Mercurial > dropbear
annotate tommath.src @ 190:d8254fc979e9 libtommathorig LTM_0.35
Initial import of libtommath 0.35
author  Matt Johnston <matt@ucc.asn.au> 

date  Fri, 06 May 2005 08:59:30 +0000 
parents  d29b64170cf0 
children 
rev  line source 

19  1 \documentclass[b5paper]{book} 
2 \usepackage{hyperref}  
3 \usepackage{makeidx}  
4 \usepackage{amssymb}  
5 \usepackage{color}  
6 \usepackage{alltt}  
7 \usepackage{graphicx}  
8 \usepackage{layout}  
9 \def\union{\cup}  
10 \def\intersect{\cap}  
11 \def\getsrandom{\stackrel{\rm R}{\gets}}  
12 \def\cross{\times}  
13 \def\cat{\hspace{0.5em} \ \hspace{0.5em}}  
14 \def\catn{$\$}  
15 \def\divides{\hspace{0.3em}  \hspace{0.3em}}  
16 \def\nequiv{\not\equiv}  
17 \def\approx{\raisebox{0.2ex}{\mbox{\small $\sim$}}}  
18 \def\lcm{{\rm lcm}}  
19 \def\gcd{{\rm gcd}}  
20 \def\log{{\rm log}}  
21 \def\ord{{\rm ord}}  
22 \def\abs{{\mathit abs}}  
23 \def\rep{{\mathit rep}}  
24 \def\mod{{\mathit\ mod\ }}  
25 \renewcommand{\pmod}[1]{\ ({\rm mod\ }{#1})}  
26 \newcommand{\floor}[1]{\left\lfloor{#1}\right\rfloor}  
27 \newcommand{\ceil}[1]{\left\lceil{#1}\right\rceil}  
28 \def\Or{{\rm\ or\ }}  
29 \def\And{{\rm\ and\ }}  
30 \def\iff{\hspace{1em}\Longleftrightarrow\hspace{1em}}  
31 \def\implies{\Rightarrow}  
32 \def\undefined{{\rm ``undefined"}}  
33 \def\Proof{\vspace{1ex}\noindent {\bf Proof:}\hspace{1em}}  
34 \let\oldphi\phi  
35 \def\phi{\varphi}  
36 \def\Pr{{\rm Pr}}  
37 \newcommand{\str}[1]{{\mathbf{#1}}}  
38 \def\F{{\mathbb F}}  
39 \def\N{{\mathbb N}}  
40 \def\Z{{\mathbb Z}}  
41 \def\R{{\mathbb R}}  
42 \def\C{{\mathbb C}}  
43 \def\Q{{\mathbb Q}}  
44 \definecolor{DGray}{gray}{0.5}  
45 \newcommand{\emailaddr}[1]{\mbox{$<${#1}$>$}}  
46 \def\twiddle{\raisebox{0.3ex}{\mbox{\tiny $\sim$}}}  
47 \def\gap{\vspace{0.5ex}}  
48 \makeindex  
49 \begin{document}  
50 \frontmatter  
51 \pagestyle{empty}  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

52 \title{MultiPrecision Math} 
19  53 \author{\mbox{ 
54 %\begin{small}  
55 \begin{tabular}{c}  
56 Tom St Denis \\  
57 Algonquin College \\  
58 \\  
59 Mads Rasmussen \\  
60 Open Communications Security \\  
61 \\  
62 Greg Rose \\  
63 QUALCOMM Australia \\  
64 \end{tabular}  
65 %\end{small}  
66 }  
67 }  
68 \maketitle  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

69 This text has been placed in the public domain. This text corresponds to the v0.35 release of the 
19  70 LibTomMath project. 
71  
72 \begin{alltt}  
73 Tom St Denis  
74 111 Banning Rd  
75 Ottawa, Ontario  
76 K2L 1C3  
77 Canada  
78  
79 Phone: 16138363160  
80 Email: [email protected]  
81 \end{alltt}  
82  
83 This text is formatted to the international B5 paper size of 176mm wide by 250mm tall using the \LaTeX{}  
84 {\em book} macro package and the Perl {\em booker} package.  
85  
86 \tableofcontents  
87 \listoffigures  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

88 \chapter*{Prefaces} 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

89 When I tell people about my LibTom projects and that I release them as public domain they are often puzzled. 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

90 They ask why I did it and especially why I continue to work on them for free. The best I can explain it is ``Because I can.'' 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

91 Which seems odd and perhaps too terse for adult conversation. I often qualify it with ``I am able, I am willing.'' which 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

92 perhaps explains it better. I am the first to admit there is not anything that special with what I have done. Perhaps 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

93 others can see that too and then we would have a society to be proud of. My LibTom projects are what I am doing to give 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

94 back to society in the form of tools and knowledge that can help others in their endeavours. 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

95 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

96 I started writing this book because it was the most logical task to further my goal of open academia. The LibTomMath source 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

97 code itself was written to be easy to follow and learn from. There are times, however, where pure C source code does not 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

98 explain the algorithms properly. Hence this book. The book literally starts with the foundation of the library and works 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

99 itself outwards to the more complicated algorithms. The use of both pseudocode and verbatim source code provides a duality 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

100 of ``theory'' and ``practice'' that the computer science students of the world shall appreciate. I never deviate too far 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

101 from relatively straightforward algebra and I hope that this book can be a valuable learning asset. 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

102 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

103 This book and indeed much of the LibTom projects would not exist in their current form if it was not for a plethora 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

104 of kind people donating their time, resources and kind words to help support my work. Writing a text of significant 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

105 length (along with the source code) is a tiresome and lengthy process. Currently the LibTom project is four years old, 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

106 comprises of literally thousands of users and over 100,000 lines of source code, TeX and other material. People like Mads and Greg 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

107 were there at the beginning to encourage me to work well. It is amazing how timely validation from others can boost morale to 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

108 continue the project. Definitely my parents were there for me by providing room and board during the many months of work in 2003. 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

109 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

110 To my many friends whom I have met through the years I thank you for the good times and the words of encouragement. I hope I 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

111 honour your kind gestures with this project. 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

112 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

113 Open Source. Open Academia. Open Minds. 
19  114 
115 \begin{flushright} Tom St Denis \end{flushright}  
116  
117 \newpage  
118 I found the opportunity to work with Tom appealing for several reasons, not only could I broaden my own horizons, but also  
119 contribute to educate others facing the problem of having to handle big number mathematical calculations.  
120  
121 This book is Tom's child and he has been caring and fostering the project ever since the beginning with a clear mind of  
122 how he wanted the project to turn out. I have helped by proofreading the text and we have had several discussions about  
123 the layout and language used.  
124  
125 I hold a masters degree in cryptography from the University of Southern Denmark and have always been interested in the  
126 practical aspects of cryptography.  
127  
128 Having worked in the security consultancy business for several years in S\~{a}o Paulo, Brazil, I have been in touch with a  
129 great deal of work in which multiple precision mathematics was needed. Understanding the possibilities for speeding up  
130 multiple precision calculations is often very important since we deal with outdated machine architecture where modular  
131 reductions, for example, become painfully slow.  
132  
133 This text is for people who stop and wonder when first examining algorithms such as RSA for the first time and asks  
134 themselves, ``You tell me this is only secure for large numbers, fine; but how do you implement these numbers?''  
135  
136 \begin{flushright}  
137 Mads Rasmussen  
138  
139 S\~{a}o Paulo  SP  
140  
141 Brazil  
142 \end{flushright}  
143  
144 \newpage  
145 It's all because I broke my leg. That just happened to be at about the same time that Tom asked for someone to review the section of the book about  
146 Karatsuba multiplication. I was laid up, alone and immobile, and thought ``Why not?'' I vaguely knew what Karatsuba multiplication was, but not  
147 really, so I thought I could help, learn, and stop myself from watching daytime cable TV, all at once.  
148  
149 At the time of writing this, I've still not met Tom or Mads in meatspace. I've been following Tom's progress since his first splash on the  
150 sci.crypt Usenet news group. I watched him go from a clueless newbie, to the cryptographic equivalent of a reformed smoker, to a real  
151 contributor to the field, over a period of about two years. I've been impressed with his obvious intelligence, and astounded by his productivity.  
152 Of course, he's young enough to be my own child, so he doesn't have my problems with staying awake.  
153  
154 When I reviewed that single section of the book, in its very earliest form, I was very pleasantly surprised. So I decided to collaborate more fully,  
155 and at least review all of it, and perhaps write some bits too. There's still a long way to go with it, and I have watched a number of close  
156 friends go through the mill of publication, so I think that the way to go is longer than Tom thinks it is. Nevertheless, it's a good effort,  
157 and I'm pleased to be involved with it.  
158  
159 \begin{flushright}  
160 Greg Rose, Sydney, Australia, June 2003.  
161 \end{flushright}  
162  
163 \mainmatter  
164 \pagestyle{headings}  
165 \chapter{Introduction}  
166 \section{Multiple Precision Arithmetic}  
167  
168 \subsection{What is Multiple Precision Arithmetic?}  
169 When we think of longhand arithmetic such as addition or multiplication we rarely consider the fact that we instinctively  
170 raise or lower the precision of the numbers we are dealing with. For example, in decimal we almost immediate can  
171 reason that $7$ times $6$ is $42$. However, $42$ has two digits of precision as opposed to one digit we started with.  
172 Further multiplications of say $3$ result in a larger precision result $126$. In these few examples we have multiple  
173 precisions for the numbers we are working with. Despite the various levels of precision a single subset\footnote{With the occasional optimization.}  
174 of algorithms can be designed to accomodate them.  
175  
176 By way of comparison a fixed or single precision operation would lose precision on various operations. For example, in  
177 the decimal system with fixed precision $6 \cdot 7 = 2$.  
178  
179 Essentially at the heart of computer based multiple precision arithmetic are the same longhand algorithms taught in  
180 schools to manually add, subtract, multiply and divide.  
181  
182 \subsection{The Need for Multiple Precision Arithmetic}  
183 The most prevalent need for multiple precision arithmetic, often referred to as ``bignum'' math, is within the implementation  
184 of publickey cryptography algorithms. Algorithms such as RSA \cite{RSAREF} and DiffieHellman \cite{DHREF} require  
185 integers of significant magnitude to resist known cryptanalytic attacks. For example, at the time of this writing a  
186 typical RSA modulus would be at least greater than $10^{309}$. However, modern programming languages such as ISO C \cite{ISOC} and  
187 Java \cite{JAVA} only provide instrinsic support for integers which are relatively small and single precision.  
188  
189 \begin{figure}[!here]  
190 \begin{center}  
191 \begin{tabular}{rc}  
192 \hline \textbf{Data Type} & \textbf{Range} \\  
193 \hline char & $128 \ldots 127$ \\  
194 \hline short & $32768 \ldots 32767$ \\  
195 \hline long & $2147483648 \ldots 2147483647$ \\  
196 \hline long long & $9223372036854775808 \ldots 9223372036854775807$ \\  
197 \hline  
198 \end{tabular}  
199 \end{center}  
200 \caption{Typical Data Types for the C Programming Language}  
201 \label{fig:ISOC}  
202 \end{figure}  
203  
204 The largest data type guaranteed to be provided by the ISO C programming  
205 language\footnote{As per the ISO C standard. However, each compiler vendor is allowed to augment the precision as they  
206 see fit.} can only represent values up to $10^{19}$ as shown in figure \ref{fig:ISOC}. On its own the C language is  
207 insufficient to accomodate the magnitude required for the problem at hand. An RSA modulus of magnitude $10^{19}$ could be  
208 trivially factored\footnote{A PollardRho factoring would take only $2^{16}$ time.} on the average desktop computer,  
209 rendering any protocol based on the algorithm insecure. Multiple precision algorithms solve this very problem by  
210 extending the range of representable integers while using single precision data types.  
211  
212 Most advancements in fast multiple precision arithmetic stem from the need for faster and more efficient cryptographic  
213 primitives. Faster modular reduction and exponentiation algorithms such as Barrett's algorithm, which have appeared in  
214 various cryptographic journals, can render algorithms such as RSA and DiffieHellman more efficient. In fact, several  
215 major companies such as RSA Security, Certicom and Entrust have built entire product lines on the implementation and  
216 deployment of efficient algorithms.  
217  
218 However, cryptography is not the only field of study that can benefit from fast multiple precision integer routines.  
219 Another auxiliary use of multiple precision integers is high precision floating point data types.  
220 The basic IEEE \cite{IEEE} standard floating point type is made up of an integer mantissa $q$, an exponent $e$ and a sign bit $s$.  
221 Numbers are given in the form $n = q \cdot b^e \cdot 1^s$ where $b = 2$ is the most common base for IEEE. Since IEEE  
222 floating point is meant to be implemented in hardware the precision of the mantissa is often fairly small  
223 (\textit{23, 48 and 64 bits}). The mantissa is merely an integer and a multiple precision integer could be used to create  
224 a mantissa of much larger precision than hardware alone can efficiently support. This approach could be useful where  
225 scientific applications must minimize the total output error over long calculations.  
226  
142  227 Yet another use for large integers is within arithmetic on polynomials of large characteristic (i.e. $GF(p)[x]$ for large $p$). 
19  228 In fact the library discussed within this text has already been used to form a polynomial basis library\footnote{See \url{http://poly.libtomcrypt.org} for more details.}. 
229  
230 \subsection{Benefits of Multiple Precision Arithmetic}  
231 \index{precision}  
232 The benefit of multiple precision representations over single or fixed precision representations is that  
233 no precision is lost while representing the result of an operation which requires excess precision. For example,  
234 the product of two $n$bit integers requires at least $2n$ bits of precision to be represented faithfully. A multiple  
235 precision algorithm would augment the precision of the destination to accomodate the result while a single precision system  
236 would truncate excess bits to maintain a fixed level of precision.  
237  
238 It is possible to implement algorithms which require large integers with fixed precision algorithms. For example, elliptic  
239 curve cryptography (\textit{ECC}) is often implemented on smartcards by fixing the precision of the integers to the maximum  
240 size the system will ever need. Such an approach can lead to vastly simpler algorithms which can accomodate the  
241 integers required even if the host platform cannot natively accomodate them\footnote{For example, the average smartcard  
242 processor has an 8 bit accumulator.}. However, as efficient as such an approach may be, the resulting source code is not  
243 normally very flexible. It cannot, at runtime, accomodate inputs of higher magnitude than the designer anticipated.  
244  
245 Multiple precision algorithms have the most overhead of any style of arithmetic. For the the most part the  
246 overhead can be kept to a minimum with careful planning, but overall, it is not well suited for most memory starved  
247 platforms. However, multiple precision algorithms do offer the most flexibility in terms of the magnitude of the  
248 inputs. That is, the same algorithms based on multiple precision integers can accomodate any reasonable size input  
249 without the designer's explicit forethought. This leads to lower cost of ownership for the code as it only has to  
250 be written and tested once.  
251  
252 \section{Purpose of This Text}  
253 The purpose of this text is to instruct the reader regarding how to implement efficient multiple precision algorithms.  
254 That is to not only explain a limited subset of the core theory behind the algorithms but also the various ``house keeping''  
255 elements that are neglected by authors of other texts on the subject. Several well reknowned texts \cite{TAOCPV2,HAC}  
256 give considerably detailed explanations of the theoretical aspects of algorithms and often very little information  
257 regarding the practical implementation aspects.  
258  
259 In most cases how an algorithm is explained and how it is actually implemented are two very different concepts. For  
260 example, the Handbook of Applied Cryptography (\textit{HAC}), algorithm 14.7 on page 594, gives a relatively simple  
261 algorithm for performing multiple precision integer addition. However, the description lacks any discussion concerning  
262 the fact that the two integer inputs may be of differing magnitudes. As a result the implementation is not as simple  
263 as the text would lead people to believe. Similarly the division routine (\textit{algorithm 14.20, pp. 598}) does not  
264 discuss how to handle sign or handle the dividend's decreasing magnitude in the main loop (\textit{step \#3}).  
265  
266 Both texts also do not discuss several key optimal algorithms required such as ``Comba'' and Karatsuba multipliers  
267 and fast modular inversion, which we consider practical oversights. These optimal algorithms are vital to achieve  
268 any form of useful performance in nontrivial applications.  
269  
270 To solve this problem the focus of this text is on the practical aspects of implementing a multiple precision integer  
271 package. As a case study the ``LibTomMath''\footnote{Available at \url{http://math.libtomcrypt.org}} package is used  
272 to demonstrate algorithms with real implementations\footnote{In the ISO C programming language.} that have been field  
273 tested and work very well. The LibTomMath library is freely available on the Internet for all uses and this text  
274 discusses a very large portion of the inner workings of the library.  
275  
276 The algorithms that are presented will always include at least one ``pseudocode'' description followed  
277 by the actual C source code that implements the algorithm. The pseudocode can be used to implement the same  
278 algorithm in other programming languages as the reader sees fit.  
279  
280 This text shall also serve as a walkthrough of the creation of multiple precision algorithms from scratch. Showing  
281 the reader how the algorithms fit together as well as where to start on various taskings.  
282  
283 \section{Discussion and Notation}  
284 \subsection{Notation}  
142  285 A multiple precision integer of $n$digits shall be denoted as $x = (x_{n1}, \ldots, x_1, x_0)_{ \beta }$ and represent 
19  286 the integer $x \equiv \sum_{i=0}^{n1} x_i\beta^i$. The elements of the array $x$ are said to be the radix $\beta$ digits 
287 of the integer. For example, $x = (1,2,3)_{10}$ would represent the integer  
288 $1\cdot 10^2 + 2\cdot10^1 + 3\cdot10^0 = 123$.  
289  
290 \index{mp\_int}  
291 The term ``mp\_int'' shall refer to a composite structure which contains the digits of the integer it represents, as well  
292 as auxilary data required to manipulate the data. These additional members are discussed further in section  
293 \ref{sec:MPINT}. For the purposes of this text a ``multiple precision integer'' and an ``mp\_int'' are assumed to be  
294 synonymous. When an algorithm is specified to accept an mp\_int variable it is assumed the various auxliary data members  
295 are present as well. An expression of the type \textit{variablename.item} implies that it should evaluate to the  
296 member named ``item'' of the variable. For example, a string of characters may have a member ``length'' which would  
297 evaluate to the number of characters in the string. If the string $a$ equals ``hello'' then it follows that  
298 $a.length = 5$.  
299  
300 For certain discussions more generic algorithms are presented to help the reader understand the final algorithm used  
301 to solve a given problem. When an algorithm is described as accepting an integer input it is assumed the input is  
302 a plain integer with no additional multipleprecision members. That is, algorithms that use integers as opposed to  
303 mp\_ints as inputs do not concern themselves with the housekeeping operations required such as memory management. These  
304 algorithms will be used to establish the relevant theory which will subsequently be used to describe a multiple  
305 precision algorithm to solve the same problem.  
306  
307 \subsection{Precision Notation}  
142  308 The variable $\beta$ represents the radix of a single digit of a multiple precision integer and 
309 must be of the form $q^p$ for $q, p \in \Z^+$. A single precision variable must be able to represent integers in  
310 the range $0 \le x < q \beta$ while a double precision variable must be able to represent integers in the range  
311 $0 \le x < q \beta^2$. The extra radix$q$ factor allows additions and subtractions to proceed without truncation of the  
312 carry. Since all modern computers are binary, it is assumed that $q$ is two.  
19  313 
314 \index{mp\_digit} \index{mp\_word}  
315 Within the source code that will be presented for each algorithm, the data type \textbf{mp\_digit} will represent  
316 a single precision integer type, while, the data type \textbf{mp\_word} will represent a double precision integer type. In  
317 several algorithms (notably the Comba routines) temporary results will be stored in arrays of double precision mp\_words.  
318 For the purposes of this text $x_j$ will refer to the $j$'th digit of a single precision array and $\hat x_j$ will refer to  
319 the $j$'th digit of a double precision array. Whenever an expression is to be assigned to a double precision  
320 variable it is assumed that all single precision variables are promoted to double precision during the evaluation.  
321 Expressions that are assigned to a single precision variable are truncated to fit within the precision of a single  
322 precision data type.  
323  
324 For example, if $\beta = 10^2$ a single precision data type may represent a value in the  
325 range $0 \le x < 10^3$, while a double precision data type may represent a value in the range $0 \le x < 10^5$. Let  
326 $a = 23$ and $b = 49$ represent two single precision variables. The single precision product shall be written  
327 as $c \leftarrow a \cdot b$ while the double precision product shall be written as $\hat c \leftarrow a \cdot b$.  
328 In this particular case, $\hat c = 1127$ and $c = 127$. The most significant digit of the product would not fit  
329 in a single precision data type and as a result $c \ne \hat c$.  
330  
331 \subsection{Algorithm Inputs and Outputs}  
332 Within the algorithm descriptions all variables are assumed to be scalars of either single or double precision  
333 as indicated. The only exception to this rule is when variables have been indicated to be of type mp\_int. This  
334 distinction is important as scalars are often used as array indicies and various other counters.  
335  
336 \subsection{Mathematical Expressions}  
337 The $\lfloor \mbox{ } \rfloor$ brackets imply an expression truncated to an integer not greater than the expression  
338 itself. For example, $\lfloor 5.7 \rfloor = 5$. Similarly the $\lceil \mbox{ } \rceil$ brackets imply an expression  
339 rounded to an integer not less than the expression itself. For example, $\lceil 5.1 \rceil = 6$. Typically when  
340 the $/$ division symbol is used the intention is to perform an integer division with truncation. For example,  
341 $5/2 = 2$ which will often be written as $\lfloor 5/2 \rfloor = 2$ for clarity. When an expression is written as a  
342 fraction a real value division is implied, for example ${5 \over 2} = 2.5$.  
343  
142  344 The norm of a multiple precision integer, for example $\vert \vert x \vert \vert$, will be used to represent the number of digits in the representation 
19  345 of the integer. For example, $\vert \vert 123 \vert \vert = 3$ and $\vert \vert 79452 \vert \vert = 5$. 
346  
347 \subsection{Work Effort}  
348 \index{bigOh}  
349 To measure the efficiency of the specified algorithms, a modified bigOh notation is used. In this system all  
350 single precision operations are considered to have the same cost\footnote{Except where explicitly noted.}.  
351 That is a single precision addition, multiplication and division are assumed to take the same time to  
352 complete. While this is generally not true in practice, it will simplify the discussions considerably.  
353  
354 Some algorithms have slight advantages over others which is why some constants will not be removed in  
355 the notation. For example, a normal baseline multiplication (section \ref{sec:basemult}) requires $O(n^2)$ work while a  
356 baseline squaring (section \ref{sec:basesquare}) requires $O({{n^2 + n}\over 2})$ work. In standard bigOh notation these  
357 would both be said to be equivalent to $O(n^2)$. However,  
358 in the context of the this text this is not the case as the magnitude of the inputs will typically be rather small. As a  
359 result small constant factors in the work effort will make an observable difference in algorithm efficiency.  
360  
361 All of the algorithms presented in this text have a polynomial time work level. That is, of the form  
362 $O(n^k)$ for $n, k \in \Z^{+}$. This will help make useful comparisons in terms of the speed of the algorithms and how  
363 various optimizations will help pay off in the long run.  
364  
365 \section{Exercises}  
366 Within the more advanced chapters a section will be set aside to give the reader some challenging exercises related to  
367 the discussion at hand. These exercises are not designed to be prize winning problems, but instead to be thought  
368 provoking. Wherever possible the problems are forward minded, stating problems that will be answered in subsequent  
369 chapters. The reader is encouraged to finish the exercises as they appear to get a better understanding of the  
370 subject material.  
371  
372 That being said, the problems are designed to affirm knowledge of a particular subject matter. Students in particular  
373 are encouraged to verify they can answer the problems correctly before moving on.  
374  
375 Similar to the exercises of \cite[pp. ix]{TAOCPV2} these exercises are given a scoring system based on the difficulty of  
376 the problem. However, unlike \cite{TAOCPV2} the problems do not get nearly as hard. The scoring of these  
377 exercises ranges from one (the easiest) to five (the hardest). The following table sumarizes the  
378 scoring system used.  
379  
380 \begin{figure}[here]  
381 \begin{center}  
382 \begin{small}  
383 \begin{tabular}{cl}  
384 \hline $\left [ 1 \right ]$ & An easy problem that should only take the reader a manner of \\  
385 & minutes to solve. Usually does not involve much computer time \\  
386 & to solve. \\  
387 \hline $\left [ 2 \right ]$ & An easy problem that involves a marginal amount of computer \\  
388 & time usage. Usually requires a program to be written to \\  
389 & solve the problem. \\  
390 \hline $\left [ 3 \right ]$ & A moderately hard problem that requires a nontrivial amount \\  
391 & of work. Usually involves trivial research and development of \\  
392 & new theory from the perspective of a student. \\  
393 \hline $\left [ 4 \right ]$ & A moderately hard problem that involves a nontrivial amount \\  
394 & of work and research, the solution to which will demonstrate \\  
395 & a higher mastery of the subject matter. \\  
396 \hline $\left [ 5 \right ]$ & A hard problem that involves concepts that are difficult for a \\  
397 & novice to solve. Solutions to these problems will demonstrate a \\  
398 & complete mastery of the given subject. \\  
399 \hline  
400 \end{tabular}  
401 \end{small}  
402 \end{center}  
403 \caption{Exercise Scoring System}  
404 \end{figure}  
405  
406 Problems at the first level are meant to be simple questions that the reader can answer quickly without programming a solution or  
407 devising new theory. These problems are quick tests to see if the material is understood. Problems at the second level  
408 are also designed to be easy but will require a program or algorithm to be implemented to arrive at the answer. These  
409 two levels are essentially entry level questions.  
410  
411 Problems at the third level are meant to be a bit more difficult than the first two levels. The answer is often  
412 fairly obvious but arriving at an exacting solution requires some thought and skill. These problems will almost always  
413 involve devising a new algorithm or implementing a variation of another algorithm previously presented. Readers who can  
414 answer these questions will feel comfortable with the concepts behind the topic at hand.  
415  
416 Problems at the fourth level are meant to be similar to those of the level three questions except they will require  
417 additional research to be completed. The reader will most likely not know the answer right away, nor will the text provide  
418 the exact details of the answer until a subsequent chapter.  
419  
420 Problems at the fifth level are meant to be the hardest  
421 problems relative to all the other problems in the chapter. People who can correctly answer fifth level problems have a  
422 mastery of the subject matter at hand.  
423  
424 Often problems will be tied together. The purpose of this is to start a chain of thought that will be discussed in future chapters. The reader  
425 is encouraged to answer the followup problems and try to draw the relevance of problems.  
426  
427 \section{Introduction to LibTomMath}  
428  
429 \subsection{What is LibTomMath?}  
430 LibTomMath is a free and open source multiple precision integer library written entirely in portable ISO C. By portable it  
431 is meant that the library does not contain any code that is computer platform dependent or otherwise problematic to use on  
432 any given platform.  
433  
434 The library has been successfully tested under numerous operating systems including Unix\footnote{All of these  
435 trademarks belong to their respective rightful owners.}, MacOS, Windows, Linux, PalmOS and on standalone hardware such  
436 as the Gameboy Advance. The library is designed to contain enough functionality to be able to develop applications such  
437 as public key cryptosystems and still maintain a relatively small footprint.  
438  
439 \subsection{Goals of LibTomMath}  
440  
441 Libraries which obtain the most efficiency are rarely written in a high level programming language such as C. However,  
442 even though this library is written entirely in ISO C, considerable care has been taken to optimize the algorithm implementations within the  
443 library. Specifically the code has been written to work well with the GNU C Compiler (\textit{GCC}) on both x86 and ARM  
444 processors. Wherever possible, highly efficient algorithms, such as Karatsuba multiplication, sliding window  
445 exponentiation and Montgomery reduction have been provided to make the library more efficient.  
446  
447 Even with the nearly optimal and specialized algorithms that have been included the Application Programing Interface  
448 (\textit{API}) has been kept as simple as possible. Often generic place holder routines will make use of specialized  
449 algorithms automatically without the developer's specific attention. One such example is the generic multiplication  
450 algorithm \textbf{mp\_mul()} which will automatically use ToomCook, Karatsuba, Comba or baseline multiplication  
451 based on the magnitude of the inputs and the configuration of the library.  
452  
453 Making LibTomMath as efficient as possible is not the only goal of the LibTomMath project. Ideally the library should  
454 be source compatible with another popular library which makes it more attractive for developers to use. In this case the  
455 MPI library was used as a API template for all the basic functions. MPI was chosen because it is another library that fits  
456 in the same niche as LibTomMath. Even though LibTomMath uses MPI as the template for the function names and argument  
457 passing conventions, it has been written from scratch by Tom St Denis.  
458  
459 The project is also meant to act as a learning tool for students, the logic being that no easytofollow ``bignum''  
460 library exists which can be used to teach computer science students how to perform fast and reliable multiple precision  
461 integer arithmetic. To this end the source code has been given quite a few comments and algorithm discussion points.  
462  
463 \section{Choice of LibTomMath}  
464 LibTomMath was chosen as the case study of this text not only because the author of both projects is one and the same but  
465 for more worthy reasons. Other libraries such as GMP \cite{GMP}, MPI \cite{MPI}, LIP \cite{LIP} and OpenSSL  
466 \cite{OPENSSL} have multiple precision integer arithmetic routines but would not be ideal for this text for  
467 reasons that will be explained in the following subsections.  
468  
469 \subsection{Code Base}  
470 The LibTomMath code base is all portable ISO C source code. This means that there are no platform dependent conditional  
471 segments of code littered throughout the source. This clean and uncluttered approach to the library means that a  
472 developer can more readily discern the true intent of a given section of source code without trying to keep track of  
473 what conditional code will be used.  
474  
475 The code base of LibTomMath is well organized. Each function is in its own separate source code file  
476 which allows the reader to find a given function very quickly. On average there are $76$ lines of code per source  
477 file which makes the source very easily to follow. By comparison MPI and LIP are single file projects making code tracing  
478 very hard. GMP has many conditional code segments which also hinder tracing.  
479  
480 When compiled with GCC for the x86 processor and optimized for speed the entire library is approximately $100$KiB\footnote{The notation ``KiB'' means $2^{10}$ octets, similarly ``MiB'' means $2^{20}$ octets.}  
481 which is fairly small compared to GMP (over $250$KiB). LibTomMath is slightly larger than MPI (which compiles to about  
482 $50$KiB) but LibTomMath is also much faster and more complete than MPI.  
483  
484 \subsection{API Simplicity}  
485 LibTomMath is designed after the MPI library and shares the API design. Quite often programs that use MPI will build  
486 with LibTomMath without change. The function names correlate directly to the action they perform. Almost all of the  
487 functions share the same parameter passing convention. The learning curve is fairly shallow with the API provided  
488 which is an extremely valuable benefit for the student and developer alike.  
489  
490 The LIP library is an example of a library with an API that is awkward to work with. LIP uses function names that are often ``compressed'' to  
491 illegible short hand. LibTomMath does not share this characteristic.  
492  
493 The GMP library also does not return error codes. Instead it uses a POSIX.1 \cite{POSIX1} signal system where errors  
494 are signaled to the host application. This happens to be the fastest approach but definitely not the most versatile. In  
495 effect a math error (i.e. invalid input, heap error, etc) can cause a program to stop functioning which is definitely  
496 undersireable in many situations.  
497  
498 \subsection{Optimizations}  
499 While LibTomMath is certainly not the fastest library (GMP often beats LibTomMath by a factor of two) it does  
500 feature a set of optimal algorithms for tasks such as modular reduction, exponentiation, multiplication and squaring. GMP  
501 and LIP also feature such optimizations while MPI only uses baseline algorithms with no optimizations. GMP lacks a few  
502 of the additional modular reduction optimizations that LibTomMath features\footnote{At the time of this writing GMP  
503 only had Barrett and Montgomery modular reduction algorithms.}.  
504  
505 LibTomMath is almost always an order of magnitude faster than the MPI library at computationally expensive tasks such as modular  
506 exponentiation. In the grand scheme of ``bignum'' libraries LibTomMath is faster than the average library and usually  
507 slower than the best libraries such as GMP and OpenSSL by only a small factor.  
508  
509 \subsection{Portability and Stability}  
510 LibTomMath will build ``out of the box'' on any platform equipped with a modern version of the GNU C Compiler  
511 (\textit{GCC}). This means that without changes the library will build without configuration or setting up any  
512 variables. LIP and MPI will build ``out of the box'' as well but have numerous known bugs. Most notably the author of  
513 MPI has recently stopped working on his library and LIP has long since been discontinued.  
514  
515 GMP requires a configuration script to run and will not build out of the box. GMP and LibTomMath are still in active  
516 development and are very stable across a variety of platforms.  
517  
518 \subsection{Choice}  
519 LibTomMath is a relatively compact, well documented, highly optimized and portable library which seems only natural for  
520 the case study of this text. Various source files from the LibTomMath project will be included within the text. However,  
521 the reader is encouraged to download their own copy of the library to actually be able to work with the library.  
522  
523 \chapter{Getting Started}  
524 \section{Library Basics}  
525 The trick to writing any useful library of source code is to build a solid foundation and work outwards from it. First,  
526 a problem along with allowable solution parameters should be identified and analyzed. In this particular case the  
527 inability to accomodate multiple precision integers is the problem. Futhermore, the solution must be written  
528 as portable source code that is reasonably efficient across several different computer platforms.  
529  
530 After a foundation is formed the remainder of the library can be designed and implemented in a hierarchical fashion.  
531 That is, to implement the lowest level dependencies first and work towards the most abstract functions last. For example,  
532 before implementing a modular exponentiation algorithm one would implement a modular reduction algorithm.  
533 By building outwards from a base foundation instead of using a parallel design methodology the resulting project is  
534 highly modular. Being highly modular is a desirable property of any project as it often means the resulting product  
535 has a small footprint and updates are easy to perform.  
536  
142  537 Usually when I start a project I will begin with the header files. I define the data types I think I will need and 
19  538 prototype the initial functions that are not dependent on other functions (within the library). After I 
539 implement these base functions I prototype more dependent functions and implement them. The process repeats until  
540 I implement all of the functions I require. For example, in the case of LibTomMath I implemented functions such as  
541 mp\_init() well before I implemented mp\_mul() and even further before I implemented mp\_exptmod(). As an example as to  
542 why this design works note that the Karatsuba and ToomCook multipliers were written \textit{after} the  
543 dependent function mp\_exptmod() was written. Adding the new multiplication algorithms did not require changes to the  
544 mp\_exptmod() function itself and lowered the total cost of ownership (\textit{so to speak}) and of development  
545 for new algorithms. This methodology allows new algorithms to be tested in a complete framework with relative ease.  
546  
547 FIGU,design_process,Design Flow of the First Few Original LibTomMath Functions.  
548  
549 Only after the majority of the functions were in place did I pursue a less hierarchical approach to auditing and optimizing  
550 the source code. For example, one day I may audit the multipliers and the next day the polynomial basis functions.  
551  
552 It only makes sense to begin the text with the preliminary data types and support algorithms required as well.  
553 This chapter discusses the core algorithms of the library which are the dependents for every other algorithm.  
554  
555 \section{What is a Multiple Precision Integer?}  
556 Recall that most programming languages, in particular ISO C \cite{ISOC}, only have fixed precision data types that on their own cannot  
557 be used to represent values larger than their precision will allow. The purpose of multiple precision algorithms is  
558 to use fixed precision data types to create and manipulate multiple precision integers which may represent values  
559 that are very large.  
560  
561 As a well known analogy, school children are taught how to form numbers larger than nine by prepending more radix ten digits. In the decimal system  
562 the largest single digit value is $9$. However, by concatenating digits together larger numbers may be represented. Newly prepended digits  
563 (\textit{to the left}) are said to be in a different power of ten column. That is, the number $123$ can be described as having a $1$ in the hundreds  
564 column, $2$ in the tens column and $3$ in the ones column. Or more formally $123 = 1 \cdot 10^2 + 2 \cdot 10^1 + 3 \cdot 10^0$. Computer based  
565 multiple precision arithmetic is essentially the same concept. Larger integers are represented by adjoining fixed  
566 precision computer words with the exception that a different radix is used.  
567  
568 What most people probably do not think about explicitly are the various other attributes that describe a multiple precision  
569 integer. For example, the integer $154_{10}$ has two immediately obvious properties. First, the integer is positive,  
570 that is the sign of this particular integer is positive as opposed to negative. Second, the integer has three digits in  
571 its representation. There is an additional property that the integer posesses that does not concern pencilandpaper  
572 arithmetic. The third property is how many digits placeholders are available to hold the integer.  
573  
574 The human analogy of this third property is ensuring there is enough space on the paper to write the integer. For example,  
575 if one starts writing a large number too far to the right on a piece of paper they will have to erase it and move left.  
576 Similarly, computer algorithms must maintain strict control over memory usage to ensure that the digits of an integer  
577 will not exceed the allowed boundaries. These three properties make up what is known as a multiple precision  
578 integer or mp\_int for short.  
579  
580 \subsection{The mp\_int Structure}  
581 \label{sec:MPINT}  
582 The mp\_int structure is the ISO C based manifestation of what represents a multiple precision integer. The ISO C standard does not provide for  
583 any such data type but it does provide for making composite data types known as structures. The following is the structure definition  
584 used within LibTomMath.  
585  
586 \index{mp\_int}  
142  587 \begin{figure}[here] 
588 \begin{center}  
589 \begin{small}  
590 %\begin{verbatim}  
591 \begin{tabular}{l}  
592 \hline  
593 typedef struct \{ \\  
594 \hspace{3mm}int used, alloc, sign;\\  
595 \hspace{3mm}mp\_digit *dp;\\  
596 \} \textbf{mp\_int}; \\  
597 \hline  
598 \end{tabular}  
599 %\end{verbatim}  
600 \end{small}  
601 \caption{The mp\_int Structure}  
602 \label{fig:mpint}  
603 \end{center}  
604 \end{figure}  
605  
606 The mp\_int structure (fig. \ref{fig:mpint}) can be broken down as follows.  
19  607 
608 \begin{enumerate}  
609 \item The \textbf{used} parameter denotes how many digits of the array \textbf{dp} contain the digits used to represent  
610 a given integer. The \textbf{used} count must be positive (or zero) and may not exceed the \textbf{alloc} count.  
611  
612 \item The \textbf{alloc} parameter denotes how  
613 many digits are available in the array to use by functions before it has to increase in size. When the \textbf{used} count  
614 of a result would exceed the \textbf{alloc} count all of the algorithms will automatically increase the size of the  
615 array to accommodate the precision of the result.  
616  
617 \item The pointer \textbf{dp} points to a dynamically allocated array of digits that represent the given multiple  
618 precision integer. It is padded with $(\textbf{alloc}  \textbf{used})$ zero digits. The array is maintained in a least  
619 significant digit order. As a pencil and paper analogy the array is organized such that the right most digits are stored  
620 first starting at the location indexed by zero\footnote{In C all arrays begin at zero.} in the array. For example,  
621 if \textbf{dp} contains $\lbrace a, b, c, \ldots \rbrace$ where \textbf{dp}$_0 = a$, \textbf{dp}$_1 = b$, \textbf{dp}$_2 = c$, $\ldots$ then  
622 it would represent the integer $a + b\beta + c\beta^2 + \ldots$  
623  
624 \index{MP\_ZPOS} \index{MP\_NEG}  
625 \item The \textbf{sign} parameter denotes the sign as either zero/positive (\textbf{MP\_ZPOS}) or negative (\textbf{MP\_NEG}).  
626 \end{enumerate}  
627  
628 \subsubsection{Valid mp\_int Structures}  
629 Several rules are placed on the state of an mp\_int structure and are assumed to be followed for reasons of efficiency.  
630 The only exceptions are when the structure is passed to initialization functions such as mp\_init() and mp\_init\_copy().  
631  
632 \begin{enumerate}  
633 \item The value of \textbf{alloc} may not be less than one. That is \textbf{dp} always points to a previously allocated  
634 array of digits.  
635 \item The value of \textbf{used} may not exceed \textbf{alloc} and must be greater than or equal to zero.  
636 \item The value of \textbf{used} implies the digit at index $(used  1)$ of the \textbf{dp} array is nonzero. That is,  
637 leading zero digits in the most significant positions must be trimmed.  
638 \begin{enumerate}  
639 \item Digits in the \textbf{dp} array at and above the \textbf{used} location must be zero.  
640 \end{enumerate}  
641 \item The value of \textbf{sign} must be \textbf{MP\_ZPOS} if \textbf{used} is zero;  
642 this represents the mp\_int value of zero.  
643 \end{enumerate}  
644  
645 \section{Argument Passing}  
646 A convention of argument passing must be adopted early on in the development of any library. Making the function  
647 prototypes consistent will help eliminate many headaches in the future as the library grows to significant complexity.  
648 In LibTomMath the multiple precision integer functions accept parameters from left to right as pointers to mp\_int  
649 structures. That means that the source (input) operands are placed on the left and the destination (output) on the right.  
650 Consider the following examples.  
651  
652 \begin{verbatim}  
653 mp_mul(&a, &b, &c); /* c = a * b */  
654 mp_add(&a, &b, &a); /* a = a + b */  
655 mp_sqr(&a, &b); /* b = a * a */  
656 \end{verbatim}  
657  
658 The left to right order is a fairly natural way to implement the functions since it lets the developer read aloud the  
659 functions and make sense of them. For example, the first function would read ``multiply a and b and store in c''.  
660  
661 Certain libraries (\textit{LIP by Lenstra for instance}) accept parameters the other way around, to mimic the order  
662 of assignment expressions. That is, the destination (output) is on the left and arguments (inputs) are on the right. In  
663 truth, it is entirely a matter of preference. In the case of LibTomMath the convention from the MPI library has been  
664 adopted.  
665  
666 Another very useful design consideration, provided for in LibTomMath, is whether to allow argument sources to also be a  
667 destination. For example, the second example (\textit{mp\_add}) adds $a$ to $b$ and stores in $a$. This is an important  
668 feature to implement since it allows the calling functions to cut down on the number of variables it must maintain.  
669 However, to implement this feature specific care has to be given to ensure the destination is not modified before the  
670 source is fully read.  
671  
672 \section{Return Values}  
673 A well implemented application, no matter what its purpose, should trap as many runtime errors as possible and return them  
674 to the caller. By catching runtime errors a library can be guaranteed to prevent undefined behaviour. However, the end  
675 developer can still manage to cause a library to crash. For example, by passing an invalid pointer an application may  
676 fault by dereferencing memory not owned by the application.  
677  
678 In the case of LibTomMath the only errors that are checked for are related to inappropriate inputs (division by zero for  
679 instance) and memory allocation errors. It will not check that the mp\_int passed to any function is valid nor  
680 will it check pointers for validity. Any function that can cause a runtime error will return an error code as an  
142  681 \textbf{int} data type with one of the following values (fig \ref{fig:errcodes}). 
19  682 
683 \index{MP\_OKAY} \index{MP\_VAL} \index{MP\_MEM}  
142  684 \begin{figure}[here] 
19  685 \begin{center} 
686 \begin{tabular}{ll}  
687 \hline \textbf{Value} & \textbf{Meaning} \\  
688 \hline \textbf{MP\_OKAY} & The function was successful \\  
689 \hline \textbf{MP\_VAL} & One of the input value(s) was invalid \\  
690 \hline \textbf{MP\_MEM} & The function ran out of heap memory \\  
691 \hline  
692 \end{tabular}  
693 \end{center}  
142  694 \caption{LibTomMath Error Codes} 
695 \label{fig:errcodes}  
696 \end{figure}  
19  697 
698 When an error is detected within a function it should free any memory it allocated, often during the initialization of  
699 temporary mp\_ints, and return as soon as possible. The goal is to leave the system in the same state it was when the  
700 function was called. Error checking with this style of API is fairly simple.  
701  
702 \begin{verbatim}  
703 int err;  
704 if ((err = mp_add(&a, &b, &c)) != MP_OKAY) {  
705 printf("Error: %s\n", mp_error_to_string(err));  
706 exit(EXIT_FAILURE);  
707 }  
708 \end{verbatim}  
709  
710 The GMP \cite{GMP} library uses C style \textit{signals} to flag errors which is of questionable use. Not all errors are fatal  
711 and it was not deemed ideal by the author of LibTomMath to force developers to have signal handlers for such cases.  
712  
713 \section{Initialization and Clearing}  
714 The logical starting point when actually writing multiple precision integer functions is the initialization and  
715 clearing of the mp\_int structures. These two algorithms will be used by the majority of the higher level algorithms.  
716  
717 Given the basic mp\_int structure an initialization routine must first allocate memory to hold the digits of  
718 the integer. Often it is optimal to allocate a sufficiently large preset number of digits even though  
719 the initial integer will represent zero. If only a single digit were allocated quite a few subsequent reallocations  
720 would occur when operations are performed on the integers. There is a tradeoff between how many default digits to allocate  
721 and how many reallocations are tolerable. Obviously allocating an excessive amount of digits initially will waste  
722 memory and become unmanageable.  
723  
724 If the memory for the digits has been successfully allocated then the rest of the members of the structure must  
725 be initialized. Since the initial state of an mp\_int is to represent the zero integer, the allocated digits must be set  
726 to zero. The \textbf{used} count set to zero and \textbf{sign} set to \textbf{MP\_ZPOS}.  
727  
728 \subsection{Initializing an mp\_int}  
729 An mp\_int is said to be initialized if it is set to a valid, preferably default, state such that all of the members of the  
730 structure are set to valid values. The mp\_init algorithm will perform such an action.  
731  
142  732 \index{mp\_init} 
19  733 \begin{figure}[here] 
734 \begin{center}  
735 \begin{tabular}{l}  
736 \hline Algorithm \textbf{mp\_init}. \\  
737 \textbf{Input}. An mp\_int $a$ \\  
738 \textbf{Output}. Allocate memory and initialize $a$ to a known valid mp\_int state. \\  
739 \hline \\  
740 1. Allocate memory for \textbf{MP\_PREC} digits. \\  
741 2. If the allocation failed return(\textit{MP\_MEM}) \\  
742 3. for $n$ from $0$ to $MP\_PREC  1$ do \\  
743 \hspace{3mm}3.1 $a_n \leftarrow 0$\\  
744 4. $a.sign \leftarrow MP\_ZPOS$\\  
745 5. $a.used \leftarrow 0$\\  
746 6. $a.alloc \leftarrow MP\_PREC$\\  
747 7. Return(\textit{MP\_OKAY})\\  
748 \hline  
749 \end{tabular}  
750 \end{center}  
751 \caption{Algorithm mp\_init}  
752 \end{figure}  
753  
754 \textbf{Algorithm mp\_init.}  
142  755 The purpose of this function is to initialize an mp\_int structure so that the rest of the library can properly 
756 manipulte it. It is assumed that the input may not have had any of its members previously initialized which is certainly  
757 a valid assumption if the input resides on the stack.  
758  
759 Before any of the members such as \textbf{sign}, \textbf{used} or \textbf{alloc} are initialized the memory for  
760 the digits is allocated. If this fails the function returns before setting any of the other members. The \textbf{MP\_PREC}  
761 name represents a constant\footnote{Defined in the ``tommath.h'' header file within LibTomMath.}  
762 used to dictate the minimum precision of newly initialized mp\_int integers. Ideally, it is at least equal to the smallest  
763 precision number you'll be working with.  
764  
765 Allocating a block of digits at first instead of a single digit has the benefit of lowering the number of usually slow  
766 heap operations later functions will have to perform in the future. If \textbf{MP\_PREC} is set correctly the slack  
767 memory and the number of heap operations will be trivial.  
768  
769 Once the allocation has been made the digits have to be set to zero as well as the \textbf{used}, \textbf{sign} and  
770 \textbf{alloc} members initialized. This ensures that the mp\_int will always represent the default state of zero regardless  
771 of the original condition of the input.  
19  772 
773 \textbf{Remark.}  
774 This function introduces the idiosyncrasy that all iterative loops, commonly initiated with the ``for'' keyword, iterate incrementally  
775 when the ``to'' keyword is placed between two expressions. For example, ``for $a$ from $b$ to $c$ do'' means that  
776 a subsequent expression (or body of expressions) are to be evaluated upto $c  b$ times so long as $b \le c$. In each  
777 iteration the variable $a$ is substituted for a new integer that lies inclusively between $b$ and $c$. If $b > c$ occured  
778 the loop would not iterate. By contrast if the ``downto'' keyword were used in place of ``to'' the loop would iterate  
779 decrementally.  
780  
781 EXAM,bn_mp_init.c  
782  
783 One immediate observation of this initializtion function is that it does not return a pointer to a mp\_int structure. It  
784 is assumed that the caller has already allocated memory for the mp\_int structure, typically on the application stack. The  
785 call to mp\_init() is used only to initialize the members of the structure to a known default state.  
786  
142  787 Here we see (line @23,[email protected]) the memory allocation is performed first. This allows us to exit cleanly and quickly 
788 if there is an error. If the allocation fails the routine will return \textbf{MP\_MEM} to the caller to indicate there  
789 was a memory error. The function XMALLOC is what actually allocates the memory. Technically XMALLOC is not a function  
790 but a macro defined in ``tommath.h``. By default, XMALLOC will evaluate to malloc() which is the C library's builtin  
791 memory allocation routine.  
792  
793 In order to assure the mp\_int is in a known state the digits must be set to zero. On most platforms this could have been  
794 accomplished by using calloc() instead of malloc(). However, to correctly initialize a integer type to a given value in a  
795 portable fashion you have to actually assign the value. The for loop (line @28,[email protected]) performs this required  
796 operation.  
797  
798 After the memory has been successfully initialized the remainder of the members are initialized  
19  799 (lines @29,[email protected] through @31,[email protected]) to their respective default states. At this point the algorithm has succeeded and 
142  800 a success code is returned to the calling function. If this function returns \textbf{MP\_OKAY} it is safe to assume the 
801 mp\_int structure has been properly initialized and is safe to use with other functions within the library.  
19  802 
803 \subsection{Clearing an mp\_int}  
804 When an mp\_int is no longer required by the application, the memory that has been allocated for its digits must be  
805 returned to the application's memory pool with the mp\_clear algorithm.  
806  
807 \begin{figure}[here]  
808 \begin{center}  
809 \begin{tabular}{l}  
810 \hline Algorithm \textbf{mp\_clear}. \\  
811 \textbf{Input}. An mp\_int $a$ \\  
142  812 \textbf{Output}. The memory for $a$ shall be deallocated. \\ 
19  813 \hline \\ 
814 1. If $a$ has been previously freed then return(\textit{MP\_OKAY}). \\  
815 2. for $n$ from 0 to $a.used  1$ do \\  
816 \hspace{3mm}2.1 $a_n \leftarrow 0$ \\  
817 3. Free the memory allocated for the digits of $a$. \\  
818 4. $a.used \leftarrow 0$ \\  
819 5. $a.alloc \leftarrow 0$ \\  
820 6. $a.sign \leftarrow MP\_ZPOS$ \\  
821 7. Return(\textit{MP\_OKAY}). \\  
822 \hline  
823 \end{tabular}  
824 \end{center}  
825 \caption{Algorithm mp\_clear}  
826 \end{figure}  
827  
828 \textbf{Algorithm mp\_clear.}  
142  829 This algorithm accomplishes two goals. First, it clears the digits and the other mp\_int members. This ensures that 
830 if a developer accidentally reuses a cleared structure it is less likely to cause problems. The second goal  
831 is to free the allocated memory.  
832  
833 The logic behind the algorithm is extended by marking cleared mp\_int structures so that subsequent calls to this  
834 algorithm will not try to free the memory multiple times. Cleared mp\_ints are detectable by having a predefined invalid  
835 digit pointer \textbf{dp} setting.  
836  
837 Once an mp\_int has been cleared the mp\_int structure is no longer in a valid state for any other algorithm  
19  838 with the exception of algorithms mp\_init, mp\_init\_copy, mp\_init\_size and mp\_clear. 
839  
840 EXAM,bn_mp_clear.c  
841  
142  842 The algorithm only operates on the mp\_int if it hasn't been previously cleared. The if statement (line @23,a>dp != [email protected]) 
843 checks to see if the \textbf{dp} member is not \textbf{NULL}. If the mp\_int is a valid mp\_int then \textbf{dp} cannot be  
844 \textbf{NULL} in which case the if statement will evaluate to true.  
845  
846 The digits of the mp\_int are cleared by the for loop (line @25,[email protected]) which assigns a zero to every digit. Similar to mp\_init()  
847 the digits are assigned zero instead of using block memory operations (such as memset()) since this is more portable.  
848  
849 The digits are deallocated off the heap via the XFREE macro. Similar to XMALLOC the XFREE macro actually evaluates to  
850 a standard C library function. In this case the free() function. Since free() only deallocates the memory the pointer  
851 still has to be reset to \textbf{NULL} manually (line @33,[email protected]).  
852  
853 Now that the digits have been cleared and deallocated the other members are set to their final values (lines @34,= [email protected] and @35,[email protected]).  
19  854 
855 \section{Maintenance Algorithms}  
856  
857 The previous sections describes how to initialize and clear an mp\_int structure. To further support operations  
858 that are to be performed on mp\_int structures (such as addition and multiplication) the dependent algorithms must be  
859 able to augment the precision of an mp\_int and  
860 initialize mp\_ints with differing initial conditions.  
861  
862 These algorithms complete the set of low level algorithms required to work with mp\_int structures in the higher level  
863 algorithms such as addition, multiplication and modular exponentiation.  
864  
865 \subsection{Augmenting an mp\_int's Precision}  
866 When storing a value in an mp\_int structure, a sufficient number of digits must be available to accomodate the entire  
867 result of an operation without loss of precision. Quite often the size of the array given by the \textbf{alloc} member  
868 is large enough to simply increase the \textbf{used} digit count. However, when the size of the array is too small it  
869 must be resized appropriately to accomodate the result. The mp\_grow algorithm will provide this functionality.  
870  
871 \newpage\begin{figure}[here]  
872 \begin{center}  
873 \begin{tabular}{l}  
874 \hline Algorithm \textbf{mp\_grow}. \\  
875 \textbf{Input}. An mp\_int $a$ and an integer $b$. \\  
876 \textbf{Output}. $a$ is expanded to accomodate $b$ digits. \\  
877 \hline \\  
878 1. if $a.alloc \ge b$ then return(\textit{MP\_OKAY}) \\  
879 2. $u \leftarrow b\mbox{ (mod }MP\_PREC\mbox{)}$ \\  
880 3. $v \leftarrow b + 2 \cdot MP\_PREC  u$ \\  
142  881 4. Reallocate the array of digits $a$ to size $v$ \\ 
19  882 5. If the allocation failed then return(\textit{MP\_MEM}). \\ 
883 6. for n from a.alloc to $v  1$ do \\  
884 \hspace{+3mm}6.1 $a_n \leftarrow 0$ \\  
885 7. $a.alloc \leftarrow v$ \\  
886 8. Return(\textit{MP\_OKAY}) \\  
887 \hline  
888 \end{tabular}  
889 \end{center}  
890 \caption{Algorithm mp\_grow}  
891 \end{figure}  
892  
893 \textbf{Algorithm mp\_grow.}  
894 It is ideal to prevent reallocations from being performed if they are not required (step one). This is useful to  
895 prevent mp\_ints from growing excessively in code that erroneously calls mp\_grow.  
896  
897 The requested digit count is padded up to next multiple of \textbf{MP\_PREC} plus an additional \textbf{MP\_PREC} (steps two and three).  
898 This helps prevent many trivial reallocations that would grow an mp\_int by trivially small values.  
899  
900 It is assumed that the reallocation (step four) leaves the lower $a.alloc$ digits of the mp\_int intact. This is much  
901 akin to how the \textit{realloc} function from the standard C library works. Since the newly allocated digits are  
902 assumed to contain undefined values they are initially set to zero.  
903  
904 EXAM,bn_mp_grow.c  
905  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

906 A quick optimization is to first determine if a memory reallocation is required at all. The if statement (line @24,[email protected]) checks 
142  907 if the \textbf{alloc} member of the mp\_int is smaller than the requested digit count. If the count is not larger than \textbf{alloc} 
908 the function skips the reallocation part thus saving time.  
909  
910 When a reallocation is performed it is turned into an optimal request to save time in the future. The requested digit count is  
911 padded upwards to 2nd multiple of \textbf{MP\_PREC} larger than \textbf{alloc} (line @25, [email protected]). The XREALLOC function is used  
912 to reallocate the memory. As per the other functions XREALLOC is actually a macro which evaluates to realloc by default. The realloc  
913 function leaves the base of the allocation intact which means the first \textbf{alloc} digits of the mp\_int are the same as before  
914 the reallocation. All that is left is to clear the newly allocated digits and return.  
915  
916 Note that the reallocation result is actually stored in a temporary pointer $tmp$. This is to allow this function to return  
917 an error with a valid pointer. Earlier releases of the library stored the result of XREALLOC into the mp\_int $a$. That would  
918 result in a memory leak if XREALLOC ever failed.  
19  919 
920 \subsection{Initializing Variable Precision mp\_ints}  
921 Occasionally the number of digits required will be known in advance of an initialization, based on, for example, the size  
922 of input mp\_ints to a given algorithm. The purpose of algorithm mp\_init\_size is similar to mp\_init except that it  
923 will allocate \textit{at least} a specified number of digits.  
924  
925 \begin{figure}[here]  
926 \begin{small}  
927 \begin{center}  
928 \begin{tabular}{l}  
929 \hline Algorithm \textbf{mp\_init\_size}. \\  
930 \textbf{Input}. An mp\_int $a$ and the requested number of digits $b$. \\  
931 \textbf{Output}. $a$ is initialized to hold at least $b$ digits. \\  
932 \hline \\  
933 1. $u \leftarrow b \mbox{ (mod }MP\_PREC\mbox{)}$ \\  
934 2. $v \leftarrow b + 2 \cdot MP\_PREC  u$ \\  
935 3. Allocate $v$ digits. \\  
936 4. for $n$ from $0$ to $v  1$ do \\  
937 \hspace{3mm}4.1 $a_n \leftarrow 0$ \\  
938 5. $a.sign \leftarrow MP\_ZPOS$\\  
939 6. $a.used \leftarrow 0$\\  
940 7. $a.alloc \leftarrow v$\\  
941 8. Return(\textit{MP\_OKAY})\\  
942 \hline  
943 \end{tabular}  
944 \end{center}  
945 \end{small}  
946 \caption{Algorithm mp\_init\_size}  
947 \end{figure}  
948  
949 \textbf{Algorithm mp\_init\_size.}  
950 This algorithm will initialize an mp\_int structure $a$ like algorithm mp\_init with the exception that the number of  
951 digits allocated can be controlled by the second input argument $b$. The input size is padded upwards so it is a  
952 multiple of \textbf{MP\_PREC} plus an additional \textbf{MP\_PREC} digits. This padding is used to prevent trivial  
953 allocations from becoming a bottleneck in the rest of the algorithms.  
954  
955 Like algorithm mp\_init, the mp\_int structure is initialized to a default state representing the integer zero. This  
956 particular algorithm is useful if it is known ahead of time the approximate size of the input. If the approximation is  
957 correct no further memory reallocations are required to work with the mp\_int.  
958  
959 EXAM,bn_mp_init_size.c  
960  
961 The number of digits $b$ requested is padded (line @22,MP_[email protected]) by first augmenting it to the next multiple of  
962 \textbf{MP\_PREC} and then adding \textbf{MP\_PREC} to the result. If the memory can be successfully allocated the  
963 mp\_int is placed in a default state representing the integer zero. Otherwise, the error code \textbf{MP\_MEM} will be  
964 returned (line @27,[email protected]).  
965  
142  966 The digits are allocated and set to zero at the same time with the calloc() function (line @25,[email protected]). The 
19  967 \textbf{used} count is set to zero, the \textbf{alloc} count set to the padded digit count and the \textbf{sign} flag set 
968 to \textbf{MP\_ZPOS} to achieve a default valid mp\_int state (lines @29,[email protected], @30,[email protected] and @31,[email protected]). If the function  
969 returns succesfully then it is correct to assume that the mp\_int structure is in a valid state for the remainder of the  
970 functions to work with.  
971  
972 \subsection{Multiple Integer Initializations and Clearings}  
973 Occasionally a function will require a series of mp\_int data types to be made available simultaneously.  
974 The purpose of algorithm mp\_init\_multi is to initialize a variable length array of mp\_int structures in a single  
975 statement. It is essentially a shortcut to multiple initializations.  
976  
977 \newpage\begin{figure}[here]  
978 \begin{center}  
979 \begin{tabular}{l}  
980 \hline Algorithm \textbf{mp\_init\_multi}. \\  
981 \textbf{Input}. Variable length array $V_k$ of mp\_int variables of length $k$. \\  
982 \textbf{Output}. The array is initialized such that each mp\_int of $V_k$ is ready to use. \\  
983 \hline \\  
984 1. for $n$ from 0 to $k  1$ do \\  
985 \hspace{+3mm}1.1. Initialize the mp\_int $V_n$ (\textit{mp\_init}) \\  
986 \hspace{+3mm}1.2. If initialization failed then do \\  
987 \hspace{+6mm}1.2.1. for $j$ from $0$ to $n$ do \\  
988 \hspace{+9mm}1.2.1.1. Free the mp\_int $V_j$ (\textit{mp\_clear}) \\  
989 \hspace{+6mm}1.2.2. Return(\textit{MP\_MEM}) \\  
990 2. Return(\textit{MP\_OKAY}) \\  
991 \hline  
992 \end{tabular}  
993 \end{center}  
994 \caption{Algorithm mp\_init\_multi}  
995 \end{figure}  
996  
997 \textbf{Algorithm mp\_init\_multi.}  
998 The algorithm will initialize the array of mp\_int variables one at a time. If a runtime error has been detected  
999 (\textit{step 1.2}) all of the previously initialized variables are cleared. The goal is an ``all or nothing''  
1000 initialization which allows for quick recovery from runtime errors.  
1001  
1002 EXAM,bn_mp_init_multi.c  
1003  
1004 This function intializes a variable length list of mp\_int structure pointers. However, instead of having the mp\_int  
1005 structures in an actual C array they are simply passed as arguments to the function. This function makes use of the  
1006 ``...'' argument syntax of the C programming language. The list is terminated with a final \textbf{NULL} argument  
1007 appended on the right.  
1008  
1009 The function uses the ``stdarg.h'' \textit{va} functions to step portably through the arguments to the function. A count  
1010 $n$ of succesfully initialized mp\_int structures is maintained (line @47,[email protected]) such that if a failure does occur,  
1011 the algorithm can backtrack and free the previously initialized structures (lines @27,[email protected] to @46,}@).  
1012  
1013  
1014 \subsection{Clamping Excess Digits}  
1015 When a function anticipates a result will be $n$ digits it is simpler to assume this is true within the body of  
1016 the function instead of checking during the computation. For example, a multiplication of a $i$ digit number by a  
1017 $j$ digit produces a result of at most $i + j$ digits. It is entirely possible that the result is $i + j  1$  
1018 though, with no final carry into the last position. However, suppose the destination had to be first expanded  
1019 (\textit{via mp\_grow}) to accomodate $i + j  1$ digits than further expanded to accomodate the final carry.  
1020 That would be a considerable waste of time since heap operations are relatively slow.  
1021  
1022 The ideal solution is to always assume the result is $i + j$ and fix up the \textbf{used} count after the function  
1023 terminates. This way a single heap operation (\textit{at most}) is required. However, if the result was not checked  
1024 there would be an excess high order zero digit.  
1025  
1026 For example, suppose the product of two integers was $x_n = (0x_{n1}x_{n2}...x_0)_{\beta}$. The leading zero digit  
1027 will not contribute to the precision of the result. In fact, through subsequent operations more leading zero digits would  
1028 accumulate to the point the size of the integer would be prohibitive. As a result even though the precision is very  
1029 low the representation is excessively large.  
1030  
1031 The mp\_clamp algorithm is designed to solve this very problem. It will trim highorder zeros by decrementing the  
1032 \textbf{used} count until a nonzero most significant digit is found. Also in this system, zero is considered to be a  
1033 positive number which means that if the \textbf{used} count is decremented to zero, the sign must be set to  
1034 \textbf{MP\_ZPOS}.  
1035  
1036 \begin{figure}[here]  
1037 \begin{center}  
1038 \begin{tabular}{l}  
1039 \hline Algorithm \textbf{mp\_clamp}. \\  
1040 \textbf{Input}. An mp\_int $a$ \\  
1041 \textbf{Output}. Any excess leading zero digits of $a$ are removed \\  
1042 \hline \\  
1043 1. while $a.used > 0$ and $a_{a.used  1} = 0$ do \\  
1044 \hspace{+3mm}1.1 $a.used \leftarrow a.used  1$ \\  
1045 2. if $a.used = 0$ then do \\  
1046 \hspace{+3mm}2.1 $a.sign \leftarrow MP\_ZPOS$ \\  
1047 \hline \\  
1048 \end{tabular}  
1049 \end{center}  
1050 \caption{Algorithm mp\_clamp}  
1051 \end{figure}  
1052  
1053 \textbf{Algorithm mp\_clamp.}  
1054 As can be expected this algorithm is very simple. The loop on step one is expected to iterate only once or twice at  
1055 the most. For example, this will happen in cases where there is not a carry to fill the last position. Step two fixes the sign for  
1056 when all of the digits are zero to ensure that the mp\_int is valid at all times.  
1057  
1058 EXAM,bn_mp_clamp.c  
1059  
1060 Note on line @27,[email protected] how to test for the \textbf{used} count is made on the left of the \&\& operator. In the C programming  
1061 language the terms to \&\& are evaluated left to right with a boolean shortcircuit if any condition fails. This is  
1062 important since if the \textbf{used} is zero the test on the right would fetch below the array. That is obviously  
1063 undesirable. The parenthesis on line @28,a>[email protected] is used to make sure the \textbf{used} count is decremented and not  
1064 the pointer ``a''.  
1065  
1066 \section*{Exercises}  
1067 \begin{tabular}{cl}  
1068 $\left [ 1 \right ]$ & Discuss the relevance of the \textbf{used} member of the mp\_int structure. \\  
1069 & \\  
1070 $\left [ 1 \right ]$ & Discuss the consequences of not using padding when performing allocations. \\  
1071 & \\  
1072 $\left [ 2 \right ]$ & Estimate an ideal value for \textbf{MP\_PREC} when performing 1024bit RSA \\  
1073 & encryption when $\beta = 2^{28}$. \\  
1074 & \\  
1075 $\left [ 1 \right ]$ & Discuss the relevance of the algorithm mp\_clamp. What does it prevent? \\  
1076 & \\  
1077 $\left [ 1 \right ]$ & Give an example of when the algorithm mp\_init\_copy might be useful. \\  
1078 & \\  
1079 \end{tabular}  
1080  
1081  
1082 %%%  
1083 % CHAPTER FOUR  
1084 %%%  
1085  
1086 \chapter{Basic Operations}  
1087  
1088 \section{Introduction}  
1089 In the previous chapter a series of low level algorithms were established that dealt with initializing and maintaining  
1090 mp\_int structures. This chapter will discuss another set of seemingly nonalgebraic algorithms which will form the low  
1091 level basis of the entire library. While these algorithm are relatively trivial it is important to understand how they  
1092 work before proceeding since these algorithms will be used almost intrinsically in the following chapters.  
1093  
1094 The algorithms in this chapter deal primarily with more ``programmer'' related tasks such as creating copies of  
1095 mp\_int structures, assigning small values to mp\_int structures and comparisons of the values mp\_int structures  
1096 represent.  
1097  
1098 \section{Assigning Values to mp\_int Structures}  
1099 \subsection{Copying an mp\_int}  
1100 Assigning the value that a given mp\_int structure represents to another mp\_int structure shall be known as making  
1101 a copy for the purposes of this text. The copy of the mp\_int will be a separate entity that represents the same  
1102 value as the mp\_int it was copied from. The mp\_copy algorithm provides this functionality.  
1103  
1104 \newpage\begin{figure}[here]  
1105 \begin{center}  
1106 \begin{tabular}{l}  
1107 \hline Algorithm \textbf{mp\_copy}. \\  
1108 \textbf{Input}. An mp\_int $a$ and $b$. \\  
1109 \textbf{Output}. Store a copy of $a$ in $b$. \\  
1110 \hline \\  
1111 1. If $b.alloc < a.used$ then grow $b$ to $a.used$ digits. (\textit{mp\_grow}) \\  
1112 2. for $n$ from 0 to $a.used  1$ do \\  
1113 \hspace{3mm}2.1 $b_{n} \leftarrow a_{n}$ \\  
1114 3. for $n$ from $a.used$ to $b.used  1$ do \\  
1115 \hspace{3mm}3.1 $b_{n} \leftarrow 0$ \\  
1116 4. $b.used \leftarrow a.used$ \\  
1117 5. $b.sign \leftarrow a.sign$ \\  
1118 6. return(\textit{MP\_OKAY}) \\  
1119 \hline  
1120 \end{tabular}  
1121 \end{center}  
1122 \caption{Algorithm mp\_copy}  
1123 \end{figure}  
1124  
1125 \textbf{Algorithm mp\_copy.}  
1126 This algorithm copies the mp\_int $a$ such that upon succesful termination of the algorithm the mp\_int $b$ will  
1127 represent the same integer as the mp\_int $a$. The mp\_int $b$ shall be a complete and distinct copy of the  
1128 mp\_int $a$ meaing that the mp\_int $a$ can be modified and it shall not affect the value of the mp\_int $b$.  
1129  
1130 If $b$ does not have enough room for the digits of $a$ it must first have its precision augmented via the mp\_grow  
1131 algorithm. The digits of $a$ are copied over the digits of $b$ and any excess digits of $b$ are set to zero (step two  
1132 and three). The \textbf{used} and \textbf{sign} members of $a$ are finally copied over the respective members of  
1133 $b$.  
1134  
1135 \textbf{Remark.} This algorithm also introduces a new idiosyncrasy that will be used throughout the rest of the  
1136 text. The error return codes of other algorithms are not explicitly checked in the pseudocode presented. For example, in  
1137 step one of the mp\_copy algorithm the return of mp\_grow is not explicitly checked to ensure it succeeded. Text space is  
1138 limited so it is assumed that if a algorithm fails it will clear all temporarily allocated mp\_ints and return  
1139 the error code itself. However, the C code presented will demonstrate all of the error handling logic required to  
1140 implement the pseudocode.  
1141  
1142 EXAM,bn_mp_copy.c  
1143  
1144 Occasionally a dependent algorithm may copy an mp\_int effectively into itself such as when the input and output  
1145 mp\_int structures passed to a function are one and the same. For this case it is optimal to return immediately without  
1146 copying digits (line @24,a == [email protected]).  
1147  
1148 The mp\_int $b$ must have enough digits to accomodate the used digits of the mp\_int $a$. If $b.alloc$ is less than  
1149 $a.used$ the algorithm mp\_grow is used to augment the precision of $b$ (lines @29,[email protected] to @33,}@). In order to  
1150 simplify the inner loop that copies the digits from $a$ to $b$, two aliases $tmpa$ and $tmpb$ point directly at the digits  
1151 of the mp\_ints $a$ and $b$ respectively. These aliases (lines @42,[email protected] and @45,[email protected]) allow the compiler to access the digits without first dereferencing the  
1152 mp\_int pointers and then subsequently the pointer to the digits.  
1153  
1154 After the aliases are established the digits from $a$ are copied into $b$ (lines @48,[email protected] to @50,}@) and then the excess  
1155 digits of $b$ are set to zero (lines @53,[email protected] to @55,}@). Both ``for'' loops make use of the pointer aliases and in  
1156 fact the alias for $b$ is carried through into the second ``for'' loop to clear the excess digits. This optimization  
1157 allows the alias to stay in a machine register fairly easy between the two loops.  
1158  
1159 \textbf{Remarks.} The use of pointer aliases is an implementation methodology first introduced in this function that will  
1160 be used considerably in other functions. Technically, a pointer alias is simply a short hand alias used to lower the  
1161 number of pointer dereferencing operations required to access data. For example, a for loop may resemble  
1162  
1163 \begin{alltt}  
1164 for (x = 0; x < 100; x++) \{  
1165 a>num[4]>dp[x] = 0;  
1166 \}  
1167 \end{alltt}  
1168  
1169 This could be rewritten using aliases as  
1170  
1171 \begin{alltt}  
1172 mp_digit *tmpa;  
1173 a = a>num[4]>dp;  
1174 for (x = 0; x < 100; x++) \{  
1175 *a++ = 0;  
1176 \}  
1177 \end{alltt}  
1178  
1179 In this case an alias is used to access the  
1180 array of digits within an mp\_int structure directly. It may seem that a pointer alias is strictly not required  
1181 as a compiler may optimize out the redundant pointer operations. However, there are two dominant reasons to use aliases.  
1182  
1183 The first reason is that most compilers will not effectively optimize pointer arithmetic. For example, some optimizations  
1184 may work for the Microsoft Visual C++ compiler (MSVC) and not for the GNU C Compiler (GCC). Also some optimizations may  
1185 work for GCC and not MSVC. As such it is ideal to find a common ground for as many compilers as possible. Pointer  
1186 aliases optimize the code considerably before the compiler even reads the source code which means the end compiled code  
1187 stands a better chance of being faster.  
1188  
1189 The second reason is that pointer aliases often can make an algorithm simpler to read. Consider the first ``for''  
1190 loop of the function mp\_copy() rewritten to not use pointer aliases.  
1191  
1192 \begin{alltt}  
1193 /* copy all the digits */  
1194 for (n = 0; n < a>used; n++) \{  
1195 b>dp[n] = a>dp[n];  
1196 \}  
1197 \end{alltt}  
1198  
1199 Whether this code is harder to read depends strongly on the individual. However, it is quantifiably slightly more  
1200 complicated as there are four variables within the statement instead of just two.  
1201  
1202 \subsubsection{Nested Statements}  
1203 Another commonly used technique in the source routines is that certain sections of code are nested. This is used in  
1204 particular with the pointer aliases to highlight code phases. For example, a Comba multiplier (discussed in chapter six)  
1205 will typically have three different phases. First the temporaries are initialized, then the columns calculated and  
1206 finally the carries are propagated. In this example the middle column production phase will typically be nested as it  
1207 uses temporary variables and aliases the most.  
1208  
1209 The nesting also simplies the source code as variables that are nested are only valid for their scope. As a result  
1210 the various temporary variables required do not propagate into other sections of code.  
1211  
1212  
1213 \subsection{Creating a Clone}  
1214 Another common operation is to make a local temporary copy of an mp\_int argument. To initialize an mp\_int  
1215 and then copy another existing mp\_int into the newly intialized mp\_int will be known as creating a clone. This is  
1216 useful within functions that need to modify an argument but do not wish to actually modify the original copy. The  
1217 mp\_init\_copy algorithm has been designed to help perform this task.  
1218  
1219 \begin{figure}[here]  
1220 \begin{center}  
1221 \begin{tabular}{l}  
1222 \hline Algorithm \textbf{mp\_init\_copy}. \\  
1223 \textbf{Input}. An mp\_int $a$ and $b$\\  
1224 \textbf{Output}. $a$ is initialized to be a copy of $b$. \\  
1225 \hline \\  
1226 1. Init $a$. (\textit{mp\_init}) \\  
1227 2. Copy $b$ to $a$. (\textit{mp\_copy}) \\  
1228 3. Return the status of the copy operation. \\  
1229 \hline  
1230 \end{tabular}  
1231 \end{center}  
1232 \caption{Algorithm mp\_init\_copy}  
1233 \end{figure}  
1234  
1235 \textbf{Algorithm mp\_init\_copy.}  
1236 This algorithm will initialize an mp\_int variable and copy another previously initialized mp\_int variable into it. As  
1237 such this algorithm will perform two operations in one step.  
1238  
1239 EXAM,bn_mp_init_copy.c  
1240  
1241 This will initialize \textbf{a} and make it a verbatim copy of the contents of \textbf{b}. Note that  
1242 \textbf{a} will have its own memory allocated which means that \textbf{b} may be cleared after the call  
1243 and \textbf{a} will be left intact.  
1244  
1245 \section{Zeroing an Integer}  
1246 Reseting an mp\_int to the default state is a common step in many algorithms. The mp\_zero algorithm will be the algorithm used to  
1247 perform this task.  
1248  
1249 \begin{figure}[here]  
1250 \begin{center}  
1251 \begin{tabular}{l}  
1252 \hline Algorithm \textbf{mp\_zero}. \\  
1253 \textbf{Input}. An mp\_int $a$ \\  
1254 \textbf{Output}. Zero the contents of $a$ \\  
1255 \hline \\  
1256 1. $a.used \leftarrow 0$ \\  
1257 2. $a.sign \leftarrow$ MP\_ZPOS \\  
1258 3. for $n$ from 0 to $a.alloc  1$ do \\  
1259 \hspace{3mm}3.1 $a_n \leftarrow 0$ \\  
1260 \hline  
1261 \end{tabular}  
1262 \end{center}  
1263 \caption{Algorithm mp\_zero}  
1264 \end{figure}  
1265  
1266 \textbf{Algorithm mp\_zero.}  
1267 This algorithm simply resets a mp\_int to the default state.  
1268  
1269 EXAM,bn_mp_zero.c  
1270  
1271 After the function is completed, all of the digits are zeroed, the \textbf{used} count is zeroed and the  
1272 \textbf{sign} variable is set to \textbf{MP\_ZPOS}.  
1273  
1274 \section{Sign Manipulation}  
1275 \subsection{Absolute Value}  
1276 With the mp\_int representation of an integer, calculating the absolute value is trivial. The mp\_abs algorithm will compute  
1277 the absolute value of an mp\_int.  
1278  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1279 \begin{figure}[here] 
19  1280 \begin{center} 
1281 \begin{tabular}{l}  
1282 \hline Algorithm \textbf{mp\_abs}. \\  
1283 \textbf{Input}. An mp\_int $a$ \\  
1284 \textbf{Output}. Computes $b = \vert a \vert$ \\  
1285 \hline \\  
1286 1. Copy $a$ to $b$. (\textit{mp\_copy}) \\  
1287 2. If the copy failed return(\textit{MP\_MEM}). \\  
1288 3. $b.sign \leftarrow MP\_ZPOS$ \\  
1289 4. Return(\textit{MP\_OKAY}) \\  
1290 \hline  
1291 \end{tabular}  
1292 \end{center}  
1293 \caption{Algorithm mp\_abs}  
1294 \end{figure}  
1295  
1296 \textbf{Algorithm mp\_abs.}  
1297 This algorithm computes the absolute of an mp\_int input. First it copies $a$ over $b$. This is an example of an  
1298 algorithm where the check in mp\_copy that determines if the source and destination are equal proves useful. This allows,  
1299 for instance, the developer to pass the same mp\_int as the source and destination to this function without addition  
1300 logic to handle it.  
1301  
1302 EXAM,bn_mp_abs.c  
1303  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1304 This fairly trivial algorithm first eliminates nonrequired duplications (line @27,a != [email protected]) and then sets the 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1305 \textbf{sign} flag to \textbf{MP\_ZPOS}. 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1306 
19  1307 \subsection{Integer Negation} 
1308 With the mp\_int representation of an integer, calculating the negation is also trivial. The mp\_neg algorithm will compute  
1309 the negative of an mp\_int input.  
1310  
1311 \begin{figure}[here]  
1312 \begin{center}  
1313 \begin{tabular}{l}  
1314 \hline Algorithm \textbf{mp\_neg}. \\  
1315 \textbf{Input}. An mp\_int $a$ \\  
1316 \textbf{Output}. Computes $b = a$ \\  
1317 \hline \\  
1318 1. Copy $a$ to $b$. (\textit{mp\_copy}) \\  
1319 2. If the copy failed return(\textit{MP\_MEM}). \\  
1320 3. If $a.used = 0$ then return(\textit{MP\_OKAY}). \\  
1321 4. If $a.sign = MP\_ZPOS$ then do \\  
1322 \hspace{3mm}4.1 $b.sign = MP\_NEG$. \\  
1323 5. else do \\  
1324 \hspace{3mm}5.1 $b.sign = MP\_ZPOS$. \\  
1325 6. Return(\textit{MP\_OKAY}) \\  
1326 \hline  
1327 \end{tabular}  
1328 \end{center}  
1329 \caption{Algorithm mp\_neg}  
1330 \end{figure}  
1331  
1332 \textbf{Algorithm mp\_neg.}  
1333 This algorithm computes the negation of an input. First it copies $a$ over $b$. If $a$ has no used digits then  
1334 the algorithm returns immediately. Otherwise it flips the sign flag and stores the result in $b$. Note that if  
1335 $a$ had no digits then it must be positive by definition. Had step three been omitted then the algorithm would return  
1336 zero as negative.  
1337  
1338 EXAM,bn_mp_neg.c  
1339  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1340 Like mp\_abs() this function avoids nonrequired duplications (line @21,a != [email protected]) and then sets the sign. We 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1341 have to make sure that only nonzero values get a \textbf{sign} of \textbf{MP\_NEG}. If the mp\_int is zero 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1342 than the \textbf{sign} is hardcoded to \textbf{MP\_ZPOS}. 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1343 
19  1344 \section{Small Constants} 
1345 \subsection{Setting Small Constants}  
1346 Often a mp\_int must be set to a relatively small value such as $1$ or $2$. For these cases the mp\_set algorithm is useful.  
1347  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1348 \newpage\begin{figure}[here] 
19  1349 \begin{center} 
1350 \begin{tabular}{l}  
1351 \hline Algorithm \textbf{mp\_set}. \\  
1352 \textbf{Input}. An mp\_int $a$ and a digit $b$ \\  
1353 \textbf{Output}. Make $a$ equivalent to $b$ \\  
1354 \hline \\  
1355 1. Zero $a$ (\textit{mp\_zero}). \\  
1356 2. $a_0 \leftarrow b \mbox{ (mod }\beta\mbox{)}$ \\  
1357 3. $a.used \leftarrow \left \lbrace \begin{array}{ll}  
1358 1 & \mbox{if }a_0 > 0 \\  
1359 0 & \mbox{if }a_0 = 0  
1360 \end{array} \right .$ \\  
1361 \hline  
1362 \end{tabular}  
1363 \end{center}  
1364 \caption{Algorithm mp\_set}  
1365 \end{figure}  
1366  
1367 \textbf{Algorithm mp\_set.}  
1368 This algorithm sets a mp\_int to a small single digit value. Step number 1 ensures that the integer is reset to the default state. The  
1369 single digit is set (\textit{modulo $\beta$}) and the \textbf{used} count is adjusted accordingly.  
1370  
1371 EXAM,bn_mp_set.c  
1372  
190
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1373 First we zero (line @21,mp_[email protected]) the mp\_int to make sure that the other members are initialized for a 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1374 small positive constant. mp\_zero() ensures that the \textbf{sign} is positive and the \textbf{used} count 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1375 is zero. Next we set the digit and reduce it modulo $\beta$ (line @22,MP_[email protected]). After this step we have to 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1376 check if the resulting digit is zero or not. If it is not then we set the \textbf{used} count to one, otherwise 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1377 to zero. 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1378 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1379 We can quickly reduce modulo $\beta$ since it is of the form $2^k$ and a quick binary AND operation with 
d8254fc979e9
Initial import of libtommath 0.35
Matt Johnston <matt@ucc.asn.au>
parents:
142
diff
changeset

1380 $2^k  1$ will perform the same operation. 
19  1381 
1382 One important limitation of this function is that it will only set one digit. The size of a digit is not fixed, meaning source that uses  
1383 this function should take that into account. Only trivially small constants can be set using this function.  
1384  
1385 \subsection{Setting Large Constants}  
1386 To overcome the limitations of the mp\_set algorithm the mp\_set\_int algorithm is ideal. It accepts a ``long''  
1387 data type as input and will always treat it as a 32bit integer.  
1388  
1389 \begin{figure}[here]  
1390 \begin{center}  
1391 \begin{tabular}{l}  
1392 \hline Algorithm \textbf{mp\_set\_int}. \\  
1393 \textbf{Input}. An mp\_int $a$ and a ``long'' integer $b$ \\  
1394 \textbf{Output}. Make $a$ equivalent to $b$ \\  
1395 \hline \\  
1396 1. Zero $a$ (\textit{mp\_zero}) \\  
1397 2. for $n$ from 0 to 7 do \\  
1398 \hspace{3mm}2.1 $a \leftarrow a \cdot 16$ (\textit{mp\_mul2d}) \\ 