i
•
|(0
[]
Software can be used to generate elementary functions. Michael A n d r e w s and Thomas Mraz describe an algorithm for applications sensitive to memory but insensitive to speed Software for the CORDIC algorithm as applied to microprocessors is described. For applications sensitive to m e m o r y but insensitive to speed, CORDIC can be used as a universal'subroutine to generate several elementary functions. The universality o f the actual code applies to each o f the elementary functions. The only difference for each function remains in the argument reduction and parameter scaling. A n example for generating sine and cosine functions on a 6800 microprocessor is described.
The CORDIC routine is a crossaddition-based algorithm capable of generating the elementary functions sin, cos, tan, arctan, sinh, cosh, tanh, arctanh, In, exp and square root, as well as performing the operations of multiplication, division and conversion between mixed-radix number systems. Developed in 1956 by Voider I , the first use of the crossaddition algorithm for function generation was in the CORDIC (coordinated rotation digital computer) in satellite-positioning systems. Schmid 2 later demonstrated a technique for performing decimal-to-binary conversion with CORDIC in mixed-radix conversion. Hewlett-Packard have incorporated a hardware implementation of the CORDIC algorithm in the HP9100 series desktop calculators as reported by Walther 3 . A brief comparison of the Cordic algorithm with other algorithms for elementary functions can be found in reference 4. Here, experiences with ~/X on the 8080 and PDP11/20 are described. Of course, if hardware is no problem, vendors can now supply 'numbercrunching' chips like the National Semiconductor 57109 s or the Advanced Micro Devices 95116. However, these devices still require considerable software to format the arguments and access the peripheral hardware. Because of the limited amount of memory available to microprocessor systems, the relatively low byte count necessary for the software realiziation of the CORDIC algorithm makes it particularly well suited as a unified elementary function generator for microprocessors. The remainder of this paper describes the development of a fixed-point software routine to calculate sin and cos for any angle between 0 ° and 90 ° using a M6800 microprocesso.r system. Also discussed are the results obtained from the routine along with software considerations related to program length.
CORDIC ALGORITHM The iterative equations used in trigonometric computations are discussed first. To this end, assume sin and cos are to be calculated for the given input argument 8. The input argument 0 establishes a fixed vector, labelled A in Figure 1, whose magnitude is not important. (A magnitude of unity is desirable since the vertical and horizontal projections obtain sin0 and cos0 directly.) To find the desired functions, the CORDIC algorithm begins with a Department of Computer Science, Colorado State University, Fort Collins, CO 80523, USA
270
0141
Unknownfixed vector
_ / ~
/ /
/;
" .
\
Rototionolvector
\
cos
0
Figure 1. Vectors and arguments used in the CORDIC algorithm
rotational vector at 0 ° with a magnitude equal to 1/kn ; where k n is a constant defined later. The rotational vector, labelled B for convenience, is then rotated through a predetermined angle, cz, as shown in Figure 1. After the rotation, the difference between the argument of A and the argument of B or (arg,4 - argB) is found. The sign of (argA - argB) is used to indicate which direction vector B should be rotated next, so as to move closer to the position of vector A. If (argA - argB) > 0, as is the case in Figure 1, vector B has not rotated far enough and must be rotated in the positive (anticlockwise) direction again. However, if (argA - - argB) < 0, vector B has overshot vector A and the next rotation must be in the negative (clockwise) direction. The angle of each successive rotation is approximately half that of the previous rotation. Thus, by monitoring the sign of (argA argB), B will gradually converge on A. This is indicated by (argA -- argB) being forced to zero. To determine sin0 and cos0, the CORDIC algorithm keeps track of the rotational sequence of vector B until it converges upon vector A. When vector B has converged on vector A , the vertical and horizontal projections of B are identical to those of A. Hence, sin0 and cos0 can be found in two accumulators tracking the movement of vector B. Figure 2 represents a typical rotational sequence for the ith iteration I. The projections of vector Bi on the X and Y axes are given in equations (I) and (2)
9331/78/050270- 05 :$02.00 © 1978 IPC Business Press
microprocessors and microsystems
plished by merely shifting Xi or Yi i bits to the right. Because the only operations required are the shifts and acids or subtracts indicated in equations (9) and (10) the CORDIC algorithm can be easily implemented on a microprocessor system. Another important quantity, Z, contains the accumulation of the ei rotations and is represented by
v,.
Zi + l = Zi + ei
Bi,. -i
where the value for each ei has been chosen 2 as shown in equation (12) so that the rotational sequence converges as quickly as possible
YI÷,
ei = tan'lP "i
Xi
Xi~,
By solving equatiqns (9), (10) and (I I) using the relation between ei and p-1 as stated in equation (I 2), equations (I 3), (14) and (15) are obtained for n iterations
Figure 2. Vector B movement Xi = Bi cos argBi
(1)
Yi = Bi sin argBi
(2)
Once the vector is rotated through the angle + ei (top sign indicates anticlockwise rotation), the new X and Y axes projections become X i + 1 = Bi X/(1 + p.2i) cos (argBi + ei)
(3)
Yi + 1 = Bi x~ (1 + p.=i) sin (argBi -+ el)
(4)
A constant p is selected (the radix of the microprocessor) and ei is chosen so that cosei = (1 + p-2i) -1/2 to facilitate further computation. The term radix refers to the base of a number system. Microprocessors operate in binary, or base 2, and thus have a radix of 2. By applying identities I1 and 12 (see Appendix A) to equations (3) and (4) respectively, the expressions for X i + i and Yi + i can be written as X,.+,
~+,
(12)
~_--x
/
Xi+,
(11)
= Bi X~ (1 + p'2i)(cos argBi cosei -Tsin argBi sins/)
(5)
= Bi X/(1 + p'2i)(cosei sin argBi + sins/cos argBi)
(6)
By substituting the values given by equation (1) and (2) for the cos argB and sin argBi into (5) and (6), and making use of relationships (7) and (8), the iteration equations given in (9) and (10) result
Xn = K n [ X o cos(e)- Yo sine]
(13)
Yn = Kn[ Yo cos(e) +Xo sine]
(14)
Zn : Zo + e
(15)
The values ofXo, Yo and Zo represent the initial values of the X, Y and Z registers, e is the cumulative motion of vector B and Kn is the growth constant associated with the rotation as in equations (3) and (4). They are given by (16) and (17) respectively n-1 e = ~ -+ ei (16) i=0 Kn : n~lx/(1 +p-2i) i=0
(17)
The CORDIC algorithm computes expressions (13), (14) and (15) via the iteration equations stated in (9), (10) and (11). The direction of rotation is determined before each iteration by checking the sign of the quantity in the Z register. The final goal is the reduction of Zn in equation (15) to zero. It is important to initialize the X, Yand Z registers. By setting Xo = 1/K,, Yo = 0, and Zo = 0, the relationships generated by the CORDIC algorithm given in equations (13), (14) and (15) by forcing Zn to zero, reduce to the following
X. -= cos0
(18)
Yn : sin0
(19)
z.
(2o)
1 = 0
-i P sine/ - X/(I + p-2i)
(8)
Thus, registers X and Y contain the desired values of cos0 and sin0 respectively.
Xi + ] = X/g_ p-i Yi
(9)
ARGUMENT
Yi + i = Yi + p-i X i
(10)
Since p represents the radix of the number syste.m being us.ed (p = 2 for binary), the multiplication of p-' X i or p4 Yi indicated in equations (9) and (10) can be accom-
vol 2 no 5 october 78
REDUCTION
The software described here requires that the input argument for which the sin and cos are to be calculated satisfies two stipulations. The first stipulation restricts the input argument to the first quadrant. The second stipulation imposed on the input argument is that it must be prescaled by multiplying a fixed constant.
271
Table 1. A n g l e iteration equivalence i
Se,8OO%x 1
&i
_~ 0 1 2 3 4 5
~26.6 ° ~14.0 ° ~ 7.1 ° ~ 3.6 ° ~ 0.89 o
6 7
~, 0 . 4 5 ° ~ 0.22 °
xo
90 °
: Y: OOOOHEx
45 °
®
s t:
x:ooo%,
: Y : 800OHE x
(~) I initialize x ~ Y I reg=sters
7
[
@[ 5
~188.8 °
@[ Shift X=,& Y., I Restricting the input argument to between 0 ° and 90 °, results in slightly less complicated software, thus occupying less memory, and greater precision is obtained. The reasons for using simpler software and the need for better accuracy are clarified by the following discussion. Table 1 depicts the angles of rotation (i.e. the c~is) for a few values of i. By examining the values given in Table 1, it becomes clear that the largest input argument the rotational vector can converge on is slightly greater than 180 °. Therefore, any input argument between 180 ° and 360 ° must be represented as a negative angle to coyerall four quadrants. Recalling the previous discussion on testing the sign of (argA - argB) to determine whether the rotational vector overshoots or undershoots the fixed vector, it can be seen that problems arise if argA is allowed to be negative. For argA < 0, (argA - argB) > 0 indicates that the rotational vector has overshot instead of undershot the fixed vector. Therefore, additional software becomes necessary to test and remember the sign of the input angle so that a proper determination of whether the rotational vector overshot or undershot the fixed vector can be made. For an unrestricted angle it is also necessary for the angle of the first rotation step to be 90 ° . However, by restricting the input argument to the first quadrant, the 90 ° rotation is superfluous and can therefore be eliminated. Because the number of iterations, or rotations, is limited by the finite wordlength (for a 16-bit signed word, 15 rotations, or iterations, are necessary), eliminating the 90 ° rotation would allow the angle of the last iteration to be about half the size than if the 90 ° rotation was performed. Thls results in slightly increased accu racy. For scaling, it is necessary to take advantage of the full wordlength to obtain the greatest possible accuracy. The scale factor is that value, which when multiplied by the maximum allowable angle, in this case 90 °, is necessary to obtain the maximum positive binary representation, which for a16-bit signed word is 0111 1111 1111 1111 binary or 7FFF in hexadecimal notation. Both the input argument and rotational constants, oqs, must be scaled if they are to be consistent. The maximum decimal value which can be represented by a 16-bit signed binary word is 32 767. Therefore, for 90 ° as the maximum allowable angle, the scale factor is 32 76?/90 or approximately 364.
right i bits
® Xi ÷ Ysi"-~Xi.I
x~ - Y,i ---x,.,
Yi - x.r-~Yi., zi , , , ~ - ~ z . ,
z, - ~/ ~ z , . ,
Y~ + x=j -~Y,.l
1
Figure 3. Program flowchart is 0 ° or 90 ° . Should 0 be 0 ° or 90 ° the X and Y registers are set to 8000(hex) and 0000(hex), or O000(hex) and 8000(hex) to give the correct answers for cosO° and sin0 °, or cos90 ° and sin90 ° respectively. The 0 ° and 90 ° traps are necessary to eliminate the poor results obtained by generating a function of zero, as for sinO° or cos90 °. If the input argument is neither 0 ° nor 90 ° the program will initialize the X and Y registers to 4DBA(hex) and 0000(hex) respectively. The quantity 4DBA is the hexadecimal notation of 1/Kn ; where K n is computed using equation (17) with the radix p. = 2, and n = 15 iterations. X i and Yi are multiplied by 2" by moving the contents of the X and Y registers to Xsi and Ysi, used as shift registers. A shifting subroutine then shifts X i and Y/ / bits to the right, thus achieving the desired multiplication. Decision block [7] checks the sign of the Z register to determine the direction of the next rotation, and hence the proper sign for computing 2(/" + l, Yi + i and Zi + i. Blocks [8j and [9l are added and subtracted by a single subroutine which is called three times to calculate X i + ~, Fi + I and Z i + ~. After the / + l t h values for the X, Y and Z registers are computed, thereby effectively performing a single rotation sequence, the iteration counter i is incremented and the routine repeated for 15 iterations (i= 0 14). When decision block [11l determines that 15 iterations have been performed, the routine exits with cos0 and sin0 in the X and Y registers respectively.
FLOWCHART The flowchart of Figure 3 describes the procedure used in calculating sin0 and cos0. It is assumed that angle 0 has been placed as the initial value in the Z register. Decision blocks [1J and [2] first check whether the input argument
272
PROGRAM The program developed to determine the sin and cos for a given angle based on the CORDIC algorithm is partitioned into five sections as follows
microprocessors and microsystems
R ESU LTS
Table 2. Program subparts Section
Bytes of opcode
traps initialization shifting addition and/or subtraction ei constants
27 13 30 63 30
Table 3. CORDIC precision 0 ° decimal
0 scaled hex
Accuracy, bits cos0 sin0
0 10 20 30 40 50
0000 0E38 1C71 2AAA 38E3 471B 5554 638D 71C6 7FFF
16 14 14 13 14 12
16 13 12 16 12 14
16 12 13
13 15 14
16
16
60 20 80
90
The results obtained for various input arguments are summarized in Table 3, which depicts the accuracy obtained for the calculated sin and cos values. The hexadecimal values in Table 3 were obtained by multiplying the angle by the scale factor of 364, rounding to the nearest whole number and converting to hexadecimal values. The approximate speed of execution has also been calculated by determining the total number of instruction cycles for a complete execution and assuming an instruction cycle time of 1hrs. The approximate execution time was found to be 8.3 ms (scaling not included). The actual code for the algorithm used here is presented in Appendix B. The numerical code and constraints for other elementary functions are currently being developed.
ACKNOWLEDGEMENT This work was partially supported by the National Science Foundation under grant NSF MCS 76-12457.
REFERENCES 1 Weisberger,A J and Toal, T 'Tough mathematical tasks are child's play for number cruncher' Vol 50
• • • • •
0 ° and 90 ° traps initialization of X and Y registers shifting addition and/or subtraction constant storage (eis)
A listing of the number of bytes of opcode for each section is provided in Table 2. Along with the 163 bytes needed in PROM or ROM, 16 consecutive bytes are also required in RAM for register and counter storage. The program uses two subroutines during execution. The first subroutine is called twice to shift the X s and Ys registers. The second subroutine performs the addition and/or subtraction necessary in computing X i + 1, Yi + 1 and Z i + 1, and is therefore called three times during each iteration. The number of bytes of opcode for the first and second subroutines have been included in the figures given in Table 2 for the shifting and addition and/or subtraction routines, respectively.
APPENDIX A Identity I1 cos(X -+ Y) = cosX cos Y -T-sinX sin Y Identity 12 sin(X + Y) = sinX cos Y -+cosX sin Y
A P P E N D I X B. P R O G R A M CODE The program code which computes the sine and cosine of an angle via the CORDIC algorithm is presented in this section. As written, the code requires that the program reside in memory locations 2000(hex) to 20A2(hex), and that RAM locations 000E(hex) to 001D(hex) be free for temporary register storage. Should these memory stipulations be undesirable, the code can be altered to use any memory space convenient, so long as 16 contiguous locations in direct addressable (page O) RAM are allotted for temporary register storage. The notation used consists of standard M6800 instruction mnemonics with additional symbols
vol 2 no 5 october 78
Eleclronics (17 February 1977) pp 102-107
2
Voider, J E 'The Cordic trigonometric computing technique' IEEE Trans. Electron. Comput. Vol EC-8 No 5 (September 1959) pp 330-334
BIBLIOGRAPHY Andrews, M 'Influence of architecture on numerical algorithms' Microprocessors Vol 2 No 3 (June 1978) pp 130--138
Osborne, A 'Number crunching: two hardware solutions' Kilobaud No 17 (May 1978) pp 84-89
Schmid, H 'BCD trig and hyperbolic functions' Electron. Des. Vol 19 (13 September 1973) pp 68-74
Walther, J S 'A unified algorithm for elementary functions' Spring Joint Comput. Conf., A FIPS Proc. Vol 38 (1971) pp 379-385
defined as follows: # indicates immediate addressing; (yz) or (wxyz) are used to indicate storing, or loading, the contents of the direct addressed location yz(hex), or extended addressed location wxyz(hex); and index addressing is indicated by'mnemonic x,ab where ab(hex) is the offset. Also given are the hexadecimal values of the scaled a constants used by the program and the format for the necessary RAM or temporary storage locations. The only initialization necessary before executing the program is to load the scaled value of the desired angle in the ZH&L register (locations 0016 and 0017 as presented). Because of the 0 ° and 90 ° traps, the program assumes any unscaled angle in the range 0 ° to 1 o, and 89 ° to 90 ° as 0 ° and 90 ° respectively. When complete, the cos and sin values are left in the temporary x and y registers respectively and the routine exits by branching to the location immediately following the e/constants (location 20A3 as presented).
273
A ddres~
Label
/nstru( tion
CO[?TIT2elItS
2000
S]'ART
LDX gO000 STX (12)
CLEAR X AND Y REGISTERS
SUB
CHKREG
STX (14)
TSTZ
JUMP INIT
AGIN
LDS #8000 LDAA (16) BNE TO TSTZ STS (12) BRA TO JUMP CMPA #TF BNE TO INIT STS (14) JMP TO EXIT LDX #207F STX (1C) LDX #4DBA STX (12) CLRB STAB (1]) LDX (12) STX (1A) LDX (14)
ADD
CHECK IF ZH - 00 ZIS BETWEEN 0 ° AND 1° S E T X - 8000, Y - 0000, & EXIT C H E C K I F Z H - 7F Z IS BETWEEN 89 ° AND 90 ° SETX = 0000, Y - 8000, & EXIT INITIALIZE u INDEX REGISTER
2084
SETSP
TESTZ
2085
INITIALIZE X - 1/Kn INITIALIZE i - 0
MOVE Yi -~ Ysi
Xi -~ X~i
GET~X STR~X
2063
SR1 TESTA
RTRN SR2
274
POSITION INDEX REGISTER AND STACK POINTER SHIFT Ysli BITS RIGHT SET INDEX REGISTER TO Xsi SHIFTXs/i BITS RIGHT SET ADD COUNTER TO 01 SET SP TO PULLX/ GET Xi, OR Y/, OR Zi BRANCH IF ADD COUNTER = FF CHECK SIGN OF Z GO TO ADD/SUB SUBROUTINE STORE XiH , OR Y/H, OR ZiH
CHECK IF ADD COUNTER = FF
OR SUBTRAC ]ION ONX, Y A N D Z REGISTERS
Table of scaled ~i constants
STX (18) LDX #~010 TXS BSR to SR1 INX INX BSR to SR1 INCA STAA (10) TXS PULA PULB BMI TO GETGX TST (0016) BSR TO SR2 PSHB PSHA INX INX LDAA (10) BMI TO STR~X DECA STAA (l O) BRA TO SETSP LDX (1C) BRA TO TESTZ STX (IC) LDAB (11) INCB CMPB #OF BEQ TO EXIT BRA TO AGIN LDAA (11) BEQ TO RTRN ASR X,08 ROR X,09 DECA BRA TO TESTA RTS BMI TO CHKREG TST (0010) BEQ TO ADD
SUB X,07 SBCA X,06 RTS TST (0010) BEQ TO SUB ADDB X,07 ADCA X,06 RTS
20A2 20A3
EXIT
40 00 25 C8 13 F6 0A 22 05 16 02 8C 01 46 00 A3 O0 51 00 29 00 14 00 0A 00 05 00 02 00 01 - -
~oH ~IH OqL (~2H ~X2L
13H 13L
14H 14L
DECREMENT ADD COUNTER REPEAT FOR Y~, OR Z/ SET INDEX REGISTER TO ~/ STORE ~i+~ INDEX REGISTER VALUE INCREMENT/ I F i = 15, EXIT i ~ 15, REPEAT SUBROUTINE #1 SHIFT Ysi AND Xsi i BITS RIGHT
SUBROUTINE #2
PERFORM ADDITION
Temporary register configuration Address Contents O00E OF 10 11 12 13 14
SP SCRATCHPAD SP SCRATCHPAD ADD COUNTER i
15 16
XH XL YH YL ZH
17
ZL
18
YSH YSL XSH
19 1A 1B 1¢ 1D
XSL
~ INDEX REGISTER VALUE (~X)
microprocessors and microsystems