Pseudorandom number generators for Salford FTN77

Pseudorandom number generators for Salford FTN77

Computer Physics Communications Computer Physics Communications 81 (1994) 237-247 ELSEVIER Pseudorandom number generators for Salford FTN77 Kenneth ...

794KB Sizes 2 Downloads 151 Views

Computer Physics Communications Computer Physics Communications 81 (1994) 237-247

ELSEVIER

Pseudorandom number generators for Salford FTN77 Kenneth G. Hamilton

1

Garjak Research, Inc., 5330 Carroll Canyon Rd, San Diego, CA 92121, USA Received 4 January 1994

Abstract

Previous articles compared and analyzed the pseudorandom number generators that are delivered with offthe-shelf Fortran compilers for personal computers, and updated two CPC library elements to support multiple Fortran compilers. In the current work, the standard generator that is provided with Salford Software's FTN77 compiler is examined, and the two existing CPC generators are migrated to use that compiler's in-line assembler feature. Statistical analyses are presented that pertain to both the current software, and to that o f the previous articles.

NEW VERSION SUMMARY Title of new version: RANSAL Catalogue number. ACTO Program obtainable from: CPC Program Library, Queen's University, of Belfast, N. Ireland (see application form in this issue)

Operating system: MSDOS or equivalent Programming language used: Mixed Fortran-77 and assembly language [ 5 ] Memory required to execute with typical data: Generally, 400-1000 bytes, depending on the version being used No. of bits in a word: 32

Reference to original programs: TRCG, ACARRYPC, RANTAUMAR, and RANTAUMAR2

No. of lines in distributed program, including test data, etc.: 982

Authors of original programs: Ting-Wai Chiu and Tian-Shin Guu (TRCG) [3], and George Marsaglia, B. Narasimhan and Arif Zaman (ACARRYPC) [4]. This is a conversion of a previously-published package by the current author [1,2] based upon those library items.

Keywords: random numbers, random number generators, Monte Carlo, Tausworthe, lagged-Fibonaeei Nature of the physical problem Any Monte Carlo or other calculation requiring a uniform pscudorandom number generator.

Licensing provisions: Standard CPC license Computer for which the program is designed: IBM PC and compatibles, using Intel 386 or higher processors

Method of solution Pseduorandom numbers belonging to a uniform distribution are calculated using the Tausworthe and lagged Fibonacci methods.

1 Correspondence to: K.G. Hamilton, P.O. Box 9388, San Diego, CA 92169-0388, USA. E-mail: 72727.177 @compuserve.com.

Reasons for the new version The Tausworthe generator of Chiu and Guu and the lagged-

0110-4655/94/$07.00 (~ 1994 Elsevier Science B.V. All fights reserved SSDI Ol lO-4655(94)OOO12-Q

238

K.G. Hamilton / Computer Physics Communications 81 (1994) 237-247

Fibonacci generator of Marsaglia et al., were modified in Refs. [1,2] to make them callable by a variety of PC Fortran compilers, using both real and 32-bit protected mode. The Salford FTN77 compiler generates a non-standard object module format, which cannot be linked to subroutines written in Microsoft MASM. As a substitute, FTN77 includes an in-line assembler that provides a MASM-like functionality. Since it was not possible to directly adapt the existing CPC library software to link with FTN77 calling programs, the algorithms were recoded to match the in-line assembler. Compilers which are now supported: Salford FTN77 v2.70 Typical running time Between 0.8 and 1.8 microseconds per pseudorandom number, depending on version of the routine, using an Intel 486DX2-66 CPU. Unusual features of the program Uses in-line assembler capability of Salford FTN77. References [ 1] T.W. Chiu and T.S. Guu, Comput. Phys. Commun. 47 (1987) 129-137. [2] G. Marsaglia, B. Narasimhan and A. Zaman, Comput. Phys. Commun. 60 (1990) 345-349. [3] K.G. Hamilton, Comput. Phys. Commun. 75 (1993) 105-117. [4] K.G. Hamilton, Comput. Phys. Commun. 78 (1993) 172-180. [5] Salford Software, Ltd., FTN77 Reference Manual, version 2.70 (1993).

L O N G WRITE-UP

1. Introduction

Many language compilers are now delivered with a pseudorandom number generator function. While there are several methods available for the generation of streams of pseudorandom numbers, in almost all cases the supplied routine follows the congruential algorithm: Xn = (aXn-i + c) m o d m .

Conventionally, the variable x is produced as an integer in the range [0, m ) and then normalized so as to return a value in [0, 1 ) to a calling program.

The congruence generator has received a certain amount of bad publicity ever since Marsaglia discovered that if sequential values are gathered and treated as n-tuples in hyperspace, the resulting points form a regular lattice [1]. The spacing of the points is predictable [2], given the values of a, c, and m, and so an algorithm can often be devised that will yield a lattice structure so fine as to not cause harm in most applications. Unfortunately, proper care has not always been exercised by software designers, with the result that some truly ghastly generators have been released on unsuspecting users. (Readers with strong stomachs are referred to Sharp and Bays' catalog of horrors [3 ].) As an apparently traumatic reaction to bad congruence generators, some researchers have embraced various forms of sequence generator, !13n = !13n_p ® "tOn-q, q < P,

in which "®" is a simple arithmetic operator such as addition or subtraction. This was first suggested in print, with ® being addition, by Green, Smith and Klein [4], in the late 1950s. (Knuth also shows several examples of this form of generator [ 5 ]. ) When @ is the word-wide no-carry addition modulo 2 ("XOR" in computerese) it is known as the Generalized Feedback Shift Register (GFSR) algorithm, and results from Lewis and Payne's generalization [ 7 ] of a Tausworthe sequence [6]. Such sequence generators are usually implemented as recirculating buffers, so that the newly-computed Xn replaces Xn-p, which will no longer be needed. Marsaglia and co-workers have advocated a form of sequence generator in which the fundamental operation is either addition (with the carry bit being added in on the next pass through the generator), or subtraction (with the borrow bit being processed on the next call) [8,9]. This has recently been shown to be equivalent to a congruential generator with an immense prime value for m [ 10 ]. Additional popular generators have been based on combining two or more simple rou-

K.G. Hamilton / Computer Physics Communications 81 (1994) 237-247

tines, in order to blend together any patterns or regularities and make a more uniform "fog" of points [ 11-13 ]. In previous articles [ 14,15 ], two generators from the CPC library (both written in Intel 80x86 assembly language) were adapted to be callable by a wide range of PC Fortrans, though the use of conditional assembly directives in the entry and exit code. The generators involved were: • A "lagged-Fibonacci" scheme by Marsaglia et al. [ 9 ], in which the borrow bit from each basic subtraction operation was saved and applied to the next iteration, and • a form of GFSR by Chiu and Guu [16], in which the XOR was used to generate floatingpoint return values, while ordinary addition was employed for the production of integers. The supported compilers were those from Lahey, Microsoft, Silicon Valley Software, and Watcom. In this article, the default generator for Salford FTN77 is described and discussed, and the two above routines are migrated to the FTN77 environment.

2. S a l f o r d F T N 7 7

generator

Salford Software, Ltd., produces a 32-bit Fortran compiler for use with Intel 386 and higher CPUs, running in protected mode. Included with the compiler is a pseudorandom number generator function named RANDOM, which returns 64-bit floating point values. The speed of the Salford generator is in the "medium" range (3 as per call on a 66MHz i486 machine), compared to other manufacturerprovided modules. This is fairly good, given that it must be performing extended-precision arithmetic internally, in order to make up the double precision result. The algorithm involved [ 17 ] is Xn

=

1313Xn-I

mod

259,

which is the same as that in the Numerical Algorithms Group package [ 18 ]. For cases in which m = 2a and c = 0 (conditions that are satisfied here) the m a x i m u m period is m/4. This

....5-:

:.. •

• ..~





.°"

.- : ' : .

°

• ".

.: . . , : • . • ..... ~...".,

• ".. -

..



)

.

.~

• ."

: - ;'.._ •

:" i"

• •••"

.

,"

i



..:...

jilb "~..

"

-...

.::

..."

•.

-





-•

t".•

.

"-'.

,..



.

I

": :" "

:"

"

"

•,

••t



" '

. . .: ••••

" •







".

'

' "•

t•

• ." .: tt

..



. .

•• . •

.~, •

~.

"

•| • • • ""

: •

".

'

"

• -

-. .



'



.." . . . .. . .•. . -', . ".. " •.; . ...... , . .. . . f . . . ...~ - " . ": : ". . .

.:

•~ .



., , ....,

". "~.

.

••.:

:"

••

:'"

.

., ,

:

•,

•..:..

v"

°.°

" "

. •

,•

- • . . , .'.:

'i ,. "i':"~ . .'..~. • . . ' 0

"t"

••

°

..-

""

. • "



• ". t



"'..~



...

,

":" ""

• :... :

••

"7'.. . ".

'..':5 "'" .

..~.

••

.," : . ' ...:. •

..

""

• :.

.

.•.

.

- . . . - . • , . .

"'



.



, .-

:

••



:.

. .:. -

.• .



239

.

• •

•" "•

F i g . 1. S c a t t e r i n g o f p o i n t s p r o d u c e d b y c a l l i n g S a l f o r d ' s RANDOM g e n e r a t o r , a n d f o r m i n g 3 x 101° ( x , y ) p o i n t s from sequential non-overlapping pairs. No organized planar p a t t e r n is y e t v i s i b l e , i n t h i s r u n t h a t l a s t e d f o r s e v e r a l d a y s . S u c h a p a t t e r n is i n e v i t a b l e , b u t w o u l d r e q u i r e c e n t u r i e s o f c o m p u t a t i o n t o m a n i f e s t itself•

is achieved if a mod 8 is either 3 or 5; since 1313 mod 8 = 5, the period is 2 57 ~ 1.44 x 1017. Fig. 1 shows the result ofmaldng 3 x 101° (x, y) points out of sequential returns from RANDOM, and then plotting only those that fall into the range 0.5 < x < 0.5002, 0.4 < y < 0.4002. This image size is the same as the one shown in Ref. [ 14] for the Lahey generator, in which a pattern of planes was well-established but far from complete. In this case, however, no pattern can be discerned at all from the Salford generator. The calculation needed to produce Fig. 1 required approximately five days of computation and, given the known period of the algorithm, we can estimate that over 10 000 years would be required for a full cycle. 3. M i g r a t i o n

of CPC software

While the manufacturer-provided generator is probably adequate for most needs, the sequence generators that are available run faster (in addition to having periods that would exceed the age

240

K.G. Hamilton/Computer Physics Communications 81 (1994) 237-247

of the Universe) and so it becomes worthwhile to migrate them to the FTN77 environment in order to provide Salford users with some additional options. The compiler produces object modules in a proprietary format that are incompatible with the OMF or COFF formats that are generated by the Microsoft Macro Assembler (MASM). Since assembly language can be a good thing for time-critical code, Salford has embedded an inline assembler into the compiler. The compiler is then capable of accepting a fairly flexible mixture of Fortran and assembly language within any source file. This facility turns out to be an exceptionally convenient method for the development of algorithms in machine code. In general, it is possible to write a subroutine in which the entry and exit code are in Fortran, along with the local variable declarations, but with the procedural portion being written in direct bit-twiddling machine instructions. This approach has been used to make the two generators operate with FTN77, so that the result is a set of Fortran subroutines with embedded assembly language. These particular source modules will not work with any other brand of compiler.

3.1. The Marsaglia-Narasimhan-Zaman (MNZ) generator Marsaglia, Narasimhan, and Zaman [9] described a subtract-with-borrow generator that had been coded in Intel 8088 assembly language to be used with calling programs that had been compiled with either of two 16-bit real mode PC Fortran compilers. In previous work, this software was generalized by the current author [ 14,15 ], and recoded with 32-bit instructions so as to support several real and protected mode compilers. Versions were constructed to return values either as scalars or in the form of whole arrays. This algorithm has been shown [10] to be equivalent to a congruence generator with an enormous prime modulus and so, in an ultimate sense, should be expected to have a lattice pat-

tern if a calculation proceeds long enough. The length of the period is so great, and the pattern so fine, that it should be many years before computers become available that are fast enough for this to be a problem. Additionally, we must keep in mind that computer arithmetic is not exact, but has a finite number of bits in each floatingpoint word, and that this produces an unavoidable lattice. Pragmatically, then, the lattice issue can be ignored as long as the cell spacing is small compared to the "machine epsilon," the difference between floating point values. Viewed in single precision, this exonerates all of the generators being discussed here, including the one that comes with the compiler, since the hardware representation is a greater source of error. The MNZ generator has now been rewritten to match the in-line assembler of the Salford FTN77 compiler. The the source code is Fortran for the entry and exit sections, and for the data declarations, with low-level machine instructions for the principal procedural portion. Since this is a protected-mode compiler, only 32-bit versions of the subroutine were made, with separate source files for scalar and array arguments. FTN77 reserves the EBX register for use as a pointer to the local data section, and so some register rearrangement was necessary in order to avoid its use. Some minor instruction reorderings were made, which should result in slightly better performance on i486 and Pentium processors. These modifications (which involve not using a value fetched from memory as an index in the next instruction [ 19 ] ) should have no effect on i386 operation.

3.2. The Chiu-Guu Tausworthe (CGT) generator Chiu and Guu constructed a form of Tausworthe generator in which the algorithm ~3 n = ZOn_250 ~ W n - 1 0 3

was used to produce 32-bit integers which were then floated to make single-precision real values, and then returned to a calling program that had

K.G. Hamilton/Computer Physics Communications 81 (1994) 237-247

been compiled with Turbo Pascal v3.0. Their subprogram was also set up to return 16-bit integers using ordinary addition. Previously [ 14,15 ], the Chiu-Guu software was migrated to be callable from several PC Fortrans, and recoded for 32-bit operation. The entry point that returned 16-bit integers was also modified to return 32-bit ones instead. This generator has also been rebuilt to conform to the requirements of the FTN77 in-line assembler, as described in the previous section.

3.3. The Marsaglia-Zaman- Tsang (MZT) generator Marsaglia, Zaman, and Tsang recently described a long-period portable pseudorandom number generator coded entirely in Fortran [20], and suitable for any machine that uses at least a 24-bit mantissa in its floating-point representation. This routine was added to the test suite, in order to provide a reference point: it appears to be the best generator that is available without using assembly language. Any user who lacks an assembler (or assembling compiler, such as FTN77 ) should seriously consider this generator if there is a need for an alternative to the manufacturer-provided module. A version, adapted for array use, was shown by James [21].

3.4. Results The mathematician Alan Turing once provided a definition of artificial intelligence: that an person could interact with a program for some specified length of time (say, 10 minutes) without suspecting that he was talking to a computer. This is, of course, not a definition of artificial intelligence but rather one of a sucessful hoax. (It may sometimes help to think as a detective, instead of as a scientist. ) The testing of pseudorandom numbers involves a similar type of fraud. We produce a stream of numbers through a perfectly well defined procedure, and then insert them into a series of tests that were designed to examine truly random numbers for suspicious charac-

241

teristics. The goal is to make the tests give the wrong answers, and have them report that there is nothing odd going on. Knowing the characteristics of any particular generator, it is presumably possible to design a test that its output will fail, and for that reason the quest for a perfect pseudorandom number generator is quixotic. All that we can hope for is that the numbers will be "random enough" for some application that we have in mind, and this means that the structural characteristics of the generator's output do not couple with anything in the application. The safest situation for a practitioner of Monte Carlo simulations is to have a selection of generators available (of different types), so as to avoid announcing miraculous results that may be based on a numerical artifact. With these caveats in place, let us examine the characteristics of the following generators: • Salford's R A N D O M (supplied with the compiler), • the Marsaglia-Zaman-Tsang (MZT) routine, • the Marsaglia-Narasimhan-Zaman (MNZ) lagged-Fibonacci method, and • the Chiu-Guu Tausworthe (CGT) implementation. The first and second are scalar procedures only: they return one value for each call. Because of the present work, we have both scalar and array versions of the third and fourth algorithms. First, we will look at the execution speed of these routines, and then we will examine the statistical behavior of these generators under some standard tests. Please note that these routines for Salford emit exactly the same streams of numbers as the previously documented MASM ones [14,15] for Lahey, etc., compilers; this means that the statistical results given below are also applicable to that software.

3.4.1. Execution times Fair comparisons between scalar-return routines and ones that return entire arrays of values involve some minor complications. Suppose that an application program needs 1000 "random" numbers. If we are using a subroutine that

242

K. G. Hamilton / Computer Physics Communications 81 (1994) 237-247

returns a single value, then we will need to arrange a loop and call the module 1000 times. On the other hand, if we choose a routine that can return an entire array of values, then we may only need to invoke it once. In comparing different generators, we should therefore include the cost of a 1000-trip loop in the timing of scalar routines, since the execution times of the arrayoriented ones reflect a similar penalty. O f course, the time for 1000 calls is much too small to measure with any accuracy on a PC, since the system clock resolution is only (1/18)th of a second. The solution is to pick a large number, N, as the quantity of "random" numbers that we ultimately need, and then measure the time of a scalar routine by executing two nested loops. In this case, the inner loop runs for a count of 1,2,3 ..... 1000, while the outer one repeats the process N/1000 times. In order to correct for the C P U overhead spent executing the outer loop, we can make a d u m m y subroutine and call it inside a loop that repeats N/1000 times. When the total time to run this is subtracted from the total time for the two nested loops that called the scalar generator, and then divided by N, we have a reasonable value for the average time per call of a scalar routine. Timing a generator that returns an array is done by setting up a call for 1000 values, and executing that N/1000 times inside a loop. When the d u m m y loop time is deducted, and the difference divided by N, we have a per-value number that can be reasonably compared to the scalar resuits. In this way, the cost of executing the array routine's internal loop is balanced by the time to execute the external one in the scalar case. The results for the Salford library and converted CPC library generators, for N = 109, are as shown in Table 1. These times were obtained with an Intel 486DX2-66 processor running in a DEC 466d2LP system. The results indicate that the empty loop required less than one second, while each of the generators executed for 15-60 minutes. It can be seen that the M N Z and CGT routines provide roughly a factor of two speed increase over both the manufacturer-provided generator and the compiled M Z T one. Using a subroutine

that returns arrays of values provides an additional reduction in run time.

3.4.2. Chi-squared test By far, the most "plain vanilla" evaluation that can be performed on a sequence of purportedlyrandom values is thex2-test. In it, a number (K) of equally sized bins are set up to cover the range [0,1 ). As numbers are received, each one falls into exactly one of the bins, and after a sufficient population (say, N ) has been gathered, a count is made of the contents in each of the bins. If N is very large, we should expect to have a count of Navg = N/K for each one of the bins. We live in a world that is far from perfect, however, and so there are deviations from this average. Suppose that we compare this idealized average to the actual counts (which we call Nk ) by computing

z2=

K

(Uk-Ua g) 2

The anticipated value of this quantity is determined by the number of degrees of freedom (DOF) of the system, which in this case is one less than the number of bins. Should the value of g 2 be either too large or too small, the situation looks suspicious. If the test is repeated with another set of input numbers, then a different value for X 2 should be expected; in fact, there is a distribution function for g 2, and tables are provided by Knuth [5] and other sources. For a specified DOF, the table will indicate the value o f x 2 that corresponds to a given cumulative probability. The approach in this case was to create 101 bins, equivalent to D O F = 100. Knuth provides an equation for situations in which D O F exceeds 30, and this formula was used to calculate the values o f x 2 that correspond to the 5% and 95% cumulative probability levels. To explore the behavior of the four existing generators, the X 2 test was run 1000 times with N = 106, and tallies

K.G. Hamilton/Computer Physics Communications 81 (1994) 237-247

243

Table 1 Timings for generators, when N = 109 values were produced Generator

Points/call

Total time (s)

Average time (/~s)

Dummy Library MZT MNZ CGT MNZ CGT

0 1 1 1 1 1000 1000

0.66 3444.51 3264.01 1612.97 1441.92 1059.01 826.87

3.4438 3.2634 1.6123 1.4413 1.0584 0.8262

were made of the number of times that the value of ;(2 fell below the 5% level, and how many times it exceeded the 95% one. Usually, tests of generators are described by quoting a handful of specific numbers, say a dozen values of X 2, and leaving the interpretation open to the reader. In this case, the quantity of individual results is far too great for that, and so a statistical evaluation of the statistical tests is not only possible, but is more concise than the customary presentation. The results are summarized in Table 2. For an "ideal" generator, the first two numerical columns should be 50/0 and 5%. The worst deviations that we can see are 3.8% and 6.0%: this is not bad. It is interesting to note that all of the subroutines tended to err on the side of small X 2, indicating that they are slightly more uniformly distributed than should be expected. (This is most likely fortuitous, however. )

3.4. 3. Kolmogorov-Smirnov test The second standard test that was applied to the generators is the one due to Kolmogorov and Smirnov [5]. Consider an ideal uniform distribution of values (u) between 0 and 1; this is equivalent to a cumulative probability that increases linearly from zero at u = 0 to unity at u=l. A set of 1000 points was generated, and then sorted into ascending order. The running total of these points also defines a cumulative distribution, one that should be reasonably close to the ideal straight-line function. The maximum deviations in the positive and negative directions between the ideal and actual cumulative proba-

bilities is measured in terms of the values Kl000 + and Ki-0o0. If this procedure is repeated a number of times, it will be observed that Kl+000 and K~00 also belong to a distribution, which is tabulated in a manner similar to X 2. There is an expected value, and also well-defined points that correspond to various cumulative probability levels. In this test, the 1000-point KolmogorovSmirnov test was run 1000 times, and counts kept of the number of times that one of the resuits fell either below the 5% point of cumulative probability, or above the 95% point. Obviously, the ideal asymptotic answer would be to see 5% in each of these categories. The results are shown in Table 2, and they are very good: the maximum deviation is around one-half of one percent.

3.4.4. Jumps test The jumps test [22] involves creating a circular array of N pseudorandom numbers, and then counting how many times a value on one side of the mean is followed by one that is on the other side; each such transition is a "jump." Since the values should have neither a serial correlation nor anti-correlation, the number of jumps ( J ) should be equal to half the number of points in the array, and the jump fraction, F = J/N, should tend to 0.5. In fact, each trial will yield a different value for F, and we can compute a standard deviation. By noting that each value in the array is treated as only "above" or "below," we can observe that there is only one arrangement in which all are above, and only one in which they are all below.

244

K.G. Hamilton/Computer Physics Communications 81 (1994) 237-247

Table 2 Statistical results for generators Generator

Chi-squared <5%

>95%

Kolmogorov-Smirnov <5% >95%

Jump fraction

Ideal

5.0%

5.0%

5.00%

5.00%

0.500000 4- 0.000500

Salford MNZ Chiu-Guu MZT

6.0% 5.2% 5.3% 5.9%

4.4% 3.8% 4.6% 4.6%

4.55% 5.10% 4.75% 5.45%

5.05% 4.90% 5.55% 4.95%

0.500015 0.499999 0.500017 0.499999

Further, there are N arrangements in which N 1 are above and 1 below, just as there are N opposite arrangements. In fact, this is a situation that can be described by a binomial distribution, and so the standard deviation should be given by a = 1 / ( 2 N ) ; thus we should expect that an ensemble of tests should converge to a j u m p fraction of 0.5, with a standard deviation of 0.0005. The results are given in Table 2, and they indicate quite good agreement to both of these idealizations.

3.4.5. Autocorrelation test The jumps test essentially is a limited form of autocorrelation, so we can proceed on to perform tests at various lags. The results are given in Table 3, and indicate no problem with any of the generators. The ideal asymptotic result is zero, with a standard deviation of N -l/E, and all of the evaluations appear to be correctly trending to this limit. 3.4. 6. OPSO tests The Overlapping-Pairs Sparse-Occupancy (OPSO) test is one of the "stringent" tests that Marsaglia has defined. The test involves creation of a square array of cells of size K × K, which cover the unit square. A circular array of N pseudorandom values is then generated, and every sequential pair of values forms a point that falls into one of the cells. The pairs are taken in an overlapping manner, so that the first point is given by (Ul, u2), the second by (u2, u3), and the third by (u3, u4 ). The last point is connected to the first as (UN, Ul ). After all of the pairs have been processed,

+ 44+

0.000505 0.000499 0.000517 0.000502

there will be a certain number of empty cells; this quantity is /z . If the test is run numerous times, then an observed average value (/~) can be found along with an observed standard deviation (~). We want to compare these to and a, the values that we would see if the distribution was truly stochastic. Fortunately, Marsaglia has also provided the right answers for three different variations of the test [23], and so we can proceed directly to computation without additional derivation. The tests that were performed (with the ideal results) are: • N = 221,K = 1 0 2 4 , < / t > = 141909, a = 290.26; • N = 222, K = 2048, = 1542998, a = 638.75; and • N = 223 , K = 2 0 4 8 , < / t > = 567637, a = 580.80. For obvious reasons, these are referred to here as simply "OPSOI," "OPSO2," and "OPSO3." Each of these tests was executed 100 times, and the error e = / ~ - was computed; this is the difference between the ideal average value, and our observed one. Table 4 shows the average error (should be a small number, tending toward zero in the limit), and the calculated standard deviation for the 100 trials (should approach a in the limit). Since the OPSO tests return huge integer numbers for/~, the results stated in the table have been divided by a, to make their digestion easier. Thus, the "perfect" result would be 0a 4- 1a. Generally, these results are very satisfying, as the worst news visible only involves 1/7 of a standard deviation.

245

16 G. Hamilton /Computer Physics Communications 81 (1994) 237-247

Table 3 Autocorrelation results for generators Generator

Autocorrelation Lag= 1

Autocorrelation Lag= 2

Autocorrelation Lag= 3

Autocorrelation Lag= 4

Ideal Salford MNZ Chiu-Guu MZT

0.00004-0.0316 °0.00134-0.0318 -0.00124-0.0317 -0.0012+0.0322 0.0003+0.0321

0.00004-0.0316 0.0017+0.0317 0.00024-0.0320 -0.0012+0.0316 -0.00144-0.0317

0.00004-0.0316 -0.00094-0.0326 -0.0021 +0.0313 -0.00124-0.0328 -0.0007+0.0313

0.00004-0.0316 -0.00194-0.0310 -0.00144-0.0313 0.0002+0.0323 -0.00104-0.0306

Table 4 OPSO results for generators Generator

OPSO 1

OPSO2

OPSO3

Ideal Salford MNZ Chiu-Guu MZT

0.000e+ 1.000~ 0.019e4-1.041a 0.026e+0.975a -0.022e+ 1.147a -0.010trq-0.928a

0.000e+ 1.000a 0.05 le4-0.957a 0.038a +0.892a 0.029tr+ 0.918a 0.118a+ 1.044a

0.000e+ 1.000e 0.013tr4-1.148e -0.079e+ 1.030e 0.142e+0.946e -0.090tr+ 1.059tr

4. Conclusion In terms of period length, the default generator that is supplied with FTN77 appears to be the best congruence generator currently in general availability for PCs. Due to the lack of any visible correlation in tests, no unfortunate consequences should be expected, even in relatively large and sensitive calculations. The standard FTN77 is easily better than the routines that are found on most mainframes, which often have cycles in the range of only 2 31 t o 232 . In order to attain the long period, extendedprecision arithmetic was needed, leading to a REAL*8 return result. This naturally imposes a speed penalty, although the generator is still in the middle range of execution time, compared to other manufacturers' 32-bit wares. For those applications in which speed is more significent than precision, many of the available sequence generators will return a REAL*4 value more quickly, and possess even longer periods. Four such routines (two scalar, two vector) have been submitted to the CPC library to accompany this article. The author would like to thank Mr. Bob Jack-

man of Salford Software, Ltd., for bring him into contact with the company's products.

5. CPC Library software The source filethat is associated with this article consists of mixed Fortran and assembly code for the Tausworthc and laggcd-Fibonacci generators, along with Fortran drivers that can be used to verify the proper operation of the routines. A "makcfilc" is provided that will generate all of the test programs, if given as input to Salford's M K utility.These routines with work only with the Salford compiler; users of other compilers are directed to the previously-published routines. The following filesare included: I. S R M A R . F O R - Scalar version of the M N Z algorithm 2. TMAR.FOR - Test program for above 3. VRMAR.FOR - Vector version of the MNZ algorithm 4. TMARV.FOR - Test program for above 5. SRTAU.FOR - Scalar version of the CGT algorithm 6. TTAU.FOR - Test program for above

246

I~ G. Hamilton/Computer Physics Communications 81 (1994) 237-247

7. V R T A U . F O R - Vector version of the C G T algorithm 8. T T A U V . F O R - Test program for above 9. MAKEFILE. - Builds TMAR, TMARV, T T A U and T T A U V All of the files have been concatenated into one, with an identification line preceeding each.

6. Sample output

When T M A R or T M A R V is run, the laggedFibonacci routine will return, and the program will display, the following results: Uninitialized: R = 0.0709601 or R = 0 . 5 4 1 9 1 4 8 or R = 0.5730261 or R = 0 . 5 7 5 2 6 0 5 or R = 0 . 0 1 3 4 5 9 7 or R = 0 . 6 0 7 4 8 8 8 or R = 0 . 8 2 5 2 9 0 6 or R = 0 . 6 3 6 0 0 8 3 or R = 0 . 9 8 5 4 6 4 6 or R = 0.4203059 or I n i t i a l i z e d with R = 0.2586110 or R = 0 . 3 8 5 3 0 5 7 or R = 0 . 6 5 7 8 9 9 3 or R = 0 . 2 7 0 3 3 9 7 or R = 0 . 7 6 0 5 8 6 0 or R = 0 . 6 0 9 6 2 4 7 or R = 0 . 9 4 7 1 8 3 4 or R = 0 . 9 8 4 8 3 2 0 or R = 0 . 6 7 9 4 0 2 8 or R = 0.6453191 or

3D915386 3FOABAEE 3FI2BID7 3F134445 3C5C8645 3FIB8463 3F53463E 3F22D171 3F7C4769 3ED73256 JJ: 13151992 3E8468A9 3EC546C9 3F286C16 3E8A69F7 3F42B5C3 3F1ClO5E 3F727A9C 3F7CIDF4 3F2DED57 3F2533A2

These values are returned by the Tausworthe generator, if T T A U or T T A U V is run: Uninitialized: R = 0.9415818

or 3F710B81

R = 0.1440618

or 3E1384F2

R = 0.1375262

o r 3EOCD3AE

R = 0.7522126

or 3F409101

R = 0.4011315

o r 3ECD611D

R = 0.7559434

or 3F418581

R = 0.9655210

or 3F772C63

R = 0.8187274

or 3F51981F

R = 0.3115642

or 3E9F8558

R = 0.1963905 Initialized

or 3E491A98 with

JJ:

80875745

R = 0.1816825

o r 3E3AOAFD

R = 0.6461046

or 3F25671C

R = 0.6189352

or 3FIE7289

R = 0.3033033

or 3E9B4A91

R = 0.4912424

or 3EFB841F

R = 0.4101366

or 3EDIFD6E

R = 0.0734986

or 3996866D

R = 0.9503846

or 3F734C68

R = 0.4725456

or 3EFIF180

R = 0.9035067

or 3F674C37

References [I ] G. Marsaglia, Proc. Natl. Acad. Sci.U S A 61 (1968) 25-28. [2] G. Marsaglia, Numer. Math. 16 (1970) 8-I0. [3] W.E. Sharp and C. Bayes, Comput. Geosci. 18 (1992) 79-87. [4] B.F. Green, J.E.K. Smith and L. Klein, J. A C M 6 (1959) 527-537. [5] D.E. Knuth, The Art of Computer Programming vol. 2: Seminumerical Algorithms, 2rid ed., AddisonWesley, Reading MA, 1981. [6] R.C. Tausworthe, Math. Comp. 19 (1965) 201209. [7] T.G. Lewis and W.H. Payne, J. A C M 20 (1973) 456-468. [8] G. Marsaglia and A. Zaman, Ann. Appl. Prob. I (1991) 462-480. [9] G. Marsaglia,B. Narasimhan and A. Zaman, Comput. Phys. Commun. 60 (1990) 345-349. [I0] S. Tezuka and P. L'Ecuyer, Proc. 1992 Winter Simulation Conference (J.J. Swain, D. Goldsman, R.C. Crain, and J.R. Wilson, eds), pp. 443-447. [I I] G. Marsaglia, et al., The McGill random number package super-duper, School of Computer Science, McGiU University, Montreal, 1972. [12] P. L'Ecuyer, Commun. A C M 31 (1988) 742-749, 774. [13 ] S. Tezuka and P. L'Ecuyer, A C M Tram. on Modeling and Simulation I (1991) 99-I12. [14] K.G. Hamilton, Comput. Phys. Commun. 75 (1993) I05-I17. [15] K.G. Hamilton, Comput. Phys. Commun. 78 (1993) 172-180. [16] T.W. Chiu and T.S. Guu, Comput. Phys. Commun. 47 (1987) 129-137.

K G. Hamilton / Computer Physics Communications 81 (1994) 237-247

[ 17] R. Jackman, Salford Software, Ltd., private communication. [18] B.D. Ripley, Proc. Roy. Soc. London A 389 (1983) 197-204 [ 19] Intel486 Microprocessor Family, Programmers Reference Manual, Intel Corporation, Santa Clara, CA (1992), Appendix G. [20] G. Marsaglia, A. Zaman and W.W. Tsang, Star. Prob. Lett. 8 (1990) 35-39.

247

[21] F. James, Comput. Phys. Commun. 60 (1990) 329-344. [22] S. Garpman and J. Randrup, Comput. Phys. Comlinebreakmun. 15 (1978)5-13. [23] G. Marsaglia, A current view of random number generators, in: Computer Science and Statistics: the Interface, L. Billiard, ed. (North-HoUand, Amsterdam, 1985).