Evolvable characteristic-based pseudo random number generation

Evolvable characteristic-based pseudo random number generation

ARTICLE IN PRESS Engineering Applications of Artificial Intelligence 17 (2004) 485–489 Evolvable characteristic-based pseudo random number generation...

287KB Sizes 2 Downloads 78 Views

ARTICLE IN PRESS

Engineering Applications of Artificial Intelligence 17 (2004) 485–489

Evolvable characteristic-based pseudo random number generation Jason C. Isaacs, Robert K. Watkins, Simon Y. Foo* Department of Electrical Engineering, FAMU-FSU College of Engineering, Tallahassee, FL 32310, USA Received 30 March 2004

Abstract We have recently succeeded in developing genetic algorithm (GA)-based random number generators (RNG) by encoding generators in genomes that manifest themselves as combinations of basic mathematical operators and randomized seeds/parameters. The combination of operators and parameters are ‘‘optimized’’ by selecting the ‘‘most fit’’ algorithm as measured by our GA’s objective function (a metrized version of the Federal Information Protection Standard 140-1 (FIPS) statistical tests). Moreover, offline testing (with Marsaglia’s Diehard test suite) shows that our characteristic-based RNGs perform well. r 2004 Elsevier Ltd. All rights reserved. Keywords: Evolutionary computation; Random number generation; FPGA

1. Introduction Somewhat paradoxically, our principle weapon against complexity is randomness. Scientists often use computer simulations to approach problems that are insoluble by analytic techniques. An important, though often overlooked, component in every such simulation is the random number generator. In recent years, the need for efficient, reliable random number generation has grown continuously, and this trend will likely continue into the foreseeable future. This research is therefore aimed at meeting this demand by producing an evolvable random number generator (RNG) for implementation on a Field Programmable Gate Array (FPGA). The advantages of this approach are numerous, and include: (1) a near inexhaustible supply of ‘new’ RNGs; (2) RNGs that can be optimized for various reliability criteria; and, (3) greatly increased speed. The implementation of the RNG in hardware will result in a much more rapid execution. The foundations of Genetic Algorithms and Evolutionary Computation will not be discussed here for the sake of brevity, but for a thorough *Corresponding author. Department of Electrical Engineering, Florida State University, A342-FAMU-FSU College of Engineering, Tallahassee, FL 32310 USA Tel.: +1-850-410-6474; fax: +1-850-4106479. E-mail address: [email protected] (S.Y. Foo). 0952-1976/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2004.05.001

introduction to these subjects see Chambers (2001) and Baeck et al. (2001). Our first step towards the goal of implementing an evolvable RNG is the development of an Evolvable Characteristic-based RNG (ECRNG). The search for an RNG that could be evolved by a GA has itself evolved through several generations. Initial attempts included unsuccessful experiments at serial employment of RNG, seed optimizations (also attempted by Hernandez, et al., 2001, and anti-bias filtering. Success came with the idea of evolving the primitive characteristics of known RNG, i.e., the ECRNG.

2. Characteristics The evolvable characteristics, i.e., operators and parameters, were selected from known methods of random number generation. For example, a linear congruential generator (LCG) generally uses prime numbers, multiplication, addition, and a modulus operation. Many other generators were investigated for their distinct operations and parameters and lists of these properties were collected. After an analysis of these qualities it was recognized that they could be combined in many logical ways and produce a

ARTICLE IN PRESS 486

J.C. Isaacs et al. / Engineering Applications of Artificial Intelligence 17 (2004) 485–489

Table 1 Parameter and operations decoding table Allele

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

Parameters

Operation

Value

Method 1

Method 2

55457 29753 4297389 988736 15881431 2212313463 285974774 35083272 41801257775 48611277 7727327 34963724 95027453 62191547 1038897742 89867472

Val1+exp^(Val2) Val1+Val2 Val1+Val2 Val1+Val2 Val1+Val2 Val1+Val2 Val1 Val2 Val1Val2 Val1 mod Val2 Val1 xor Val2 w/reas Val1 xor Val2 Val1 xor Val2 w/reas Val1 xor Val2 Val1 xor Val2 w/Reas Val1 mod Val2 Val1>>3+Val2

Val1+e^(Val2) Val1+Val2 Val1 and Val2 Val1 inclusive or Val2 Val1 inclusive or w/reas Val2 Val1+Val2 Val1 Val2 Val1Val2 Val1 mod Val2 Val1 xor Val2 w/reas Val1 xor Val2 Val1 xor Val2 w/reas Val1 xor Val2 Val1 xor Val2 w/Reas Val1 mod Val2 Val1>>3+Val2

‘‘random’’ number. Hence, the idea to evolve combinations of these qualities to produce RNGs was shaped. While a string can be a sequence of any symbols, the genetic encoding for the ECRNG utilizes integers in the range [0, 15] to represent mathematical operators, seed, and value parameters, shown in Table 1. The individual chromosome is decoded according to this table. Digraphs in the even positions refer to seed/ parameters, while odd positions refer to operations. For example, a 00 in the zero position (even) is decoded as a seed 15881431; 01 in the one position (odd) is decoded as the addition operator; 01 in the two position specifies the second addend 3367900313; and so on to produce a random 16-, 20-, or N-bit number. The specific parameters returned during decoding are selected at random from the stored lists of prime numbers, while the operations are decoded directly (Table 2). Prime numbers were chosen because of their prevalence in known methods of random number generation. Of course, every GA is a copious consumer of random numbers, but our gene decoder also requires random digits, which are provided by an XOR reassignment-based RNG (Hernandez et al., 2001). The final two positions in the genome (20 | 66 in the example, above) are tags to indicate the individual’s rank, within the current population, and ‘‘fitness’’, respectively, and are used for ‘‘optimizing’’ the population. The fitness measure is the result of its performance on the FIPS 140-11 (Isaacs et al., 2000). In summary, this GA (Fig. 1) requires: an initialization of a population of strings, decoding and testing of the RNG (strings) for selection, and prescribed iterations (generations) of reproduction, crossover, muta1 National Institute of Standards and Technology, US Federal Information Processing Standards, FIPS-140-1.

Table 2 Decoded gene using method 1 operations Chromosome |0|1|1|5|9|9|12|3|3|6|5|7|0|10|14|14|13|13|6|0|4|15|10|7|2|7|8||20|60| 55457+29753+48611277 xor 95027453+988736–2212313463  55457 xor 1038897742 mod 62191547 xor 285974774 +exp (15881431) b3+7727327  4297389  41801257775 = 16-, 20bit number.

SEED AND RECOMBINANT METHOD GENETIC ALGORITHM

GENE: RNG

STORE BEST RNGS FOR DIEHARD TESTING

TRANSLATOR RANDOM NUMBER SET FITNESS EVALUATION: FIPS 140-1 Fig. 1. Flowchart of the ECRNG.

tion, decoding, testing, and further selection of strings until stopping criteria is satisfied. A number of GA parameters have been experimented to optimize RNG behavior including: selection, recombination, and mutation techniques. Results from these experiments have been incorporated into the algorithm to help extract weak populations.

ARTICLE IN PRESS J.C. Isaacs et al. / Engineering Applications of Artificial Intelligence 17 (2004) 485–489

Fig. 2. Average FIPS 140-1 Performance over 100 Generations using an Elitist, M-Tournament, and Random Roulette Selection.

3. Genetic algorithm operations

487

and reproduction selection criteria by selecting individuals in the population at random for reproduction, consequently completely ignoring the results of the fitness evaluation. It clearly avoids the inbreeding trap of the elitist method, but does not actively promote the proliferation of fit solutions, unlike the m-tournament selection. Fig. 2 also illustrates experimental results incorporating a random selection into the ECRNG. Advantages of this method allow for a broad search of the space, akin to a brute force attack, but this search never converges on highly fit individuals and eliminates the inherent strength of the evolutionary algorithms ability to find quality solutions, therefore it will not be used.

3.1. Selection method The selection operator in a GA selects chromosomes from the population for reproduction. In most cases, the fitter the chromosome (according to the fitness function), the more times a chromosome will be selected to reproduce. In determining the best selection method for this GA experiments are performed using three selection methods: elitist, modified tournament (m-tournament), and random. The elitist selection mechanism used here acts by preserving the top 20% of each generation and allows these fit individuals to reproduce only among their own kind. Results for 100 generations of evolution are shown in Fig. 2. These results express a declining average fitness of the population over time suggesting that while the best chromosomes in this region may be highly fit with respect to other individuals in this space, the overall fitness of this neighborhood is not. Although this isolation in a constricted region of the search space does allow for a thorough search of the solution neighborhood it generally yields few optima and thus will not be used. In the m-tournament selection employed, high fitness individuals are preferentially selected for reproduction, but weak individuals are not necessarily eliminated. This selection strategy is applied to a population of 20 chromosomes, by preserving the top five performers in each generation and providing the winner with five randomly selected mates; the first runner-up had four mates; the second runner-up, three mates, and so on. Typical results from experiments using this m-tournament selection mechanism are presented in Fig. 2. Analysis of these results demonstrate a rapid average fitness convergence of the solution populations, this proved to be true over several runs, and not a result of a fortunately well-chosen region of the solution space. Many highly fit solutions are found using this selection method and as a result it will be used in every genetic algorithm implemented in this study. Finally, a random selection method is investigated. This method decouples objective function performance

3.2. Recombination This operator chooses a position (locus) on the chromosome and exchanges sub-sequences (before and after) the locus between two parent chromosomes to create two offspring. Two methods of crossover were considered; random single-site and single-site midpoint. The following example demonstrates two techniques that can be used with random single-site crossover. The Operator/Parameter (O/P) overlapping method allows for operator and parameter gene mixing, whereas the O/ P maintaining method preserves operator and parameter functionality. Example of these two methods of random single-site crossover is given below. O/P overlapping method: GENE1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 GENE2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Crossover point for child “1” is 17. Crossover point for child “2” is 14. Child 1 17 18 19 20 21 22 23 24 25 26 27 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Child 2 14 15 16 17 18 19 20 21 22 23 24 25 26 27 15 16 17 18 19 20 21 22 23 24 25 26 27

O/P maintaining method: GENE1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 GENE2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

Crossover point for child “1” is 17. Crossover point for child “2” is 14. Child 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Child 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

The overlap of operator and parameter genes means that there could be a loss of fitness of the genes transferred, due to a new decoding of those values, i.e., operations may no longer be operations. Without further study it would be hard to determine the full effects of this crossover technique, since it is difficult to tell which genes on the chromosome have the greatest effect on the individual’s fitness. Therefore, in order to eliminate gene overlap the O/P maintaining method of crossover will be used. Experiments are performed using both a midpoint and a random single-site crossover with O/P maintaining technique. Fig. 3, illustrates the experimental results obtained from applying these

ARTICLE IN PRESS 488

J.C. Isaacs et al. / Engineering Applications of Artificial Intelligence 17 (2004) 485–489 Table 3 Method 1 operations list renumbered for bias testing

Fig. 3. Average FIPS 140-1 performance over 100 generations using a midpoint and random single-site crossover with tournament selection.

Allele

Operation

Allele

Operation

0 1 2 3 4 5 6 7

Val1 xor Val2 w/reas Val1 xor Val2 Val1 xor Val2 w/reas Val1 xor Val2 Val1+e^(Val2) Val1 xor Val2 w/reas 1 Val1 mod Val2

8 9 10 11 12 13 14 15

Val1+Val2 Val1 mod Val2 Val1  Val2 Val1 Val2 Val1+Val2 Val1+Val2 Val1+Val2 Val1+Val2

recombination techniques to the ECRNG. From these results, it is clear that the best method for this application would be a random single-site crossover. 3.3. Mutation This operator randomly alters selected bits in a chromosome. For example, a 1 can be changed to a 9. The 0–15 integer encoding means that there is a probability p = [(1/27  14/15)  mutation rate] of a change at any one locus. Typically, we use a mutation rate between 10 and 12 percent. This means a change occurs with a probability of 0.0039 at any position on the chromosome. Higher mutation rates can be used to prevent the GA from either converging too rapidly or becoming stagnant in the current solution space neighborhood.

4. ECRNG genetic analysis Initial runs of the ECRNG, using 500 of the top 5 fittest genes evolved, per run, over 100 runs, 20 generations each, yielded curious allele distribution. A simple plotting of the chromosome frequency demonstrates a noticeable bias for alleles 0,1,4, and 5; also, this bias is not specific to either operations or parameters, but both. At first, it appeared as if this was possible because of a dominance of the addition operators over other operators, and therefore a reflection of this selectivity by the fitness function. To test this hypothesis, the operations list was renumbered as shown in Table 3. This renumbering resulted in the same biasing towards allele values 0,1,4 and 5. This was an interesting dilemma. There were two remaining possibilities for this problem: first, the genetic algorithm was selecting these alleles because of an unusual preference for these values (and not for their function), or second, the allele values were more prevalent due to an improperly randomized initial population. The second possibility, being easier to investigate, seemed like the best place to start. After reviewing the initialization procedure, it was found that the seed list used to generate the initial population was

Fig. 4. FIPS 140-1 results for 500 RNG chromosomes.

composed of numbers that were between 15- and 16-bits long. Therefore, it was determined that a list composed of 32-bit numbers would be more appropriate for the inline RNG. The seed list was replaced with 32-bit numbers and then the ECRNG was run again, using Method 1 operations list, to produce 500 highly fit solutions. These solution chromosomes were then analyzed for allele distribution and resulted in the even distribution of all alleles. This predicament extols the virtues of a random initial population that allows the GA to work without external influence. Now that the ills of biasing have been eliminated, it is time to test a few fit solutions of the ECRNG against Marsaglia’s rigorous statistical test suite DIEHARD (Marsaglia and Zaman, 1993). This test suite has been the de facto standard for over a decade (however, NIST has recently issued what it hopes will be a new standard (NIST Special Publication 800-22, 2001)).

5. RNG solutions testing After extensive work on the ECRNG algorithm, the next relevant step is to test some of the evolved RNG chromosomes. First, 500 individual chromosomes FIPS 140-1 scores were measured, as shown in Fig. 4. Then a random sample of 230 individuals is selected from this 500 and tested offline against Diehard, as shown in Fig. 5. These FIPS scores show a fit population of RNG solutions, with respect to the online fitness testing, but

ARTICLE IN PRESS J.C. Isaacs et al. / Engineering Applications of Artificial Intelligence 17 (2004) 485–489

Fig. 5. DIEHARD score distribution results for 230 RNG chromosomes.

this only demonstrates that these RNG possess randomness qualities for the first 200,000 bits generated. Therefore, they must be tested offline to determine if these qualities persist through 20 million bits. Passing all 18 tests is outstanding, but passing 14 or more is considered very difficult. Therefore RNGs with DIEHARD scores of 14+ will be considered highly fit and will be archived for future use.

6. Conclusions / Future work The ECRNG perform outstanding when compared to various known RNG methods, but it has proven difficult (and perhaps may be impossible) to give these ECRNG a full implementation in a limited amount hardware; due to the storage of lists of large prime numbers and the requirements of computationally

489

complex operations. As such, the ECRNG will continue to generate RNGs and these will be tested and catalogued for future use, but the search for an RNG that can both be evolved by a GA and implemented on an FPGA will move forward. As such, our focus has shifted to RNG that depended exclusively on hardware friendly operations, i.e., strictly Boolean operations. One such approach, which shows great promise, is the use of Cellular Automata based RNG.

References Baeck, T., Fogel, D.B., Michalewicz, Z., Back, T. (Eds), (2001). Evolutionary Computation 1: Basic Algorithms and Operators. 1st Edition. Institute of Physics Publishing, Philadelphia, May 15. Chambers, L. (Ed.), 2001. The Practical Handbook of Genetic Algorithms. 2nd Edition. Chapman & Hall/CRC, Boca Raton, Fla, London. Hernandez, J.C., Ribagorda, A., Isasi, P., Sierra, J. M. 2001. Finding near optimal parameters for linear congruential pseudorandom number generators by means of evolutionary computation. Conference Proceedings: GECCO. Isaacs, J. 2000. Watkins, R., An XOReassignment QRNG: Pass the FIPS-140 using 148 gates and a 9 bit seed, Technical Paper, 2000. Marsaglia, G., Zaman, A., 1993. Monkey tests for random number generators. Computers and Mathematics with Applications 26 (9), 1–10. NIST Special Publication 800-22, 2001. A statistical test suite for the validation of random number generators and pseudo random number generators for cryptographic applications, May 15, 2001.