Computer Physics Communications 165 (2005) 37–48 www.elsevier.com/locate/cpc
Fast recall of state-history in kinetic Monte Carlo simulations utilizing the Zobrist key D.R. Mason a , T.S. Hudson a,∗ , A.P. Sutton a,b a Department of Materials, Oxford University, OX1 3PH, UK b Helsinki University of Technology, Laboratory of Computational Engineering, PO Box 9203, FIN 02015 ESPOO, Finland
Accepted 3 September 2004 Available online 11 November 2004
Abstract We present a novel application of the Zobrist hashing method, known in the computer chess literature, to simulation of diffusional phase transformations in metal alloys. A history of previously visited states can be easily maintained, allowing very fast lookup of energies and transition rates calculated earlier in the simulation. The method has been applied to the simulation of a Fe-1at.%Cu system, with simple potentials and a transition rate for diffusional events approximated from the difference in internal energy between trial states. In this simple model at temperatures of 1073 K we find that 61.2% of the states considered during the simulation have been seen previously, and that this proportion rises to 85.1% at 773 K and even to 99.9% at 373 K. Rapid recall of these states reduces the computational time taken for the same sequence of atom-vacancy exchange moves by a factor of 6.3 at 773 K rising to over 100 at 373 K. We suggest that a similar speedup factor will be found using more sophisticated models of diffusion and that the method can, with small modifications, be applied to a wide range of kinetic Monte Carlo simulations of atomistic diffusion processes. 2004 Elsevier B.V. All rights reserved. PACS: 68.35.F; 61.72.J Keywords: Vacancies; Vacancy diffusion; Kinetic Monte Carlo; Residence time; Hash function; On-the-fly kMC
1. Introduction In kinetic Monte Carlo (kMC) simulations of crystalline solids, a dramatic increase in algorithm efficiency can be obtained by storing in memory some details about previously visited states [1,2]. In his 2000 review of kinetic Monte Carlo algorithms [3], Jónsson states: * Corresponding author.
E-mail address:
[email protected] (T.S. Hudson). 0010-4655/$ – see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.cpc.2004.09.007
38
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
The problem with fast, repeating processes which slow down the time evolution in the simulation has been addressed in several ways, but a more systematic method for dealing with this problem may be possible and would certainly be useful. It should be possible to develop algorithms where previously visited configurations are recognized and repeated cycling between configurations identified and dealt with as a local equilibrium. For example if the energy of a configuration is computationally expensive to calculate, as it is in the case of most multibody interatomic potentials, or when structural relaxations are performed after each kMC move [4], recalling the transition rates to neighboring states each time a configuration is reencountered will greatly increase the speed. One of the obstacles to utilizing this however has been the lack of an efficient method for comparing the current state of the system to the historical database of configurations visited. The most obvious method, an atom by atom comparison is time consuming, the total comparison time scaling with the size of the system multiplied by the number of states in storage. Here we present Zobrist keys [5] as the starting point for a simple and efficient method of comparing a state to the database of states previously visited. The Zobrist hashing strategy was originally devised for computer chess, Go, and other similar two player board games, where it is now widely used [6]. It is a fast method to calculate and update a signature or key representing a given board position. For a crystalline alloy or other impure crystal, we can apply the same scheme to maintain a key representing the configurations of atoms within the crystal. The analogy is quite straightforward between a board game position and the configuration of such a crystalline solid: types of pieces are analogous to chemical elements, and positions on the game board are analogous to atomic positions within the crystal. Two features of the Zobrist key make it particularly useful amongst possible hash functions for representing crystal configurations. Firstly, it maps similar configurations of atoms to very different key values, and the distribution of key values is uniform across the space of all possible keys. Secondly, it can be calculated rapidly and incrementally. In Section 2 we show how the Zobrist hashing method can be applied to simulations of diffusional phase transformation in systems with an underlying lattice. We discuss the problem of maintaining a sparse hash table and a suitable strategy for replacing old states that the system is unlikely to revisit in Section 3. We then apply this method to a simple model of diffusion in Fe-1at.% Cu in Section 4.
2. Generating a Zobrist key for crystalline configurations We consider a kinetic Monte Carlo simulation of solid-state diffusion in alloys. We might allow atoms to relax from their ideal lattice positions, but we assume that it remains possible to associate each atom uniquely with a lattice site. A state σ in the simulation is a particular configuration of atoms and vacancies on a lattice. The system moves from one state to the next by the exchange of atoms, in our case a nearest neighbor pair of atom and vacancy. We determine stochastically which state will be the next visited by comparing the transition rates between states. We would like to store the energy of previously visited states and the transition rates between states, in case the simulation subsequently returns to the same state. A configuration of atoms may be represented as an array ρ, of length N , where N is the number of lattice sites in the system. The number of distinct atomic types, including a vacancy type, is T . The ith entry of the array contains the type of atom at site i; ρi ∈ {A, B, . . . , V }. The Zobrist key for the initial configuration of atoms k is then constructed as follows: (1) A two-dimensional array z is constructed before the simulation is started, with N rows and T columns. An entry of the array can be referenced by zi,ρi . Each entry of the array is assigned a randomly-generated 64-bit integer. (2) The key for the initial configuration is initialized to k = 0.
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
39
(3) k is then constructed incrementally using the bitwise exclusive or (xor) function. For each atom i = 1, 2, . . . , N in the lattice, k xor zi,ρi → k. Thus we construct a new composite pseudorandom number which combines the signature of each atom in the configuration. 2.1. Update scheme The xor function has the following useful properties: a xor b = b xor a a xor a = 0 (a xor b) xor c = a xor (b xor c) a xor 0 = a
(1)
The xor operation acts as a toggle. If applied once it will ‘add’ an atom type ρi at a particular position i, as represented by the random sequence of 64 bits zi,ρi to the key. If applied once more it will ‘remove’ the atom from the key. The construction of the initial configuration therefore adds each atom to the key. If an atom of type A at site i exchanges with an atom of type B at site j , the key is updated in four xor steps to become: k xor zi,A → k k xor zj,B → k k xor zi,B → k k xor zj,A → k
(2)
Where the first two lines of Eq. (2) remove the atoms at sites i and j , and the third and fourth lines replace them with their types exchanged. Note that the configuration ρ will always map to the same key, independently of the path taken to reach it. Also, since the key is only a function of the atom type at each position, indistinguishable states will map to the same key. 2.2. Update scheme for vacancy diffusion The construction of the key as used above overspecifies the system. It is only necessary to utilize the lattice positions of T − 1 atom types, as the final type must occupy all remaining positions. Without loss of generality, the array entries zi,X may be set to zero for atom type X for all positions i. X should be chosen to be the atom type which is most often exchanged, as k xor 0 = k. If we are simulating vacancy diffusion, we choose X = V , then an exchange requires only two xor operations. To remove an atom of type A from site i and replace it at site j , Eqs. (2) reduce to: k xor zi,A xor zj,A → k
(3)
If the simulation is of Kawasaki dynamics in a dilute medium it may be efficient to choose X as the host medium. 2.3. Managing storage and retrieval of information Using this key to represent the entire configuration, a hash table may be employed to manage the storage and retrieval of any information we require about the state. This process is detailed in Appendix B, which also discusses the probability of two different states accidentally having the same Zobrist key. Hash tables provide an efficient method for determining whether a particular key has been seen before, and if so, detailing from where in memory details of the state may be retrieved.
40
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
Fig. 1. To show that the number of calculated transition rates required is proportional to the number of neighboring states not seen previously. The model described in Section 4 has been used. Plotting CPU time per move (in milliseconds) as a function of the number of states seen shows that the time taken for a step in the simulation is proportional to the number of transition rates that have to be calculated. The points are the average time taken per move for the simulations in Section 4. This plot uses a very large cache size M = 262,144 states, and pensionable parameter α = 0.999. Inset: increasing the cache size using the LRU strategy for a single run at T = 1073 K shows that for larger cache sizes the time taken to find the oldest state becomes significant, and so the time taken for a simulation would increase indefinitely with cache size if this simple strategy were used.
Fig. 1 shows the average computational time per kMC move, plotted as a function of the proportion of states recovered from memory in the model system in Section 4. It clearly demonstrates that there are substantial computational gains possible when revisited states are identified by this method.
3. Stored state replacement/expiration strategy Kinetic Monte Carlo simulations may comprise an indefinite number of moves, so we will want to add a very large number of states to memory. Storing more states will increase the probability that information about a transition will be recovered from memory. As hash table size and space for storing the details of visited states is limited by available computer memory, less useful states must be replaced. The exact least recently used (LRU) replacement strategy is widely used in web page caching [7] and elsewhere, but it may be inefficient in our case. This is because to guarantee we have found the least recently used state to replace in our state database, we would have to scan through the entire database of stored states, or continuously maintain a sorted list of these states, every time a new insertion into our database was made. The alternative method we propose, which we shall denote the next pensionable state (NPS) replacement strategy, compromises on the intention to remove the least recently used state from our cache: the first instance encountered of a less recently used or pensionable state is removed. This dramatically improves speed, and in fact, when designed carefully, on average reduces replacement time to a small constant independent of the total number of states stored. The failure of the LRU strategy is shown in the inset of Fig. 1. Our definition of whether a state is pensionable depends primarily on the time that state was last used. At this point we need to make a slight digression to clarify the different independent counters which could be interpreted as a time. We have identified five in this model:
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
41
(1) Real time simulated. This represents the real time (in seconds) taken for the sequence of transitions, and is determined by the sequence of states visited and the model for the rate of diffusion used. This counter is incremented stochastically with each move. It is the most interesting quantity for comparing the simulation with experiment. (2) CPU time elapsed. This is the time taken for the simulation to run (usually in hours). This is of key importance to the user. (3) Monte Carlo moves performed. This marks the number of vacancy-atom exchanges which have actually occurred. When we discuss speed we mean the ratio of this quantity to the elapsed CPU time. (4) Distinct states visited. As each new state is found and written into the memory for the first time a counter is incremented. This counter records the total number of distinct states visited through the simulation. It marks the progress of the system through its configurational phase space. (5) Distinct states visited since last use. This counter is somewhat different as it must take a different value for each state stored in the memory. Each time a state stored in memory is visited a counter is reset to zero. Then for each distinct state visited the counter is incremented. The last of these counters is difficult to implement in practice, as it would require an order (M) operation for each step, where M is the size of the memory cache. It is included because sampling from this counter would directly give a means of determining the required size of the memory cache. At low temperatures we might expect only a few distinct states to be visited between uses of a state because the system is often in a steep sided energy minimum in configuration space. At high temperatures the steepness of the sides would be reduced, allowing access to a greater number of distinct states between revisits. The expected number of distinct states visited between uses gives a scale for the volume of configuration space in a typical minimum. A small multiple of the expected number should be used for the size of the cache to ensure the bowl of the minimum is completely covered. As this last counter is not likely to be available in a real implementation, we will use the fourth counter, the number of distinct states visited, to determine whether a state is old, and therefore can be acceptably dismissed. The value of this counter is tagged to a state each time it is used, so each state i has its own time of last visit, ti . If this time is less than or equal to the value of a pensionable age cutoff p, then the state is eligible for removal. This cutoff is initially set to p = 0. The cache is filled in sequence from position 1 to position M. When the last empty space has been filled, adding a new state requires eliminating an old. Starting from the position of the last new state written, we cycle through the cache looking for a state eligible for removal. If the current time is t0 , then each time we remove a state i we set the pensionable age to: p = αti + (1 − α)t0 ,
(4)
where α is an adjustable parameter 0 α 1. If the whole cache is tested and no eligible state is found, then the oldest state has been identified in the process of checking, so that state is replaced. Note that the choice α = 0 will keep p pinned to the current time, and so all states are eligible for removal. This choice means the position that a new state is added cycles through the cache. The choice α = 1 keeps the next state removed to be at least as old as the one previously removed. As p starts at 0, this choice means the oldest state is always removed and so this returns the LRU strategy. Any intermediate choice skips over newly visited states and eliminates older candidates. This skipping means that the cache is regularly cycled, and that few states need to be considered before one is chosen for removal.
4. Modeling the Fe–Cu system In this section we describe a model system to demonstrate the utility of the Zobrist key in improving the efficiency of kinetic Monte Carlo simulations. We use a simplified model of phase separation in a Fe–Cu binary
42
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
alloy system. Diffusion occurs by the motion of a single vacancy. Periodic boundary conditions are used. We have ignored elastic relaxation effects and atoms remain at lattice positions of a bcc lattice. We allow vacancy-atom exchanges only to the eight nearest neighbors of the vacancy. Although this model is too simplistic to describe phase-separation in real Fe–Cu alloys it enables us the demonstrate the features of the Zobrist key, which is our purpose here. An alloy system was simulated to provide inhomogeneity in the system, which provides a much larger configuration space. This space will also have energy minima which may localize the system for long periods. Atoms are initially randomly positioned on the lattice with lattice constant equal to that of pure iron (a = 2.8665 Å). One lattice site is left vacant in a system of 64 × 64 × 64 unit cells. We use many body EAM potentials for the Fe–Cu system derived by Ackland and Bacon [8]. The kinetic Monte Carlo simulation proceeds by considering transition rates to neighboring configurations. The rate of the transition is a function of the difference in energy of the two configurations and the type of atom being exchanged, t [4]: E(σ ) − E(σ ) Em (t) exp − . r(σ → σ ) = νD exp − (5) kB T 2kB T Em (t) is the migration energy appropriate for an atom of type t in pure iron. νD is the Debye frequency (taken here to be the constant value 1 × 1013 Hz). E(σ ) is the energy of the configuration of atoms denoted σ . kB is the Boltzmann constant, T the simulated temperature. For a previously visited state σ managed by the hash table and referenced by a Zobrist key we must store the transition rates across the saddle points, r(σ → σ ). In addition to this we can add the following information to improve the efficiency of our calculation • The internal energy of the configuration of atoms, E(σ ). • The energy barriers of the saddle points, Em (t) + 1/2(E(σ ) − E(σ )). If we were performing a simulation on a flexible lattice we should probably add to these the displacements of all atoms from their lattice positions. Other simulation models may utilize additional data. We stated in Section 3 that we expect states to be revisited. To test the extent to which states are indeed revisited we store all states considered, without any state replacement, and continue the simulation for the first one million moves. The simulation was run from the same starting configuration with increasing simulated temperatures from 373 to 1073 K. At high temperature we expect the motion of the vacancy to be a random walk, but at low temperature we expect the vacancy to be trapped at Fe–Cu interfaces. In these simulations a simulation cell of 64 × 64 × 64 bcc unit cells with a single vacancy used, and statistics were gathered and averaged over four independent simulation runs. We consider the number of times each stored state is likely to be used. On the first visit to the state a counter r is initialized to zero. On subsequent visits this counter is incremented. The cumulative probability distribution of a state in cache being revisited r times by the end of the simulation is plotted in Fig. 3. At low temperatures a state is likely to be revisited a large number of times. For example, in a simulation at a simulated temperature of 573 K a state has a 50% chance of being revisited ten times or more. When a state is revisited it may have been first constructed some time ago. The number of MC steps (in this model corresponding to single vacancy exchanges) since the state was first constructed was recorded for each state found that was visited previously. The resulting cumulative probability distribution is shown in Fig. 2. At high temperatures, the almost random walk of the vacancy means that the simulation is moving steadily through the configuration space. It is therefore unlikely that the system revisits a configuration first constructed a long time ago. At low temperatures the system gets trapped in low energy configurations, increasing the probability that a state could be revisited long after it was first seen. The states most likely to be revisited are those recently seen. This is especially true when the system is localized in low energy regions of the configuration space. At low temperatures it is therefore the case that even if only
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
43
Fig. 2. The number of MC steps elapsed since a revisited state was first constructed. The points for m > 100 steps were collected into bins, and the points plotted are for the centre of these bins. The steps seen at low numbers of steps are an artifact of the simulation being performed on a bcc lattice. As the single vacancy moves from one simple cubic sublattice to the other each step, it is not possible to revisit a state first constructed an odd number of moves previously. The lines connecting points are to guide the eye.
Fig. 3. The cumulative probability distribution of a given state in the cache being used at least r times. The intersection of the line with the vertical axis gives the probability that a state will ever be reused. The lines connecting points are to guide the eye.
a relatively small number of states are stored in memory, a high speedup factor can be achieved, whereas at high temperatures even an enormous cache will have a more limited impact on possible speedup. In Fig. 4 the proportion of states considered in the simulation that were found in the cache, whether previously visited by the system or as trial configurations, is plotted as a function of the cache size. The highest number of states identified as having been seen before is strongly dependent on temperature. Beyond a cache size of a few hundred stored states diminishing returns make further increases in cache size ineffective. However, as a general rule the cache size should be set as large as possible within computational limits, and the NPS replacement strategy should be used with a value of the pensionable age parameter α just slightly less than one. We have found a value of α = 0.9 to be efficient. Finally, we consider the specific speedup available for the Fe-1at.%Cu system modeled. The simulations were performed on a 2.4 GHz Pentium 4m processor. If the time taken to find the transition rates is large compared to the overhead of constructing trial states and making a selection, then we can deduce a theoretical maximum speedup from Fig. 4, given by smax = 1/(1 − p). The speedup s achieved by our model is somewhat less that this as we have a finite overhead, seen in Fig. 1 to be 11.7 microseconds per move. As mentioned in Section 4, our model estimates the transition rate between states i and j as a function of the internal energy of the states. The energy of state j will be stored if j has been previously visited, or if the transition k → j has been considered—i.e. if any of the states k neighboring j had been visited. If we were to calculate
44
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
Fig. 4. The proportion p of considered states found in the cache as a function of cache size M. The points for infinite cache size are estimated using M = 262,144. The lines connecting points are to guide the eye.
Fig. 5. Speedup seen in our simulation, together with theoretical maximal speedups using a model for the transition rates based only on difference in energy between states, and a model using saddle point energies calculated directly. The lines connecting points are to guide the eye.
saddle point energies on the fly, using the dimer method or otherwise [9–11], then the energy of the barrier i → j would be available only if state i or state j had been visited. Using this model we would find a second maximum = 1/(1 − p ), where p is the proportion of required saddle energies found in cache. speedup. This is given by smax The speedup achieved by our model and the two theoretical maxima is shown in Fig. 5. It is found that the speedup in our model is greatest at low temperatures where states are revisited many times. For a simulation at 573 K we find an increase in computational speed, that is the number of vacancy-atom exchanges performed per second of CPU time, of 43.1. At 873 K this speedup has dropped to 3.8. At very low temperatures the achieved speedup reaches a maximum value given by the ratio of the time taken to calculate all transition rates and the overhead per move in constructing trial states and selecting a move. This maximum is found to be over 100.
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
45
5. Conclusions We have shown in Section 4 that in kinetic Monte Carlo simulations of atomistic diffusion on a lattice, states can be revisited many times. At low temperatures the system is more likely to be localized in low energy regions of configuration space. This has the effect of making recently visited configurations of atoms more likely to be revisited. We have shown that by utilizing the Zobrist key together with appropriate memory replacement strategies, the previously visited states can be efficiently identified and information about them can be recalled. This makes substantial computational savings because energies and transition rates do not need to be recalculated for those states. For a kinetic Monte Carlo simulation of diffusion in Fe-1at.% Cu, we found that a speedup of 2.5 was possible even for the highest temperature simulation of 1073 K using the Zobrist scheme for identifying states and NPS strategy for maintaining a history of previously visited states. With lower simulated temperatures, the speedup factor we achieved is increased, to 6.3 at 773 K rising to 103.9 at 373 K. We have used rather simple potentials and constrained the simulation to take place on a rigid lattice. We have also used only internal energy differences between states to construct a transition rate rather than attempting to evaluate accurately the free energy barrier between a metastable state and the corresponding saddle point. We argue that these points could, in principle, be corrected. A careful examination of possible exchanges could be undertaken [12], bond-order potentials used [13,14], the condition of lattice rigidity could be relaxed [4,15] and the saddle points located on-the-fly. All these improvements in the physical description would increase the computational time required for the same number of Monte Carlo moves. Most importantly we also believe that the speedup due to the maintenance of the history of stored states would be very similar in a more sophisticated model to that reported here. With more costly schemes for calculating the transition rate to trial states this speedup factor would be increased further for the lower temperature simulations. If the overhead costs of finding trial configurations and selecting the move, which we have measured to be a few microseconds, are negligible compared to the time taken to calculate the rate of transition between states, then a theoretical speedup of over 800 should be possible for a simulation at 373 K. The method we have presented is generally applicable to kinetic Monte Carlo simulations based on a lattice, or where the atoms are displaced from an underlying lattice. More generally the methodology of managing a rolling history of recently visited states is applicable to any kinetic Monte Carlo simulation with indexable states, for instance where the states are defined entirely by bond connectivity. In these general circumstances another hashing function instead of the Zobrist key will have to be developed.
Acknowledgements D.R.M. was funded by EPSRC grant number GR/R67699/02. T.S.H. was funded by the UK Office of Science and Technology, EURATOM, the ORS scheme, and a Linacre College Applied Materials scholarship. The authors would like to thank Professor Alfred Cerezo for useful discussions.
Appendix A. Algorithm in pseudocode In this section we present an example of the n-fold way kinetic Monte Carlo algorithm, as presented originally by Lebowitz et al. [16], decorated with calls to the Zobrist key and hash table routines. We start with an input ‘initial’ state σ , and an initial Zobrist key k. There are F possible events which can occur from this state, taking the system to one of F ‘final’ states. We would like to find which of the ‘final’ states the system moves to, plus the time taken for this move, in as few calculations as possible. This algorithm will overwrite
46
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
σ with the selected final state σe , and maintain the history of previously visited events. Comments are indicated by a preceding hash character (#). The Hash key routine “newHashKey(k, v, a, ρ)” is defined in Eq. (2). Two calls to the history of stored states are also required. Firstly the call “getHistory(k, σ )” looks up the Zobrist key k in the hash table. If present, the previously visited state σ is returned. If not, σ is a dummy state with no information. Secondly the call “putHistory(k, σ )” looks to see if a stored state with the Zobrist key k exists, and if so replaces this state with σ . If not, a new position in the hash table is created and the state σ stored. If the hash table is full then the Next Pensionable State strategy is used to decide upon a state which has outlived its utility and can be eliminated. A.1. Kinetic Monte Carlo with Zobrist key 1. loop e = 1, F # loop through each possible event v = vacancyStart(e) # find vacancy location for event e a = vacancySwap(e) # find atom swapping with vacancy ρ = atomType(a) # find type of atom at position a ze = newHashKey(z, v, a, ρ) # calculate new Zobrist key call getHistory(ze , σe ) # retrieve all stored information end loop # at this point all the information available in the history has been copied # into the array of trial states σe . 2. (exchange information between σ and the trial states σe ) # this step is simulation dependent. The free energy of the saddle # point must be equal from either side of the barrier, so some # transition rates could be transferred across. 3. eventsLeft = EventsWithoutRates(σ ) # find an array containing the states which do not yet have rates 4. loop i = 1, size(eventsLeft(i)) # loop through the remaining states # this loop can be parallelized e = eventsLeft(i) # find event number call findRate(σ, σe ) # calculate unknown rate call putHistory(ze , σe ) # store new information in σe end loop # at this point σ has a rate for every event e. # the next move can therefore be determined 5. call putHistory(z, σ ) # store the new information in σ 6. e = selectEvent(σ ) # stochastically decide which event to perform 7. t = selectTime(σ ) # stochastically determine the time taken 8. if (free energy of σe is missing) find free energy of σe # this is unlikely, but could happen if σ were last visited # some time ago, and it’s neighbors have since been written over. 9. (share uncorrelated events) # If performing one physical process does not affect the rate # of another, then the rates of these uncorrelated events can # be transferred across at this stage if desired. 10. call performEvent(e) # exchange atoms 11. σ = σe # overwrite state σ 12. z = ze # overwrite Zobrist key
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
47
Appendix B. Hash tables with the Zobrist key Using the Zobrist keys defined in Section 2, it is a simple matter to uniquely tag atomic configurations, but it is not so easy to search through a list of tags until the one which is desired is found. Such a search operation takes order M comparisons, where M is the size of the data set. Hash tables [17,18] are an efficient and commonly used method for storing and retrieving entries in large data sets. A hash table is a large array of which only M entries are occupied with pointers to the memory locations of the M items in the dataset. If we want to store or retrieve data, in our case the energy and transition rates from a state σ , the procedure is: (1) Construct a Zobrist key kσ for the state. (2) Map this key to a particular element of the hash table array j = g(kσ ). The function g is a many-to-one function, which we have taken to be merely extracting the highest few bits. (3) Examine the pointer in element j of the hash table. If it is null, then σ is not stored in memory. (4) If element j is not null, then it points to a dataset entry σ . This is expected to be the desired dataset entry, but, owing to the flexibility in the construction of the Zobrist key and the hash table element, this is not guaranteed. A test must be performed to ensure that kσ = kσ , as this ensures σ = σ (also see later discussion). (5) A strategy must be defined to look for the correct state if this test fails. The simplest strategy is to look at the hash table element j + 1 and return to step 3. This is known as linear probing [19]. However care must be taken in maintaining the correctness of the hash table if linear probing is used in conjunction with a replacement strategy (as described in Section 3), where entries are periodically removed from the hash table. At step 3 we have the possibility of exiting the loop with σ not found. If a retrieve operation was being performed we therefore must calculate or fetch the data desired for σ by some other means. In our vacancy diffusion example, the energy of a configuration of atoms and the transition rates must be calculated from the potentials. However, if we were performing a store operation we must determine how and where best to store the data. In step 4, we can find a pointer to an incorrect state. Two possible errors can occur: • We have mapped the same hash table element to two stored states. If we have M states stored in memory, and a hash table containing e elements, there is a M/e chance of this happening each time a new state is stored. This error can be overcome using linear probing, but it is best to keep the ratio M/e small. We therefore want our hash table to remain sparse, and so must clean up our database of previously visited states as the simulation progresses. Our replacement strategy and the physical justification for this strategy is discussed in Sections 3 and 4. • We have mapped two different states to the same Zobrist key. If this occurred during a simulation, the results of the simulation would be incorrect because it would falsely assume that it had seen the state before, and use errant data. We now consider the probability of this occurring using a 64-bit key. The number of possible atomic configurations is: N nconfig ≈ (B.1) ct −ct , t
where ct is the concentration of atomic type t and the product runs over all types, including the vacancy. Since there are only 264 distinct Zobrist keys, for a large number of atoms in the system each key does not uniquely identify one configuration. However, since the keys generated are evenly distributed by construction, the chances of finding a particular one repeated is M/264 . If over 109 different moves in the simulation (a lengthy kMC run) each state is compared to a database of 106 previously stored configurations, the probability of a single collision of this kind is under 1015 /264 ≈ 5.4 × 10−5 . This justifies our use of a 64-bit Zobrist key. The size of the key can be increased to further reduce this probability if required.
48
D.R. Mason et al. / Computer Physics Communications 165 (2005) 37–48
References [1] D.R. Mason, R.E. Rudd, A.P. Sutton, Stochastic kinetic Monte Carlo algorithms for long-range Hamiltonians, Comput. Phys. Comm. 160 (2) (2004) 140–157. [2] F. Montalenti, A.F. Voter, Exploiting past visits or minimum-barrier knowledge to gain further boost in the temperature-accelerated dynamics method, J. Chem. Phys. 116 (2002) 4819–4828. [3] H. Jónsson, Theoretical studies of atomic scale processes relevant to crystal growth, Ann. Rev. Phys. Chem. 51 (2000) 623–653. [4] D.R. Mason, R.E. Rudd, A.P. Sutton, Atomistic modelling of diffusional phase transformations with elastic strain, J. Phys. Condens. Matter 16 (2004) 1–19. [5] A.L. Zobrist, A new hashing method with applications for game playing, Technical report 88, Computer Science Department, University of Wisconsin, Madison, 1970; Reprinted in: ICCA J. 13 (1990) 69–73. [6] T.A. Marsland, in: second ed., Encyclopedia of Artificial Intelligence, Wiley & Sons, 1992, pp. 224–241. [7] V.S. Mookerjee, Y. Tan, Analysis of a least recently used cache management policy for web browsers, Oper. Res. 50 (2002) 345–357. [8] G.J. Ackland, D.J. Bacon, A.F. Calder, T. Harry, Computer simulation of point defect properties in dilute Fe–Cu alloy using a many-body interatomic potential, Phil. Mag. A 75 (3) (1997) 713–732. [9] G. Henkelman, H. Jónsson, A dimer method for finding saddle points on high dimensional potential surfaces using only first derivatives, J. Chem. Phys. 111 (15) (1999) 7010–7022. [10] G. Henkelman, H. Jónsson, Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points, J. Chem. Phys. 113 (22) (2000) 9978–9985. [11] D. Passerone, M. Ceccarelli, M. Parrinello, A concerted variational strategy for investigating rare events, J. Chem. Phys. 118 (5) (2003) 2025–2032. [12] G. Henkelman, H. Jónsson, Long time scale kinetic Monte Carlo simulations without lattice approximation and predefined event table, J. Chem. Phys. 115 (21) (2001) 9657–9666. [13] D.G. Pettifor, M. Aoki, Bonding and structure of intermetallics—a new bond order potential, Phil. Trans. Roy. Soc. A 334 (1991) 439–449. [14] D. Nguyen-Manh, D.G. Pettifor, D.J.H. Cockayne, M. Mrovec, S. Znam, V. Vitek, Environmentally dependent bond-order potentials: New developments and applications, B. Mater. Sci. 26 (1) (2003) 43–51. [15] H. Ikeda, H. Matsuda, Computer simulation of phase decomposition process generating precipitates harder than matrix, Mater. Trans. JIM 37 (1996) 1413–1421. [16] A.B. Bortz, M.H. Kalos, J.L. Lebowitz, A new algorithm for Monte Carlo simulation of Ising spin systems, J. Comp. Phys. 17 (1975) 10–18. [17] A.I. Dumey, Indexing for rapid random access memory systems, Comput. and Autom. 5 (12) (1956) 6–9. [18] D.E. Knuth, The Art of Computer Programming, Sorting and Searching, vol. 3, Addison-Wesley, Reading, MA, 1973. [19] G. Schay Jr., W.G. Spruth, Analysis of a file addressing model, Comm. ACM 5 (8) (1962) 459–462.