Drivers of Late Pleistocene human survival and dispersal: an agent-based modeling and machine learning approach

Drivers of Late Pleistocene human survival and dispersal: an agent-based modeling and machine learning approach

Quaternary Science Reviews 221 (2019) 105867 Contents lists available at ScienceDirect Quaternary Science Reviews journal homepage: www.elsevier.com...

2MB Sizes 0 Downloads 15 Views

Quaternary Science Reviews 221 (2019) 105867

Contents lists available at ScienceDirect

Quaternary Science Reviews journal homepage: www.elsevier.com/locate/quascirev

Drivers of Late Pleistocene human survival and dispersal: an agentbased modeling and machine learning approach Ali R. Vahdati a, *, John David Weissmann a, Axel Timmermann b, c,  n a, Christoph P.E. Zollikofer a, ** Marcia S. Ponce de Leo a b c

Department of Anthropology, University of Zurich, Zurich, Switzerland Pusan National University, Pusan, South Korea Center for Climate Physics, Institute for Basic Science, Busan, South Korea

a r t i c l e i n f o

a b s t r a c t

Article history: Received 30 July 2019 Accepted 4 August 2019 Available online xxx

Understanding Late Pleistocene human dispersals from Africa requires understanding a multifaceted problem with factors varying in space and time, such as climate, ecology, human behavior, and population dynamics. To understand how these factors interact to affect human survival and dispersal, we have developed a realistic agent-based model that includes geographic features, climate change, and time-varying vegetation and food resources. To enhance computational efficiency, we further apply machine learning algorithms. Our approach is new in that it is designed to systematically evaluate a large-scale agent-based model, and identify its key parameters and sensitivities. Results show that parameter interactions are the major source in generating variability in human dispersal and survival/ extinction scenarios. In realistic scenarios with geographical features and time-evolving climatic conditions, random fluctuations become a major source of variability in arrival times and success. Furthermore, parameter settings as different as 92% of maximum possible difference, and occupying more than 30% of parameter space can result in similar dispersal scenarios. This suggests that historical contingency (similar causes e different effects) and equifinality (different causes e similar effects) are primary constituents of human dispersal scenarios. While paleoanthropology, archaeology and paleogenetics now provide insights into patterns of human dispersals at an unprecedented level of detail, elucidating the causes underlying these patterns remains a major challenge. © 2019 Elsevier Ltd. All rights reserved.

Keywords: Out of Africa Human dispersal Model validation Machine learning Sensitivity analysis Pleistocene Global Climate dynamics Paleogeography Data treatment Data analysis

1. Introduction There is now almost general consensus that Homo sapiens evolved in Africa over an extended period of time, and that African populations of our species started dispersing over the entire globe during the Late Pleistocene, resulting in the peopling of all habitable regions of our planet (Groucutt et al., 2015; Galway-Witham and Stringer, 2018; Scerri et al., 2018). How exactly the global spread of humans unfolded over time and space, however, remains a question of intense research. Was there a single “out-of-Africa” event, or multiple dispersal waves, or a continuous flow of people? Did people migrate along specific routes, or diffuse along

* Corresponding author. ** Corresponding author. E-mail addresses: [email protected] (A. R. Vahdati), [email protected] (C.P.E. Zollikofer). https://doi.org/10.1016/j.quascirev.2019.105867 0277-3791/© 2019 Elsevier Ltd. All rights reserved.

environmental gradients? And which factors mediated population dispersal from Africa? A wide range of approaches is currently used to address these questions, and to reconstruct past dispersal events at various levels of spatial and temporal scale. Paleoanthropological and archaeological evidence, as well as patterns of genetic and phenetic diversity in past and extant populations have led to important insights into the complexities of human spatiotemporal population dispersals, such as the timing of arrivals outside Africa (Liu et al., 2015; Bae et al., 2017; Clarkson et al., 2017; Westaway et al., 2017; Groucutt et al., 2018; Hershkovitz et al., 2018; O'Connell et al., 2018; Westaway, 2019), population bottlenecks and founder effects, local population admixture and replacement (von CramonTaubadel, 2014; Nielsen et al., 2017). Also, surprising new insights emerge about the admixture of H. sapiens with archaic local populations such as Neanderthals and Denisovans over extended periods of time and spatial ranges (Dannemann and Racimo, 2018;

2

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

Wolf and Akey, 2018; Yang and Fu, 2018). This rapidly-developing field of research produces new evidence on dispersal patterns and dynamics on an almost daily basis (Harvati et al., 2019). However, we only begin to understand the underlying causes and mechanisms of dispersals. Various factors have been proposed to have been involved in these dispersals, such as climate (Abbate and Sagri, 2012; Eriksson et al., 2012; Stewart and Stringer, 2012), sea level changes (Armitage et al., 2011; Abbate and Sagri, 2012), tectonics (Bailey and King, 2011; Winder et al., 2015), ecology, faunal turnover rates (Palombo, 2013), food resources (Dennell, 2003; Parton et al., 2015), demography (Dennell, 2003; Abbate and Sagri, 2012), as well as cognition and technoculture (Tattersall, 2009). It is likely that a combination of these factors has contributed to human dispersals from Africa, but reconstructing the influence of each of them remains a challenging task. The difficulty is because direct experimental testing of past dispersal processes is not possible, and empirical data are sparse and incomplete. Computer models have thus been used as “in silico” experimental platforms to explore dispersal scenarios as a function of various model parameters, and to assess the likelihood of alternative scenarios given the available empirical data (Eriksson € lzchen et al., 2016). However, testing et al., 2012; Rabett, 2018; Ho alternative scenarios with computer simulations has several limitations. First, only a small proportion of all possible scenarios can be tested explicitly (Reyes-Centeno et al., 2015). Second, it remains open whether the model scenario that best fits the empirical data is the most likely one. To tackle the second issue, Approximate Bayesian Computing (ABC) has been used, where thousands of model parameter combinations can be tested to evaluate the most likely parameter settings given a set of empirical data (reviewed in , 2006)). This approach has been used to (Marjoram and Tavare show that climate and local ecosystem productivity and rates of resource exploitation played a significant role in past human dispersals (Eriksson et al., 2012). However, a set of fundamental questions pertaining to human dispersal models has not yet been addressed. How do the various factors potentially influencing dispersals interact with each other, and how does their influence vary in time and space? What is the importance of historical contingency (similar initial conditions e different outcomes) and of equifinality (different initial conditions e similar outcomes)? Answering these questions is essential for understanding the potentials and limitations of human dispersal models, and for inferring the causes of the past dispersal events from the available empirical data. Here we present a new dispersal simulation model addressing these questions. The model incorporates individual- and population-level parameters as well as time-varying boundary conditions in geography, sea level and climate. These simulations are not to replicate the historical dispersal events, but to explore the combined effects of possible drivers of dispersals. We have previously compared our computational model against theoretical expectations and validated its behavior (Callegari et al., 2013; Tkachenko et al., 2017). Fig. 1 illustrates the different stages of our analysis. We used a spherical grid based on the subdivision of an icosahedron, each node of which is 25e33 km apart from its neighbors. Each node of the grid has a set of biotic (food availability) and abiotic (geographical features) attributes. Food availability is measured in Net Primary Production (NPP), which is estimated based on simulated past regional temperature and precipitation levels (Timmermann and Friedrich, 2016). Geographical features include altitude (Amante and Eakins, 2018), sea level (Waelbroeck et al., 2002), ice cover (Ganopolski et al., 2010), and present-day rivers

(Made with Natural Earth., 2018). These biotic and abiotic attributes determine the carrying capacity of each node. We first use a ramp function to translate a node's NPP to its carrying capacity K (see Material and Methods), and then modify K according to the geographical features. For example, individuals cannot live in icecovered or too elevated regions, whereas rivers and sea shores increase the carrying capacity of their nodes. Simulations start with a population of N ¼ 140 individuals in Southern Africa at 85 ka (“burn-in” phase), largely coinciding with the second phase of dispersal of Homo sapiens from Africa (Groucutt et al., 2018). At every time step agents can perform different actions such as moving, mating and reproducing. Movement is influenced by the environment: the direction in which to move is a random choice weighted by altitude and carrying capacity of the neighboring nodes. The agents' choices of movement are independent from each other. Tables S1 and S2 describe environmental and agents' configurable attributes, respectively. In addition to attributes of grid nodes, i.e. NPP and geographical features, we analyzed the effect of the movement probability of individuals (pm ), and population birth rates (b0 ) on survival and dispersal (see Table S2). See Materials and Methods for how birth and death probabilities are calculated. We modeled dispersals at four levels of complexity (Fig. 1): a “uniform earth” scenario where individuals move on a featureless earth in any direction with equal probability; a “uniform continents” scenario, where individuals move with equal probability on landmasses in the shape of continents; a “constant NPP” scenario, where individuals live under realistic but time-invariant topographic and NPP conditions; and a “dynamic NPP” scenario, where individuals live under realistic time-evolving topographic and NPP conditions (Waelbroeck et al., 2002; Ganopolski et al., 2010; Timmermann and Friedrich, 2016). Fully analyzing the simulation outcomes as a function of parameter settings is practically infeasible because simulations are computationally expensive: each 85 000-year simulation takes 2e5 h depending on parameter settings. Moreover, because of the intrinsic stochasticity of agent-based models, which also limits parallelization, several replicates per parameter setting are required. Thus, analyzing the effect of each parameter and its interactions with other parameters requires tens of thousands of simulation runs, which can take several months. We tackle these problems using a rarely-benefited technique in this field: machine learning (Hastie et al., 2009; Jo et al., 2017). We construct mathematical surrogate-models of the simulations, which can estimate the computational model's behavior to a high degree of accuracy, and in a short time. The surrogate-models are tested against various subsets of the data to make sure that they identify general patterns of data, regardless of parameter values. We then use the surrogate-models to perform a sensitivity analysis (see Materials and Methods), which quantifies the effect of each parameter and of parameter interactions on model outcomes. Predicting the simulations' behavior with machine learning can also inform us to what degree model parameters determine its output and how much stochasticity there is. Using this approach we address two questions: a) does a local population survive under a given parameter setting, and b) if yes, how long does it take for that population to arrive in a given geographical region (Table S4). We construct decision trees to reveal the complex network of interactions between parameters (Fig. 2), and how each parameter can increase or decrease survival likelihoods or arrival times at specified geographic locations.

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

3

Fig. 1. Study design. The figure illustrates how different simulations were combined to inform us about drivers of Late Pleistocene human extinctions and dispersals. First, we ran simulations on grids in four different simulation scenarios. Each world varies in the ‘node parameters’ (see Tables S1 and 1). Simulations also vary in population parameters (see Tables S1 and 1). Simulations determine population extinctions and arrival times. We use machine learning to understand the relationships between this large parameter space and simulation outcomes, which informs us about the predictability of simulation outcomes, helps us perform sensitivity analysis, and allows us determine parameter value combinations that result in similar arrival times.

Fig. 2. Parameter values that control population survival in the uniform earth scenario. The decision tree represents the outcomes of an ensemble of 24 000 simulations. The tree is structured according to the importance of a given parameter for population survival in the simulations (top to bottom of the tree hierarchy), and according to the parameter's threshold value above/below which a majority of simulations results in population extinction versus survival (left versus right branches). The top box comprises all simulations (24 000 ¼ 100%). Each box comprises the following information: line 1: the percentage of simulations according to the parameter threshold criterion defined in the parent box; box color: the percentage of simulations that lead to extinction (orange) versus survival (blue) under local parameter settings; line 2: the parameter threshold criterion for the two child boxes. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

2. Results 2.1. Drivers of population survival We first trained machine learning algorithms to predict

survivals from population parameters. The algorithms made predictions with accuracies and specificities ranging between 97 and 99% (see the supplementary Materials and Methods). Then, we used the machine learning models to predict population survivals for tens of thousands of random parameter value combinations and

4

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

analyzed the sensitivity of model outcomes to changes in parameter values. The results (Fig. 3 and Table S3) show that birth rate b0 is the single most important parameter determining population survivals. After that, parameter interactions determine population survivals more than any single parameter. Interactions occur between parameters when their combined effects on model outcomes are not additive: the effect of one parameter varies as a lez and Cox, function of other parameters (Berrington de Gonza 2007). Parameter interactions account for 18%, 19%, 10%, and 5% of variability in survivals in the four simulation scenarios, from the simplest to the most complex scenario, respectively. Even in the simple uniform earth simulations, population survivals depend on complex parameter interactions (Fig. 2).

2.2. Drivers of population dispersals The following changes in each parameter increase survival probabilities: higher b0, lower movement probability pm (Table S3 e total effects), and higher NPP (global NPP in the uniform earth and uniform continents, or NPPmax e Table 1) (Fig. 2 and Fig. S4). Changes to Kmax (maximum carrying capacity) have little effect on

Fig. 3. Parameter importances for population survivals. The figure shows parameter importances in our four modeling scenarios (color legend). X-axis shows model parameters and y-axis shows the first order Sobol indices on the effect of a parameter on population survivals. First order effects show how much each parameter accounts for population survivals without taking into account interactions with other parameters. Effect of interactions is the last point on the x-axis. For exact numbers and total effects, which show parameter importances when taking interactions into account, see Table S3. Not all models have the same number of parameters. This is evident in the number of bars. Error bars show 95% confidence intervals. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

survival probabilities (Fig. S4). Kmax has a small range, and that could explain why we do not see it affecting survivals. We chose the current range because values below Kmax ¼ 40 (0.052 individuals per km2 ) lead to many extinctions, and values above Kmax ¼ 50 (0.065 individuals per km2 ) are too far from empirical estimates (Zahid et al., 2016). Survival probabilities increase at higher b0 because it leads to higher birth to death probabilities when population size is below the carrying capacity (Fig. S2). With an increased pm , a population quickly spreads to a larger area, reducing the mating chance of each individual, and hence reducing the survival probability. Higher global NPP increases survival probability by increasing the carrying capacity of a region. See Table 1 for a biological interpretation of the parameter ranges. Population dispersals in the two simpler scenarios are not affected as much by parameter interactions as population survivals (Fig. 4 vs Fig. 3). This means that each parameter's effect on arrival times in the two simpler scenarios is more or less constant across all values of other parameters, and the values of other parameters do not change a parameter's contribution to arrival times. In the constant NPP and dynamic NPP models, parameter interactions can account for a large variance of arrival times (Fig. 4). This shows how a model's behavior changes, and becomes nonlinear, after introducing environmental heterogeneity. b0 is still the largest determinant of arrival times, even more than movement probability pm. The only exception is dispersals from the Middle East to Southeast Asia, where pm is more important. Global NPP and pm have similar magnitude of effects in determining arrival times in the uniform earth and uniform continents models. The effect of NPPmax and Kmax , however is small in the constant NPP and dynamic NPP models (Figs. S1 and 4). pm is expectedly a major determinant of arrival times in almost all models (Fig. 4). It has to be high for a population to disperse quickly, but not too high, otherwise they die out (Fig. S4c). b0 and NPP are relevant because they affect population size, and a larger population has more chance of successfully dispersing to new lands. Restricting population dispersals to landmasses (the uniform continents scenario) does not qualitatively change parameter importances for dispersal (Figs. 3 and 4). Adding geographical features alone or geographical features and fluctuating climate introduces more stochasticity into simulation outcomes. Specifically, arrival times become mostly unpredictable in the constant NPP and dynamic NPP scenarios: model parameters alone can poorly explain model outcomes because landscape and/or climate features make arrival times stochastic (see Materials and Methods and Table S4). We asked whether the Sahara is the main source of stochasticity, making arrival times unpredictable. To that end, we constructed machine learning models to predict arrival times from the Middle East to other regions out of Africa. In the constant NPP scenario, the new starting region only helped predictions to the Arabian Peninsula and Southeast Asia, but predictions are still poor for other regions (Table S5). In the dynamic NPP scenario, arrival times from the Middle East are even less predictable (see Materials and

Table 1 Simulation scenario parameters and their ranges. The range of b0 translates 2.7 to 9.45 infants per life time of a female, considering fertile ages of 18e45 years, which encompasses the estimated value of 4.11 for hunter-gatherer and agricultural populations (Zahid et al., 2016). Each individual moves on average 1.5 km per year for the lowest pm , and 6 km per year for the highest one. A Kmax of 40 and 50 translates to an average of 0.052 and 0.065 individuals per km2, respectively, as Kmax applies to each node of the world grid which is on average 770 km2 . Scenario

Uniform earth Uniform Continents Constant NPP Dynamic NPP

Environmental Boundary conditions

Population parameters ranges

Topography

Climate/NPP

global NPP

b0

pm

Kmax

NPPmax

2D 2D 3D 3D

Uniform Uniform Time-constant Time-evolving

0.3e0.6 0.3e0.6 NA NA

0.10e0.35 0.10e0.35 0.10e0.35 0.10e0.35

0.05e0.2 0.05e0.2 0.05e0.2 0.05e0.2

45 45 40e50 40e50

1.2 1.2 0.8e1.2 0.8e1.2

land land land land

only þ sea þ sea þ sea

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

5

Fig. 4. Parameter importances for arrival times. The figure shows parameter importances in our four modeling scenarios: a. the uniform earth, where arrival times to a single region located in the Horn of Africa is analyzed because arrival times are equal in all directions; b. the uniform continents; c. the constant NPP scenario; and d. the dynamic NPP scenario. We only included regions where surrogate machine learning models explained a minimum of 90% of the variance of simulation output. In d, arrivals to the Arabian Peninsula and Southeast Asia are from the Middle East. In all panels, the x-axis shows model parameters and the y-axis shows the first order Sobol indices on the effect of a parameter on arrival times to the sampling regions. First order effects show how much each parameter accounts for arrival time without taking into account interactions with other parameters. Effect of interactions is the last point on the x-axis. For exact numbers and total effects, which show parameter importances when taking interactions into account, see the electronic supplementary material.

Methods and Table S5). Arrival times in both constant NPP and dynamic NPP scenarios have large variances, and therefore, even when a meta-model manages to explain more than 90% of the variance of actual arrival times, its predictions are still different from the simulated arrival times by several thousand years (Table S4). Only when the variance of actual arrival times is lower than 2 000 years, are the predictions accurate within a few hundred years. Adding more simulations or increasing the number of features by adding second and third degree polynomials, does not often help the predictions (Table S6). This unpredictability is inherent to the simulations. Adding more complexity to the simulations does not necessarily lead to more stochasticity and unpredictability. Machine learning models can better predict arrival times in the dynamic NPP scenario than the constant NPP scenario, where climate is fixed at 85 ka. This can happen because climate changes can block and open corridors. When a corridor is blocked, populations with different parameter settings have little advantage over one another, as they all get stuck behind the corridor. After the corridor opens, all populations start from this new starting point to disperse. This makes it easier to predict arrival times to different regions because there are fewer random elements along the way, as starting points keep resetting. This phenomenon is also evident from comparing the variance of arrival times between the constant NPP and dynamic NPP scenarios. In the constant NPP simulations, populations arrive in Northern South America with a Coefficient of Variation (CV) of 0.26 whereas populations in the dynamic NPP models arrive with a CV of 0.02 (Table S4 and Fig. 5). As another example, variance of arrival times

in Middle East can be explained by up to 93% in the dynamic NPP scenario, but only 47% explained in the constant NPP scenario. Here again, the CV of arrival times is smaller in the dynamic NPP scenario (0.83 vs 0.57 - Table S4). Note the different distributions of arrival times in the constant NPP and dynamic NPP scenarios (Fig. 5). In the dynamic NPP scenario, arrivals occur mostly in short time spans and there is almost no arrivals events between these clusters. In the constant NPP scenario, arrival time distributions are skewed where there are more or less arrivals during most of the simulation time. This is another observation that suggests the gatekeeper effects of changing climatic conditions. The other noteworthy difference between the two distributions is that climate change slows arrivals to most of the sampling regions. This is also evident from the mean and median arrival times in the dynamic NPP and the constant NPP scenarios (Table S4). At the same time, stochasticities induced by climate change decrease successful dispersals to the Americas. Only 1.5% (73 of 4 705) of simulations in the dynamic NPP arrived in Patagonia, in contrast to the 11.6% (544 of 4 705) of the simulations in the constant NPP. 2.3. Equifinality To assess whether there are certain parameter values that lead to similar arrival times (equifinality), we calculated the fraction of parameter space that leads to similar arrival times (within 1 000 years) to some sampling regions in the dynamic NPP scenario (Fig. S5). We predicted population survivals of 60 000 parameter

6

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

Fig. 5. Distribution of arrival times to eight sampling regions in the constant NPP and dynamic NPP scenarios. The map shows the carrying capacity at 85 ka (warmer colors denote higher carrying capacity), i.e. the time at which constant NPP simulations were performed. Each histogram on the map shows the distribution of arrival times to a region (histograms' titles). X-axes show arrival times in ka, and y-axes are normalized frequencies in percentage. Each histogram shows distributions in two modeling scenarios: in black simulations in the dynamic NPP scenario, and in red simulations in the constant NPP scenario. Note multi-modal histograms in the dynamic NPP scenario. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

settings, and then we determined the arrival times of survived populations. The peak of arrival time distributions show that up to more than 30% of parameter settings can lead to similar arrival times. The parameter settings that lead to similar arrival times can be quite different from one another, as different as 92% of the maximum possible difference (Fig. S6). Our estimated arrival times in the Iberian Peninsula, depending on parameter value combinations of 60 000 simulations, are between 831 and 73 831 years ago. The distribution of the arrival times (Fig. S5d), however, shows three distinct peaks, one of which at around 55e35 ka largely overlaps with the estimates of an archaeological study of first arrival times to the Iberian Peninsula at s-Sa nchez et al., 2019; de la Pen ~ a, 2019). around 45 ka (Corte We obtained all the different parameter settings that result in similar arrival times in the Iberian Peninsula with a maximum of 1 000 years difference within the 50e40 ka interval (Fig. S6). Such parameter settings can be as different as up to 66% of the maximum possible difference in our simulations. For example, a population A with a b0 ¼ 0:35, NPPmax ¼ 0:84, Kmax ¼ 40:32, and pm ¼ 0:07, can arrive in the Iberian Peninsula at about 44ka, similar to a population B with the following different parameter settings: b0 ¼ 0:32, NPPmax ¼ 1:17, Kmax ¼ 49:48, and pm ¼ 0:19. Such parameter differences translate to the following biological differences between the two populations: population A has on average 9.45 infants per female life time, whereas population B has 8.64 infants. Each individual in A moves on average 2.1 km per year, but individuals in B have substantially higher average mobility (5.7 km per year). Population densities are 52.4 and 64.3 individuals per 1 000 km2 for populations A and B, respectively. Between-population differences in NPPmax and Kmax indicate that population B is more efficient in exploiting high-NPP environments than population A.

3. Discussion There has been increasing awareness on the importance of thoroughly documenting, validating and testing modeling frameworks, and our study follows these incentives (Grimm et al., 2006; Stillman et al., 2015; Schulze et al., 2017). Our approach is novel in several aspects. It uses a global large-scale ABM with different levels of model complexity, rather than a stepping-stone model (Kimura, 1953). Using ABMs in addition to mathematical models (e.g. the commonly used reaction-diffusion models (Young and ndez, 1999; Davison et al., 2006; Bettinger, 1995; Fort and Me Ackland et al., 2007; Patterson et al., 2010; Timmermann and Friedrich, 2016) e see (Tkachenko et al., 2017) for a comparison of our ABM model with theoretical predictions) has several advantages: they can model emergent properties of a system, which result from interactions among individuals and are inherently unpredictable (Mitchell, 2009); they can model discrete phenomena; they can incorporate stochasticity; and they can have a large number of heterogeneous individuals interacting with one another in a heterogeneous environment (Macal and North, 2005; Bazghandi, 2012). As a result, even seemingly simple systems can show complex properties, such as the complex ramifications of parameter interactions to determine population survivals and extinctions in the uniform earth scenario (Fig. 2). Our ABM model allows testing various hypotheses about the causes underlying human dispersals. For example, a higher NPP can imply better land use and hunting strategies/tools. NPP values can also be used to model prey species distribution. Increased movement probability can represent a higher adaptive capacity for home displacement (Rolland, 2010). NPPwater can stand for how well humans could have taken advantage of fresh water resources, and how much regional vegetation improves in presence of fresh water (Parton et al., 2015). Other model parameters such as interbirth

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

interval, minimum and maximum age of pregnancy, population reproduction/death rate, and how well individuals can survive in high altitudes can be used to test more hypotheses, e.g. about the influence of reproductive behavior and physiology on dispersals. The next novelty of our study design is the use of machine learning to analyze the simulations' sensitivity to input parameters, and to better understand the complex outputs of the ABM. Using machine learning to analyze the simulations has several advantages. First, it helps reduce computational costs of testing different parameter value combinations, which lifts practical limitations on parameter space exploration. Second, it can limit the required number of simulation replicates. Agent-based simulations are stochastic, and often need large numbers of replicates before one can conclude a pattern from their outputs. However, if a model behaves predictably when we change its parameter values, we can conclude that adding more replicates would not considerably change the results, and the single instances of simulations with varying parameter values are enough. Using cross-validation helps this aim by ensuring that we are not over-interpreting (over-fitting) the model outputs and that the patterns can be generalized to new data. Third, it can reveal when output variability can be explained by the model parameters, and when other sources such as landscape features and climate change are affecting the results. Hence, it shows parameter importances in historical perspective, i.e., the relative importance of factors affecting dispersals is not just a global static property of the system, but a function of space and time. This allowed us to determine that arrival times in the most complex scenarios are results of stochastic events that cannot be predicted from model parameters (“historical contingency”). Fourth, being able to compare large combinations of parameter values, it shows that no single “optimum” combination of parameter values best fits given arrival times. In archaeology this is known as the “equifinality problem” (Premo, 2010): different combinations of causes/processes result in similar outcomes/patterns. Sensitivity analysis of the simulation outputs showed various amounts of parameter interactions. Interactions account for survival variation more in the simpler scenarios (interactions account for 18% and 19% of the variation respectively for the uniform earth and uniform continents scenarios) than the complex ones (10% and 5% respectively for the constant NPP and dynamic NPP scenarios Fig. 3). However, in the simple scenarios, interactions have small effects on arrival times (4% - Fig. 4a and b). In the two complex scenarios, depending on the region, interactions account for a smaller or larger variation in arrival times. Interactions play small roles in arrival times in the Horn of Africa from southern Africa and in the Southeast Asia from the middle East in the constant NPP scenario. Similarly, interactions have small effects on arrivals in the Horn of Africa and Patagonia in the dynamic NPP scenario. We can deduce that the paths to these regions are more or less smooth, similar to the simpler scenarios, and thus interactions account for small variations in arrival times. The path to Patagonia in the dynamic NPP model gets blocked by climatic conditions, and only the final sections of the path in the Americas matters for parameter effects. Assessing the predictability of simulation outputs showed that climate and geographical features are the most important determinants of arrival times, more than individual/population characteristics, thus largely confirming the insights gained in a previous study (Timmermann and Friedrich, 2016). In a heterogeneous environment (constant NPP), a population's crossing of corridors, such as the Sahara, depends on stochastic behavior of individuals, and less on the population parameters. Therefore, arrival times in such simulations are not predictable. Adding a fluctuating climate has the unexpected effect of making dispersals more predictable. This occurs because climate fluctuations

7

completely block important corridors. Thus, all populations, regardless of their properties, get stuck behind these corridors. After the climate opens a corridor, different populations have a fresh and equal start for dispersing. This makes arrival times more a function of climate changes, but also predictable. Inferring human dispersal scenarios from empirical data remains a matter of debate, suggesting single or multiple dispersal events in ranges between 200 ka to 50 ka (Reyes-Centeno et al., 2015; Hershkovitz et al., 2018; Masoj c et al., 2019). Genetic data hint to dispersals from Africa between 130 and 45 kya (Scally and Durbin, 2012; Fernandes et al., 2012; Fu et al., 2013; Groucutt et al., 2018), some archaeological data suggest 125 kya (Armitage et al., 2011), whereas others suggest the first dispersals occurred only as early as 60e50 kya (Mellars et al., 2013). Similarly, first dispersal routes are still debated (Bailey and King, 2011; Petraglia et al., 2012). The fossil record and genetic inferences both have limitations. The fossil record has sampling bias and cannot determine whether a specimen is the first evidence of a lineage or whether the lineage has gone extinct. Genetic inference models, on the other hand, have assumptions that are not necessarily realistic, and relaxing them can lead to different results (Chikhi et al., 2010). The fossil record suggests that modern humans were in the Levant and the Arabian Peninsula 130e80 ka (Grun and Stringer, 1991; Grun et al., 2005; Rose et al., 2011; Reyes-Centeno et al., 2015; Groucutt et al., 2018; Harvati et al., 2019), and they have likely reoccupied the Levant at 55 ka (Hershkovitz et al., 2015). The fossil record in South and Central Asia is sparse, but combined paleoanthropological, archaeological and genetic evidence speaks for multiple-dispersals starting at 100 k or even earlier (Bae et al., 2017). First occupation of Western Europe and Siberia was at about 45 ka (Fu et al., 2014; Benazzi et al., 2011). No single optimum parameter setting matches these empirical data. This is for three reasons. First, there is a strong dependence of parameters on one another (parameter interactions) in determining the fate of a populations (Figs. 3 and 4). Second, many parameter settings, even when they are very different, as much as 92% of the maximum difference, can lead to similar outcomes (equifinality e Fig. S6). Third, even similar parameter settings can result in largely different outcomes because of the immense stochasticity (historical contingency). This urges any model that aims to suggest sensible parameter values about human arrival times to test a range of values. Reporting a single optimum parameter value combination that results in a model behavior similar to empirical data can be misleading. We note again that the aim of our model was not to fit empirical estimates. It would be possible to choose model parameters such that any desired empirical estimates are replicated. However, as the results show, huge parts of parameter space can result in similar arrival times and similar parameter settings can lead to diversed arrival times. Further empirical evidence about human dispersals and population dynamics can make the potential parameter space smaller, thus reduce the uncertainty in computational models. For example, more certainty about the first successful dispersal wave out of Africa is an important information limiting the parameter space. In conclusion, we have validated and analyzed a complex computational model of human dispersals. We showed that in simple landscapes, arrival times are a function of the model parameters and there are small stochasticities in model behavior. However, even in the simple landscape settings, parameter interactions are major determinants of survivals. Once we add geographical features and climate fluctuations, stochasticity increases so much that model behavior becomes poorly predictable. This limit in predictability suggests that historical contingency played an important role in the human dispersals from Africa. Even

8

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

when a model behaves predictably, contrasting parameter settings can result in similar arrival times, which shows the importance of model validation before drawing conclusions about key parameter values. The framework presented here can further be used to evaluate alternative hypotheses about human dispersals by exploring more model parameters that we have kept constant in this study. It can also help answer if and why modern humans outcompeted Neanderthals by modeling two species with various parameter value combinations and letting them compete for resources over time and space. 4. Methods 4.1. Calculating birth and death probabilities We determined birth pb and death pd probabilities for a single individual as pb ¼ b0 , and pd ¼ d0 þ ðb0  d0 Þ  N= K. Here, N is population size, K is carrying capacity, b0 and d0 are the birth and death probabilities for population size zero. 4.2. Determining a region's carrying capacity A ramp function is used to determine a cell's carrying capacity: if NPP < NPPmax , K ¼ NPP  Kmax =NPPmax ; otherwise K ¼ Kmax . See Table S2 for description of the parameters. 4.3. Sobol sensitivity analysis We analyzed the sensitivity of machine learning model outputs to variability in model inputs using the Sobol sensitivity method (Sobol, 2001; Saltelli et al., 2010). Briefly, sensitivity analysis measures the uncertainty of model outputs that can be accounted for by changes in model inputs. We used the Sobol algorithm implemented in the Python package SALib (Herman and Usher, 2017). We followed the following procedure for performing sensitivity analysis: first, we specified the lower and upper bounds of each parameter which were the same as the minimum and maximum values used for each parameter for the simulations. Next, we produced sample parameter values using the saltelli. sample function in the SALib package which produces a Sobol sequence of parameter value combinations. We used the following parameter values for all sensitivity analysis: N ¼ 3\thinspace000, where N is number of generated samples, and calc second order ¼ True, to calculate higher-order interactions. N ¼ 3\thinspace000 and calc second order ¼ True result in to 24 000, 24 000, 30 000, and 30 000 parameter settings for the uniform earth, uniform continents, constant NPP, and dynamic NPP scenarios, respectively, whose behavior were predicted with machine learning surrogate-models. Note that the simulation model has more parameters (Tables S1 and S2) than the ones we test here. We kept those parameters constant because running the model is computationally too expensive to test the effect of all parameters (see the model's manual at github. com/uzh/QHG for a list of all its parameters). 4.4. Machine learning models We constructed all the machine learning models using Python's Scikit-learn library (Pedregosa et al., 2011). For all the classification and regression estimations, we used the following algorithms: Knearest neighbors (KNN), Logistic and Ridge regression, random forests, and gradient boosting regression trees. KNN, one of the simplest machine learning algorithms, is a distance-based algorithm: it predicts the outcome new data based on their distance from the training samples. KNN is an easy to

understand and fast algorithm. If a problem is a classification one, KNN uses a majority voting system to decide the category of the new data. If it is a regression problem, it takes the mean of the training sample near the new data. Logistic regression is a classification algorithm. It uses a sigmoid function to assign probabilities to each new data point and categorize them accordingly. Ridge regression is similar to a simple linear regression, except coefficients of the model are minimized to be as close to zero as possible. This regularization helps adjust the importance of different features and avoid over-fitting. Random forests and gradient boosting algorithms are ensemble methods of decision trees. Decision trees often over-fit the data, i.e. have high variance. Ensembles of tree help use the power of many decision trees and reduce variance. A random forest is a collection of decision trees where each tree has small random differences from other ones. Each tree makes good predictions only on a fraction of training data. A random forest then predicts a new data point by asking all the trees about it. It uses the majority vote in case of classification and the mean in case of regression to make a prediction. Gradient boosting also uses many decision trees to make predictions. But it takes a different approach in building those trees. Each new tree, instead of being random, tries to minimize the error of other trees built before it. See the supplementary methods for the range of parameters of machine learning models we tested, and the details of training the models. The complete list of simulation parameters and outputs are in the electronic supplementary tables. 4.4.1. Predicting the behavior of the uniform earth simulations The model consists of a sphere with no geographical features (all cells on the sphere are habitable and have equal NPP). Simulations start with a populations of 140 individuals in a region consisting of seven neighboring cells. Since populations disperse homogeneously in all directions, we used a single sampling region equivalent to Horn of Africa with a great-circle distance of 100 km to determine the time at which populations arrive at this region (Table S7). 4.4.2. Predicting the behavior of the uniform-continents simulations This model is one level more complex than the previous one, where individuals are limited to areas in the shape of continents. This limitation isolates the effect of the shape of landmasses on human dispersal across continents. Populations still disperse homogeneously in all directions, except where blocked by water. 4.4.3. Predicting the behavior of the constant NPP simulations In this more complex model, there are realistic geographical features, such as mountains and rivers, but they are fixed in time (85 ka, Fig. 5). The geographical features prevent populations from dispersing with equal probabilities in all directions and favors certain dispersal paths. We record arrival times to the same 13 regions as in the uniform continents scenario (Fig. S3 and Table S7). 4.4.4. Predicting the behavior of the dynamic NPP simulations Our most realistic simulations include realistic geographical features and a changing climate. Climate changes every 1 000 years, and affects land NPP (Timmermann and Friedrich, 2016) values, ice sheets (Ganopolski et al., 2010) and sea level (Waelbroeck et al., 2002). Populations start in Southern Africa at 85 ka. The changes in topography/NPP create and remove obstacles for population dispersals. This allows us to evaluate climate's effect on population dispersals. Varying simulation parameters are the same as the constant NPP scenario.

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

CRediT authorship contribution statement Ali R. Vahdati: Conceptualization, Methodology, Investigation, Validation, Formal analysis, Writing - original draft, Writing - review & editing. John David Weissmann: Software, Methodology, Investigation, Writing - review & editing, Data curation. Axel Timmermann: Conceptualization, Resources, Writing - review &  n: Writing - review & editing, Data curation. Marcia S. Ponce de Leo editing. Christoph P.E. Zollikofer: Conceptualization, Project administration, Supervision, Funding acquisition, Writing - review & editing. Acknowledgements This research was supported by Swiss Platform for Advanced Scientific Computing (PASC), grant F-74105-03-01. A.T. was supported through the Institute for Basic Science (project code IBSR028-D1). ARV thanks Hamed Khakzad for valuable discussions about machine learning models. In memory of George Lake, who co-initiated the PASC project. Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi.org/10.1016/j.quascirev.2019.105867. References Abbate, E., Sagri, M., 2012. Early to Middle Pleistocene Homo dispersals from Africa to Eurasia: geological, climatic and environmental constraints. Quat. Int. 267, 3e19. Ackland, G.J., Signitzer, M., Stratford, K., Cohen, M.H., 2007. Cultural hitchhiking on the wave of advance of beneficial technologies. Proc. Natl. Acad. Sci. U. S. A. 104, 8714e8719. Amante, C., Eakins, B.W., 2018. Arc-Minute Global Relief Model: Procedures, Data Sources and Analysis. Armitage, S.J., Jasim, S.A., Marks, A.E., Parker, A.G., Usik, V.I., Uerpmann, H.P., 2011. The Southern route ”Out of Africa”: evidence for an early expansion of modern humans into Arabia. Science 331, 453e456. Bae, C.J., Douka, K., Petraglia, M.D., 2017. On the origin of modern humans: Asian perspectives. Science 358, eaai9067. Bailey, G.N., King, G.C.P., 2011. Dynamic landscapes and human dispersal patterns: tectonics, coastlines, and the reconstruction of human habitats. Quat. Sci. Rev. 30, 1533e1553. Bazghandi, A., 2012. Techniques, advantages and problems of agent based modeling for traffic simulation. IJCSI Int. J. Comput. Sci. 9, 115e119. Benazzi, S., Douka, K., Fornai, C., Bauer, C.C., Kullmer, O., Svoboda, J., Pap, I., Mallegni, F., Bayle, P., Coquerelle, M., Condemi, S., Ronchitelli, A., Harvati, K., Weber, G.W., 2011. Early dispersal of modern humans in Europe and implications for Neanderthal behaviour. Nature 479, 525eU249. lez, A., Cox, D.R., 2007. Interpretation of interaction: a review. Berrington de Gonza Ann. Appl. Stat. 1, 371e385. Callegari, S., Weissmann, J.D., Tkachenko, N., Petersen, W.P., Lake, G., Ponce de n, M.S., Zollikofer, C.P.E., 2013. An agent-based model of human dispersals at Leo a global scale. Adv. Complex Syst. 16, 1350023. Chikhi, L., Sousa, V.C., Luisi, P., Goossens, B., Beaumont, M.A., 2010. The confounding effects of population structure, genetic diversity and the sampling scheme on the detection and quantification of population size changes. Genetics 186, 983e995. Clarkson, C., Jacobs, Z., Marwick, B., Fullagar, R., Wallis, L., Smith, M., Roberts, R.G., Hayes, E., Lowe, K., Carah, X., Florin, S.A., McNeil, J., Cox, D., Arnold, L.J., Hua, Q., Huntley, J., Brand, H.E.A., Manne, T., Fairbairn, A., Shulmeister, J., Lyle, L., Salinas, M., Page, M., Connell, K., Park, G., Norman, K., Murphy, T., Pardoe, C., 2017. Human occupation of northern Australia by 65,000 years ago. Nature 547, 306e310. s-S nez-Espejo, F.J., Simo n-Vallejo, M.D., Stringer, C., Lozano Corte anchez, M., Jime ez, J.L., Odriozola, C.P., RiquelmeFrancisco, M.C., García-Alix, A., Vera Pela ldez, R., Maestro Gonza lez, A., Ohkouchi, N., MoralesCantal, J.A., Parrilla Gira ~ iz, A., 2019. An early Aurignacian arrival in southwestern Europe. Nat. Ecol. Mun Evolut. 3, 207e212. Dannemann, M., Racimo, F., 2018. Something old, something borrowed: admixture and adaptation in human evolution. Curr. Opin. Genet. Dev. 53, 1e8. Davison, K., Dolukhanov, P., Sarson, G.R., Shukurov, A., 2006. The role of waterways in the spread of the Neolithic. J. Archaeol. Sci. 33, 641e652. ~ a, P., 2019. Dating on its own cannot resolve hominin occupation patterns. de la Pen Nat. Ecol. Evolut. 3, 712e712.

9

Dennell, R., 2003. Dispersal and colonisation, long and short chronologies: how continuous is the Early Pleistocene record for hominids outside East Africa? J. Hum. Evol. 45, 421e440. Eriksson, A., Betti, L., Friend, A.D., Lycett, S.J., Singarayer, J.S., von CramonTaubadel, N., Valdes, P.J., Balloux, F., Manica, A., 2012. Late Pleistocene climate change and the global expansion of anatomically modern humans. Proc. Natl. Acad. Sci. U. S. A. 109, 16089e16094. Fernandes, V., Alshamali, F., Alves, M., Costa, M.D., Pereira, J.B., Silva, N.M., Cherni, L., Harich, N., Cerny, V., Soares, P., Richards, M.B., Pereira, L., 2012. The Arabian cradle: mitochondrial relicts of the first steps along the southern route out of Africa. Am. J. Hum. Genet. 90, 347e355. ndez, V., 1999. Time-delayed theory of the Neolithic transition in Europe. Fort, J., Me Phys. Rev. Lett. 82, 867e870. Fu, Q., Mittnik, A., Johnson, P.L.F., Bos, K., Lari, M., Bollongino, R., Sun, C., Giemsch, L., Schmitz, R., Burger, J., Ronchitelli, A.M., Martini, F., Cremonesi, R.G., Svoboda, J., €bo, S., Krause, J., 2013. Bauer, P., Caramelli, D., Castellano, S., Reich, D., P€ aa A revised timescale for human evolution based on ancient mitochondrial genomes. Curr. Biol. 23, 553e559. Fu, Q., Li, H., Moorjani, P., Jay, F., Slepchenko, S.M., Bondarev, A.A., Johnson, P.L.F., Filippo, C.D., Meyer, M., Zwyns, N., Salazar-garcı, D.C., Aximu-Petri, A., Pru, K., Kuzmin, Y.V., Keates, S.G., Kosintsev, P.A., Razhev, D.I., Richards, M.P., Peristov, N.V., Pruefer, K., de Filippo, C., Meyer, M., Zwyns, N., SalazarGarcia, D.C., Kuzmin, Y.V., Keates, S.G., Kosintsev, P.A., Razhev, D.I., Richards, M.P., Peristov, N.V., Lachmann, M., Douka, K., Higham, T.F.G., Slatkin, M., Hublin, J.J., Reich, D., Kelso, J., Viola, T.B., Paeaebo, S., 2014. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445þ. Galway-Witham, J., Stringer, C., 2018. How did Homo sapiens evolve? Science 360, 1296e1298. Ganopolski, A., Calov, R., Claussen, M., 2010. Simulation of the last glacial cycle with a coupled climate ice-sheet model of intermediate complexity. Clim. Past Discuss. 5, 2269e2309. Grimm, V., Berger, U., Bastiansen, F., Eliassen, S., Ginot, V., Giske, J., Goss-Custard, J., Grand, T., Heinz, S.K., Huse, G., Others, 2006. A standard protocol for describing individual-based and agent-based models. Ecol. Model. 198, 115e126. Groucutt, H.S., Petraglia, M.D., Bailey, G., Scerri, E.M.L., Parton, A., Clark-Balzan, L., Jennings, R.P., Lewis, L., Blinkhorn, J., Drake, N.A., Breeze, P.S., Inglis, R.H., s, M.H., Meredith-Williams, M., Boivin, N., Thomas, M.G., Scally, A., 2015. Deve Rethinking the dispersal of Homo sapiens out of Africa. Evol. Anthropol. Issues News Rev. 24, 149e164. Groucutt, H.S., Grün, R., Zalmout, I.A.S., Drake, N.A., Armitage, S.J., Candy, I., ClarkWilson, R., Louys, J., Breeze, P.S., Duval, M., Buck, L.T., Kivell, T.L., Pomeroy, E., Stephens, N.B., Stock, J.T., Stewart, M., Price, G.J., Kinsley, L., Sung, W.W., Alsharekh, A., Al-Omari, A., Zahir, M., Memesh, A.M., Abdulshakoor, A.J., AlMasari, A.M., Bahameem, A.A., Al Murayyi, K.M.S., Zahrani, B., Scerri, E.L.M., Petraglia, M.D., 2018. Homo sapiens in Arabia by 85,000 years ago. Nat. Ecol. Evolut. 2, 800e809. Grun, R., Stringer, C.B., 1991. Electron-spin-resonance dating and the evolution of modern humans. Archaeometry 33, 153e199. Grun, R., Stringer, C., McDermott, F., Nathan, R., Porat, N., Robertson, S., Taylor, L., Mortimer, G., Eggins, S., McCulloch, M., 2005. U-series and ESR analyses of bones and teeth relating to the human burials from Skhul. J. Hum. Evol. 49, 316e334. €ding, C., Bosman, A.M., Karakostis, F.A., Grün, R., Stringer, C., et al., Harvati, K., Ro 2019 Jul 10. Apidima cave fossils provide earliest evidence of homo sapiens in eurasia. Nature 571 (7766), 500e504. Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning : Data Mining, Inference, and Prediction, 2 edition. Springer, New York, NY. Herman, J., Usher, W., 2017. SALib: an open-source python library for sensitivity analysis. J. Open Source Softw. 2, 9e10. Hershkovitz, I., Marder, O., Ayalon, A., Bar-Matthews, M., Yasur, G., Boaretto, E., Caracuta, V., Alex, B., Frumkin, A., Goder-Goldberger, M., Gunz, P., Holloway, R.L., Latimer, B., Lavi, R., Matthews, A., Slon, V., Mayer, D.B.Y., Berna, F., Bar-Oz, G., Yeshurun, R., May, H., Hans, M.G., Weber, G.W., Barzilai, O., 2015. Levantine cranium from Manot Cave (Israel) foreshadows the first European modern humans. Nature 520, 216eU178. Hershkovitz, I., Weber, G.W., Quam, R., Duval, M., Grün, R., Kinsley, L., Ayalon, A., n-Torres, M., Bar-Matthews, M., Valladas, H., Mercier, N., Arsuaga, J.L., Martino s, L., Sarig, R., May, H., Bermúdez de Castro, J.M., Fornai, C., Martín-France Krenn, V.A., Slon, V., Rodríguez, L., García, R., Lorenzo, C., Carretero, J.M., Frumkin, A., Shahack-Gross, R., Bar-Yosef Mayer, D.E., Cui, Y., Wu, X., Peled, N., Groman-Yaroslavski, I., Weissbrod, L., Yeshurun, R., Tsatskin, A., Zaidner, Y., Weinstein-Evron, M., 2018. The earliest modern humans outside Africa. Science 359, 456e459. €lzchen, E., Hertler, C., Timm, I., Lorig, F., 2016. Evaluation of Out of Africa hyHo potheses by means of agent-based modeling. Quat. Int. 413, 78e90. Jo, C., Cho, Y., Egger, B., 2017. A machine learning approach to live migration modeling. In: Proceedings of the 2017 Symposium on Cloud Computing SoCC'17. ACM Press, New York, New York, USA, pp. 351e364. Kimura, M., 1953. “Stepping Stone” model of population. Annu. Rep. Natl. Inst. Genet. 3, 62e63. n-Torres, M., Cai, Y.j., Xing, S., Tong, H.w., Pei, S.w., Sier, M.J., Liu, W., Martino Wu, X.h., Edwards, R.L., Cheng, H., Li, Y.y., Yang, X.x., de Castro, J.M.B., Wu, X.j., 2015. The earliest unequivocally modern humans in southern China. Nature 526, 696e699.

10

A. R. Vahdati et al. / Quaternary Science Reviews 221 (2019) 105867

Macal, C.M., North, M.J., 2005. Tutorial on agent-based modeling and simulation. In: Proceedings of the 2005 Winter Simulation Conference, pp. 2e15. Made with Natural Earth. Free Vector and Raster Map Data @ Naturalearthdata.com, 2018. , S., 2006. Modern computational approaches for analysing Marjoram, P., Tavare molecular genetic variation data. Nat. Rev. Genet. 7, 759e770. Masoj c, M., Nassr, A., Kim, J.Y., Krupa-Kurzynowska, J., Sohn, Y.K., Szmit, M., Kim, J.C., Kim, J.S., Choi, H.W., Wieczorek, M., Timmermann, A., 2019. Saharan green corridors and middle Pleistocene hominin dispersals across the Eastern Desert, Sudan. J. Hum. Evol. 130, 141e150. Mellars, P., Gori, K.C., Carr, M., Soares, P.A., Richards, M.B., 2013. Genetic and archaeological perspectives on the initial modern human colonization of southern Asia. Proc. Natl. Acad. Sci. 110, 10699e10704. Mitchell, M., 2009. Complexity: A Guided Tour, 1 edition. Oxford University Press. arXiv:1011.1669v3. Nielsen, R., Akey, J.M., Jakobsson, M., Pritchard, J.K., Tishkoff, S., Willerslev, E., 2017. Tracing the peopling of the world through genomics. Nature 541, 302e310. O'Connell, J.F., Allen, J., Williams, M.A.J., Williams, A.N., Turney, C.S.M., Spooner, N.A., Kamminga, J., Brown, G., Cooper, A., 2018. When did Homo sapiens first reach Southeast Asia and Sahul? Proc. Natl. Acad. Sci. 115, 8482e8490. Palombo, M.R., 2013. What about causal mechanisms promoting early hominin dispersal in Eurasia? A research agenda for answering a hotly debated question. Quat. Int. 295, 13e27. Parton, A., Farrant, A.R., Leng, M.J., Telfer, M.W., Groucutt, H.S., Petraglia, M.D., Parker, A.G., 2015. Alluvial fan records from southeast Arabia reveal multiple windows for human dispersal. Geology 43, 295e298. Patterson, M.A., Sarson, G.R., Sarson, H.C., Shukurov, A., 2010. Modelling the Neolithic transition in a heterogeneous environment. J. Archaeol. Sci. 37, 2929e2937. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825e2830. Petraglia, M.D., Ditchfield, P., Jones, S., Korisettar, R., Pal, J., 2012. The Toba volcanic super-eruption, environmental change, and hominin occupation history in India over the last 140,000 years. Quat. Int. 258, 119e134. Premo, L.S., 2010. Equifinality and Explanation: the Role of Agent-Based Modeling in Postpositivist Archaeology. Simulating Change: Archaeology into the Twenty-First Century. University of Utah Press, Salt Lake City, pp. 28e37. Rabett, R.J., 2018. The success of failed Homo sapiens dispersals out of Africa and into Asia. Nat. Ecol. Evolut. 2, 212e219. Reyes-Centeno, H., Hubbe, M., Hanihara, T., Stringer, C., Harvati, K., 2015. Testing modern human out-of-africa dispersal models and implications for modern human origins. J. Hum. Evol. 87, 95e106. Rolland, N., 2010. The earliest hominid dispersals beyond Subsaharan Africa: a survey of underlying causes. Quat. Int. 223e224, 54e64. Rose, J.I., Usik, V.I., Marks, A.E., Hilbert, Y.H., Galletti, C.S., Parton, A., Geiling, J.M., Cerny, V., Morley, M.W., Roberts, R.G., 2011. The Nubian complex of Dhofar, Oman: an african middle stone age industry in southern Arabia. PLoS One 6. Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., Tarantola, S., 2010. Variance based sensitivity analysis of model output. Design and estimator for

the total sensitivity index. Comput. Phys. Commun. 181, 259e270. Scally, A., Durbin, R., 2012. Revising the human mutation rate: implications for understanding human evolution. Nat. Rev. Genet. 13, 745e753. Scerri, E.M.L., Thomas, M.G., Manica, A., Gunz, P., Stock, J.T., Stringer, C., Grove, M., Groucutt, H.S., Timmermann, A., Rightmire, G.P., D'Errico, F., Tryon, C.A., Drake, N.A., Brooks, A.S., Dennell, R.W., Durbin, R., Henn, B.M., Lee-Thorp, J., DeMenocal, P., Petraglia, M.D., Thompson, J.C., Scally, A., Chikhi, L., 2018. Did our species evolve in subdivided populations across Africa, and why does it matter? Trends Ecol. Evol. 33, 582e594. Schulze, J., Müller, B., Groeneveld, J., Grimm, V., 2017. Agent-based modelling of social-ecological systems: achievements, challenges, and a way forward. J. Artif. Soc. Soc. Simul. 20. Sobol, I.M., 2001. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simulat. 55, 271e280. Stewart, J.R., Stringer, C.B., 2012. Human evolution out of Africa: the role of refugia and climate change. Science 335, 1317e1321. Stillman, R.A., Railsback, S.F., Giske, J., Berger, U., Grimm, V., 2015. Making predictions in a changing world: the benefits of individual-based ecology. Bioscience 65, 140e150. Tattersall, I., 2009. Human origins: out of africa. Proc. Natl. Acad. Sci. U. S. A. 106, 16018e16021. Timmermann, A., Friedrich, T., 2016. Late Pleistocene climate drivers of early human migration. Nature 538, 92e95. Tkachenko, N., Weissmann, J.D., Petersen, W.P., Lake, G., Zollikofer, C.P.E., Callegari, S., 2017. Individual-based modelling of population growth and diffusion in discrete time. PLoS One 12, e0176101. von Cramon-Taubadel, N., 2014. Evolutionary insights into global patterns of human cranial diversity: population history, climatic and dietary effects. J. Anthropol. Sci. 92, 43e77. Waelbroeck, C., Labeyrie, L., Michel, E., Duplessy, J.C., McManus, J.F., Lambeck, K., Balbon, E., Labracherie, M., 2002. Sea-level and deep water temperature changes derived from benthic foraminifera isotopic records. Quat. Sci. Rev. 21, 295e305. Westaway, M.C., 2019. The first hominin fleet. Nat. Ecol. Evolut. 3, 999e1000. Westaway, K.E., Louys, J., Awe, R.D., Morwood, M.J., Price, G.J., Zhao, J.x., Aubert, M., Joannes-Boyau, R., Smith, T.M., Skinner, M.M., Compton, T., Bailey, R.M., van den Bergh, G.D., de Vos, J., Pike, A.W.G., Stringer, C., Saptomo, E.W., Rizal, Y., Zaim, J., Santoso, W.D., Trihascaryo, A., Kinsley, L., Sulistyanto, B., 2017. An early modern human presence in Sumatra 73,000e63,000 years ago. Nature 548, 322e325. s, M.H., King, G.C.P., Bailey, G.N., Inglis, R.H., MeredithWinder, I.C., Deve Williams, M., 2015. Evolution and dispersal of the genus Homo : a landscape approach. J. Hum. Evol. 87, 48e65. Wolf, A.B., Akey, J.M., 2018. Outstanding questions in the study of archaic hominin admixture. PLoS Genet. 14, e1007349. Yang, M.A., Fu, Q., 2018. Insights into modern human prehistory using ancient genomes. Trends Genet. 34, 184e196. Young, D.A., Bettinger, R.L., 1995. Simulating the global human expansion in the Late Pleistocene. J. Archaeol. Sci. 22, 89e92. Zahid, H.J., Robinson, E., Kelly, R.L., 2016. Agriculture, population growth, and statistical analysis of the radiocarbon record. Proc. Natl. Acad. Sci. 113, 931e935.