Statistical Methodology 9 (2012) 55–62
Contents lists available at SciVerse ScienceDirect
Statistical Methodology journal homepage: www.elsevier.com/locate/stamet
Parameter estimation from a model grid application to the Gaia RVS spectra A. Bijaoui ∗ , A. Recio-Blanco, P. de Laverny, C. Ordenovic University of Nice Sophia Antipolis, UMR CNRS 6202, OCA, B.P. 4229, 06304, Nice Cedex 04, France
article
info
Keywords: Classification Parameter estimation Model grid Decision trees Astrophysical data analysis Astrophysical data mining
abstract In the framework of the ESA Gaia mission, stellar atmospheric parameters will be extracted for millions of spectra that will be observed by Gaia RVS (Wilkinson et al. 2005) [21]. Due to this high number of observed spectra it is necessary that the analysis be carried out using fast and robust automated algorithms. In this paper, we analyze the efficiency of a selection of fitting algorithms in obtaining stellar parameters for a sample of spectra. Several of these algorithms are based on the use of a decision tree, either oblique, kd or decorated. The tests are carried out using the same model grid in the same software environment. Different performance indices associated with our scientific goal are examined. The application of the Gauss–Newton algorithm initialized using a decision tree algorithm appeared to best satisfy the performance criteria. © 2011 Elsevier B.V. All rights reserved.
1. The astrophysical purpose The ESA Gaia mission is forecast for launch in 2013. Its instruments will determine to very high accuracy the position and proper motion of a billion stars in the Milky Way Galaxy. The Radial Velocity Spectrograph (RVS) [21], a spectrograph primarily dedicated to obtaining stellar radial velocities, is one of three instruments in the Gaia complement. The millions of spectra that will be collected with Gaia RVS will also be analyzed to determine their stellar atmospheric parameters, in particular the effective temperature, surface gravity, mean metal content and some chemical abundances. The goal of the present work is to determine the model S ≡ S (l, 2) which best fits the observational vector O ≡ {O(l)} (l ∈ (1, L)). 2 is the parameter vector, with components θi , i ∈ (1, I ). Since
∗
Corresponding author. Tel.: +33 492 003 027; fax: +33 492 003 121. E-mail addresses:
[email protected],
[email protected] (A. Bijaoui),
[email protected] (A. Recio-Blanco),
[email protected] (P. de Laverny). 1572-3127/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.stamet.2011.07.004
56
A. Bijaoui et al. / Statistical Methodology 9 (2012) 55–62
Fig. 1. A sample from the grid of theoretical spectra. The bottom spectrum corresponds to a theoretical spectrum close to the solar one and the other spectra to those obtained by the variation of one of the physical parameters.
the spectral model computation requires a large amount of CPU time and memory, it is better to prepare in advance a grid of models covering a selected range of parameters. For our application, the grid is comprised of a set of synthetic spectra computed using MARCS stellar atmospheric models [9] that were sampled in three parameters: the effective temperature Teff , the logarithmic surface gravity log g and the mean metallicity [M /H ]. The spectra were generated using the same wavelength sampling and the same line spread function as those that will be provided by Gaia RVS. A sample of these spectra are shown in Fig. 1. Due to the high number of spectra that will be observed the analysis of the Gaia RVS spectra must be carried out using fast and robust automated algorithms. In this paper, we analyze how efficiently a selection of fitting algorithms derive stellar parameters for a sample of spectra. In particular, different criteria (performance indices) associated with our scientific goal will be examined. 2. The methodology The processed material. The tests were performed using a grid of N = 1386 model spectra each with 971 spectral elements. The three grid parameters were regularly sampled, Teff from 4500 to 7750 K in 250 K steps, log g from 1 to 5 in 0.5 steps and [M /H ] from −2 to 1 in 0.25 steps. To create the sample of test spectra, for each parameter set corresponding to a grid point, a random value was added to each parameter, its absolute value being smaller than half of the sampling step. Simulated observed spectra, free of noise, were then computed for each of these parameter sets by a multi-linear interpolation on the model grid. Thus, the algorithms were tested outside the grid points, but the whole parameter space was scanned. In our application, the conditional PDF p(O|S) is assumed to be Gaussian. Thus, the classical method of least squares will be applied. The efficiency of each algorithm strongly depends on the signal to noise ratios (SNR). All the tests were carried out using the same four SNR values: ∞, 1000, 100 and 10. Here, the SNR is defined as the ratio between the continuum amplitude and the standard deviation of the noise in the spectra. A large majority of the spectra to be observed by Gaia RVS are expected to have a SNR of 10, or less.
A. Bijaoui et al. / Statistical Methodology 9 (2012) 55–62
57
The problem specificities. Our problem is a classical optimization one with the following characteristics:
• The models are not given by a closed-form relation. They are known only for the parameter grid. Specific interpolations are needed. Regular sampling of the parameter space was carried out.
• The flux values of the spectra do not vary linearly with the parameter. • The distance function may have secondary minima. The algorithm is required to provide the global minimum. 3. The tested algorithms The following algorithms were tested under the described conditions. The exhaustive search algorithm (ESA). The first approach consists of computing the distances between the test spectrum and each model in the grid, and then selecting the model that corresponds to the minimum distance. As this method scans the entire set of models it is often used for small model grids. Hence the final estimation of the parameters cannot correspond to a secondary minimum. However, the precision of the final estimation depends on the size of the sampling of the model grid. In order to improve this precision, a correction can be applied after the determination of the model that is the nearest neighbor to the test spectrum. This correction takes into account the local derivatives about the nearest neighbor model grid point and it corresponds to a step of the Gauss–Newton algorithm (see below). However testing carried out here showed that this correction was not sufficient to reduce the biases. It was therefore better to apply directly the Gauss–Newton algorithm. The Nelder–Mead algorithm (NMA). The Nelder–Mead algorithm (NMA) [14,2] is a commonly used optimization algorithm in n-dimensional space. The method is based on the transformation of a simplex according to its vertex values. Interpolations within the grid must be carried out in order to obtain a sufficient precision, taking into account the sampling steps. In our experiments, the number of iterations of the algorithm was on the order of log(N ). Furthermore, there is no guarantee that NMA will converge to the absolute minimum, since the simplex may become trapped in a secondary minimum. Our experiments showed that the choice of the starting simplex was important to whether or not convergence was achieved. In the case of non-convergence the algorithm was stopped after a given number of iteration steps (100). Stochastic methods. Many stochastic methods have been proposed that iteratively obtain the global minimum. For example, the simulated annealing method appears to be well suited to carrying out parameter estimation [8,18]. Genetic algorithms have also been applied to stellar parameter estimation from spectra [1]. These algorithms require a large number of iterations in order to converge on a solution without a guarantee that the convergence will be towards the global minimum. In this work in order to improve the convergence in the case of the Nelder–Mead algorithm, a random value was added to the parameters that were obtained at each iteration. This led to a simple stochastic variant. The mean absolute amplitude of the random value was progressively decreased. This resulted in improved convergence but also the CPU time increased by a significant factor. Consequently, this algorithm appeared to be not well suited for the tested grid. The Gauss–Newton algorithm (GNA). The Gauss–Newton algorithm (GNA) [6] is based on a linearization around a given parameter set 2(0) associated with a model S0 . The corrections δ 2 are obtained with the relation
δ 2 = (JT J)−1 JT (O − S0 ),
(1) (0)
J being the Jacobian matrix [∂ S (l, 2 )/∂θi ]. In the case of large corrections, some iterations need to be carried out through linearization around the new values. Generally, GNA converges but there is no guarantee that the resulting values correspond to the global minimum distance. Bailer-Jones [3] implemented a GNA version on a model grid, the ILIUM algorithm. For each of the test spectra, the model and its derivatives are computed using a cubic spline interpolation. In the present paper, the basic Gauss–Newton algorithm was applied using a linear interpolation on the derivatives. For some test spectra the GNA did not converge and was stopped after 100 iterations.
58
A. Bijaoui et al. / Statistical Methodology 9 (2012) 55–62
The MATISSE algorithm. In GNA it can be shown that the parameter corrections are obtained by a projection of the test spectra onto specific vectors associated with the derivatives of the model spectra with respect to the stellar parameters. Other optimization algorithms have been developed by considering alternative projections vectors. In particular, Tegmark et al. [19] developed an estimation algorithm based on vectors associated with data compression. Heavens et al. [11] showed that the resulting MOPED algorithm was well suited to processing large sets of spectra in shorter timeframes. In these algorithms, the resulting projection vectors consist of combinations of models and are derived with a selection of eigenvectors. In this context, we determined a way in which to derive the projection vectors directly from the model grid in order to use more efficiently the information contained in the grid. This process was implemented in the MATrix Inversion for Spectral SynthEsis (MATISSE) algorithm in application to the automated parametrization of stellar spectra [17]. Decision trees for a fast nearest neighbor search. Taking into account the sampling of the model grid, the optimization can be considered, in the first instance, as the identification of the nearest neighbor. Bentley et al. [4] showed that a good approximation can be achieved using a kd-tree with a fast algorithm. The kd-tree is a basic space-partitioning structure in a k-dimensional space. At each node, the parameter space is divided into two subspaces by a hyperplane. For each node, the median of the spectral flux values corresponding to a selected spectral element is computed from the model spectra in the node subset. This subset is then shared by the hyperplane corresponding to the median. If N is the number of model spectra, the tree has log(N ) levels. The search is carried out by means of a scan through the tree, comparing the test spectrum flux at each spectral element to the corresponding medians. By construction, this procedure requires log(N ) comparisons for each test spectrum. Many other tree-like structures have been introduced in order to improve the search for the nearest neighbor [16]. kd-trees have been applied previously in astrophysical data analysis [15]. Since the spectral length is equal to 971, it is obvious that a simple kd-tree cannot make use of all the information in the models. Instead of selecting a spectral element at each node, we make an oblique decision using a projection vector for each node [20]. The determination of the optimal projection vectors becomes the problem that needs to be solved. We obtained good results using a difference vector (DV). As for a kd-tree the subsets are equally populated taking into account the median of the projection coefficient. Let us call m1 and m2 the means of the spectra models within each of the two respective subsets. DV is defined as the vector m2 –m1 . Necessarily the algorithm is iterative: we start from a DV and then we determine the two new subsets. Next, we compute the mean of each new subset and the resulting DV. If the angle between the new DV and the previous DV is too large, we continue to iterate the process until convergence is achieved. Each node of the tree is decorated with the means and the standard deviations of the parameters of the spectra within its associated subset. The noise in the observed spectra leads to possible misclassifications. At each node, one has to take into account that the projection coefficient is distributed according to a Gaussian law. As the random variable is defined on the whole real axis, in principle both of the branches would be chosen, and the algorithm must then fully explore the decision tree. This would lead to a non-efficient method. Thus we replaced the Gaussian distribution by an Epanechnikov kernel [7], which corresponds to a truncated parabola. Due to the thresholding, generally only one branch is chosen and so, by the end, only a few leaves will be selected. After all the levels of the tree are scanned, a subset of models from the grid is selected and their distances to the test spectrum are computed. The following algorithms each make different use of the information obtained in the exploration of the decision tree to determine the final stellar parameters. The minimum identification from the decision tree (MIDT). After the tree was scanned, the model within the subset corresponding to the minimal distance was selected as the best fit. Its precision is limited by the model grid sampling. The interpolation from the nearest points (INDT). After the determination of the environment corresponding to the nearest points, the Nadaraya–Watson interpolation formula [10] was applied. The interpolation introduces a smoothing that can lead to some biases on the parameter values. It can be shown that performing an inversion of the kernel matrix will reduce the biases [5]. Our experiments did not result in a gain in the parameter determination efficiency. However the CPU time was significantly increased.
A. Bijaoui et al. / Statistical Methodology 9 (2012) 55–62
59
Table 1 The CPU time estimated for each algorithm at each SNR for a single test spectrum. SNR
ESA
NMA
GNA
MIDT
INDT
MTDT
GNDT
∞
17.4 16.5 15.6 15.7
45.3 45.6 44.7 41.6
2.8 2.8 4.7 17.6
1.6 1.6 1.6 1.4
2.7 2.7 2.4 2.4
4.2 4.6 4.6 4.9
6.5 6.5 6.5 6.4
1000 100 10
Table 2 The internal standard deviation of the effective temperature σT . SNR
ESA
NMA
GNA
MIDT
INDT
MTDT
GNDT
∞
0.0 4.8 39.2 228.7
0.0 27.1 81.4 232.0
0.0 4.3 30.5 253.5
0.0 5. 43.4 242.0
0.0 2.9 23.9 151.1
0.0 15.8 79.9 258.6
0.0 5.7 34.2 228.5
1000 100 10
The estimation from the local environment (MTDT). After the minimum localization, the parameters were determined from a projection on specific vectors. Like for the MATISSE algorithm [17], these vectors are optimally determined from a large environment. Here, this operation was carried out iteratively in order to obtain a variation less than 10% of the sampling step. The Gauss–Newton algorithm with the decision tree (GNDT). After the maximum localization, the parameters were here determined iteratively using Gauss–Newton corrections to obtain a variation less than 10% of the sampling step. 4. The performance indices The following indices were computed for each algorithm at each SNR. The CPU time. The algorithms were all coded in F90 on a Dell Latitude D830 laptop, with the Windows XT platform and the Compaq Visual Fortran 6.6. The CPU time performance index is an estimate of the time taken in milliseconds for each algorithm at each SNR to determine the parameters for one test spectrum only. It does not take into account any preliminary computations, such as the time that was required for computing the derivatives or for building the decision tree. The estimated CPU times are listed in Table 1. The use of a decision tree accelerated the algorithm. Surprisingly, GNA carried out fast determinations of the parameters for high SNR. NMA was rejected at this stage due to its excessive CPU time. The internal standard deviation of the effective temperature, σT (see Table 2). Each test spectrum was generated with 10 realizations for each SNR. σT is the standard deviation on the derived effective temperature of these realizations per SNR. The sampling step of the model grid in Teff is 250 °K. The targeted 100 °K accuracy was not achieved for SNR = 10 for any of the algorithms. INDT achieved the best results here at all SNR and is tempered by the following index. The mean value of the absolute differences between the measured effective temperatures and the input effective temperature, bT . The absolute difference (bias) between the effective temperature determined for each realization of a SNR and the input effective temperature were calculated and the mean of those differences for each SNR provides the bT values as given in Table 3. A large disparity can be observed in the results in Table 3. GNA and GNDT appear to be the methods which lead to the smallest biases. We note that the INDT mean bias for SNR = 10 is too large compared to the other values at that SNR (excluding NMA). The square root of the mean square differences between the measured effective temperatures and the input effective temperature, eT . The results for eT , calculated using the SNR realizations as for bT , are shown in Table 4. eT takes into account both the biases and the statistical fluctuations. The bias bT could be reduced by mapping, but our experiments showed that this reduction is balanced by an increase of the statistical
60
A. Bijaoui et al. / Statistical Methodology 9 (2012) 55–62 Table 3 The mean biases bT for the effective temperature. SNR
ESA
NMA
GNA
MIDT
INDT
MTDT
GNDT
∞
92.8 91.1 77.7 90.5
164.1 159.6 151.5 172.1
10.9 11.0 15.3 88.0
100.5 98.9 84.8 102.8
68.8 68.8 66.9 157.6
113.4 111.3 94.0 114.4
21.5 20.0 18.5 85.2
1000 100 10
Table 4 The global mean error, eT , for the effective temperature. SNR
ESA
NMA
GNA
MIDT
INDT
MTDT
GNDT
∞
127.3 127.4 132.1 308.7
303.5 300.5 294.6 371.4
40.5 51.0 65.3 368.2
139.7 139.1 144.8 320.9
97.5 97.6 101.8 284.4
226.9 228.9 218.4 355.9
51.5 50.9 68.9 312.3
1000 100 10
Table 5 The mean efficiency ET for the effective temperature. SNR
ESA
NMA
GNA
MIDT
INDT
MTDT
GNDT
∞
4.76 4.77 4.96 10.95
8.6 8.91 9.66 14.92
0.84 0.98 1.63 10.75
5.34 5.38 5.64 12.13
3.78 3.80 3.99 10.69
4.69 4.90 5.65 12.53
1.21 1.29 1.85 10.22
1000 100 10
Table 6 The robust mean efficiency RT for the effective temperature. SNR
ESA
NMA
GNA
MIDT
INDT
MTDT
GNDT
∞
4.03 4.06 4.29 10.18
3.48 3.73 5.00 11.73
0.27 0.30 1.08 9.68
4.09 4.12 4.45 11.28
2.64 2.65 2.89 9.82
1.26 1.61 3.06 11.37
0.66 0.74 1.3 9.58
1000 100 10
fluctuations. eT is the correct criterion with which to characterize the final precision. We note that for high SNRs the Gauss–Newton methods provided the best estimates, but for low SNRs the use of the decision tree improved the results. The mean efficiency ET . Let us call rT (n) the mean ratio between the root square of the quadratic error obtained for a given parameter set n and tT (n), the corresponding optimal theoretical value that has been computed from the Fisher information on the parameter Teff for SNR = 100. eT (n) gives information on the efficiency of the algorithm for a given spectrum. Its mean, ET , is the mean efficiency of the algorithm on the entire grid. Table 5 provides confirmation of the conclusions drawn from the previous table. The Gauss–Newton methods are globally the most efficient. The robust mean efficiency RT . For a given algorithm, eT (n) may reach large values for a small percentage of the test spectra. This is due to a convergence default which is connected to the non-convexity of the distance function between the test spectrum and the models. For these test spectra, the algorithm fails to provide the global minimum of the distance function. We apply an iterative 3-σ rejection algorithm on the eT set in order to remove the test spectra for which the algorithm failed to find the global minimum distance. RT is obtained after 10 iterations. It can be considered as a robust value of the algorithm efficiency. The results are shown in Table 6. Table 7 gives the percentage of remaining test spectra used to calculate the final RT values in Table 6. The results for RT show that ESA and the algorithms based on the use of the decision tree are the most effective at avoiding solutions which have been trapped in a secondary minimum. The algorithm choice. The Gauss–Newton algorithm initialized using the oblique decision tree (GNDT) achieves the best global performances in the determination of parameters for the sample of test spectra. In particular:
A. Bijaoui et al. / Statistical Methodology 9 (2012) 55–62
61
Table 7 The percentage of remaining test spectra used to calculate RT . SNR
ESA
NMA
GNA
MIDT
INDT
MTDT
GNDT
∞
96 96 96 97
86 85 87 92
91 90 95 96
95 95 95 97
93 93 93 97
80 82 86 96
94 93 93 98
1000 100 10
• The CPU time is independent of the SNR. For Gaia RVS the SNR will be generally low (<10) and then GNDT will be more efficient than GNA.
• The ET indicator shows that the deviation due to the noise is globally close to the theoretical deviation of 10 for low SNR. For high SNR, GNA would be slightly better for the model grid considered. • The percentage values indicate that GNDT is more robust than GNA at low SNR, since only 2% of the test spectra are rejected instead of 4%. It can be noted that for each algorithm the target Teff accuracy of 100 K was not achieved at the SNR which would correspond to the majority of the Gaia/RVS spectra. The mean efficiency ET shows that this is not due to the algorithms. For this SNR, they generally exploit the information correctly. The failure occurs for astrophysical reasons. In general, for the model grid considered, the variation of the spectrum with the temperature is not sufficient for reaching our accuracy goal. This result must be tempered by considering how the error is distributed on the grid. The poor performances are obtained for spectra displaying faint spectral lines. That is particularly the case for stars having a low metallicity. Nevertheless, the target goal is reached for about a quarter of the model spectra. 5. Conclusion The new era of large astronomical surveys has led to the continuing accumulation of vast astrophysical data sets that need to be automatically analyzed using sophisticated algorithms. The comparison of observations to stellar atmospheric models is one of the key steps. This paper was focused on the nature of the automated algorithms and their comparison in the case of stellar spectral analysis for the Gaia RVS application. Nevertheless further exploration of the questions posed here is still required. In this paper, a specific decision tree was introduced in order to (i) accelerate the search of the nearest model, and (ii) avoid entrapment in a secondary minimum by means of a complete scan of the parameter space. The decision tree is oblique, based on projections on specific vectors at each tree node. It is like a kd-tree for which each set of models is divided into two equal parts taking into account the median of the projection coefficients. The coupling of the decision tree with the classical Gauss–Newton algorithm provided a robust and efficient algorithm with which to estimate the parameters on a regularly spaced grid. During the learning phase, an empirical deviation σk between the parameters of the associated model spectra can be computed for each tree node k. The analysis of the σk array allows identification of the regions within the parameter space that will lead to imprecise results in the analysis. This is mainly due to a spectral degeneracy where two spectra corresponding to very different parameter sets could look very similar. Thus, the use of decision trees is better suited to localizing the parameter regions where the sampling would not be sufficient to distinguish between two such spectra. This leads to an interesting sampling strategy. Other sampling strategies can be implemented, such as random sampling or the use of a Latin hypercube [12]. The interest in using methods based on compressed data was mentioned above. These methods are limited by the non-linearity of the spectrum versus the parameters. Using diffusion maps, Lee and Wasserman [13] and Richards et al. [18] showed that it was possible to reduce this difficulty. The use of compressed data would also be possible for the algorithms tested here. Our experiments showed that a compression rate of 95% can be applied through a principal component analysis without a significant decrease in the precision. ˆ | Θj ) . Whatever the algorithm, it is easy to determine, from simulations, the conditional PDF p(Θ This is sufficient for providing a confidence interval on the parameters determined for the test spectra.
62
A. Bijaoui et al. / Statistical Methodology 9 (2012) 55–62
A deeper analysis is sometimes required in order to characterize the estimated parameters. In this ˆ ). Its computation case, a Bayesian inference was performed by studying the posterior PDF p(Θ |Θ takes into account a prior PDF, usually considered as uniform on the model grid. This analysis is very time-consuming. The conditional posterior PDF is generally characterized by its expectation and its standard deviation. Our experiments showed that the biases were significantly reduced by this analysis, but the deviation also increased significantly. In conclusion, the existence of such different procedures for estimating parameters leads us to examine this fundamental problem with the feeling that each algorithm would add value to the overall analysis. Hence, in the analysis of the Gaia RVS spectra, it is planned that multiple determinations of the stellar parameters will be performed using several key automated algorithms. Acknowledgments The authors are sincerely grateful to Dr. C. Worley for her careful review of the English in this paper. They also thank the anonymous reviewers for their constructive comments. References [1] C. Allende Prieto, An automated system to classify stellar spectra I, Mon. Not. R. Astron. Soc. 339 (2003) 1111–1116. [2] C. Allende Prieto, Stellar atmospheric parameters: the four-step program and Gaia’s radial velocity spectrometer, classification and discovery in large astronomical surveys, AIP Conf. Proc. 1082 (2008) 47–53. [3] C.A.L. Bailer-Jones, The ILIUM forward modelling algorithm for multivariate parameter estimation and its application to derive stellar parameters from Gaia spectrophotometry, Mon. Not. R. Astron. Soc. 403 (2010) 96–116. [4] J.L. Bentley, J.H. Friedman, Fast algorithms for constructing minimal spanning trees in coordinate spaces, IEEE Trans. Comput. 27 (1978) 97–105. [5] A. Bijaoui, A. Recio-Blanco, P. de Laverny, Parameter estimation from an optimal projections in a local environment, classification and discovery in large astronomical surveys, AIP Conf. Proc. 1082 (2008) 54–60. [6] A. Björck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996, p. 260. [7] V.A. Epanechnikov, Theory Probab. Appl. 14 (1969) 153–158. [8] O.R. Gonzalez, C. Küper, K. Jung, P.C. Naval, E. Mendoza, Parameter estimation using simulated annealing for S-system models of biochemical networks, Bioinformatics 23 (2007) 480–486. [9] B. Gustafsson, B. Edvardsson, K. Erikson, U.G. Jørgensen, Å. Norlund, B. Plez, A grid of MARCS model atmospheres for latetype stars—I. Methods and general properties, Astron. Astrophys. 486 (2008) 951–970. [10] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, Francfort, 2001, p. 35. [11] A. Heavens, R. Jimenez, O. Lahav, Massive lossless data compression and multiple parameter estimation from Galaxy spectra, Mon. Not. R. Astron. Soc. 317 (2000) 965. [12] K. Heitmann, D. Higdon, C. Nakhleh, Salman Habib, Cosmic calibration, Astrophys. J. 646 (2006) L1–L4. [13] A.B. Lee, L. Wasserman, Spectral connectivity analysis, J. Amer. Statist. Assoc. 105 (2009) 1241–1255. [14] J.A. Nelder, R. Mead, A simplex method for function minimization, Comput. J. 7 (1965) 308–313. [15] R.C. Nichol, S. Chong, A.J. Connolly, S. Davies, C. Genovese, A.M. Hopkins, C.J. Miller, A.W. Moore, D. Pelleg, G.T. Richards, J. Schneider, I. Szapudi, L. Wasserman, Computational astrostatistics: fast and efficient tools for analysing huge astronomical data sources, in: Eric D. Feigelson, G.Jogesh Babu (Eds.), Statistical Challenges in Astronomy. Third Statistical Challenges in Modern Astronomy, SCMA III, Springer, New York, 2003, pp. 265–278. [16] P. Ram, D. Lee, H. Ouyang, A.G. Gray, Rank-approximate nearest neighbor search: retaining meaning and speed in high dimensions, in: Adv. in Neural Inf. Proc. Syst., NIPS, vol. 22, 2010. [17] A. Recio-Blanco, A. Bijaoui, P. de Laverny, Automated derivation of stellar atmospheric parameters and chemical abundances: the MATISSE algorithm, Mon. Not. R. Astron. Soc. 370 (2006) 141–150. [18] J.W. Richards, P.E. Freeman, A.B. Lee, C.M. Schafer, Accurate parameter estimation for star formation history in galaxies using SDSS spectra, Mon. Not. R. Astron. Soc. 399 (2009) 1044–1057. [19] M. Tegmark, A.N. Taylor, A.F. Heavens, Karhunen–Loeve eigenvalue problems in cosmology: how to tackle large data sets? Astrophys. J. 480 (1997) 22–35. [20] R. White, Astronomical applications of oblique decision trees, classification and parametrization of unresolved galaxies with Gaia, classification and discovery in large astronomical surveys, AIP Conf. Proc. 1082 (2008) 37–43. [21] M.I. Wilkinson, et al., Spectroscopic survey of the Galaxy with Gaia—II. The expected science yield from the royal velocity spectrometer, Mon. Not. R. Astron. Soc. 359 (2005) 1306–1335.