Expert Systems with Applications 37 (2010) 1719–1727
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Parameterization of RVS synthetic stellar spectra for the ESA Gaia mission: Study of the optimal domain for ANN training Diego Ordóñez a,*, Carlos Dafonte a, Minia Manteiga b, Bernardino Arcay a a b
Department of Information and Communications Technologies, Faculty of Computer Science, University of A Coruña, 15071 A Coruña, Spain Department of Navigation and Earth Sciences, University of A Coruña, 15011 A Coruña, Spain
a r t i c l e
i n f o
Keywords: Artificial Neural Network Connectionist systems FFT Wavelet transform Gaia mission Stellar spectra Stellar parameters
a b s t r a c t One of the upcoming cornerstone missions of the European Space Agency (ESA) is Gaia, a spacecraft that will be launched in 2011 and will carry out a stereoscopic census of our Galaxy and its environment by measuring with unprecedented exactitude the astrometry (distance and movements), the photometric distribution from ultraviolet to the infrared of its components, and, in the case of the brightest objects (mainly stars), the spectrum with intermediate resolution in the region of the infrared CaII triplet, with a spectrograph known as Radial Velocity Spectrometer (RVS). Stars are the basic constituents of our Galaxy, and they can be characterized if we can estimate their principal atmospheric parameters: effective temperature, gravity, metal content (general abundance of elements other than H and He), and their abundance of alpha elements (elements with Z > 22, [a/Fe]), which provide information on the physical environment in which the star was born. This work presents our results for the parameterization of stellar spectra with simulated data (synthetic spectra) in the spectral region of the RVS and with the application of Artificial Intelligence Techniques based on ANNs. Our work has two main purposes: to determine the optimal domain for the ANNs performance, and to develop an adequate noise detection and filtering algorithm. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction In a star, the distribution of light intensity according to wavelength (color), in other words the electromagnetic stellar spectrum, is determined by the physical and chemical conditions of its atmosphere (mainly effective temperature and gravity, and also by abundances of chemical elements). From the very start of Astronomic Spectroscopy, in the 1950s, the measurement of certain spectral features (absorption or emission bands or lines) and their relative intensity have helped astrophysicists to determine these properties and elaborate models that characterize the mass, age, and evolutionary stage of stars. The earliest characterization and classification methods were heuristic and established a bidimensional framework of reference called the MK system (in honor of its creators, Morgan, Keenan, & Kellman, 1943). The MK system orders the spectra into a series of spectral types, O–B–A–F–G–K–M, with decreasing intensities for the Hydrogen lines; this series was subsequently found to reflect a sequence of decreasing atmospheric temperatures. The system assigns a Roman number to the spectra, going from I to V, that indicates the luminosity class and depends on the intrinsic bright* Corresponding author. Tel.: +34 981167000; fax: +34 981167160. E-mail addresses:
[email protected] (D. Ordóñez),
[email protected] (C. Dafonte),
[email protected] (M. Manteiga),
[email protected] (B. Arcay). 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.07.038
ness of the star; from a physical point of view, it is related to the surface gravity of that star. Astrophysicists classify spectra in the MK system on the basis of a set of reference spectra, used as template standards, that define each of the classes. The advantage of this method is that it provides a valid, albeit imprecise, estimation of the star type (early or hot, intermediate like the Sun, or cold and late) and of its evolutionary stage (stars in the Main Sequence or evolved stars). Increasing understanding of the physical processes that take place inside a star and the radiation transport processes through the stellar atmosphere led to the first spectral synthesis models (Kurutz, 1992; Gray & Napier, 2001). These models allowed researchers to obtain the theoretic spectrum of a star, based mainly on the choice of its principal atmospheric properties: effective temperature and gravity (Teff y logg), and abundance of chemical elements ([Fe/H] and ½a=Fe). By comparing these synthetic spectra with the real spectrum, it can be estimated, with a certain margin of error, the values of these parameters in the star. Spectral synthesis is a common method for high resolution spectra (low noise and a small sampling interval). In the course of the last two decades, Observational Astrophysics has witnessed an authentic revolution in the capacities of telescopes and associated instruments as well as in the automation processes of data acquisition, processing, and archiving. Both the most recent earth telescope projects (e.g. the Great Canary
1720
D. Ordóñez et al. / Expert Systems with Applications 37 (2010) 1719–1727
Telescope at El observatorio del Roque de los Muchachos at La Palma, Spain), and the existing spatial telescopes (IUE, Hubble telescope, etc.) include the creation of extensive databases, whose exploitation inevitably requires the use of automatic techniques for processing, classification, and parameterization. The techniques that are used in Computational Astrophysics for this automatic processing of astronomic data (spectra, images, and photometric data) are mainly twofold: statistical techniques (Minimal Distance Methods, Cluster Analysis), and techniques based on Artificial Intelligence Methods (Rodriguez, Arcay, Dafonte, Manteiga, & Carricajo, 2004; Dafonte, Rodriguez, Arcay, Carricajo, & Manteiga, 2005), in particular Artificial Neural Networks (ANNs). A large number of publications are available in this field, such as those mentioned in this recent article by Von Hippel, Allende, and Sneden (2002), Bailer-Jones (2000, 2008), Fiorentin et al. (2007) or Harinder, Gulati, and Gupta (1998). In general terms, their purpose consists in identifying the astronomic source type (star, galaxy, quasar, asteroid, etc.), and, if the data set is uniform, as is the case with stars, characterize its members by parameterizing their main properties. Stars are the basic constituents of our Galaxy, and as explained before, they can be characterized if we can know or estimate their effective temperature, gravity, metal content (general abundance of elements other than H and He, usually determined from the abundance of Iron as opposed to Hydrogen, [Fe/H]), and in particular their abundance of alpha elements (elements with Z > 22, [a/ Fe]), which provide information on the physical environment in which the star was born. The abundances of individual chemical elements can only be obtained in the presence of spectra with a good signal-to-noise ratio (SNR) and an adequate spectral resolution. One of the upcoming missions of the European Space Agency (ESA) is Gaia, a mission that is presently approved and outlined by a team of researchers that includes the authors of this article. Gaia, a spacecraft that will be launched in 2011, include several instruments, and will carry a stereoscopic census of our Galaxy and its environment by measuring with unprecedented exactitude the astrometry (distance and movements) of its components, the photometric distribution from ultraviolet to the infrared (3050 and 12,320 Å), and, in the case of the brightest objects (mainly stars), the spectrum with intermediate resolution in the region of the CaII triplet, between 8470 and 8740 Å, measured with a spectrograph known as Radial Velocity Spectrometer (RVS). This work presents our results for the parameterization of stellar spectra with simulated data (synthetic spectra) in the spectral region of the RVS and with the application of Artificial Intelligence Techniques based on ANNs. Our work has two purposes: determine the optimal domain for work with data, and develop an adequate noise detection and filtering algorithm. Domain refers to the data format that results from the transformations applied to the stellar spectra in order for the parameterization algorithm to behave in the best possible way. We considered the three following domains: the original domain, i.e. a flow of light received according to the wavelength; the transformed domain, i.e. the result of applying the Fourier Transform (Cooley & Tukey, 1965) to the spectra in wavelength; and finally the Wavelet Transform, which allows us to use the multilevel analysis of signal approaches and details (Mallat, 1989) as inputs of the neural network. The choice of the data domain, the selection of an adequate neural network for the classification, and the detection of SNR, are related tasks that allow us to work towards hybrid systems for information processing in which specific techniques can be applied to each case. The considered SNRs cover the range that can be expected for most of the stellar spectra that will be obtained with the RVS once
the Gaia satellite is operative, i.e. SNR 5, 10, 25, 50, 75, 100, 150, 200 and 1. We also considered additive white Gaussian noise in the spectral signal. This work consists of seven sections. Following this Introduction, Section 2 describes the matrix of synthetic stellar spectra that was used for this study, and the variation range of each physical– chemical stellar parameter (Teff, logg, [Fe/H] and ½a=Fe) that was considered. We briefly describe how the input domain changes when the Fourier Transform of these signals is carried out, as well as the Wavelet Transform, for which five different approach and detail levels were considered. Section 3 explains how the experiments for the prediction of the SNR relationship and the determination of the four physical–chemical parameters of the stars were designed, whereas Section 4 describes the developed equipment and tools and their computational efficiency. Section 5 treats the problem of signal noise prediction. Section 6 presents the results for each domain considered independently (Section 6.1) and for the combination of various methods (Section 6.2). Finally, Section 7 presents our conclusions for the obtained results and our plans for improvement and continuity in the course of the Gaia project.
2. Data description.Input domains For our tests the Gaia RVS Spectralib was used, a library of stellar spectra compiled by A. Recio-Blanco and P. de Laverny fron Niza Observatory, and B. Plez from Montpellier University. A technical note is available describing the models used for the atmospheres from which the synthetic spectra were calculated and what parameters were used (Recio-Blanco, de Laverny, & Plez, 2005). It has the following characteristics: 1. Total number of examples: 9408. 2. Initial wavelength 847.58 nm, final wavelength 873.59 nm, increase in wavelength 0.0268 nm, number of points per spectrum 971. 3. Ranges of the parameters: See Table 1. When the Gaia satellite becomes operative, the RVS instrument will inevitably include noise from various sources (sensitivity of the detectors, background noise near the source, instrumental noise, etc). We have therefore, considered the possibility of working with synthetic spectra that are modified by various noise levels according to a simple model of noise, white noise, and various SNR values: 5, 10, 25, 50, 75, 100, 150, 200 and 1. Careful study of these values provides us with information on how noise affects the prediction of parameters and how we can mitigate its effect by detecting its intensity with regard to the signal, as described in Section 5. The dataset represents the total amount of examples that will be used to carry out the first stage of the experiment (comparison of results according to input domains). This set was arbitrarily divided into two subsets, in a proportion of 70–30%; the first subset will be used to train the algorithms, the second for testing (the results are shown in the comparison). The above data were obtained through the participation of our research team in the Gaia project. Since the design and implementation of a project of these dimensions entails a large number of tasks and requires the application of various technologies, which can hardly be expected to be carried out by one single team, the Gaia consortium has divided the tasks among several coordination units (CUs). Our research team belongs to CU8, the unit in charge of classification tasks, which means that we shall focus on classification by means of parameterization of spectra that proceed from individual stars. Our input information consists of calibrated photometry, spectroscopy, and astrometry, data gathered by the
D. Ordóñez et al. / Expert Systems with Applications 37 (2010) 1719–1727
1721
Table 1 Parameters and value ranges. Parameter
Min
Max
Values
Teff
4500
7750
4500 7750 4500 4750 5000 5250 5500 5750 6000 6250 6500 6750 7000 7250 7500 7750
Logg
0.5
5
0.5 5 0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
[Fe/H]
5
1
5 1 5 4 3 2 1.5 1 0.75 0.5 0.25 0 0.25 0.5 0.75 1
½a=Fe
0.2
0.4
0.2 0.4 0.2 0 0.2 0.4
satellite and used to estimate the main astrophysical parameters of the stars: Teff, logg, [Fe/H], and ½a=Fe. The Gaia project is divided into 10 development cycles with a duration of six months; during each cycle, CU2 generates new simulated data and prepares and refines the new simulations for the next cycle. At the time of our experiment the project was in its fifth cycle and the reference simulations were, as always, those of the previous cycle, in our case cycle 4. As was explained in the introduction, the input domains on which we focus are the following: Fig. 2. Multilevel analysis.
(1) The spectrum in wavelength: in this case, the processing consists only in normalizing the spectrum and scaling its values in the [0,1] interval, so that they can be used as input signals for the ANNs. (2) The result of transforming the spectra into wavelength by applying the FFT (Cooley & Tukey, 1965). The process of transformation into the selected domain is described in Fig. 1. The input of the algorithm is the signal in normalized original signal in units of intensity flux per wavelength interval; the value in each k or wavelength corresponds to the flow value of the spectrum. From that point onwards, we apply a classical steps in signal processing, as can be seen in Fig. 1. The entire process consists of four stages (Bendat & Piersol, 1971): (a) Subtract the average value from the signal. (b) Apply a window function of the Hamming type. (c) Apply the FFT. (d) Calculate the absolute value of the resulting signal. (3) The application of the wavelet transform (Meyer et al., 1989) to spectra. An efficient way of implementing this scheme by means of filters was developed by Mallat (1989), whose
practical filtering algorithm produced a rapid wavelet transform. We shall refer to this analysis as the multilevel analysis and apply it, in this particular case, to spectra. For many signals, the low frequency content is the most important part, because it provides them with an identity, whereas the high frequency content merely conveys nuances. In wavelet analysis, we frequently speak of approaches and details: the former are signal components with high scales and low frequencies, the latter are components with low scales and high frequencies. In our case, the signal is a stellar spectrum, generally composed by high frequency absorption lines with variable intensity, wide lines, and molecular bands of various types, which leads us to think that a wavelet analysis may be very adequate for our problem. The concept of multilevel analysis refers to the repeated application of filtering to each of the successive signal approaches, obtaining a new level after each filtering stage. Fig. 2 shows how the filters are used to obtain the approaches and details. We consider up to five levels of approaches and details, which yield a total of 5 2 ¼ 10 different signals for each spectrum. We carry out the tests for each signal in order to determine where the relevant information for each parameter is to be found. All these approaches and details constitute what we consider to be the third study domain. With regard to the wavelet analysis, an important parameter remains to be determined: the election of the mother wavelet. In this case, we decided to carry out the analysis with the help of the Daubeschies wavelets (Daubechies, 1988), because they are orthogonal and easily implementable by means of digital filtering techniques (Meng, Wang, & Llu, 2000).
3. Experiment design
Fig. 1. Fourier transform domain.
The previous section described the three formats or input data domains that are to be considered in the extraction of stellar parameters and the comparison of various experiments. The dimension of the input data can be observed in Table 2. For the spectrum in wavelength, the dimensionality results from using all the information provided by the signal of the normalized spectrum, i.e. a total of 971 points. However, for the result of the
1722
D. Ordóñez et al. / Expert Systems with Applications 37 (2010) 1719–1727
Table 2 Dimensions of the data formats (N = Number of signal points). Format
N
FFT Wavelength A1/D1 A2/D2 A3/D3 A4/D4 A5/D5
335 971 490 249 129 69 39
algorithms that were used to analyse the transformed domains, the dimensions change. In the case of the Fourier analysis, we know that the resulting signal is symmetric and band-limited. It is for this reason that we have 335 points, of which we discard half (for symmetry), as well as the highest frequencies (because they cancel the signal). In the case of the multiresolution analysis with wavelets, as we descend the filtering sublevels, the number of signal points decrease with a factor of approximately 12, which results in signals with the amount of points indicated in Table 2. The algorithm used to evaluate the adjustment to each of the input domains is an ANN. We generate one network for each parameter to be predicted (Teff, logg, [Fe/H], ½a=Fe). The network design is based on experiments that were described in related articles and have proven their usefulness in parameterization (Kaempf et al., 2005; Bailer-Jones, 2000). All these experiments point towards feed-forward networks with three layers (input, hidden, output) and trained with the error backpropagation algorithm (Rumelhart, Hinton, & Williams, 1986). After deciding the network architecture, we must determine the dimension of the neural network. For each experiment, the number of neurons in the input layer coincides with the number of points of the format that was selected for the signal. The output layer consists of one single processing element, determined by the parameter to be predicted. The activation function for the output neuron of the network is a sigmoidal function that finds its values in the closed [0,1] interval; this requires a subsequent interpretation of the network result by adapting it to the value range of the parameter to be predicted. Even though there is no formula with which to calculate the amount of hidden process elements that are adequate for the network training, we were able to determine the correct approach by following a heuristic strategy with several adjustments. We determined a maximal amount of process elements, 200, and then calculated as follows: the number of inputs added to the number of outputs and divided by 2 provides us with the number of process elements, which is the smallest of both. The reason for determining the largest possible number of process elements is that we want to make good use of the available computer resources: a network of, for instance, 1000 inputs and 500 process elements in the hidden layer with 10,000 training patterns would be too expensive in terms of resources, requiring an additional week of work with the equipment described in Section 4 just to complete each network training. In any case, we do not need a similar amount of process elements in the hidden layer, because the execution of both experiments has shown that an increase in process elements does not result in any significant improvement. Since parameterization of spectra is not a linear problem, the network architecture requires more than one layer. The obtention of a given parameter on the basis of a spectrum can be seen as a function whose input is the spectrum that is presented in the corresponding format (wavelength, FFT, or wavelet analysis) and whose value is the parameter that must be predicted. It has been shown that multilayer neural networks, with hidden process elements whose activation function is non-polynomial, can approach
any function (Leshno & Schocken, 1993). The configuration with three network layers (input, hidden layer, and output) is very frequently used for many purposes, not only to solve parameterization problems, and has provided excellent results in the present work. 4. Equipment, tools, and computational efficiency Our experiment required a rack of four servers equipped with two Intel Xeon Quad Core processors and 16 GB RAM each. This hardware architecture allowed us to launch a total of 32 parallel trainings (eight per computer) without any significant impact on the equipments productivity. The time needed to finish a training was highly variable and depended on three main factors: (1) Number of process elements of the hidden layer. (2) Dimension of the input domain. (3) Number of training stages with which the algorithm is configured. The neural networks and the processing algorithms were implemented in JAVA (requirement of the Gaia project). The neural networks were defined and trained with a tool that was developed by our research group: XOANE (Ordóñez, Dafonte, Arcay, & Manteiga, 2007), eXtensible Object Oriented Artificial Neural Networks Engine, is a tool that allows us to arbitrarily shape network architectures, training algorithms, and tests. We use this framework instead of other, more popular ones (e.g: Matlab), because its execution times are shorter and it enables us to obtain intermediate results that finally lead us to the networks point of generalization for each experiment. The search for the network generalization point is a recurring question when we tackle a classification problem with neural networks. The best network is not the one with the smallest mean error in the training set, but the one with the best results in the test set. In a normal training iteration, with correct network parameterization conditions, training network, and weights initialization, it is usual for both sets to decrease their error during the training stages until they reach a point at which the error decreases for the training set but negatively affects the results of the test; this point is called the network generalization point. XOANE allows us to save the state of the network by stage intervals that are selected by the user. For instance, if we configure a training with 1000 stages, stored at intervals of 100 stages, at the end of the algorithm we are saving 10 networks that correspond to the state of the network in stages 100, 200, . . . up to 1000. The networks are saved in an XML format (Bay et al., 2006) according to a format that was specified in an XML schema. These XML documents can afterwards be recuperated so as to reproduce the tests or transformed to be used in another environment that is external to the tool. Experience tells us that by saving the network every 100 stages, we are not likely to find the exact generalization point, but we will be nearby, with an error of maximum 50 training stages (half of the defined interval). If we want more precision, we reduce the interval, but a smaller interval also means that more networks will be saved, and this is a costly operation that considerably increases the execution time of the trainings in the case of very large networks. In the case of the present experiment, the networks were saved at intervals of 25 stages. 5. Nature of the spectral signal and prediction of the noise level The reliability of the parameterization algorithm largely depends on the wavelength coverage, the spectral resolution, and the noise intensity. From the point of view of the design, we must
D. Ordóñez et al. / Expert Systems with Applications 37 (2010) 1719–1727
know the quality of the parameterization for a given set of observations. Since the wavelength interval and the resolution are determined in advance, the treatment of the noise problem and the configuration of the information extraction algorithm that will be applied determine the extraction quality of the parameters Bailer-Jones (2000). Our experiment assumes an additive noise of the gaussian type, the noise that the Gaia development team is initially considering in each cycle. We know that noise of this nature has a spectral density that is continuous in all the frequencies. Unlike noise, the Fourier Transform of any clean spectrum is band-limited and even cancelled at high frequencies. Therefore, if the noise is additive and has constant spectral density, and if the signals only spectral density components other than zero appear in low frequencies, in a noisy spectrum the highest values of the transformed variable correspond only with spectral components of the noise. We will further develop this aspect in order to try and taxonomize the noise and predict its intensity. Fig. 3 describes this behaviour for the highest frequencies. Another important aspect is that more noise intensity in the studied signal (spectrum) implies more intense values in the higher frequencies. Therefore, in the Fig. 4 the signal of the lower values corresponds to that of SNR 100, then SNR 25, 10, and 5 (values with the highest intensity). After determining the nature of the signal and the noise, we now only need to introduce a method to predict the adequate level of noise. Our point of departure are the data of which we already dispose, i.e. the sets at various SNRs mentioned in Section 2, which amount to a total of nine sets, including the clean sets of the same examples with different noise intensities. For each resulting signal, we calculate the value of the integral for the last signal points, because more noise means more intensity in the last points and, therefore, a larger integral value. For each signal-to-noise set, we calculate the average values while taking into account the value of the integral for all the data of the set. This results in a numeric value for each set. With the reference of the values that were cal-
1723
culated during the previous step, we proceed to calculate the intervals that will allow us to serve as a basis to determine the noise category of a new example. Given the integral values ISNR5 ; ISNR10 ; ISNR25 ;ISNR50 ; ISNR75 ; ISNR100 ; ISNR150 ; ISNR200 , and ISNR10000 , we calculate the average values ðISNRX ISNRXþ1 Þ=2 to conclude that a spectrum has noise with intensity SNR50 if the value of the integral in the last points lies in the interval ½ðISNR25 ISNR50 Þ=2; ðISNR50 ISNR75 Þ=2. This strategy has a low error rate (90% hits for the available data), but by experimenting with the spectra we were able to observe that if we train with the examples of a specific signal-tonoise level and try with examples of the previous or next level, the results do not deteriorate substantially. On the basis of that fact, we decided to establish noise level groups, considering the same algorithm for SNR 5, 10, and 25; for SNR 50, 75, and 100, and for SNR 150, 200. This strategy allows us to minimize the error probabilities (all the examples in the test set are placed in the correct category). This calculation allows us to use the binomial input domain SNR specific algorithm to optimize the behaviour for any level of intensity in the signal. Fig. 5 shows that if we know the intensity of noise in a signal, we can elaborate a specific treatment for that signal. The idea behind this is to use the knowledge that was gathered in the course of numerous experiments by applying a wide range of signal processing algorithms and carrying out specific trainings for particular each input domain, parameter, and noise level. As a result of all these tests, we know with a high level of certainty which is the most adequate way of parameterizing according to the noise and parameter type. This knowledge will be used to generate the final classifier (Fig. 5). The final classifier is the result of combining the acquired knowledge. Noise is a determinant factor in the quality of the adjustment, but thanks to the algorithm described in this section, we can apply the adequate transformation to the spectrum for the parameter to predict and for that specific noise level or range of noise levels, and we can select the network that is most adequate to process that input. In other words, the final classifier in
Fig. 3. Comparison of the spectral strength of a clean signal and a signal with SNR 10.
1724
D. Ordóñez et al. / Expert Systems with Applications 37 (2010) 1719–1727
Fig. 4. One sample with several SNRs (5,10, 25,100).
s is the transformed variable.
Table 3 Results for clean, SNR infinite spectra.
Fig. 5. Outline of the final classifier based on neural networks.
Fig. 5 was designed to give the best response possible for each particular case. 6. Results 6.1. Results for each input domain The results of the experiment are numerically exposed in Tables 3–6. The values of the tabulated data must be seen in their context: for instance, a temperature error of 100 K must be compared within the value range to which the parameter belongs, which in this case and according to lies between 4500 and 7750 K (see the description of the simulated data in Section 3). Each table shows the results for each study domain after determining the noise level of the spectra. The first file states the results for spectra transformed with the algorithm described in Section 3 (with the application of the Fourier Transform), the second file refers to the normalized spectrum and wavelength input (no Transform). The remaining files of each table refer to the fact that we feed the net-
Clean
Teff
Logg
[Fe/H]
½a=Fe
FFT Wavelength A1 A2 A3 A4 A5 D1 D2 D3 D4 D5
35.833 72.2453 81.1248 85.3519 92.5648 108.322 115.155 166.298 159.379 138.612 105.339 118.439
0.0708 0.1475 0.1605 0.1849 0.206 0.2191 0.2576 0.2819 0.338 0.2708 0.2278 0.2438
0.059 0.1175 0.1283 0.1464 0.17 0.2083 0.2389 0.1868 0.1859 0.1716 0.1579 0.2286
0.0347 0.0684 0.0749 0.0811 0.0834 0.0932 0.1092 0.0752 0.0732 0.0669 0.0727 0.0857
Table 4 Results for input spectra with SNR 10. SNR10
Teff
Logg
[Fe/H]
½a=Fe
FFT Wavelength A1 A2 A3 A4 A5 D1 D2 D3 D4 D5
739.35 297.28 288.66 272.27 267.77 284.2 328.01 834.52 612.56 554.2 497.37 436.97
1.31 0.59 0.57 0.55 0.57 0.61 0.68 1.62 1.26 1.08 0.91 0.76
1.07 0.37 0.36 0.35 0.35 0.41 0.44 1.25 0.71 0.62 0.58 0.57
0.19 0.17 0.16 0.16 0.17 0.17 0.18 0.23 0.18 0.18 0.18 0.18
work with successive approaches ðAi Þ and details ðDi Þ obtained with the multilevel analysis and applying wavelets. Instead of showing the error distribution for all the cases, which would result in an enormous amount of data, we show the best
1725
D. Ordóñez et al. / Expert Systems with Applications 37 (2010) 1719–1727 Table 5 Results for input spectra with SNR 75.
Table 6 Results for input spectra with SNR 200.
SNR75
Teff
Logg
[Fe/H]
½a=Fe
SNR200
Teff
Logg
[Fe/H]
½a=Fe
FFT Wavelength A1 A2 A3 A4 A5 D1 D2 D3 D4 D5
287.92 79.93 99.26 103.27 115.42 129.18 148.99 408.19 276.13 247.32 215.42 193.68
0.5 0.17 0.2 0.23 0.25 0.27 0.33 0.89 0.55 0.45 0.45 0.37
0.31 0.13 0.15 0.16 0.18 0.23 0.27 0.42 0.27 0.26 0.23 0.28
0.13 0.08 0.09 0.1 0.1 0.11 0.13 0.15 0.12 0.12 0.11 0.12
FFT Wavelength A1 A2 A3 A4 A5 D1 D2 D3 D4 D5
175.38 80.16 106.49 90.92 99.35 111.11 130.43 259.67 207.69 188.39 147.51 145.45
0.28 0.16 0.17 0.19 0.24 0.22 0.27 0.56 0.4 0.34 0.29 0.29
0.19 0.12 0.16 0.15 0.17 0.21 0.25 0.27 0.2 0.19 0.18 0.24
0.09 0.07 0.08 0.08 0.09 0.09 0.11 0.12 0.09 0.09 0.09 0.1
case of clean spectra and Fourier transformed domain (second case), and then analyse the results in the presence of noise. Fig. 6 shows the error distribution for each parameter and no noise: this particular presentation shows 95% of the selected data set as a test for each parameter. The errors are concentrated around error zero so as to create a minimal amount of outliers, rare objects that generate a very high error rate. In fact, in this case we show 95% of the examples, which tells us that, extrapolating the results and with-
out the noise, 95% of the temperature predictions are expected to report errors below 100 K. The distribution pattern is similar for all the parameters: for 95% of the spectra, the error in gravity is smaller than 0.25, for metallicity, and smaller than 0.2. for the alpha elements. A relevant fact to extract from Fig. 6 is the concentration of errors around zero, which makes the results for metallicity and alpha elements particularly interesting. Fig. 7 provides a different look at the results: we can observe the errors for each discrete value of each parameter. The typical devi-
Fig. 6. Distribution of frequency errors for every parameter and SNR 1.
1726
D. Ordóñez et al. / Expert Systems with Applications 37 (2010) 1719–1727
Fig. 7. Errors for parameter values and SNR 1.
ations from the correct parameter value are small, as can be seen in the cases of temperature, gravity, and metallicity. Apparently this is not the case of the alpha elements, but this impression is due to the different scale in which the values for this parameter are presented, since the values are very small and scarcely separated (0.2 and 0.4). The mean error presents a similar result to that of the other parameters (0.034). When analyzing the results of Tables 3–6, we notice that the algorithm that gives the best results for the clean spectra does not function as well in the presence of noise. This is mainly due to the characteristics of the spectrum that were mentioned in Section 5 (the noise is additive and affects all the frequencies). The most consistent results are found in the wavelength domain, with slight variants in which certain approaches and details provide slight improvements (e.g: A3 with SNR 10 and predicting temperature). If we knew in advance what the noise level in the signal were told be, we could select the input domain and the network that provide the best result under the given circumstances for a given parameter. But a real situation does not provide this prior knowledge, so we are, in theory, obliged to choose an intermediate alternative that gives good results in the presence of noise and acceptable results with clean spectra. In such a scenario, and considering the mean error results, the wavelength domain would be
the best solution. However, thanks to the algorithm described in Section 5, we can apply the processing algorithm that is most suitable for each case. 6.2. Results for the combination of methods Making use of our knowledge on noise level detection, we apply the general algorithm described on Section 5. Because there are so many noise levels (9), we decide to classify them as described above, into groups of three levels ([5, 10, 25], [50, 75, 100], [150, 200, 1E3]), and this for each parameter. This method results in the combination of noise levels associated with techniques and parameters of the Table 7. The results of the final algorithm are reflected in Table 8, which takes into account the algorithms that are to be used for each particular case (Table 7). These results are always better than, or at least equal to, those obtained with one single technique for all, as can be observed by comparing the results of Tables 3–6 to those of Table 8. The results for noised spectra are of excellent quality from an astrophysical point of view. But we can obtain even more important advantages from this combination of techniques. Until now, the described tabulated results have referred to errors that came from using the same SNR for
D. Ordóñez et al. / Expert Systems with Applications 37 (2010) 1719–1727 Table 7 Processing techniques.
[5. 10. 25] [50. 75. 100] [150. 200. 1E3] 1 (Clean)
Teff
Logg
[Fe/H]
½a=Fe
W W W FFT
A2 W W FFT
A2/A3 W W FFT
A1 W W FFT
1727
incorporation of new techniques that can replace the current analogous ANNs in order to obtain algorithms of a different nature that yield better results in specific circumstances. Among the future techniques for automatic classification and parameterization are, beside ANNs, genetic algorithms and minimal distance methods.
Acknowledgement Table 8 Results of the final algorithm.
1 SNR SNR SNR SNR SNR SNR SNR SNR
200 150 100 75 50 25 10 5
Teff
Logg
[Fe/H]
½a=Fe
35.833 80.16 81.39 90.06 97.42 118.50 188.53 267.77 438.17
0.0708 0.16 0.16 0.18 0.19 0.23 0.39 0.55 0.80
0.059 0.12 0.12 0.14 0.15 0.17 0.26 0.35 0.52
0.0347 0.07 0.08 0.09 0.09 0.10 0.14 0.16 0.18
the training set and the test. Our research team, however, has also carried out crossed tests, in which an SNR was trained and used to try the test sets at different levels of noise intensity. The results were similar as long as the noise level for training and testing remained similar (cf. results in Tables 3–6 and 8), but when we tested with noise levels other than those of the training, the results were unacceptable for parameterization, regardless of the parameter. This is why the noise detection algorithm plays a fundamental role in blind tests, i.e. tests for which we do not know the noise level in advance, as happens with real problems. 7. Conclusions and future work Literature reports studies that expose results of neural networks applied to the extraction of all types of information from spectral signals (Kaempf et al., 2005; Bailer-Jones, 2000; Christlieb, Wisotzki, & Graßhoff, 2002; Von Hippel et al., 2002; Allende, Rebolo, Garcia, & Serra-Ricart, 2000; Fiorentin et al., 2007 & Recio-Blanco et al., Recio-Blanco, Bijaoui, & de Laverny, 2002). In all these studies, these networks are shown as a robust and reliable alternative method for the automatic extraction of information. Most conventional methods focus on a specific domain, analyse how it behaves in the presence of noise, and on how they are in carrying out tasks of spectral classification or parameterization. The present work has proposed the implementation of a hybrid classification system for stellar spectra based on studying the nature of the signal in order to optimize the results of applying only one technique. Given a signal and a parameter to predict, our first task consisted in detecting the noise intensity for that specific SNRparameter combination. We selected an input domain and a parameterization algorithm on the basis of objective criteria (see results in Section 5). During the analysis of the domains and formats of the input data, we exploited the information provided by the approaches and details that result from filtering in wavelets. We generated new input domains by combining the results of approaches and details, captured the relevant information for each parameter, and provided it to the input of the parameterization algorithm. The results obtained for clean and noised spectra show a distribution of mean errors that are of excellent quality when compare with previous works of stellar parameterization described in Section 1. This hybrid perspective of the parameterization algorithm is not limited to the use of ANNs: our method is open to the
This work has been funded by the Spanish Ministry of Education and Science (ESP2006-13855-C02-02).
References Allende, C., Rebolo, R., Garcia, R., & Serra-Ricart, M. (2000). The INT search for MetalPoor stars. Spectroscopic observations and classification via Artificial Neural Networks. The Astronomical Journal, 120, 1516–1531. September. Bailer-Jones, C. A. L. (2008). A method for exploiting domain information in astrophysical parameter estimation. In Astronomical data analysis software and systems XVII. ASP conference series, London (Vol. XXX). Bailer-Jones, C. A. L. (2000). Stellar parameters from very low resolution spectra and medium band filters. Astronomy and Astrophysics, 357, 197–205. Bay, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F. & Cowan, J. (2006). Extensible Markup Language (XML). W3C Recommendation. Bendat, J. S., & Piersol, A. G. (1971). Random data: Analysis and measurement procedures. New York: Wiley-Interscience, John Wiley and Sons, Inc. Ed.. Christlieb, N., Wisotzki, L., & Graßhoff, G. (2002). Statistical methods of automatic spectral classification and their application to the Hamburg/ESO sourvey. Astronomy and Astrophysics, 391, 397–406. Cooley, J. W., & Tukey, J. (1965). An algorithm for the machine calculation of complex Fourier series. Mathematical and Computing, 19, 297301. Dafonte, C., Rodriguez, A., Arcay, B., Carricajo, I. & Manteiga, M. (2005). A comparative study of KBS, ANN and statistical clustering tecniques for unattended stellar classification. Lecture Notes in Computer Science (LNCS) (Vol. 3773, pp. 566–577). Springer Verlag. Daubechies, I. (1988). Orthogonal bases of compactly supported Wavelets. Communications on Pure and Applied Mathematics, 41, 909–996. Fiorentin, P. R., Bailer-Jones, C. A. L., Lee, Y. S., Beers, T. C., Sivarani, T., Wilhelm, R., et al. (2007). Estimation of stellar atmospheric parameters from SDSS/SEGUE spectra. Astronomy and Astrophysics, 467, 1373–1387. Gray, R. O., & Napier, M. G. (2001). The physical basis of luminosity classification in the late A-F amd early G-Type stars I. Precise spectral types for 372 stars. The Astronomical Journal, 121, 2148–2158. April. Harinder, P., Gulati, R. K., & Gupta, R. (1998). Stellar spectral classification using principal component analysis and artificial neural networks. MNRAS, 295, 312–318. Kaempf, T. A., Willemsen P. G., Bailer-Jones C. A. L., & de Boer K. S. (2005). In Tenth RVS workshop on parameterisation of RVS spectra with Artificial Neural Networks first steps, Cambridge September 2005. Kurutz, R. (1992). In B. Barbuy & A. Penzini (Eds.), The stellar population of Galaxies (p. 225). IAU Symp. No 149. Dordrecht: Kluwer. Leshno, M., & Schocken, S. (1993). Multilayer feedforward networks with nonpolynomial activation functions can approximate any function. Working Paper IS91-26. Center for Digital Economy Research, Stern School of Business. Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. In Proceedings of IEEE transactions on pattern analysis and machine intelligence (Vol. 11, no. 7). Meng, H., Wang, Z., & Llu, G. (2000). In Proceedings of ICSP2000 on performance of the Daubechies wavelet filters compared with other orthogonal transforms in random signal processing. Meyer, Y. (1989). In J. M. Combes (Ed.), Wavelets (pp. 21). Berlin: Springer Verlag. Morgan, W. W., Keenan, P. C. & Kellman, E. (1943). An atlas of stellar spectra with outline of spectral classification. Astrophysics monographs. University of Chicago Press. Ordóñez, D., Dafonte, C., Arcay, B., & Manteiga, M. (2007). A canonical integrator environment for the development of connectionist systems. Dynamics of Continuous, Discrete and Impulsive Systems, 14, 580–585. August. Recio-Blanco, A., de Laverny, P., & Plez, B. (2005). RVS-ARB-001. European Space Agency technique note. Recio-Blanco, A., Bijaoui, A., & de Laverny, P. (2002). Automated derivation of stellar atmospheric parameters and chemical abundances: The MATISSE algorithm. Royal Astronomical Society, 000, 111. Rodriguez, A., Arcay, B., Dafonte, C., Manteiga, M., & Carricajo, I. (2004). An automated knowledge-based analysis and classification of stellar spectra using fuzzy reasoning. Expert Systems with Applications, 27(2), 237–244. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536. Von Hippel, T., Allende, C., & Sneden, C. (2002). In The Garrison Festschrift conference proceedings on automated stellar spectral classification and parameterization for the masses, June 10–11.