Period analysis of variable stars: Temporal dependence and local optima

Period analysis of variable stars: Temporal dependence and local optima

Computational Statistics & Data Analysis 51 (2007) 6070 – 6083 www.elsevier.com/locate/csda Period analysis of variable stars: Temporal dependence an...

864KB Sizes 0 Downloads 63 Views

Computational Statistics & Data Analysis 51 (2007) 6070 – 6083 www.elsevier.com/locate/csda

Period analysis of variable stars: Temporal dependence and local optima Peter M. Hooper∗ Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Alta., Canada T6G 2G1 Received 11 April 2006; received in revised form 6 December 2006; accepted 6 December 2006 Available online 8 January 2007

Abstract A number of methods have been proposed to estimate the period of a variable star; e.g., a recent approach uses smoothing spline regression to fit tentative periodic functions (light curves) and selects the period minimizing a robust goodness-of-fit criterion. These methods assume that measurement errors vary independently over time. Empirical evidence, however, indicates substantial temporal dependence, possibly related to changes in observing conditions. Dependence complicates the period analysis in several respects: selection of a “best” period among several local optima, estimation of the light curve, and evaluation of uncertainty about period and light curve estimates. This article presents methods designed to accommodate dependent errors. An analysis of several data sets shows that the proposed approach can produce substantially different and arguably better results compared with other methods. © 2007 Elsevier B.V. All rights reserved. Keywords: Adaptive logistic basis regression; Golden section bootstrap; Period; Periodic function; Temporal dependence; Unequally spaced data

1. Introduction Variable stars are stars whose observed brightness changes over time. In periodic variable stars the change is periodic over time. The magnitude of brightness and the pattern of its variation allow classification of variable stars into groups of interest, such as eclipsing binary and RR Lyrae. The former group consists of binary systems (two stars orbiting each other) where brightness decreases when one star passes in front of the other. The latter group is one of several categories exhibiting periodic pulsation (contraction and expansion) of the stars and their outer layers. The primary statistical problem associated with classifying a variable star is that of estimating its period and light curve. Suppose we have a sequence of brightness measurements yi obtained at unequally spaced time points ti for i = 1, . . . , n. Given a period , define phase values ui = ti / mod 1. Consider a statistical model yi = g(ti /) + ei ,

(1)

where g is a smooth periodic function with period 1 and ei is measurement error. The statistical problem has two parts: (i) select  so that the model provides an optimal fit, and (ii) given , fit the light curve g. Several solutions have been developed. Earlier approaches employ goodness-of-fit criteria that avoid fitting g in part (i). These include: methods based on simple cosine models (Deeming, 1975; Lomb, 1976; Scargle, 1982); a ∗ Tel.: +1 780 4924239; fax: +1 780 4926826.

E-mail address: [email protected]. 0167-9473/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2006.12.006

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

Star 0306

6071

Star 0306

-11.3

Residual

Brightness

0.05

-11.5

-0.05 -11.7 0.0

0.2

0.4

0.6

Phase for period 0.8763

0.8

1.0

780

790

800 Time

810

820

Fig. 1. Plots of (a) brightness against phase and (b) residuals against time, illustrating patterns among the residuals from a light curve estimate for star 0306 with period 0.8763. Time is measured in days, “brightness” refers to negative R magnitude, and phase = time/period mod 1. Two sets of points are highlighted: ◦ for nights 776, and 777 with mainly positive residuals, and  for nights 804, 805, and 807 with mainly negative residuals. Two outliers are omitted from both plots: residual −0.56 on night 810 with phase 0.36 and residual −0.21 on night 819 with phase 0.52.

 ∗ − yi∗ )2 , where yi∗ is the brightness value associated with the ith ordered phase value u∗i method minimizing (yi+1 (Lafler and Kinman, 1965); a “string-length” method minimizing the sum of distances between points (yi∗ , u∗i ) and ∗ , u∗ ) (Dwortesky, 1983); and an ANOVA method that implicitly approximates g by a piecewise constant function (yi+1 i+1 (Stellingwerf, 1978). More recent approaches employ nonparametric smoothers to fit g in part (i). Reimann (1994) fits g using SuperSmoother (Friedman, 1984) and selects  by minimizing an absolute error criterion. Oh et al. (2004) use penalized smoothing splines for g with a robust cross-validation (RCV) criterion to estimate both  and a smoothing parameter which depends on . The RCV criterion is a computationally efficient approximation of a leave-one-out cross-validated estimate of expected Huber loss. Hall et al. (2000) present theory concerning nonparametric estimation of periodic functions. The above-mentioned methods and theory are based on an assumption that measurement errors vary independently over time. Empirical evidence casts serious doubt on this assumption. An analysis of data previously studied by Oh et al. (2004) reveals clear temporal patterns among the residuals. Before presenting an example, a few comments are in order. The data are a derived product from the project ‘Stellar astrophysics and research on exoplanets’ (Charbonneau et al., 2000). The data consist of coincident measurements of R magnitude for eight stars, with n = 351 observations for each star. The observations were made on 13 nights over a 44-night interval, with the number of observations per night varying from 7 to 42. Stellar magnitude is defined in terms of flux density as −2.5 log(flux) + constant. Brightness is related directly to flux and inversely to magnitude; i.e., brighter stars have lower magnitude. To avoid confusion over this inverse relationship, the nontechnical term “brightness” is used here to refer to negative R magnitude. Fig. 1 shows plots for star 0306, a detached type of eclipsing binary. The period and light curve were estimated using methods described in Section 2. Temporal dependence is not immediately apparent when plotting brightness against phase, but patterns emerge when highlighting points by night or plotting residuals against time. E.g., positive residuals 25 1 2 occur for 32 of 33 observations on night 776, 32 on night 777, 27 on night 804, 13 on night 805, and 32 on night 807. Similar patterns, but on different nights, are found in plots for the other seven stars. These patterns suggest the need for a model with dependent errors; e.g., yi = g(ti /) + h(ti ) + ei ,

(2)

where g is again a smooth periodic function with period 1, h is a random function, and the ei are independent errors. The function h may reflect changes in observing conditions; i.e., the extent to which incoming light is blurred by the

6072

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

Gap Size

0.5 0.3 0.1 0

1

2

3

4

5

Period Fig. 2. Line plots showing the three largest gap sizes as a function of period: largest gap, sum of two largest, and sum of three largest. Gap sizes are defined as (u∗i − u∗i−1 ) for i = 2, . . . , n, and (u∗1 − u∗n + 1), where u∗1  · · ·  u∗n are the ordered phase values determined by period  and times t1 , . . . , tn shown in Fig. 1(b).

atmosphere. The observed intensities of variable stars are typically adjusted to compensate for changing conditions by comparing the intensity of nonvarying bright stars with their previously estimated value (Reimann, 1994, p. 3). This calibration may reduce but not eliminate the effect of changing conditions. Dependent errors complicate the period analysis in several respects. Patterns related to h can distort the light curve estimate g. ˆ The extent of distortion depends on how patterns are dispersed across phase values, and hence on the period . There appears to be little distortion in Fig. 1(a), partly because observations on each night are well dispersed across phase values and also because there are no large gaps among the ordered phase values for  = 0.8763 given this particular set of observation times t1 , . . . , tn . Large gaps do occur for periods near integers and also for less obvious values; see Fig. 2. Light curve estimates must be interpolated across gaps, so large gaps are problematic even when measurement errors are independent. E.g., let gˆ denote the periodic function with period 1 displayed in Fig. 1(a) and consider scatter plots of g(t ˆ i /) against ti / mod 1 for various periods . It turns out that the major eclipse is hidden within a gap covering 0.9 to 0.1 when  = 0.9929, 1.5402, and 2.3355, the minor eclipse from 0.4 to 0.6 is hidden when  = 1.5539, 2.5359, 2.7543, and 3.6365, and both eclipses are hidden when  = 1.9907. Dependence patterns tend to be more influential when they occur near the edge of gaps. Dependence also complicates the choice of a goodness-of-fit criterion. Methods suitable for independent errors, such as leave-one-out cross-validation, can produce negatively biased estimates of risk. The degree of bias can vary with , sometimes producing a poor choice of  among local optima. Dependent errors also increase the difficulty in quantifying uncertainty about period and light curve estimates. This article presents examples illustrating these phenomena and proposes methods addressing the problems. The proposed strategy begins by applying a computationally efficient smoother to estimate curves g for a large set of  values. The goal here is to obtain a rough estimate of a risk function based on a robust goodness-of-fit criterion such as absolute error. The risk function is used to select a manageable number of candidate periods. A light curve g is then estimated for each candidate  using a curtailed backfitting algorithm; i.e., obtain an initial g, ˆ then estimate h ˆ from residuals y − g(t/), ˆ then obtain a new gˆ using adjusted brightness values y − h(t). Risk estimates based on ˆ are used to select an optimal period. A version of the golden-section bootstrap is proposed residuals y − g(t/) ˆ − h(t) for approximate pointwise confidence intervals for the light curve. ˆ and with various loss functions. The estimator gˆ This strategy can be applied with various estimators gˆ and h, considered here employs an adaptive logistic basis (ALB) regression model (Hooper, 2001a). The present work began with an investigation of whether ALB models are useful for periodic data. Theoretical considerations and empirical results showed that ALB models can represent light curves in a parsimonious manner. The focus of the investigation shifted following the discovery of residual dependence patterns, which had gone unnoticed in previous work. The chosen estimator hˆ assumes a “night effect” model; i.e., h(t) remains constant over each night. This simple model seems to provide an adequate adjustment for dependence patterns. The loss function employed in all examples is absolute error with a BIC-type penalty for overfitting. Results for Huber loss and squared error loss are also briefly reported. The new methods are developed in Section 2 and applied to data for seven stars in Section 3. The results indicate that period estimates reported in Oh et al. (2004) may be incorrect for three of the seven stars; i.e., the new methods identify alternative local optima. Visual examination of fitted curves and residuals suggests that the new period estimates

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

0.8

0.8

0.8

0.8

0.4

0.4

0.4

0.4

0.0

0.0

0.0

0.0

0.4

0.8

0.0

0.4

0.8

Phase

Phase

6073

0.0 0.0

0.4 0.8 Phase

0.0

0.4

0.8

Phase

Fig. 3. Four ALB basis functions k plotted against phase. The fitted light curve for star 0306, shown in Fig. 1(a), is a linear combination of these four curves.

provide a better fit. In Section 4 the methods are extended to estimate light curves with several periodic components. Concluding remarks are given in Section 5. Details concerning the fitting of ALB regression models are provided in an Appendix. 2. Methodology 2.1. ALB regression models ALB regression and related classification models provide flexible tools for the analysis of multi-dimensional data. Applications include covariance models characterizing variability in foetal growth (Hooper et al., 2002) and conditional probability models for functional sites in DNA (Hooper, 2001b). In this section ALB methodology is adapted to the problem of estimating a periodic function. The key idea here is to use a two-dimensional smoother to estimate a onedimensional periodic function by mapping time points ti to points xi on a circle. A smooth periodic function g(t/) with period 1 can be expressed as g(t/) = f (x) = f (cos(2t/), sin(2t/)),

(3)

where f is a smooth function on the unit circle. ALB models approximate f by a linear combination of logistic basis functions, shown here with a “reference point” parameterization (Hooper, 2001a,b): f (x) ≈ fK (x) =

K 

k k (x),

k=1 K 

k (x) = exp(k − −2 x − k 2 )

m=1

exp(m − −2 x − m 2 ).

(4)

 Note that k (x) = 1 for all x, so there is no need for a constant term in (4). The function fK is determined by scalar parameters , k , k and vectors k ∈ R2 . The model is over-parameterized, with  and one pair (k , k ) redundant, so the effective number of parameters determining fK is 4K − 3. Theoretical considerations suggest that logistic basis functions are useful for modeling light curves. Hooper (2001a) showed that each log k is a concave function, so level sets {x ∈ R2 : k (x) > c} are convex. When restricted to the unit circle and represented as functions of phase, i.e., k (cos(2u), sin(2u)) for u ∈ [0, 1], the basis functions define a flexible family of smooth periodic curves. The inherent restrictions on these curves allow a parsimonious fit for many types of light curves. Empirical results in Section 3 suggest that K 4 basis functions are often sufficient for a good fit. As an example, Fig. 3 shows plots of the K = 4 basis functions used to construct the light curve gˆ for star 0306 in Fig. 1(a). This example illustrates how implicit constraints on the shape of the basis functions are well suited for light curves with intervals of constant brightness, such as curves for detached eclipsing binaries. The ALB fit remains flat over these intervals. By comparison, methods based on local smoothing can produce more “wiggly” curves; e.g., see alternative fits for star 0306 in Oh et al. (2004, Figs. 2 and 3).

6074

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

The estimation of g for a specified period  involves minimizing a penalized risk estimate over K and the parameters ˆ i ) be the adjusted brightness, with hˆ set to zero for the initial estimate g. ˆ First consider defining fK . Let zi = yi − h(t the criterion  q  n n˜ 1 R() = |zi − fK (xi )|q , (5) n˜ − 4K + 3 n i=1

with q = 1 for absolute error, q = 2 for squared error, and n˜ given at (6) below. The multiplicative penalty used here is similar to a generalized cross-validation criterion (GCV, Craven and Wahba, 1979). Assuming an error density of the form exp{a − |e|q /b}, Hooper (2001a) showed that, for n˜ = n and (4K − 3)/n small, criterion (5) behaves like the Akaike Information Criterion (Akaike, 1973). AIC minimizes the negative log likelihood plus a penalty equaling the effective number of parameters. The Bayesian Information Criterion (Schwartz, 1979) multiplies the AIC penalty by log(n)/2. Consequently, criterion (5) with n˜ = 2n/ log(n) behaves like BIC. Oh et al. (2004) employed the Huber (1981) loss function  2 e if |e|c, lc (e) = c(2|e| − c) otherwise,

(6)

(7)

with c = 1.345 MAD. The choice of the cutoff c ensures 95% efficiency under a normal model in a location problem. AIC and BIC penalties can be derived assuming an error density of the form exp{a − lc (e)/b}. If MAD equals the median of the distribution of |e|, then a calculation yields b = 3.214(MAD)2 . AIC and BIC are thus equivalent to minimizing (1/n)

n 

lc (zi − fK (xi )) + 3.214(MAD)2 (4K − 3)/n˜

(8)

i=1

with n˜ = n for AIC and n˜ = 2n/ log(n) for BIC. When using Huber loss, one requires a single estimate of MAD to compare risks for different periods . An estimate can be obtained by fitting light curves for candidate periods  using the absolute error criterion, calculating median absolute residuals for each , then averaging these median values. It is well known that AIC tends to overfit while BIC produces more parsimonious models. Parsimony is important in the present application, to limit effects of dependent errors on the initial estimate g, ˆ so n=2n/ ˜ log(n) is used throughout this work. The parameters defining fK are optimized by stochastic approximation, as described in the Appendix. Criteria (5) and (8) are motivated by statistical models with identically distributed errors ei . In practice, this assumption may be unrealistic; e.g., the error variance may fluctuate over time. It is thus worth noting that the light curve estimator minimizing absolute error is robust with respect to both outliers and heteroscedasticity. Parameter updates during optimization depend only on the sign of a residual, not on its absolute value; see (13) with q = 1 in the Appendix. 2.2. Finding and comparing local optima This section outlines an algorithm to identify and compare a set of candidate periods and light curves. The first step is to estimate the risk over a large set of periods. It is common practice when analyzing equally spaced time series to scan over a grid of equally spaced frequencies 1/. That approach seems inefficient here, since most of the computations would be spent scanning higher frequencies where the risk function is large. Plots of estimated risk functions suggest that spacings between neighboring local minima are roughly equal when risk is plotted against log(); e.g., see Fig. 4. This suggests using a grid on the log() scale. A lower cut-off for log() is then required, and it is not clear what this should be. As a compromise, I used a sequence 1 < · · · < m with 1 = 0.0001, m ≈ 10, and i+1 = i + (0.0001) max{i , 0.1}. The log(i ) are thus equally spaced for 0.1 < i < 10, providing good coverage over this interval. The i are equally spaced for i < 0.1, providing increasingly sparse coverage on the log() scale as  → 0.

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

6075

Risk

0.8

0.4

0.2

0.5

1.0

2.0

5.0

10.0

Period Fig. 4. Smoothed estimates of absolute-error risk for star 4699, based on ALB fits with K = 12 and a highly streamlined training algorithm. Risk is plotted against period  (on a logarithmic scale) for 0.2 <  < 10. The four best local minima are highlighted, including proximate values 3.2257 and 3.2905. Light curve estimates for these two periods are shown below in Fig. 5(e,f).

Approximate estimates of absolute-error risk are obtained by fitting ALB models with fixed K = 12 and a highly streamlined training algorithm; see the Appendix. This choice of K is large enough to identify moderately complex light curves. The best 500 periods are identified and their risks are re-estimated using K = 12 and a moderately streamlined training algorithm (with 10 times as many epochs as in the initial scan). This second step reduces the variance of the risk estimates. The 500 periods are then clustered, with 1 < 2 assigned to the same cluster if 2 < (1.01)1 . The risk function is approximated by a smooth curve (robust ALB fit) within each cluster, and the result is used to identify local minima (possibly more than one within a cluster). A backfitting algorithm is then used to obtain more accurate risk estimates near the local minima. Given , first set ˆ hˆ = 0. Calculate gˆ by applying the full ALB training algorithm with response variable y − h(t); see the Appendix. This calculation includes selecting K to minimize the penalized risk; i.e., fit models with K = 1, 2, 3, . . . and stop ˆ i ) defined as the when the minimum risk (from 1 to K) remains constant for five steps. After fitting g, ˆ calculate h(t median of residuals yj − g(t ˆ j /) for all observations yj made on the same night as yi . Repeat the previous two steps, ˆ After each step, calculate a risk estimate using the residuals y − g(t/) ˆ getting new estimates gˆ and h. ˆ − h(t); i.e., R1 , R2 , R3 , and R4 . The estimate R3 () will be used to compare periods. The other risk estimates are reported to show how R decreases at each step. In the examples there is usually a large reduction from R1 to R2 , then small diminishing reductions thereafter. Additional iterations appear to be unnecessary. The estimates R3 () are calculated for periods near previously identified local minima to see if these can be improved. Risk estimates for all local minima are then compared. It is helpful to plot the data and fitted light curves for several local minima. In some cases the plots may confirm the selection of an optimal period; i.e., competing periods may exhibit irregularities associated with gaps and/or dependence patterns. In other cases the choice among local minima may be less clear. In subsequent sections the algorithm described above is referred to simply as the ALB analysis. When applying the ALB analysis with alternative loss functions for g, ˆ alternative estimators hˆ were also used: the sample mean for squared error loss and the 50% trimmed mean for Huber loss. These choices allow a comparison of highly robust, moderately robust, and nonrobust methods. A Fortran implementation of the algorithm was used to carry out the analyses described in Sections 3 and 4. The program can be obtained by writing to the author. Computation time using a 1.6 GHz CPU is about 12 min for each star. This includes about 8 min for ≈ 56 000 risk estimates in the initial scan. 2.3. Comparing harmonics In several examples discussed in Section 3, the two best local optima occur at periods 0 and 1 = 20 . The risk estimates R3 (j ) are similar and, after plotting the data and light curves, it is not clear whether the simpler 0 curve is adequate. A goodness-of-fit test is appropriate here because the null hypothesis 0 is nested in the alternative 1 ; i.e., any curve with period 0 can be doubled and expressed as a curve with period 1 . Consider an approximate likelihood ratio test (LRT) motivated by the following assumptions. Suppose 0 is fixed. Under each hypothesis, suppose that model (2) holds with g determined by an ALB function with known K = K(j ) = Kj , that h is represented by fixed

6076

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

night effects, and that errors ei are independent with of the form exp{a − |e|q /b} for given q ∈ {1, 2}. The  density −n/q q , where the eˆi are the residuals from the fitted model maximum likelihoods Lj are each proportional to { |eˆi | /n} for j . The asymptotic null distribution of the LRT statistic −2 log(L0 /L1 ) is 2 with =4(K1 −K0 ). The LRT statistic can be approximated by    n˜ − 4K0 + 3 R3 (0 ) 1/q 2n log (9) n˜ − 4K1 + 3 R3 (1 ) using the risk estimates R3 based on criterion (5). This expression is not the same as the LRT statistic because the backfitting algorithm used for R3 does not necessarily maximize the likelihood. There are problems with the assumptions underlying this test. The values 0 , K0 , and K1 are not fixed and other assumptions are fairly restrictive. The test may tend to be liberal, especially if there are strong temporal dependencies that are not captured by the simple “night effect” model for h. Still, the test has some value in quantifying uncertainty about the selected period. The BIC-type penalty in (5) typically selects 0 unless the test indicates very strong evidence against 0 . E.g., if R3 (0 ) = R3 (1 ), K0 = 2, K1 = 4 and so = 8, then expression (9) exceeds 46.68 for all n 20. Thus, if R3 (0 ) > R3 (1 ), then one can be relatively confident selecting 1 . Otherwise, if R3 (0 ) is slightly smaller than R3 (1 ), then the selection of 0 is less certain. 2.4. Confidence intervals for light curves It is difficult to obtain reliable confidence intervals for light curves without a better understanding of temporal dependence effects. Methods based on independent errors are likely to under-estimate potential error. The following is a tentative proposal for approximate pointwise 1 − confidence intervals based on the golden-section (or wild) bootstrap (Härdle and Marron, 1991). ˆ denote estimates of the combined error h(ti ) + ei . Define Suppose we have estimates ˆ and g. ˆ Let ri = yi − g(t ˆ i /) √ √ ∗ a two-point distribution for bootstrap errors: ri = wi ri , where wi = (1 − 5)/2 with probability p = (5 + 5)/10, √ and wi = (1 + 5)/2 with probability 1 − p. With these definitions, the first three moments of the bootstrap error distribution agree with the first three sample moments of {r1 , . . . , rn }. Based on the “random night effect” model for h, the wi are defined to be equal for all observations on the same night and to be independent for observations on different ˆ + r ∗ for i = 1, . . . , n, and fit the nights. Generate M independent bootstrap samples, each of the form yi∗ = g(t ˆ i /) i ∗ (u) be the estimated light light curve for each sample using the backfitting algorithm described in Section 2.2. Let gm curve for the mth bootstrap sample evaluated at a phase value u ∈ [0, 1]. Let gˆ 1 (u) and gˆ 2 (u) be the /2 and 1 − /2 ∗ (u), m = 1, . . . , M}. The intervals [gˆ (u), gˆ (u)] can be severely affected by bias in the ALB sample quantiles of {gm 1 2 estimator; e.g., the interval sometimes fails to cover the original estimate g(u). ˆ To avoid this problem, the proposed confidence intervals are centered about g: ˆ g(u) ˆ ± {gˆ 2 (u) − gˆ 1 (u)}/2.

(10)

When = 0.05 it may be helpful to think of {gˆ 2 (u) − gˆ 1 (u)}/4 as a bootstrap estimate of the standard error of g(u). ˆ This symmetric interval seems to give sensible results in the examples, but coverage properties have yet to be studied. Computation time for confidence intervals based on 1000 bootstrap samples was between 30 and 90 min per light curve, depending on the complexity of the curve. 3. Analysis of seven stars Oh et al. (2004, Table 1) used data for seven stars to compare their RCV method with seven alternative methods. The results for all eight methods were similar for six of the seven stars, in each case yielding period estimates consistent with one of the local minima obtained via the ALB analysis. For the exception (star 4699), the estimates were split between the two neighboring local minima identified in Fig. 4. This level of agreement among methods seems surprising, given the results described here. Oh et al. (2004) do not indicate the range of periods investigated for each star, so a possible explanation is that the range was restricted based on a priori considerations. Results for the ALB analysis with absolute error loss are summarized in Table 1. The summary describes the three best candidate periods (local minima) for each star. For each candidate, the table includes risk estimates R1 , . . . , R4

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

6077

Table 1 Summary of analysis of seven stars Star

Period

R1

R2

R3

R4

0306

0.4381 0.8763∗† 1.7525

0.869 0.748 0.875

0.735 0.634 0.792

0.720 0.629 0.777

0.716 0.628 0.770

0969

2.2113∗† 2.2306 4.4252

0.556 0.564 0.567

0.443 0.462 0.495

0.441 0.455 0.479

1164

1.4633∗ 2.1057 2.9433†

1.956 2.374 1.625

1.691 2.117 1.271

4699

2.3015 3.2257∗† 3.2905

1.245 1.230 1.201

4865

1.1629† 2.3256∗ 3.4880

1744

5954

K

#lm

Symm

3 4 9

2 2 4

0.912 0.998 1.000

0.440 0.451 0.474

3 3 5

1 1 2

0.000 0.000 0.880

1.661 2.109 1.239

1.648 2.104 1.234

5 6 11

2 2 3

0.900 0.814 0.888

1.084 0.952 0.968

1.087 0.950 0.967

1.082 0.948 0.964

3 4 4

1 2 2

0.000 0.972 0.850

1.344 1.275 1.383

0.999 1.158 1.271

0.970 1.142 1.257

0.968 1.120 1.254

3 8 13

1 2 4

0.000 0.952 0.660

0.6666 1.3328∗† 1.9990

0.818 0.804 0.844

0.686 0.680 0.717

0.685 0.678 0.712

0.684 0.678 0.711

2 4 2

1 2 1

0.000 0.972 0.000

0.4903† 0.9806∗ 1.4711

0.768 0.800 0.842

0.629 0.664 0.757

0.622 0.655 0.739

0.621 0.653 0.727

2 4 6

1 2 3

0.000 0.996 0.680

produced in the backfitting algorithm and the number of basis functions K used for g. ˆ Candidates agreeing closely with RCV are marked with an asterisk (∗) and those selected by ALB (minimizing R3 ) are marked with a dagger (†). Risks were calculated using rescaled brightness values and are not comparable for different stars. The table also includes two values describing the shape of the fitted light curve: the number of local minima (#lm) and a measure of symmetry (symm) defined as follows. If #lm = 1 then symm = 0. Otherwise symm = 2 min{d, 1 − d}, where d is the absolute difference between the phase values of the two smallest local minima on the light curve. We thus have 0 symm 1, with symm = 1 when the two local minima are equally spaced over time. Symmetry provides information about the orbit of an eclipsing binary. If the orbit is roughly circular (or if an axis of the orbit points toward the observer) then symm ≈ 1. If the orbit is strongly eccentric, then symm can be substantially less than 1. Hoffmeister et al. (1985, p. 206) give an example (DI Her) with symm ≈ 0.23. Figs. 5 and 6 show plots of data, fitted light curves, and 95% confidence intervals based on M = 1000 bootstrap samples. Note that, with observations on 13 nights, there are a total of 213 = 8192 possible bootstrap samples. The light curve estimates and confidence bounds were evaluated over a grid of 1000 phase values. Individual stars are discussed below. Star 0306: The light curve for period 0.8763 is shown in Fig. 1(a). The effect of dependence patterns is relatively benign here and confidence intervals (not shown) are fairly tight compared with other stars. Star 0969: The light curve for period 2.2113 is shown in Fig. 5(a). The confidence intervals are relatively wide, reflecting substantial dependence patterns. The plot for period 4.4252 in Fig. 5(b) shows how gaps can affect the fitted light curve and confidence intervals. Star 1164: RCV is consistent with 1.4633, butALB indicates a much better fit for 2.9433. Plots are shown in Fig. 5(c,d). Three sets of points are highlighted. Night 783 (◦) indicates lack of fit in plot Fig. 5(c). The points begin at phase 0.8; residuals are near zero up to phase 1.0 and then decrease rapidly between 0.0 and 0.1. This pattern is accommodated in the first dip in Fig. 5(d). The points for night 804 () are consistent with dips in both plots, but with large positive residuals. Points for night 805 (+) occur at the bump near phase = 0.4 in Fig. 5(d). This bump appears to represent

6078

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

Star 0969

Star 0969 -12.18

Brightness

Brightness

-12.18

-12.24

-12.24

-12.30

-12.30 0.0

0.2

0.4

0.6

0.8

0.0

1.0

0.2

Phase for period 2.2113

Brightness

Brightness

0.8

1.0

0.8

1.0

0.8

1.0

0.8

1.0

-12.2

-12.4

-12.6

-12.4

-12.6

-12.8

-12.8 0.0

0.2

0.4

0.6

0.8

0.0

1.0

0.2

0.4

0.6

Phase for period 1.4633

Phase for period 2.9433

Star 4699

Star 4699 -13.40 Brightness

-13.40 Brightness

0.6

Star 1164

Star 1164 -12.2

-13.50

-13.50

-13.60

-13.60 0.0

0.2

0.4

0.6

0.8

0.0

1.0

0.2

0.4

0.6

Phase for period 3.2905

Phase for period 3.2257 Star 4865

Star 4865

-13.3

Brightness

Brightness

0.4

Phase for period 4.4252

-13.5

-13.3

-13.5

-13.7

-13.7 0.0

0.2

0.4

0.6

Phase for period 1.1629

0.8

1.0

0.0

0.2

0.4

0.6

Phase for period 2.3256

Fig. 5. Light curve estimates for four stars: 0969 (a,b), 1164 (c,d), 4699 (e,f), and 4865 (g,h).

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

Star 1744

Star 1744 -12.43

Brightness

Brightness

-12.43

-12.45

-12.47

-12.45

-12.47 0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

Phase for period 0.6666

0.4

0.6

0.8

1.0

0.8

1.0

0.8

1.0

Phase for period 1.3328 Star 5954

Star 5954

-13.60 Brightness

Brightness

-13.60

-13.70

-13.70

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

Phase for period 0.4903

810

810 Time

820

800

800

790

790

780

780 0.2

0.4

0.6

Phase for period 1.3328

0.6

Time versus Phase

820

0.0

0.4

Phase for period 0.9806

Time versus Phase

Time

6079

0.8

1.0

0.0

0.2

0.4

0.6

Phase for period 0.9806

Fig. 6. Light curve estimates for stars 1744 (a,b) and 5954 (c,d). Plots of time versus phase for periods 1.3328 (e), and 0.9806 (f).

a dependence pattern rather than the actual light curve. The gaps on both sides of the bump make it difficult to avoid overfitting gˆ here. The curve in Fig. 5(d) suggests a detached eclipsing binary with a moderately eccentric orbit. Star 4699: ALB selects period 3.2257, with 3.2905 a close second; see Fig. 5(e,f). Plot Fig. 5(e) indicates a good fit consistent with a detached eclipsing binary. Plot Fig. 5(f) reveals a questionable fit in the interval 0.2–0.4 involving the points for night 804 (◦) and the gap to the right of these points. In Oh et al. (2004), RCV agrees closely with 3.2257 but six of the other methods are closer to 3.2905. The unadjusted ALB risk R1 is also minimized at 3.2905. Star 4865: ALB selects 1.1629, RCV is close to 2.3256, and the unadjusted risk R1 is minimized at 2.3256; see Fig. 5(g,h). In Fig. 5(g) the points for night 787 (◦) follow the fitted curve but with large positive residuals. This dependence pattern is reflected in the wider confidence intervals between phase 0.5 and 0.6. In plot Fig. 5(h) the ALB

6080

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

curve seems to ignore night 787 but the wide confidence intervals across the gap reflect a high level of uncertainty. Points for night 782 () have a noticeable pattern, with residuals that are large and positive at the start of the night (phase 0.1 in both plots) but decrease to around zero by the middle of the night. Star 1744: ALB selects 1.3328, with the (near) harmonic 0.6666 a close second; see Fig. 6(a,b). The evidence supporting the larger period is relatively strong here. The test statistic (9) equals 57.8, with P -value < 0.00001 based on the 28 distribution. It is also helpful to examine how phase values for the larger period vary with time; see Fig. 6(e). Observations associated with the two minima in the light curve occur on different nights, but these are well dispersed over time. This dispersion makes it less likely that the difference between the two minimum brightness values is due to temporal dependence. Star 5954: The analysis here is similar to that in the preceding example, but with a different conclusion. ALB selects 0.4903 while RCV is consistent with 0.9806; see Fig. 6(c,d). The test statistic (9) equals 14.3, producing a P-value of 0.074, so there is little evidence that the larger period provides a better fit. The plot of time versus phase in Fig. 6(f) supports this conclusion. Observations associated with the two light curve minima occur over substantially different time intervals, so the difference between minimum brightness values may well reflect changes in observing conditions. This inference is reinforced when one notes the difference between the two local maximum values, which should be the same for an eclipsing binary. It is of course possible that  = 0.4903 yields a correct light curve and that 5954 is an eclipsing binary with circular orbit and eclipses of equal brightness. Results for squared error and Huber loss are similar to those for absolute error loss. For star 0306, both criteria again select period 0.8763. The two outliers (described in the caption for Fig. 1) have greater effect on squared error loss and the resulting choice between harmonics is not as clear cut: R3 (0.8763) = 0.158 and R3 (0.4382) = 0.159. For star 1744 both criteria select the harmonic 0.6666, contrary to the previous selection, but the results are inconclusive. Huber loss has R3 (0.6666) = 1.807 and R3 (1.3327) = 1.845. Squared error loss has R3 (0.6666) = 0.292 and R3 (1.3325) = 0.298. The test statistic (9) for q =2 yields 43.6, with P -value < 0.00001 based on the 28 distribution. For star 5954, both results agree with those using absolute error. The risks for squared error loss are R3 (0.4903) = 0.270 and R3 (0.9805) = 0.298. The test statistic equals 16.0, with a P-value of 0.04.

4. Multiple periodicity This section briefly describes a method for dealing with multiple periodicity. Suppose the light curve is a sum of periodic functions:

yi =

M 

gm (ti /m ) + h(ti ) + ei ,

m=1

where each gm has period 1. Oh et al. (2004) suggest a backfitting algorithm to estimate the periods m and component curves gm . A similar approach can be followed by iteratively applying the strategy of Section 2.2 within each backfitting step. For simplicity, suppose M = 2. First apply the strategy using data {(ti , yi ), i = 1, . . . , n} to get an initial estimate (ˆ 1 , gˆ 1 ). Then iterate between two steps, applying the strategy to the residuals from the fit obtained in the previous step; i.e., (i) calculate (ˆ 2 , gˆ 2 ) using (ti , yi − gˆ 1 (ti /ˆ 1 )) and (ii) calculate (ˆ 1 , gˆ 1 ) using (ti , yi − gˆ 2 (ti /ˆ 2 )). Stop when the estimates have converged. This algorithm was applied to data for a Delta Scuti-type star (0543) previously analyzed by Oh et al. (2004). The ALB results are almost identical to theirs, with period estimates differing by only 0.00001. The algorithm converged immediately; i.e., the initial step gave ˆ 1 = 0.03904, step (i) gave ˆ 2 = 0.04805, and step (ii) reproduced ˆ 1 = 0.03904. The R3 criterion provided a clear winner among local optima in each step. The component functions gˆ 1 and gˆ 2 are shown in Fig. 7, with 95% golden-section bootstrap confidence intervals. Both curves are determined by K = 2 basis functions. Note that gˆ 1 varies around the average brightness level while gˆ 2 varies around zero. The vertical axis in plot (a) represents the residuals yi − gˆ 2 (ti /ˆ 2 ), while that in (b) represents yi − gˆ 1 (ti /ˆ 1 ). The data reveal patterns of temporal dependence similar to those seen in previous examples; e.g., the points for night 789 (◦) have large negative residuals in both plots.

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

a

b

First component

6081

Second component

Residual

Residual

-11.80

-11.83

0.00

-0.02

0.0

0.2

0.4

0.6

0.8

Phase for period 0.03904

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Phase for period 0.04805

Fig. 7. Components gˆ 1 and gˆ 2 of a light curve estimate for a star with multiple periodicity.

5. Concluding remarks This work introduces new methods for estimating the period and light curve of a variable star. The methods were implemented using ALB regression models, a choice supported by theoretical considerations and empirical results. Inherent restrictions on the basis functions allow a parsimonious fit for many types of light curves, with K 4 obtained in all but one of the examples considered here. At the start of this research, it appeared that the stochastic approximation algorithm used to fit ALB would be too inefficient to search over a large set of periods. This problem was overcome by developing a streamlined version of the training algorithm that is used for an initial estimate of the risk function. The algorithm has two desirable properties: computation time increases slowly with n and is about the same for absolute error, squared error, and Huber loss. The data analysis in Section 3 compares the performance of the ALB approach with results reported by Oh et al. (2004) for several alternative methods. It was noted in Section 2.1 that local smoothers tend to produce rougher fits compared with ALB. In my view, this aspect of the comparison is relatively unimportant. Given knowledge of the true period , there are many smoothing methods that will produce acceptable results. A more important and more difficult problem is that of selecting the correct period  among feasible alternatives. The main conclusion drawn from Section 3 is that failure to recognize and account for temporal dependence can result in the selection of a substantially incorrect period (and light curve). The key idea presented in this work is to model temporal dependence as a random “night effect”. Related methods include robust loss functions to compare goodness of fit, a BIC-type penalty to control overfitting, an approximate LRT to compare harmonics, and bootstrap confidence intervals for light curves. These ideas are not intrinsically connected with ALB and can likely be applied using other regression techniques, such as spline and kernel smoothers. There is considerable scope for further work. One subject that remains to be discussed is confidence regions for the period. The approach of Section 2.4 can be modified to construct “local” bootstrap confidence intervals for ; i.e., ˆ calculate bootstrap samples as before and, for each sample, find the assuming that  is in the vicinity of an estimate , ˆ A more challenging problem is that of evaluating uncertainty about different local local minimum of R3 () nearest . minima of the risk function. Attempts to move from “local” to “global” bootstrap confidence regions (not necessarily intervals) would appear to fail on two grounds. First, it does not make sense to generate bootstrap samples using ˆ if ˆ Second, the computational cost of bootstrapping we suspect that the true  is near another local minimum far from . the search for a global optimum would be severe. There is still the option of identifying several candidate periods, as in Table 1, and constructing a local bootstrap confidence interval around each candidate. That would leave the problem of assessing the relative likelihood of candidates. Plots of the data and light curves can be informative in this regard, as seen in the examples, but conclusions based on plots are subjective. The LRT provides a more objective approach for comparing harmonics. More work is

6082

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

needed on the sampling distribution of this test. One option is to calculate a P-value based on a bootstrap distribution rather than the chi-squared distribution. The methods considered here ignore expert knowledge about known categories of variable stars; e.g., relationships among period, brightness, number of local minima, and other characteristics of shape. Period and light curve estimates are important for classifying stars, but the converse may be true as well. Better period estimates might be obtained by combining estimation with classification. A referee suggests a possible extension of the ALB methodology to analyze stars, such as red giants, with nearly periodic light curves. Consider a model yi = g(ti / + a(ti )) + h(ti ) + ei ,

(11)

where a(t) represents a phase shift that varies slowly over time. It may be possible to develop a backfitting algorithm to estimate the phase shift along with the period, periodic curve, and night effect. Acknowledgments I wish to thank Hee-Seok Oh for an introduction to variable stars and for providing the data. Thanks also go to N.G.N. Prasad for suggesting the golden-section bootstrap. I am grateful to several referees for constructive suggestions. This work was partially supported by the Natural Sciences and Engineering Research Council of Canada. Appendix A. Estimation of ALB models Criterion (5) was minimized for fixed  and K using a stochastic gradient algorithm similar to that of Hooper (2001a) but with several modifications. Parameters are initialized as follows:  = 2/K, k = (cos(2k/K), sin(2k/K)) , k = 0, and k = the average of those zi with xi nearest k . Parameters are then repeatedly perturbed as observations (xi , zi ) are randomly selected in a series of “epochs”. In each epoch the entire data set is selected in random order, with independent random permutations for different epochs. Let M be the total number of selections, so M/n is the number of epochs. At the mth selection (x, z), parameters are updated as follows: k ← k + am bk , k ← k + (am /2)bk {k − fK (x)}, k ← k + am bk {k − fK (x)}(x − k ),

(12)

bk = |z − fK (x)|q−1 sign(z − fK (x))k (x).

(13)

where

The parameter  remains fixed. Note that reference points k are not restricted to the unit circle. The same algorithm was used for Huber risk (8), but with (13) replaced by bk = min{c, |z − fK (x)|} sign(z − fK (x))k (x).

(14)

The “gains” am are positive values controlling the magnitude of the perturbations. Two types of gain functions are employed in the algorithm described in Section 2.2. At the initial stage, to obtain rough estimates of the risk for a large number of  values (≈ 56 000), a highly streamlined version of the training algorithm is used, with M ≈ 3000 (rounded up so that M/n is an integer). The gain function has two parts: a burn-in phase, where the gains decrease linearly, then an “estimation” phase, where the gains are held constant. In the burn-in phase, the parameters typically move into a configuration in the vicinity of a (local) optimum. In the estimation phase, the parameters vary with small perturbations, remaining near the optimum. The risk is estimated by averaging over the estimation phase. In the latter stage of the algorithm, to obtain accurate estimates of risk and light curve for a small number of  values, a full version of the training algorithm is used, with M ≈ 300 000. Here the gains have the same form as those in Hooper (2001a), decreasing like 1/m for the first half of the iterations, then linearly to zero over the second half. The shape of the gain function is guided by theory on the convergence of stochastic approximation (Benveniste et al., 1990).

P.M. Hooper / Computational Statistics & Data Analysis 51 (2007) 6070 – 6083

6083

References Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Cáski, F. (Eds.), Second International Symposium on Information Theory. Akademiai Kaidó, Budapest, pp. 267–281. Benveniste, A., Métivier, M., Priouret, P., 1990. Adaptive Algorithms and Stochastic Approximations. Springer, New York, pp. 45–47. Charbonneau, D., Brown, T.M., Latham, D.W., Mayor, M., 2000. Detection of planetary transits across a sun-like star. Astrophys. J. 529, L45–L48. Craven, P., Wahba, H., 1979. Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 317–403. Deeming, T.J., 1975. Fourier analysis with unequally-spaced data. Astrophys. Space Sci. 36, 137–158. Dwortesky, M.M., 1983. A period-finding method for sparse randomly spaced observations or “How long is a piece of string?”. Monthly Notices Roy. Astronom. Soc. 203, 917–924. Friedman, J.H., 1984. A variable span smoother. Technical Report 5. Laboratory for Computational Statistics, Department of Statistics, Stanford University, Stanford. Hall, P., Reimann, J., Rice, J., 2000. Nonparametric estimation of a periodic function. Biometrika 87, 545–557. Härdle, W., Marron, J.S., 1991. Bootstrap simultaneous error bars for nonparametric regression. Ann. Statist. 19, 778–796. Hoffmeister, C., Richter, G., Wenzel, W., 1985. Variable Stars. Springer, Berlin. Hooper, P.M., 2001a. Flexible regression modeling with adaptive logistic basis functions (with discussion). Canad. J. Statist. 29, 343–378. Hooper, P.M., 2001b. Reference point logistic regression and the identification of DNA functional sites. J. Classification 18, 81–107. Hooper, P.M., Mayes, D.C., Demianczuk, N.N., 2002. A model for foetal growth and diagnosis of intrauterine growth restriction. Statist. Med. 21, 95–112. Huber, P.J., 1981. Robust Statistics. Wiley, New York. Lafler, J., Kinman, T.D., 1965. An RR Lyrae survey with the Lick 20-inch astrograph, II: the calculation of RR Lyrae periods by electronic computer. Astrophys. J. (Suppl. Ser.) 11, 216–222. Lomb, N.R., 1976. Least-squares frequency analysis of unequally spaced data. Astrophys. Space Sci. 39, 447–462. Oh, H.S., Nychka, D., Brown, T., Charbonneau, P., 2004. Period analysis of variable stars by robust smoothing. Appl. Statist. 53, 15–30. Reimann, J.D., 1994. Frequency estimation using unequally-spaced astronomical data. Ph.D. Dissertation. Department of Statistics, University of California at Berkeley, Berkeley. Scargle, J.D., 1982. Studies in astronomical time series analysis, II: statistical aspects of spectral analysis of unevenly spaced data. Astrophys. J. 263, 835–853. Schwartz, G., 1979. Estimating the dimension of a model. Ann Statist. 6, 461–464. Stellingwerf, R.F., 1978. Period determination using phase dispersion minimization. Astrophys. J. 224, 953–960.