Electoral Studies 28 (2009) 354–367
Contents lists available at ScienceDirect
Electoral Studies journal homepage: www.elsevier.com/locate/electstud
On filtering longitudinal public opinion data: Issues in identification and representation of true change Mark Alexander Pickup a, *, Christopher Wlezien b a b
University of Oxford, United Kingdom Temple University, United States
a b s t r a c t Keywords: Kalman filter State space modelling Time series Dynamics Public opinion
The Kalman filter is a popular tool in engineering and economics. It is becoming popular in political science, touted for its abilities to reduce measurement error and produce more precise estimates of true public opinion. Its application to survey measures of public opinion varies in important ways compared to the traditionally understood Kalman filter. It makes a priori assumptions about the variance of the sampling error that would not usually be made and does so in a way that violates an important property of the Kalman filter. Consequently, the behavior of the filter modified for public opinion measures is less well-known. Through simulations we assess whether and to what extent filtering: reliably detects the characteristics of time series; does so across series with different rates of autoregressive decay; and does so when the variance of the sampling error is unknown. We also examine whether the filtered data represents the level of true underlying variance and the extent to which filtering assists or hinders our ability to detect exogenous shocks. We learn a numbers of things. Most importantly, taking into account sampling error variance when filtering data can work well, though its performance does vary. First, filtering works best identifying time series characteristics when assuming a stationary process, even if the underlying process contains a unit root. Second, the performance of filtering drops off when we incorrectly specify the variance of the sampling error, and especially when we overestimate it. Third, when estimating exogenous shocks it is better to make no a priori assumptions regarding a measurement error variance unless we are absolutely certain we know what it is. In fact, applying the filter without specifying the measurement error variance is more often than not the best choice. Ó 2009 Elsevier Ltd. All rights reserved.
1. Introduction Public opinion time series data represent a combination of true values and error. There are different sources of error. Observations can reflect what we call ‘‘design’’ effects, borrowing from the polling literature. These represent errors introduced by our measurement approach. In polling, for instance, clustering, stratifying, and the like can introduce error into poll results (Groves, 1989). When studying pre-election polls, the main source of design effects surrounds the polling universe. It is not easy to * Corresponding author. E-mail address:
[email protected] (M.A. Pickup). 0261-3794/$ – see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.electstud.2009.05.014
determine in advance who will vote on Election Day, and all one can do is estimate the voting population.1 Whether and
1 Pollsters attempt to ascertain each respondent’s likelihood of voting based on such factors as interest in the election, self-reports of frequency of voting in the past, and the like. With this information about the voting likelihood of individual respondents, pollsters use two different approaches to obtaining likely voter samples. By the probability approach, pollsters weight individual respondents by their estimated probability of voting. Thus, the result of a poll is the sum of weighted votes from the survey. The cut-off approach is simpler. Instead of assigning gradations of voting likelihood, pollsters simply divide their registered respondents into two groupsdthe likely and unlikely voters. The result of the poll is the preference reported for the set of likely voters. Also see Wlezien and Erikson (2007).
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
how an organization constructs a likely voter sample can have meaningful consequences for the results, that is, to the extent polls overestimate participation of one segment of the population and underestimate another (see Traugott, 2001). Another important source of error is from sampling. We rarely have data for populations taken as a whole. All polls obviously contain some degree of sampling error. This clearly impacts all research on public opinion, and electoral and voting behavior research that relies on samples of the public. It also is an issue in other areas of political science (social sciences more broadly) that uses data that are perhaps less obviously based on samples. Well-known economic measures such as inflation and unemployment contain such error, though a smaller amount than for the typical opinion poll. Thus, changes are observed from period to period even when the true underlying value is constant and unchanging. This is well-known. The problem is that one cannot literally separate sampling error from reported preferences. Sampling error is random, after all. If, however, we could determine the dynamics of the true underlying values of public opinion and the variance of the measurement error, it is possible to at least partially address the consequences of measurement error. This would, on average, provide better estimates of our variable of interest. Unfortunately, determining these parameters is often not easy to do. Some scholarsdmost notably Green et al. (1999)dhave advocated a variant of the Kalman filter that takes into account expected sampling error variance. Other scholars have adopted this approach in their estimates of survey house effects (Jackman, 2005; Pickup and Johnston, 2008).2 In so doing, they provide estimates of real underlying movement. Of course, because sampling error is random, any general filtering method will necessarily misrepresent the history of the true, underlying series. That is, it will subtract out more than it should at some points in time and less than it should at others. Although this is well-known, little is known about the extent to which it is true for the methods commonly utilised by political scientists. How much does filtering distort? We are interested in whether we can reliably assess the true characteristics of time series. Does filtering allow us to detect whether a series is stationary, where all shocks to the series decay, or integrated, where shocks last? Does it do so across stationary
2 ‘‘House’’ effects represent the consequences of survey organizations (or ‘‘houses’’) employing different methodologies, including design itself. Indeed, much of the observed difference across organizations may reflect underlying differences in design, for instance, when survey houses weight using different distributions of party identification (Wlezien and Erikson, 2001). Results can differ across these houses for other reasons, including data collection mode, interviewer training, procedures for coping with refusals, and the like (see Converse and Traugott, 1986; Lau, 1994; also see Crespi, 1988). Whatever the source, time series results may bounce around from period to period simply because results reported for different periods are generated by different organizations. This clearly complicates the assessment of change over time. A good amount of work has been done (and is underway) in estimating and correcting for the differences across organizations, thus better enabling us to combine data from different sources (Erikson and Wlezien, 1999; Jackman, 2005; Wlezien and Erikson, 2007; Pickup and Johnston, 2008).
355
series with different rates of autoregressive decay? We also are interested in assessing shocks that we associate with specific events. With filtering can we better identify the existence and nature of shocks? These questions are motivated by a basic suspicion, namely, that filtering can underestimate the extent of true movement in a series. This may be true even where we have full information about the sampling error variance. It is of special concern where we do not have good information, and particularly where we assume too much. Without accurate information about the error variance, the filter may purge real short-term movement in our underlying series, thereby understating the true change. This is a real problem because it is easy to overestimate the amount of error variance in polls in the modern day, as few survey organizations report results of random samples. Post-stratification is commonly used to weight sample characteristics to population values. To the extent post-stratification variables are related to our variable of interest, such as vote preference, weighting will reduce the observed error variance (Little, 1993).3 Consider preelection trial-heat polls in the US, every one of which seemingly is weighted to reflect some projected Election Day population (Wlezien and Erikson, 2001). It is difficult to determine the voting population well in advance, and so the vector of weights will always be imperfect, but weighting by variables that predict vote preference on Election Day will tend to reduce observed sampling error variance over time.4 It is true even if one gets the particular weights wrong, so long as the weights remain fairly constant over the course of the campaign, e.g., overweighting high income voters may lead one to consistently overestimate support for a Republican presidential candidate in the US, for example, but it nevertheless will reduce the bouncing around of reported results from poll to poll.5 Recent election polling in the US confirms this suspicion. For this expository analysis we use the results of tracking polls conducted by survey organizations at the end of the 2004 presidential election campaign, specifically, ABC, TIPP,6 Reuters/Zogby, and Democracy Corps (conducted Greenberg, Quinlan and Rosner Research). No other organizations conducted tracking polls during October of 2004.7 In the tracking poll data the results are moving averages, where respondents interviewed today are included in poll results reported tomorrow and usually the day after and sometimes the day after that. We do not want
3
Such weights can increase the error variance for unrelated measures. This is true to the extent the variables are exogenous to the campaign. 5 There are other practices - besides weighting - that can reduce error variance, e.g., survey organisations may adjust its bottom-line results based on what other organisations report or on what its own previous surveys have suggested. There are also design effects that may increase error variance, as will weighting for nonresponse. 6 Technometrica Institute of Policy and Politics. 7 See PollingReport.com. It is worth noting that the Washington Post also reported tracking poll results based on data collected together with ABCdthe reported results differ (and only a little) because of slight differences in weighting. 4
356
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
Table 1 An expository analysis of variancedtrial-heat polls during the final month of the 2004 US Presidential election campaign. Survey House
Number Mean Observed Estimated Implied of polls number of variance error true respondents variance variance
ABC 8 Democracy Corps 7 Reuters/Zogby 10 TIPP 6
1256 1030 1200 1284
0.90 0.32 0.92 1.09
1.99 2.43 2.08 1.95
1.09 2.11 1.16 0.86
to count the same respondents on multiple days, and it is very easy to remove this overlap, e.g., for survey house reporting 3-day moving averages, we only use poll results for every third day. We then calculate the Bush share of the two-party vote intention, ignoring all other candidates, e.g., Nader. To be absolutely clear, the number on a particular day is the Bush share of all respondents reporting a Bush or Kerry preference in the poll. To illustrate the effects of weighting, Table 1 compares the variance in poll results that we observe with the amount that we would expect based on sampling error alone. Assuming random sampling (or its equivalent) by pollsters, the latter is easy to compute (see Heise, 1969)dfor each daily poll the estimated error variance is p(1p)/N, where p ¼ the proportion voting for Bush. We start with the observed variance in the polls for the yeardthe mean squared deviation of poll results from the observed mean. Then we estimate the measurement error variance using the above equation. Finally we calculate the true variance by simply subtracting the estimated error variance from the observed variance.8 The surplus (observed minus error) variance equals the implied true variance of presidential preferences. Table 1 displays the numbers. In the table we can see that the daily observed variance is very stable, about 0.8 on average, and ranges between 0.32 and 1.09 depending on the survey house. We also can see that the estimated error variance is much larger than what we observe, over 2.1 percentage points on average. The implied true variance thus is negative, and substantially so. Indeed, we observe approximately only 40% of the variance that we would expect based on random sampling alone. Of course preferences may have changed during this period, and surely did at least to some extent. There also is other survey error, as discussed above. The problem is that we do not know how much and cannot tell from this analysis because we do not know how much of the variance is due to error. Most importantly for our research in this paper, assuming random sampling greatly exaggerates the amount of error variance that we observe in practice. We are interested in how poor estimation of measurement error affects the mechanics of filtering. Even where we do have full information about sampling error, we want to know how reliable filtering is. We will be examining the variant of the Kalman filter typically applied to political science public opinion data (Beck, 1989; 1991 Green
8 The measurement errors are assumed to be independent of the true movements.
et al., 1999; Jackman, 2005; and Pickup and Johnston, 2008). The traditional Kalman filter produces the optimal prediction of a time series at time t, only considering the observations previous to time t and produces the optimal filtered values considering information previous to and including time t. However, the filter that has become popular in political science specifies the variance of the measurement error a priori. The way in which this is done means that the filter used is not strictly speaking a Kalman filter, as described in the next section. Consequently, the properties of the traditional Kalman filter, including optimality, cannot be assumed for the variant that is actually applied to political science survey data. This is particularly true when the specified measurement error is incorrect. To examine the consequences of this filtering approach, we use a series of basic Monte Carlo simulations. We design a number of series that we think represent the types of processes that scholars of elections and public opinion often encounter and assess the performance of the filter. We focus on three main things: (1) the identification of the characteristics of the time series, (2) the estimation of the true variance of innovations to the series, and (3) the estimation of the effects of particular events. Let us see what the analyses reveal. 2. Sampling error and the Kalman filter Kalman filtering was developed in 1960 by engineer Robert Kalman to improve the separation of signal from noise in missile guidance systems. Since then, the filter has been applied to many areas: radar tracking, inertial navigation systems, weather forecasting, and autopilot systems are just a few examples. More recently, it has captured the attention of economists and political scientists. In political science, it has become particularly touted for its ability to filter sampling error from public opinion survey data. Donald Green, Alan Gerber, and Suzanna De Boef have even produced an easy-to-use Web-based program (Samplemiser) that applies the Kalman filter to longitudinal public opinion data. In order to apply the Kalman filter, a series is modelled as the combination of signal and noise. The signal is produced by levels and changes in the true underlying process and the noise is produced by measurement error. To do this, state space modelling is utilised. For example, a longitudinal measure of public opinion yt can be represented as:
y t ¼ mt þ 3 t
3t wNID 0; s23
mt ¼ rmt1 þ xt xt wNID 0; s2x
This is a linear Gaussian state space model. The first line is called the observation or measurement equation. It represents the observed measure of public opinion at time t as an unobserved state mt and a disturbance 3t. The unobserved state is interpreted as the actual value of public opinion at time t. The disturbance of the observation equation (3t) is the measurement error. The second line is the state equation. It models the dynamics of true public
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
opinion over time. In this example, it is modelled as a first order autoregressive process. It can also be modelled as a random walk or any other of an infinite number of more complex dynamic processes. The level disturbance (xt) and the measurement error (3t) are all assumed to be serially and mutually independent and normally distributed with zero mean and variances s23 and s2x respectively (Commandeur and Koopman, 2007). The Kalman filter is a recursive algorithm that incorporates the parameters from the state space model (in the current example: s23 , s2x and r) to produce the optimal prediction of mt only using observations prior to time t {y1, y2, ., yt1}. The discrepancy between this prediction and the observed data at t is called the prediction error. The prediction errors are of key importance to the Kalman filter, in that the parameters of the state space model are estimated such that they minimise these errors and their variance. This is different than classical regression in which estimation is done by minimising the observation errors rather than the prediction errors. Once the parameters of the state space model of public opinion are estimated, they can be used to produce the optimal values of the state at time t using observations {y1, y2, ., yt}. These are called the ‘‘filter states.’’ The algorithm can also be used to produce optimal values of the state at time t using all available observations {y1, y2, ., yn}. These are called the ‘‘smoothed states.’’ They are the best estimates of mt – true public opinion – given the data. When the Kalman filter has been applied to public opinion survey data in the political sciences, the variance of the measurement error (s23 ) is specified as a function of the number of individuals polled at time t and the smoothed value of public opinion at time t1.9 For example, when the measures of public opinion are proportions and come from surveys of N respondents, s23 is specified to be mt (1mt)/N. With additional information of this sort it is possible to allow s23 to vary over time (as a function of N and mt). This is the basis of the Kalman filter as it is applied by Beck (1989), Green et al. (1999), Jackman (2005) and Pickup and Johnston (2007). Specifying the measurement error variance in this way produces a filter that is not strictly speaking the Kalman filter. As a consequence, it loses some of the properties of the Kalman filter, including the guarantee of optimality (see Appendix for explanation). This is a minor concern compared to the potentially greater problem of misspecifying the variance of the measurement error, which may have serious consequences. The assumption that measures of public opinion come from true probability samples is problematic. In pre-election polling research, for example, this is rarely the case any more. Where random sampling was the norm 30 years ago, now survey organizations rely on likely voter screens and weighting procedures. Unfortunately, we know only a little about the specific practices of the different organizations or their consequences. It thus is difficult to determine sampling error across survey organizations. How much error on average do we expect for different organizations?
9 Sometimes the observed values are used but this is technically incorrect.
357
We just do not know. We also do not know how much error to expect over time, as practices can change and some practices induce more random error than others (consider Erikson et al., 2004). The absence of good information about sampling error complicates the properties of the Kalman filter as it is applied in political science. If we guess and err we introduce additional error. It is well-known that underestimating the measurement error variance (in the extreme, ignoring it) can result in negatively correlated prediction errors. This fact is sometimes, rightly, used to recommend the use of filtering (Beck, 1989). It is equally true that overestimating the measurement error variance will result in positively correlated prediction errors. Either way, the correlation of prediction errors means that the Kalman filter does not result in optimal estimates of mt – the reason for this is outlined in the Appendix. Intuitively, underestimating is the lesser concern. If it is too high, and we overstate the sampling error variance, the filter will tend to overcorrect, and the estimated model will be ‘‘under adaptive,’’ mistaking a proportion of the true innovations for measurement error. This can also be expected to produce problems when estimating the presence of external shocks to public opinion. Mistaking a proportion of the true innovations for measurement error will increase the standard errors of the estimated parameters while simultaneously underestimating those standard errors. (For more detail, see the Appendix.) In this circumstance, filtering will not only distort history; it may also misidentify the general characteristics of the true process, such as the degree of true variance and the magnitude and statistical significance of external shocks. The problem, of course, is knowing whether and to what extent our guess is too low or too high. Of course, we can simply assume no prior information about sampling error. If nothing is known about s23 it can be estimated, as long as it can be assumed it does not vary over time – although this assumption can be relaxed, as we discuss in the final discussion of our analysis. This is the traditional Kalman filter. The filtered series still will misrepresent the true history, by definition and according to the proportion of sampling error, though they should nicely reflect the general time series characteristics of the underlying process and also the true variance, at least given sufficient data. In much political science researchdindeed, much social science researchdwe expect neither purely stationary nor nonstationary processes but rather a combination of the two (Wlezien, 2000). In these ‘‘combined’’ processes some shocks persist indefinitely and other decay. It may be that individuals react in different ways to the same shocks, where shocks decay immediately for half of the population and cumulate fully for the other half. Alternatively, it may be that people react identically to shocks but the effect of a half of each shock decays immediately and the other half lasts indefinitely. Regardless, in the long-run such series will behave as a unit root process (Granger, 1980). In addition to assessing the performance of filtering with stationary and traditional unit root time series, we are interested in how it works with such combined series, particularly without good information about the sampling
358
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
error variance. We expect the filter to better detect the underlying unit root in such combined series as the stationary portion decreases. Recently, it has become popular to apply Bayesian methods to the estimation of the state space model parameters. Traditionally the state space equation would be estimated using maximum likelihood. Alternatively, the unobserved states (mt) of the state space equation can be simulated through Markov Chain Monte Carlo (MCMC). This is a more direct way of producing the smoothed states (true public opinion) but does not directly produce the filtered states, although these can always be estimated after the fact. The approach also differs from the traditional Kalman filter in its properties. (Again, though, the most pressing concern is the consequences of misspecifying the measurement error variance.) Due to its increasing popularity, it is this variant of the Kalman filter that we focus on in our analysis. We have spelled out our general expectations. Now let us provide greater specifics.
Table 2 The different time series. The Component Series 1. AR(1) with rho ¼ 0.8, where shocks are drawn from N(0,1) 2. AR(1) with rho ¼ 0.5, where shocks are drawn from N(0.1) 3. I(1), where shocks are drawn from N(0,1) 4. S is drawn from N(0,1) 5. AR(1) with rho ¼ 0.8, where the deterministic shock equals 0.5 at t ¼ 10 and the stochastic shocks are drawn from N(0,1) 6. AR(1) with rho ¼ 0.8, where the deterministic shock equals 2.0 at t ¼ 10 and the stochastic shocks are drawn from N(0,1) 7. AR(1) with rho ¼ 0.8, where the deterministic shock equals 3.5 at t ¼ 10 and the stochastic shocks are drawn from N(0,1) The Simulated Series 1. AR component series 1 þ S 2. AR component series 2 þ S 3. Integrated component series 3 þ S 4. AR component series 1 plus Integrated component series 3 þ S 5. AR component series 2 plus Integrated component series 3 þ S 6. 20% AR Series 2 þ 20% Integrated series 3 þ 60% S 7. AR component series 5 þ S 8. AR component series 6 þ S 9. AR component series 7 þ S
3. Simulating the consequences of filtering election campaigns (Wlezien and Erikson, 2002). That is, we expect a good amount of random noise but also real campaign shocks, some of which last and others of which decay. A weight of 60% on the random series generated the expected proportion of true to observed variance that we see on average, and we simply weighted the UR and AR (rho ¼ 0.5) components equally, i.e., 20% each. Finally, to assess the effects of hypothetical campaign shocks, we added exogenous shocks to the random series, 100 time points apart, beginning at time point ten. This was done three ways, using shocks of magnitude 0.5, 2.0 and 3.5, the sizes of which were chosen based on the effects of major events during presidential election campaigns in the US. Holbrook (1996) has shown that the mean convention effect is 6.5 percentage points and the mean debate effect 2.2 percentage points. These effects can be expressed as ratios to the sampling error variance (about 1.9) we would expect in polls conducted during the last 60 days of the campaign (Wlezien and Erikson, 2002). The average convention effect is 3.4 times the expected error variance and the average debate effect is about 1.2 times as much. Given that our randomly drawn series has an error variance of 1.0 by construction, a shock of 3.5 is proportionate to what we would expect from a convention effect and a shock of 2.0 points about double what we would expect
To see the implications of filtering, we consider a set of simulations. For each simulation, a series of 10,000 numbers was drawn randomly from a normal distribution with a mean of 0 and a variance of 1. Using these numbers we generated series with different time series characteristics. To begin with we produced an autoregressive (AR) processes with a rho of 0.8. Since the level of stationarity may matter for our exercise, we also produced a separate AR process with a rho of 0.5. Next we produced a pure unit root (UR) process. Then, using these AR and UR processes we created two combined series, initially weighting each component equally, which merely involves adding the AR and UR series. These five series (two AR, one UR and two combined) are the underlying ‘‘true’’ series of interest. We want to compare the extent to which filtering allows us to identify these series and represent them accurately in the presence of sampling error, relative to not filtering. We thus need to add an amount of error to each of our series. For this analysis, we generate an additional series of random numbers from a normal distribution (N(0,1)). We then add it to each of the five series. We also created a series that we believe has the same error variance (and dynamics) that we see on average during the last 60 days of presidential
Table 3 Characterizing the time series processdMean rhos for the filtered series, generated assuming an AR process (N ¼ 100). Mean rho with standard deviation in parentheses
AR(0.8) þ S AR(0.5) þ S IþS Combined: AR(0.8) þ I þ S Combined: AR(0.5) þ I þ S Combined: 0.2*AR(0.5) þ 0.2*I þ 0.6*S
True underlying process
Assuming perfect information about sampling error
Assuming no information about sampling error
No filter
0.80 0.50 1.00 1.00 1.00 1.00
0.76 0.46 1.00 1.00 1.00 1.00
0.75 0.41 1.00 1.00 1.00 1.00
0.56 0.28 0.99 0.99 0.99 0.97
(0.01) (0.02) (0.00) (0.01) (0.00) (0.00)
(0.02) (0.04) (0.00) (0.00) (0.00) (0.00)
(0.00) (0.01) (0.00) (0.00) (0.00) (0.00)
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
from a typical debate. A shock of 0.5 is roughly half of an expected debate effect. After adding the shocks to the random series, we generated three new AR series with a rho of 0.8 for each. This gives us nine series, which are summarized in Table 2. Now, we want to see how filtering technology works with these different series. To do this, we treat each simulated series of 10,000 observations as 100 separate time series of 100 observations. We then apply the Kalman filter to each of the separate series and summarize the performance. As discussed, applying the filter requires us to make different assumptions, especially (1) whether the underlying process is autoregressive or unit root and (2) information about the level of sampling error. Recall that the common assumption is that the process is a unit root and that the level of sampling error variance is known (Pickup and Johnston, 2008; Jackman, 2005). We relax these assumptions here. We begin with the eight series without exogenous shocks and assume an autoregressive process and estimate the rho. We first do this assuming perfect information about sampling error variance (N(0,1)). We then assume no information about sampling error variance. Table 3 summarizes the estimates of rho under the two conditions. Specifically, it shows the means and standard deviations of the estimated rhos for the different series under the two conditions. For comparison, it shows the rhos for the true underyling series, and the means and standard deviations of the rhos estimated without filtering.
3.1. An analysis of time series characteristics The results in Table 3 tell us quite a lot. Notice first that the mean rhos for the filtered series are close to the true rhos. This clearly is true for the unit root and combined series, where the means all are 1.0. Even assuming an autoregressive process, therefore, filtering can do a good job identifying a unit root in a seriesdone does not need to assume a unit root when modelling time series. Indeed, the process can be determined empirically. This is of importance because when assuming a unit root, scholars will misrepresent stationary processes, by definition.10 Filtering works slightly less well for stationary series in representing the true process, as the mean rhos are all a little below the true rhos. However, they are much closer than what we estimate when not using a filter d without a filter, the true rho of 0.8 is estimated to be 0.56 on average and the rho of 0.5 is estimated to be 0.28. Importantly, whether one specifies the sampling error variance makes little difference. The average rhos when not assuming any information about error variance are only marginally lower than those when assuming the correct amount of error variance. The average performance in Table 3 conceals a good amount of variance in the estimated rhos. Table 4 provides a more direct assessment. It shows the root mean squared
10 Filtering still cannot, at least using the basic setup, differentiate pure random walks and combined series. That is, we cannot tell whether all shocks last or just some portion of shocks. This is of obvious consequence when identifying the characteristics of time series, and indicates the need for additional tests to differentiate unit root and combined series.
359
Table 4 Root mean squared errors of rhos for the filtered series, where filtered data are generated assuming an AR process (N ¼ 100). Root mean squared error Assuming perfect Assuming no No information about information about filter sampling error sampling error AR(0.8) þ S 0.12 AR(0.5) þ S 0.21 IþS 0.01 Combined: AR(0.8) þ I þ S 0.01 Combined: AR(0.5) þ I þ S 0.01 Combined: 0.2*AR(0.5) 0.01 þ 0.2*I þ 0.6*S
0.17 0.40 0.01 0.01 0.01 0.01
0.27 0.26 0.02 0.03 0.03 0.09
errors (RMSEs) for the rhos of the filtered series and the unfiltered series. First consider the AR series. As in the previous table, the filtered series are generally a clear improvement over the unfiltered series. This only fails to be the case for the estimate when the true rho is 0.5. While the estimated rho in this case is fairly accurate on average, the variance of the estimates is very large. This is a consequence of only having 100 data points to identify a relatively small autoregressive term. It is particularly problematic when there is no information about the measurement error variance. Not surprisingly, there are no equivalent problems when dealing with series containing a unit root. The RMSEs are low for the filtered unit root series (with or without sampling error variance information). These all mark real improvements over the RMSEs for the unfiltered unit root series, though notice that these are also relatively low. The message of Table 4 is a slightly nuanced version of that from Table 3. In terms of estimating the autoregressive dynamics of a time series, using a filter is always better than not, and perfectly specifying the sampling error variance provides little gain over making no such specification unless the autoregressive parameter is relatively low. In this particular scenario, data limitations will result in a large variance for the estimated rho when sampling error is unknown. 3.2. An analysis of variance Thus far, we have seen that the performance of filtering technology in identifying the characteristics of a time series varies quite a lot. Let us now see how it works to estimate the variance of the true innovations. For this analysis, we simply compare the true innovation variances with the innovation variances of the 100 samples of the smoothed state series. The first column of Table 5 indicates the true innovation variance. The second column of the table reports results for the innovations of the smoothed series generated assuming perfect information about sampling error. The third column of the table shows results where estimates were generated assuming no information about sampling error. The fourth column describes the estimates when not using a filter. The results in Table 5 parallel what we saw in Table 3. Comparing the first and second columns, we can see that the filtered estimates on average nicely reflect the amount
360
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
Table 5 Mean innovation variance for the filtered series, generated assuming an AR process (N ¼ 100). Mean variance with standard deviation in parentheses
AR(0.8) þ S AR(0.5) þ S IþS Combined: AR(0.8) þ I þ S Combined: AR(0.5) þ I þ S Combined: 0.2*AR(0.5) þ 0.2*I þ 0.6*S
True underlying process
Assuming perfect information about sampling error
Assuming no information about sampling error
No filter
1.0 1.0 1.0 2.0 2.0 0.4
1.09 (0.03) 1.03 (0.03) 1.07 (0.03) 2.10 (0.05) 2.08 (0.05) 0.02 (0.00)
1.13 0.64 1.06 1.86 1.50 0.05
2.55 (0.04) 2.24 (0.03) 3.17 (0.05) 4.33 (0.06) 4.56 (0.07) 0.85 (0.01)
of true innovation variance when we have perfect information about sampling error. This actually comes as little surprise given that the true variance is exactly equal to the observed variance minus the error variance. The third column of the table shows that things change little when assuming no information. Here, as before, the autoregressive series with a rho of 0.5 presents a challenge when measurement error variance is unknown. The smoothed state series shows less variance than the underlying true series. In all cases, though, the estimated innovation variances from the filtered series are a clear improvement over those from the unfiltered series. In addition to the average estimated values of the variances, we are interested in the RMSEs for the estimated variances. These are shown in Table 6. Again, the RMSEs for the filtered series are substantially lower than for the unfiltered series. Unlike before, with the estimation of rhos, whether the true series is autoregressive, a pure random walk or a combined series does not appear to matter. The gain to be obtained from filtering is consistent. Even more so than before, a priori information about the amount of sampling error provides little apparent advantage when filtering.11 We have seen that filtering works well in characterizing a unit root process even when assuming an autoregressive process. An alternative is to assume a unit root. What happens when we do? Tables 7 and 8 present comparable results to those in Tables 5 and 6. They present the mean estimated innovation variances and their RMSEs. This time we start with the unit root processes. Assuming a unit root process produces no better estimates of the innovation variances compare to those obtained from estimating the AR process. In both instances the estimated innovation variances are quite accurate. When the true process is
(0.06) (0.06) (0.04) (0.07) (0.06) (0.00)
autoregressive, however, there is a clear difference. Assuming a unit root process does far worse. Of course we know that the assumption would lead the filter to misrepresent the rhos, by definition. Table 7 shows further that assuming a unit root clearly underestimates the variance of innovations. The performance of the filter under this assumption worsens as the true process departs further and further from unit root. This is true whether or not the measurement error is specified, though the problems are slightly greater when the true measurement error is unknown. It is important to note from Table 7 that filtering always does better than not filtering, even where the unit root assumption is erroneous, and whether or not measurement error is specified. In Table 8 we can see that filtering not only does better on average, it does so consistentlydthat is, the RMSEs for the filtered series always are lower.
3.3. Consequences of misspecifying the measurement error variance So far we have considered the cases of perfect information regarding the measurement error variance and no information. However, we have argued that when filtering is applied to survey data in political science the most likely scenario is misspecification of the measurement error variance. In order to examine the consequences, we have applied the Kalman filter to the 6th simulated seriesddesigned to correspond with what some believe is ‘‘realistic’’dassuming different levels of measurement error variance. We assumed no measurement error (no
Table 6 Root mean squared errors of the innovation variances, where filtered series are generated assuming an AR process (N ¼ 100). 11 One might ask how robust the model estimations are to alternate lag structure specifications. We have run simulations to examine what happens when the lag structure is misspecified – e.g., assuming an AR2 structure when the correct lag structure is AR1. The answer is that the root mean squared error for the time series estimated without fixing the measurement error variance is only fractionally larger than it is with the properly specified lag structure. It still is less than that from the time series estimated by fixing the measurement error variance at the correct or an incorrect value. This suggests that the estimation procedure is robust to lag structure misspecification. Of course, one could so badly misspecify the lag structure that the resulting estimation is simply wrong but, importantly, there is no evidence that fixing the measurement error variance does anything to resolve this problem.
Root mean squared error Assuming perfect Assuming no No information about information about filter sampling error sampling error AR(0.8) þ S 0.43 AR(0.5) þ S 0.44 IþS 0.45 Combined: AR(0.8) þ I þ S 0.75 Combined: AR(0.5) þ I þ S 0.78 Combined: 0.2*AR(0.5) 0.38 þ 0.2*I þ 0.6*S
0.63 0.76 0.53 0.86 0.92 0.35
1.63 1.32 2.27 2.51 2.75 0.48
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
361
Table 7 Mean innovation variance for the filtered series, generated assuming a unit root process (N ¼ 100). Mean variance with standard deviation in parentheses
AR(0.8) þ S AR(0.5) þ S IþS Combined: AR(0.8) þ I þ S Combined: AR(0.5) þ I þ S Combined: 0.2*AR(0.5) þ 0.2*I þ 0.6*S
True underlying process
Assuming perfect information about sampling error
Assuming no information about sampling error
No filter
1.0 1.0 1.0 2.0 2.0 0.4
0.86 0.61 1.06 2.06 2.04 0.03
0.56 0.08 1.05 1.82 1.47 0.05
3.23 3.47 3.15 4.31 4.54 0.85
filter), half of the true measurement error, the true measurement error (perfect information), one and a half times the true measurement error, twice the true measurement error, two and a half times the true measurement error, and thrice the measurement error. The mean root mean square errors of the estimated smoothed series relative to the true underlying series are presented in Fig. 1. The figure also provides the reference of filtering with unknown measurement error. The first thing evident from Fig. 1 is that filtering can help. Filtering with perfect information regarding the measurement error variance produces a 25 percent reduction in the root mean square errors over the unfiltered data. However, this gain from filtering begins to disappear as the specified measurement error variance is overstated. If the specified measurement error variance is twice the true measurement error variance, the filter provides only a fourteen percent reduction in the root mean square errors of the series. This is still an improvement over the unfiltered series but filtering with an unspecified measurement error variance actually does better. In fact, it does better than having perfect information. Interestingly, specifying the measurement error variance as lower then the true measurement error variance also does very well. Underestimating the measurement error is better than overestimating. What this all tells us is that for data that follows this particular data generating process – a process that was designed to reflect what we observe in the typical US presidential election campaign – filtering is advantageous in extracting true public opinion from sampling error. However, unless the true sampling error variance is precisely known, it is better to filter with an unspecified
(0.03) (0.03) (0.03) (0.05) (0.05) (0.00)
(0.04) (0.01) (0.04) (0.06) (0.06) (0.00)
(0.05) (0.05) (0.05) (0.06) (0.07) (0.01)
measurement error variance. This approach is more akin to the traditional Kalman filter. 3.4. Diagnosing misspecification of measurement error variance The preceding section raises the question: what is the best practice if we believe we really do know the measurement error variance but we would like to test this assumption. In practice, one doesn’t have the true underlying process to test the estimated smoothed series against. The answer is a simple diagnostic test. If the measurement error variance is correctly specified than the prediction errors should be uncorrelated. It is of course possible that they will be correlated due to other model misspecification problems, but uncorrelated prediction errors is sufficient to demonstrate that the specified measurement error variance is correct. This can be seen by testing the prediction errors from the previous analysis for serial correlation. Fig. 2 presents the level of serial correlation in the prediction errors from the filtering of the AR(1) series with a rho of 0.8 assuming different levels of measurement error variance. When the measurement error variance is correctly specified and when it is not specified at all, the serial correlation is very low (and not statistically significant). When the specified measurement error variance is too low, the prediction errors are negatively correlated at a statistically significant level. In the extreme, when the measurement error variance is assumed to be zero, the serial correlation is on average - 0.11. Specifying the measurement error variance as too high has the clear consequence of producing positively correlated prediction errors. The serial correlation is on average a statistically
Table 8 Root mean squared errors of the innovation variances, where filtered series are generated assuming a unit root process (N ¼ 100). Pearson’s correlations
AR(0.8) þ S AR(0.5) þ S IþS Combined: AR(0.8) þ I þ S Combined: AR(0.5) þ I þ S Combined: 0.2*AR(0.5) þ 0.2*I þ 0.6*S
Assuming perfect information about sampling error
Assuming no information about sampling error
No filter
0.41 0.57 0.44 0.73 0.76 0.38
0.62 0.93 0.51 0.85 0.92 0.35
2.33 2.58 2.26 2.49 2.73 0.49
362
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
0.7
Shock=0.5
a
1.8
RMSE of Coefficient Estimate
0.6
RMSE
0.5 0.4 0.3 0.2
1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0
0.1
0.0 0.0
0.5
1.0
1.5
2.0
2.5
0.5
3.0
Proportion of Actual Measurement Error Variance Specified Variance Unspecified
1.5
2.0
2.5
3.0
Variance Unspecified
Shock=2
b
Fig. 1. Root mean squared errors of filtered series under different assumptions about survey error.
2.0
RMSE of Coefficient Estimate
1.8
0.3
0.2
Correlation
1.0
Proportion of Actual Measurement Error Variance Specified
0.0
1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2
0.1
0.0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
Proportion of Actual Measurement Error Variance Specified
0.0
Variance Unspecified
-0.1
c 0.0
0.5
1.0
1.5
2.0
2.5
3.0
Shock=3.5 1.8
3.5
Proportion of Actual Measurement Error Variance Specified Variance Unspecified Fig. 2. Serial correlation of forecast errors under different assumptions about survey error.
significant 0.17 when the measurement error variance is specified as twice its true value. This serves to demonstrate a simple diagnostic that candand shoulddbe applied when filtering methods are utilised. If the prediction errors are serially correlated, the model is misspecified and this misspecification may very well be due to the level of assumed measurement error variance. 3.5. An analysis of specific shocks In addition to recovering true public opinion from noisy measures, analysts are often interested in using the filter to produce better estimates of campaign effects, as noted
RMSE of Coefficient Estimate
-0.2 -0.5
1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
Proportion of Actual Measurement Error Variance Specified Variance Unspecified Fig. 3. a,b,c. Root mean squared errors for estimated shock coefficients.
above. To determine how filtering might assist or hinder in this endeavour, the three simulated series with exogenous shocksdthe 7th, 8th and 9th simulated series from Table 2dwere put through a filter that includes a shock of unknown magnitude in the state space model:
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
b
Shock=0.5 0.10 correct incorrect
0.08 0.06
var unknown - incorrect var unknown - correct
0.04 0.02 0.00 -1
0
1
2
3
Shock=2
Proportion Significant and Correct/Incorrect
Proportion Significant and Correct/Incorrect
a
363
0.5 correct incorrect
0.4 0.3
var unknown - correct
0.2 0.1
var unknown - incorrect
0.0
4
-1
0
... x True Variance
1
2
3
4
... x True Variance
c Proportion Significant and Correct/Incorrect
Shock=3.5 0.8 correct incorrect
0.6
var unknown - correct
0.4 0.2 var unknown - incorrect
0.0 -1
0
1
2
3
4
... x True Variance Fig. 4. a,b,c. Proportion of correct and incorrect estimated shock coefficients.
yt ¼ mt þ 3t
3t wNID 0; s23
estimated standard errors by reducing the variance of the
xt term. Artificially reducing the standard errors could
mt ¼ rmt1 þ bxt þ xt xt wNID 0; s2x
where b is the estimated magnitude of the shock. As before, we treat the simulated series as 100 separate time series of 100 observations. The results of this analysis are summarized in Fig. 3a, b and c, and in Fig. 4a, b and c. Fig. 3a–c present the root mean square errors of the estimated b coefficients. Clearly, for all types of shocks – small, medium and large – filtering provides little assistance in estimating its magnitude. To the extent filtering has any effect, it is to increase the errors. This suggests that if our interest is in estimating the magnitude of campaign events, filtering offers little purchase. However, this ignores two facts. First, estimating the long-run effect of an exogenous shock requires the estimated rho for the series – specifically b/(1r) – and as has been demonstrated, filtering is of significant advantage in correctly estimating rho. Therefore, filtering is of significant advantage in estimating the long-run impact of any exogenous shock. Second, one of the stated advantages of filtering is that it reduces standard errors and increases the power of shock detection. This still needs to be explored. Fig. 4a–c indicate the proportion of the estimated coefficients that were significant and correct, and significant and incorrect. The second of these statistics is important, as overstating the magnitude of the measurement error variance increases the true standard errors on the coefficients but may, at the same time, decrease the
make a greater number of estimated coefficients statistically significant but this is equally likely to be the case for inaccurate estimates as for accurate estimates. Looking at the results for the largest shock, in the third frame of Fig. 4, we see that filtering with perfect information about measurement error variance does somewhat increase the number of correct and significant coefficients. Overstating the measurement error variance by a factor of two produces a slight increase again but this is matched by an equally large increase in the number of significant and incorrect coefficients. Overstating the measurement error variance by a factor of three reduces the number of significant and correct coefficients while increasing the number of significant and incorrect coefficients. Filtering with an unspecified measurement error variance is better than the last of these scenarios but no better than overstating the measurement error variance by a factor of two. Looking at the results for the smallest shock, in the first frame of Fig. 4, it is clearly unlikely that the shock will be detected under any circumstances. Filtering with perfect information regarding the measurement error variance does increase this probability somewhat. Overstating the measurement error variance decreases this probability, while greatly increasing the probability of estimating a significant and incorrect coefficient. Filtering with an unspecified measurement error variance under these circumstances is better than any scenario except that of perfect information.
364
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
For the medium-sized shock, perfect information regarding the measurement error variance again increases the chances of detection. Overstating the measurement error variance increases the probability of an incorrect and significant result and/or reduces the probability of detecting a significant result. Again, filtering with an unspecified measurement error variance is better than any scenario except that of perfect information. The pattern of these results is clear. Filtering can increase the probability of estimating effects of exogenous shocks. But, this is true only when the measurement error variance is known or else left unspecified. Unless the measurement error variance is precisely known, it is better to filter without specifying the measurement error. Once again, this is more akin to the traditional Kalman filter. 4. Discussion The Kalman filter is a popular tool in engineering and economics. It reduces noise in longitudinal measures and allows the practitioner to forecast future values using all previous values rather than just the last. It is becoming popular in political science, touted for its abilities to reduce measurement error and produce more precise estimates of true public opinion. Its application to survey measures of public opinion is different to its more traditional use in forecasting economic variables. Consequently, its properties when applied to public opinion measures are less wellknown. Through simulations we assess whether and to what extent the Kalman filter: reliably detects the characteristics of time series; does so across series with different rates of autoregressive decay; and does so when the amount the sampling error is unknown. We also examine whether the filtered data represents the level of true underlying variance and the extent to which we can identify specific shocks using these data. We have learned a numbers of things. Most fundamentally, taking into account sampling error variance when filtering data can work very well. That is, doing so we can better identify the general characteristics of time series and the amount of true variance. This tells us that scholars (e.g., Jackman, 2005; Pickup and Johnston, 2008) using the variant of the Kalman filter seemingly are on solid statistical ground. This said, the filter does work better under some conditions than others. First, it works best when assuming a stationary process. It does a good job on average identifying autoregressive processes and also the presence of a unit root. As the rho of the AR process declines, however, the probability of incorrectly estimating it increases. Information about sampling error has an important effect, particularly in the identification of AR processes. Of course, when assuming a unit root process, Kalman filtering is unable to identify AR processes by definition and assumption. Given this and the fact that the filter generally does a good job detecting UR processes implies that the assumption of a unit root process is both unnecessary and unhelpful. Second, the performance drops off when we misestimate the amount of sampling error, and especially when we overestimate it. Assuming no information about sampling error, filtering tends to nicely reflect the time
series dynamics of the series. Importantly, in almost all cases, filtering is better than not filtering and unless the measurement error variance is known precisely, it is better to not specify it. One may expect the measurement error variance to change over time. How might this be incorporated into the filter without pre-specifying the measurement error variance? Pickup and Johnston (2008) propose that measurement error be allowed to vary in proportion to the number of respondents in the polls used to measure public opinion. Specifically, they allow the measurement error variance to be a function of the usual estimate:
s23;t ¼ s20 þ s21
mt ð1 mt Þ N
This function contains two unknown coefficients. One coefficient (s21) allows for the fact that the variance of the measurement error may be greater or lesser than that expected from a proper probability sample, while the other coefficient (s20) allows for the effect of the various adjustments that houses make to their estimates. The inclusion and estimation of these coefficients allows the measurement error variance to change as a function of N without specifying perfect certainty about the variance. When this approach was applied to polls published during the 2004 US presidential election, it was found that the true measurement error variance was, in fact, significantly less than that usually assumed (Pickup and Johnston, 2008). This approach to allowing heterogeneity in the measurement error variance without assuming its exact value can be extended further. The variance (s32,t) can also be specified as a direct function of time, allowing for periods of greater and lesser volatility in measurement error. It may also be modelled as a function of relevant covariates. Each of these extensions is straightforward and broadens the applicability of the state space approach to modelling public opinion. Given much research on public opinion, particularly during election campaigns, we are also interested in the degree to which filtering allows us to detect the effects of events, e.g., conventions or debates in the US or party conferences in the UK. Our simulations suggest that filtering can help detect such shocks to preferences. Again though, unless the error variance is precisely known, it is much better to use the filter without specifying any measurement error at all. This result has larger implications. Including multiple measures of the underlying value of public opinion is again a simple extension of the methods described here. By combining polls from different survey houses, we can estimate the systematic bias of each survey house. The lessons learned here regarding the detection of shocks apply equally to the detection of such house effects. After all, differences in survey houses are the statistical equivalent of campaign shocks in poll results. Based on our results, future analysis of house effects should not assume precise information about the amount of measurement error variance. It could have meaningful consequences for the resulting estimates and the representation of true opinion change.
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
365
5. Putting the filter into practice
Acknowledgments
Thus far we have demonstrated what time series filtering offers public opinion research. We have highlighted what it can (and cannot) do under different conditions. We also suggest extensions to the application of filtering. From our examination, it is clear that public opinion researchers interested in filtering will need to utilise state space modelling. This presents no theoretical limitation, as the classic models that are familiar to experts in the public opinion subfield can be expressed in state space form. There is a gap between the theoretical and the practical, however, and we have not made it clear what scholars need to do to actually use the technology. For expository purposes, let us provide an example. Say that you interested in estimating the Autoregressive Distributed Lag model known as ADL(1,1), which encompasses the popular error correction model (Hendry, 1995). This model takes the form:
An earlier version of this paper was presented at the workshop on Producing Better Measures by Combining Data Cross-Temporally, at the University of Oxford, in November of 2007. We thank participants in that conference for comments, especially James Stimson. We also thank David Hendry, Neal Shepherd and the anonymous reviewers.
yt ¼ ryt1 þ m þ xt b0 þ xt1 b1 þ et
et wNID 0; s2e
Now say that you are interested in accounting for measurement error in yt. The state space representation of the ADL(1,1) with measurement error of unknown and time invariant variance is:
yt ¼ at þ 3t ;
3t wNID 0; s23 0
0
at ¼ rat1 þ m þ xt b0 þ xt1 b1 þ et ;
et wNID 0; s2e
There are a number of software packages capable of estimating such a model. An Introduction to State Space Time Series Analysis by Jacques J.F. Commandeur and Koopman (2007) explains how to perform state space time series analysis using two programs: STAMP and SsfPack. The first is a component of Oxmetrics 5.0, available from Timberlake Consultants. The second is available for free and is a library of C functions that can be linked to C programs or the programming language Ox. There is also a state space package available in R called sspir (Dethlefsen and Lundbye-Christensen, 2006). Using WinBuGS, state space methods can be implemented within a Bayesian framework in a relatively straightforward manner. The variance of the measurement error could also be modelled as a function of time or other covariates – such as sample size. In general, the state space approach is very flexible and powerful. It can accommodate time-varying parameters, trends, cycling and structural breaks. It is a powerful method for forecasting. It is more flexible in solving problems of nonstationarity, missing data and measurement error than classical approaches. It is also more easily extended to the multivariate case. It is very useful for the estimation of non-linear dynamic models. Further information on the use of state space modelling in political science, along with example WinBuGS code can be found by visiting: http://pollob.politics.ox.ac.uk/ statespace/
Appendix. On the properties of the Kalman filter Here we describe how the Kalman filter has been applied to survey data and how it differs from the traditional use in other areas. We also discuss the expected consequences of misspecifying the measurement error variance. The discussion is limited to the specific state space model utilised in our analysis. For a more general discussion of the Kalman filter and the state space model, see Commandeur and Koopman (2007), Durbin and Koopman (2001) or Harvey (1993). In our analysis we represent the data from public opinion surveys with the following state space model:
yt ¼ mt þ 3t
3t wNID 0; s23
mt ¼ rmt1 þ xt xt wNID 0; s2x
where yt represents the observed measure of public opinion at time t. The unobserved state mt is interpreted as the actual value of public opinion at time t, modelled as a first order autoregressive process. The disturbance 3t is the measurement error and r is the first order autoregression parameter. The level disturbance (xt) and the measurement error (3t) are assumed to be serially and mutually independent and normally distributed with zero mean and variances s23 and s2x respectively. Define an optimal forecast of mt as: mtjt1 ¼ hmt1 The optimal forecast of mt is that with the parameter h which minimizes E½ðmt mtjt1 Þ2 . As E½ðmt mtjt1 Þ2 ¼ E½ðmt1 ðr hÞÞ2 þ s2x , it is minimised when h ¼ r (Box and Jenkins, 1962; Harrison, 1967). This would be a simple minimisation problem except that we do not observe mt and so cannot calculate E½ðmt mtjt1 Þ2 . We can, however, observe the forecast errors et ¼ yt mtjt1 and Eðe2t Þ ¼ E½ðyt mtjt1 Þ2 . If 3twIID(0,s23 ), independently of mt, mt1, . then:
2 i h 2 i h ¼ E mt þ 3t mtjt1 E e2t ¼ E yt mtjt1 2 i h þ s23 ¼ E mt mtjt1 Therefore E(e2t ) will be minimised when E½ðmt mtjt1 Þ2 is minimised. If we can express the forecast errors and their variance as a function of the state space model parameters, we can estimate those parameters that minimise E(e2t ). These same parameters will be those that minimise
366
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367
E½ðmt mtjt1 Þ2 . The Kalman filter can be used for this purpose, as follows. We have determined that the optimal forecast of mt is rmt1. However as we do not observe mt1, consider the optimal estimator of mt at t1: mtjt1 ¼ rmt1jt1 . The mt1jt1 are the optimally estimated values of mt1 given all information up to and including t1. They are called the filtered values. Then:
2 i h E e2t ¼ E yt mtjt1 h 2 i ¼ E rmt1 þ xt1 þ 3t rmt1jt1 2 i h þ s2x þ s23 ¼ E r mt1 mt1jt1 The traditional Kalman filter is a recursive algorithm where E½rðmt1 mt1jt1 Þ2 þ s2x ¼ VARðmtjt1 Þ is usually represented as Ptjt1 and the forecasts error variance E(e2t ) is represented as Ft, and so: Ft ¼ Ptjt1 þ s23 . The filtered values are calculated as mtjt ¼ mtjt1 þ ðPtjt1 Ft1 Þðet Þ and the filtered state estimation error variance is calculated as 2 Ft1 . Pt ¼ Ptjt1 Ptjt1 Note that the Kalman filter never refers to the unobserved values of mt, only it’s predicted and filtered values ðmtjt and mtjt1 Þ.12 Smoothing occurs by taking the estimated values of mtjt and Pt from the Kalman filter and working backwards (Harvey, 1993). In this way the Kalman filter provides a method both to produce filtered and smoothed estimates of mt, and to express the forecast errors and its variance as a function of the parameters of the state space model. Determining the values of the state space parameters that minimise E(e2t ) is typically achieved by maximising the appropriate log likelihood function that is expressed in terms of the forecast errors and their variances, which are, in turn, expressed as a function of the parameters of the state space equation (done by using the Kalman filter equations). Maximising the resulting log likelihood function results in estimates of r, s23 and s2x which provide the optimal predictions of mt. When this occurs, the et are uncorrelated (West and Harrison, 1997). It can be shown that these estimates will also provide the optimal filtered and smoothed values of mt. It is becoming increasingly common to instead use MCMC simulation to estimate the parameters of the above state space model (r, s23 and s2x and the unobserved state values m1, m2, m3, ., mt1) that minimise E(e2t ). In either case, it is necessary to assume 3twIID(0,s23 ) and is independent of mt, mt1, .. It also continues to hold that once the optimal estimator is obtained the et are uncorrelated. Typically, when the Kalman filter is applied to public opinion data, the measurement error variance s23 is not estimated but rather specified as a function of the smoothed value of mt. When this is done, the first assumption no longer holds: 3t is not distributed independently of mt, mt1, .. Therefore, specifying the measurement error in this way means we can no longer rely on the usual
12
This does require initial values m0 and P0 (Harvey, 1993).
properties of the Kalman filter, including its guarantee of optimality. Of even greater concern is the case when s23 is specified incorrectly. Doing so will result in serially correlated forecast errors and larger forecast error variances (West and Harrison, 1997). Minimising E(e2t ) does not produce the optimal forecasted, filtered or smoothed values of mt. Incorrectly specifying the forecasting error variances produces an additional problem. Consider the second state space model utilised in our analysis, that which contains a shock term:
3t wNID 0; s23 mt ¼ rmt1 þ b1 xt þ xt xt wNID 0; s2x yt ¼ mt þ 3t
The consequence of over-specifying the measurement error variance (s23 ) is to introduce additional noise into the estimated unobserved states (mt). This increases the true h standard error of b1 . At the same time, it will result in an under estimation of s2x , which in turn reduces the estimated h standard error of b1 , relative to the true standard error. h Therefore, the estimated standard error of b1 will be a function both of the increasing true standard error and the increasing under estimation of that true standard error. Interestingly, under-specifying the measurement error h variance also increases the true standard error of b1 but does not result in an under estimation of the standard error. In this way, over-specification is a bigger problem than under-specification. References Beck, N., 1991. Comparing dynamic specifications: the case of presidential approval. Political Analysis 3, 51–87. Beck, N., 1989. Estimating dynamic models using Kalman filtering. Political Analysis 1, 121–156. Box, G.E.P., Jenkins, G.M., 1962. Some statistical aspects of the adaptive optimisation and control. Journal of the Royal Statistical Society. Series B (Methodological) 24, 297–343. Commandeur, J.J.F., Koopman, S.J., 2007. An Introduction to State Space Time Series Analysis. Oxford University press, Oxford. Converse, P.E., Traugott, M.W., 1986. Assessing the accuracy of polls and surveys. Science 234, 1094–1098. Crespi, I., 1988. Pre-Election Polling: Sources of Accuracy and Error. Russell Sage, New York. Dethlefsen, C., Lundbye-Christensen, S., 2006. Formulating state space models in R with focus on longitudinal regression models. Journal of Statistical Software 16 (1), 1–15. Durbin, J., Koopman, S.J., 2001. Time Series Analysis by State Space Methods. Oxford University Press, Oxford. Erikson, R.S., Wlezien, C., 1999. Presidential polls as a time series: the case of 1996. Public Opinion Quarterly 63, 163–177. Granger, C.W.J., 1980. Long memory relationships and the aggregation of dynamic models. Journal of Econometrics 14, 227–238. Green, D.P., Gerber, A.S., DeBoef, S.L., 1999. Tracking opinion over time. Public Opinion Quarterly 63, 178–192. Groves, R.M., 1989. Survey Errors and Survey Costs. Wiley, New York. Harrison, P.J., 1967. Exponential smoothing and short-term sales forecasting. Management Science. Series A, Sciences 13, 821–842. Harvey, A.C., 1993. Time Series Models, second ed. The MIT Press, Cambridge, Massachusetts. Heise, D.R., 1969. Separating reliability and stability in test–retest correlations. American Sociological Review 34, 93–101. Hendry, D., 1995. Dynamic Econometrics. Oxford University Press, Oxford. Holbrook, T., 1996. Do Campaigns Matter? Thousand Oaks, CA: Sage publications. Jackman, S., 2005. Pooling the polls over an election campaign. Australian Journal of Political Science 40, 499–517. Lau, R., 1994. An analysis of the accuracy of ‘trial heat’ polls during the 1992 presidential election. Public Opinion Quarterly 58, 2–20.
M.A. Pickup, C. Wlezien / Electoral Studies 28 (2009) 354–367 Little, R.J.A., 1993. Post-stratification: a modeler’s perspective. Journal of the American Statistical Association 88, 1001–1012. Pickup, M., Johnston, R., 2008. Campaign trial heats as election forecasts: measurement error and bias in 2004 presidential campaign polls. International Journal of Forecasting 24, 270–282. Pickup, M., Johnston, R., 2007. Campaign trial heats as electoral information: evidence from the 2004 and 2006 Canadian Federal elections. Electoral Studies 26, 460–476. Traugott, M.W., 2001. Assessing poll performance in the 2000 campaign. Public Opinion Quarterly 63, 389–419.
367
West, M., Harrison, J., 1997. Bayesian Forecasting and Dynamic Models, second ed. Springer-Verlag, New York. Wlezien, C., 2000. An essay on ‘combined’ time series process. Electoral Studies 17, 77–93. Wlezien, C., Erikson, R.S., 2007. The horse race: what polls reveal as the election campaign unfolds. International Journal of Public Opinion Research 63, 163–177. Wlezien, C., Erikson, R.S., 2002. The timeline of election campaigns. Journal of Politics 64, 969–993. Wlezien, C., Erikson, R.S., 2001. Campaign effects in theory and practice. American Politics Research 29, 419–437.