Australasian Marketing Journal 19 (2011) 281–292
Contents lists available at ScienceDirect
Australasian Marketing Journal journal homepage: www.elsevier.com/locate/amj
Pseudo panels as an alternative study design Catherine Frethey-Bentham ⇑ Department of Marketing, The University of Auckland, Private Bag 92019, Auckland, New Zealand
a r t i c l e
i n f o
Article history: Received 21 January 2011 Revised 20 June 2011 Accepted 4 July 2011 Available online 30 July 2011 Keywords: Longitudinal studies Cross-sectional studies Consumer dynamics Pseudo panels Data fusion Statistical matching
a b s t r a c t Marketing academics and practitioners increasingly recognise the importance of studying the dynamic nature of consumer attitudes and behaviour. However, recent works highlight a dearth of longitudinal studies into consumer dynamics published in marketing academic literature (Leonidou et al., 2010; Rindfleisch et al., 2008; Williams and Plouffe, 2007). In an attempt to address this gap, this article evaluates the ability of available research designs to meet various research objectives in the study of consumer dynamics. This evaluation highlights the need for a technique capable of modelling gross and individual level change using repeated cross-sectional data. The article proposes the use of pseudo panels to address this gap and advances the use of data fusion techniques for matching independent samples over multiple time periods to create these pseudo panels. Ó 2011 Australian and New Zealand Marketing Academy. Published by Elsevier Ltd. All rights reserved.
1. Introduction The process of market evolution is central to marketing (Giesler, 2008). In an ever complex market environment where consumer preferences shift constantly, marketers must understand the nature of consumer change (Blackwell et al., 2001). Indeed, it has long been recognised that turnover tables showing the transitions between discrete states in behaviour provide important, basic tools for understanding processes of social change (Lazarsfeld and Rosenberg, 1955). Thus, consumer researchers do not merely aim to comprehend consumer behaviour at a single point in time but instead are interested in how individual consumers evolve in the long run. It is generally purported that the best means for tracking and understanding consumer dynamics is to follow the movements of a group of consumers through time (Leeflang and Wittink, 2000; Smith and Lux, 1993). In market research, at least at the quantitative level, this is traditionally achieved through the conscription of a longitudinal panel. However, recent investigation into the types of studies conducted in the marketing academic literature demonstrate dearth of longitudinal panel research (Leonidou et al., 2010; Rindfleisch et al., 2008; Williams and Plouffe, 2007). The purpose of this article is to propose the use of pseudo panels to overcome many of the methodological difficulties faced by marketing practitioners and academics interested in studying the dynamic nature of consumer behaviour.
⇑ Tel.: +64 9 3737599x88830; fax: +64 9 3737444. E-mail address:
[email protected]
The article begins by briefly reviewing the various study designs available to researchers and outlines the ability of each study design to meet various objectives in the study of consumer dynamics. The overall benefits and limitations of different study designs are then highlighted. Existing econometric pseudo panel models that attempt to address the deficiencies of currently available study designs are introduced and critiqued. Finally, the article proposes a novel approach to address these issues in the form of pseudo panels which are developed using data fusion techniques. These pseudo panels allow the investigation of gross and individual level change using repeated cross-sectional data. It is argued that pseudo panels developed using data fusion are superior to econometric panels for studying consumer behaviour because they (1) use a larger number of variables to match individuals between time periods and therefore account for respondent heterogeneity and achieve matches with greater accuracy between time periods, (2) preclude the use of cohort averages, therefore securing greater detail at the individual level, (3) utilise ‘live’ rather than predicted or averaged respondent values and (4) are designed specifically for use with marketing data such as those collected in consumer panels. 2. Overview of study designs Integrally involved in consumers’ behaviour is the use and expenditure of time. Acquisition and consumption of both products and information regarding products are not cross-sectional events of short and unvarying duration. Rather, they are dynamic processes that occur over time and that may involve spans of time from one occasion and one individual to another (Jacoby et al., 1976). Hence, a long-term comprehension of consumer
1441-3582/$ - see front matter Ó 2011 Australian and New Zealand Marketing Academy. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.ausmj.2011.07.001
282
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
trends assists an understanding of change in consumer attitudes or preferences (at both the macro and micro level), provides knowledge of cyclical behaviour, and allows assessment of the full impact of marketing strategies (Mela et al., 1997). Such an understanding enables firms to create and monitor strategic plans and assists them in maintaining long-term profitability (Nauck et al., 2006). The modelling of consumer behaviour can be broken down into three general categories: (1) cross-sectional studies, (2) longitudinal panel studies, and (3) some combination of longitudinal and cross-sectional designs. Each of these types of study has a variety of different executions. These are discussed further below. 2.1. Cross-sectional studies At the broadest level, cross-sectional studies form a class of research methods that involve observation of some subset of a population all at the same time, in which groups can be compared with respect to variables of interest. Further distinction is made here between one-shot cross-sectional studies, which collect data at one time period only and repeated surveys which collect data on the same/similar variables from subsets of a population at distinct time intervals. Each of these designs is discussed further below. One shot cross-sectional surveys1 are the simplest and most common study design. They involve completion of a survey by respondents at a single point in time and provide a ‘‘snapshot’’ of the frequency and characteristics of a population at that particular point in time. Repeated surveys are defined as a series of separate cross-sectional surveys conducted at different points in time. No effort is made to ensure that any of the same elements are sampled for the individual surveys (Kalton and Citro, 1995). The elements are sampled from a population defined in the same manner for each individual survey and many of the same questions are asked in each survey. A new sample is selected at each time point, so each cross-sectional survey is based on a probability sample of the population existing at the time of data collection (assuming that probability sampling procedures were adhered to). 2.2. Longitudinal panel studies A longitudinal panel, in the broadest definition, is when the movements of a group of individuals are tracked through time. The types of movements followed, the recording device, and the variables collected differ greatly across panel types. Panels can be broadly categorised into two general categories, these being scanner panels and consumer diary panels. Each of these panels is further described below. Consumer diary panels are defined as being diaries or reporting devices, for use particularly in surveys, where each member of a continuing panel reports attitudes, activities, purchases, opinions or the like during repetitive unit periods of time (US Patent No. 4,000,915, 1977). Consumer diary panels have been used by marketers since the early 1940s. They have been useful for examining a variety of individual level changes including brand switching (Womer, 1944) and changes in consumer attitudes and behaviour (for example, The Nielsen Company’s HomeScan Panel Views Survey). Consumer panels were traditionally in the form of consumer diaries in which a sample of consumers would regularly record relevant attitudinal characteristics and behaviour
1
Also referred to as cross-sectional studies (Rindfleisch et al., 2008).
in paper diaries. More recently, many consumer panels are conducted online.2 Scanner panels typically involve a panel member using some type of hand-held scanner to record product use. Much of the empirical research utilising scanner data focuses on predicting brand or product choice on the basis of variables such as loyalty and loyalty programmes (e.g., Labeaga et al., 2007; MeyerWaarden, 2008; Sharp and Sharp, 1997), advertising and promotional strategies (e.g., Gönül and Srinivasan, 1996; Mela et al., 1997; Steenkamp and Gielens, 2003) or other marketing mix variables (e.g., Han et al., 2001; Kusum et al., 2001). As such, most current use of scanner panel data tilts towards developing models, both aggregate and disaggregate, of the effects of marketing mix variables on consumer choice (Andrews and Currim, 2005; Malhotra et al., 1999; Winer et al., 1994).
2.3. Combination survey designs In an attempt to exploit the advantages of both cross-sectional and longitudinal panel data, some survey designs utilise a combination of these two survey designs. These designs include overlapping, rotating, repeated and split panel designs, which differ in terms of the panel and cross-sectional components and the manner in which they are updated. Each of these designs is discussed further below. Overlapping surveys, like repeated surveys, are a series of crosssectional surveys conducted at different points in time. However, repeated surveys do not attempt to secure any sample overlap in the survey from one point in time to the next, while an overlapping survey is designed to provide such overlap. Thus, the aim may be to maximise the degree of overlap while taking into account both the changes desired in selection probabilities for sample elements that remain in the survey population and changes in population composition over time (Kalton and Citro, 1995). Repeated panel surveys are made up of a series of panel surveys each of any given duration. There may or may not be overlap between the time periods covered by the individual panels. Statistics New Zealand’s Quarterly Employment Survey (QES) is an example of a repeated panel survey with overlap. Derived quarterly from approximately 18,000 surveyed business locations in a range of industries and regions throughout New Zealand, all businesses in the sample are surveyed in each quarter until the sample is reselected or redesigned (approximately every five to six years) when some businesses are rotated out of the panel. Repeated panel surveys with overlap have a focus on longitudinal measures (for example, durations of periods of brand usage). In consequence, repeated panel surveys tend to have longer durations and fewer panels in operation at any given time than rotating panel surveys (Kalton and Citro, 1995). Rotating panel surveys3 are equivalent to repeated panel surveys with overlap. That is, sample elements have a restricted panel life – as they leave the panel, new entrants are added. However, the distinction between the two is made because these two designs have different objectives. Rotating panel surveys are widely used to provide a series of cross-sectional estimates and estimates of net change 2 While many consumer panels are conducted online, the term ‘consumer panels’ is not to be confused with the term ‘online panels’. Consumer panels refer to situation where panel members self-report attitudes, behaviour or the like during repetitive unit periods of time (US Patent No. 4000,915, 1977). As such, consumer panels represent a distinct type longitudinal study design. Conversely, online panels are defined as pre-recruited groups of people willing to participate in online marketing research events such surveys (McDevitt and Small, 2002). This definition of online panels demonstrates that online panels are a database of respondents or sampling frame from which to select a sample rather than a type of study design. As such, online panels are not discussed as a type of study design. 3 Also referred to as sampling on successive occasions with partial replacement of units (Patterson, 1950) and sampling for a time series (Hansen et al., 1953).
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
(for example, brand usage and changes in such usage rates), and accordingly elements are only retained for short periods (Kalton and Citro, 1995). Split-panel surveys are best defined as a combination of a panel survey and a repeated or rotating panel survey. By combining a panel survey with a repeated or rotating panel survey, a split-panel survey can provide the advantages of each. However, given a constraint on total resources, the sample size for each component is necessarily smaller than if only one component had been used. 3. Objectives in the study of consumer dynamics In order to provide insight and recommendations pertaining to the different study designs, it is essential to look into the different study objectives that face consumer behaviourists. Kalton and Citro (1995, p.31) summarised a number of objectives surrounding the estimation of population parameters. These are adapted and contextualised to relevant consumer behaviour examples as follows: (a) The estimation of population parameters at distinct time points, for example, the proportion of the target population purchasing Brand X in a given time frame. (b) The estimation of average values of population parameters across time, for example, the average weekly consumption of Brand X. (c) The estimation of net changes – that is, changes at the aggregate level, for example, change in the proportion of users of Brand X from one year to the next. (d) The aggregation of data for individuals over time, for example, the summation of twelve monthly purchases on Brand X to give annual spend. (e) The estimation of gross changes and other components of individual change, for example, the proportion of individuals purchasing Brand X this year that did not do so in the previous year. In addition to the objectives surrounding the estimation of population parameters, the author identifies two additional objectives relating to study designs. These are as follows: (a) The ability to make causal inferences, for example the ability to infer that sales promotion Y was responsible for differences observed in sales of Brand X. (b) The ability to conduct a study with limited resources. The following sections consider each of these objectives individually and discuss the ability of different data collection methods to meet the objective in question. These insights are further outlined in Table 1. 3.1. Estimation of population parameters at distinct time points When estimating population parameters at distinct time points, cross-sectional designs are preferable to any longitudinal panel designs (including diary and scanner panels). The strength of cross-sectional designs is that a new sample is selected at any given time point. Therefore, each cross-sectional survey is based on a probability sample of the population existing at the time of data collection (assuming that probability sampling procedures were adhered to). For this reason, cross-sectional surveys are particularly good at producing cross-sectional estimates (Kalton and Citro, 1995). The drawbacks of using longitudinal panel designs to estimate population parameters at distinct time points largely result because (1) after the first wave’s recruitment,
283
the study is restricted to the members of that sample although changes in the population may occur, (2) despite attempts to locate respondents from wave to wave, there is invariably a fair amount of panel attrition (Yee and Niemeier, 1996), and (3) panel conditioning can bias estimates of population parameters for diary panels. While there has been a considerable amount of research devoted to the characteristics of the individuals that agree to sign up to a panel as opposed to non-respondents in general, a smaller body of research has addressed the characteristics of those who drop out of longitudinal panel surveys (Lynn et al., 2005). Panel attrition after the first wave has been found to be related to such variables as survey enjoyment and interviewer effects (Hill and Willis, 2001), survey length (Apodaca et al., 1998), and respondent migration (Laurie et al., 1999; Lepkowski and Couper, 2002). Other studies indicate that panel attrition is found in all panel studies and is non-random (Falaris and Peters, 1998; Lee et al., 2004; Toh and Hu, 1996). Direct assessment of bias due to attrition and non-response is only possible in special situations, such as where a very informative sampling frame is available, linkage to individual records is possible, or some other form of validation study can be carried out. When attrition occurs, attempts are sometimes made to add samples of new entrants to a panel at later time points. However, because of the aforementioned difficulties in pinpointing the panel characteristics which might cause biased estimates, such updating is generally difficult and often done imperfectly (Kalton and Citro, 1995). Non-response losses also heighten concerns about non-response bias when a panel sample is used to estimate cross-sectional parameters for later time points. When attrition is systematic (selective or non-random) the sample profile changes thus biasing the results, affecting the external validity of the study and reducing the generalisability of the findings (Kalton and Citro, 1995; Norris, 1987). This is because many models used to analyse panels require balance, in that there must be observations for all cross-sectional units for all time periods, necessitating the removal of all earlier observations of those who drop out (Christian and Frischmann, 1989). When attrition is systematic, not only are sample sizes reduced, but results are also biased because the final pared down sample is no longer representative of the population (Toh and Hu, 1996) and often with longitudinal panel studies representativeness is the exception rather than the rule (Koller and Salzberger, 2009). Thus, while longitudinal panel data is particularly well suited for stationary populations, most target populations are not closed systems (Blackwell et al., 2001). By keeping the sample population fixed, there is a risk of making inaccurate conclusions about the true population, which may have changed. Another concern in measuring population parameters using diary panels is the potential incidence of panel conditioning (sometimes referred to as time-in-sample bias).4 Diary panels are susceptible to panel conditioning since they utilise self-report measures. Conditioning effects are of particular concern to panels where the length of time between interview points is relatively short (e.g., weeks or months). Panel conditioning effects have been reported in a number of studies (e.g., Bailar, 1975; Corder and Horvitz, 1989). There is still some debate as to the portion of bias attributable to conditioning as opposed to those caused by attrition (Lynn et al., 2005). While attempts to disentangle the effects
4 Where panel members’ responses at a given wave of data collection may be affected by their participation in previous waves since respondents often become primed to measurement instruments after repeated interviewing, thus making the sample atypical (Binder and Hidiroglou, 1988).
284
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
Table 1 Properties of alternative survey designs.a Consideration
One-shot crosssectional survey
Repeated cross-sectional surveys
Diary panel surveys
Scanner panel
Rotating/repeated (with overlap)/split-panel surveys
Objective A1: Estimation of population parameters at distinct time points – ability of sample to be representative of population Objective A2: Estimation of population parameters at distinct time points – accuracy of responses/ observations Objective B: Estimation of average values of population parameters Objective C: Estimation of net changes
Good Notes: estimation for a singular time point only
Good Notes: takes population changes into account
Poor Notes: panels suffer from attrition, requires a mechanism for taking population changes into account
Poor Notes: panels suffer from attrition, requires a mechanism for taking population changes into account
Variable Notes: requires a robust mechanism for sample update and panel attrition that takes population changes into account
Average Notes: often requires retrospective questioning meaning there can be recall/ telescoping problems
Average Notes: may require retrospective questioning meaning there can be recall/telescoping problems
Average Notes: possible to record behaviour as it occurs (reducing recall/ telescoping bias) but may suffer from panel conditioning
Good Notes: records behaviour as it occurs and does not suffer from panel conditioning (requires no self-reporting), but does not collect attitudinal data
Variable Notes: may require retrospective questioning meaning there can be recall/telescoping problems
Average Notes: attributes/ deficiencies same as for Objective A
Average Notes: attributes/ deficiencies same as for Objective A
Average Notes: attributes/ deficiencies same as for Objective A
Average Notes: attributes/ deficiencies same as for Objective A
Good Notes: composite estimation can be used to produce efficient estimates
Not possible
Good Notes: variance change in panel component reduced by positive correlation between waves
Good Notes: variance change in panel component reduced by positive correlation between waves
Good Notes: composite estimation can be used to produce efficient estimates
Objective D: Aggregation of data for individuals over time. Objective E: The estimation of gross changes and other components of individual change Objective F: Ability to make causal inferences
Not possible
Average Notes: estimates combine effects of changing attitudes/behaviour and changing population resulting in higher variation in estimates and statistical tests having lower power than panel studies Not possible.
Good
Good
Average Notes: can be used for aggregates over short time periods.
Not possible
Not possible
Good
Good
Average Notes: can be used for individual/gross changes over short time periods
Average – Poor Notes: reveals the covariation between variables but lacks information about the temporal sequence of presumed cause and effect Good Notes: Costs limited to one period of time. Respondent compensation costs are typically low
Average – Poor Notes: reveals the covariation between variables but lacks information about the temporal sequence of presumed cause and effect
Good Notes: contains information about the covariation between variables and the temporal sequence of the relationship
Average – Good Notes: contains information about the covariation between variables and the temporal sequence of the relationship but is limited to observational data only
Average Notes: contains information about the covariation between variables and some information about the temporal sequence of the relationship (limited to the panel component)
Average Notes: Sample selection costs. Respondent compensation lower than panels
Poor Notes: Extensive resources required, including panel management and often respondent compensation costs. Limited sample selection costs after initial set-up
Poor Notes: Extensive resources required, including panel management, potentially requires hand-held scanners for each household and often respondent compensation costs. Limited sample selection costs after initial set-up
Average – Poor Notes: Sample selection costs. Panel upkeep costs (time and monetary). Usually respondent compensation costs
Objective G: The ability to conduct a study with limited resources
a
Adapted from Duncan and Kalton (1987).
of attrition and conditioning have been made (Waterton and Lievesley, 1989), the results remain tentative at best. Panel conditioning has the potential to occur in all longitudinal panel studies and, as such, will impact on all of the subsequently discussed objectives. It should be noted that panels (particularly scanner panels) will potentially produce more accurate estimates of purchase behaviour since they record behaviour as it occurs. This reduces the bur-
den on respondents and the likelihood of any recall bias or telescoping.5 Many studies comparing the characteristics of events which are reported with those not reported identify time as a fundamental factor – the longer the recall period, the greater the
5 Where are telescoping is defined as errors that arise when respondents misplace the occurrence of events in time (Churchill and Iacobucci, 2005).
285
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
expected bias caused by respondent retrieval and reporting error (Lynn et al., 2005). Bound et al. (2001) refer to studies demonstrating the impact of time on the reporting of consumer expenditures and earnings, health related issues, motor vehicle accidents, crime, and recreation. Whilst some researchers assert that time might not increase the magnitude of recall bias under all conditions (e.g., Mathiowetz and Duncan, 1988; Schaeffer, 1994), it is likely whenever retrospective questioning is used, in particular in complex behavioural experiences (Bound et al., 2001). In contrast to a cross-sectional study, a longitudinal design lives up much better to the complexity of consumer behaviour (Koller and Salzberger, 2009). However, scanner panel data does not reveal any attitudinal information, and the ability to capture any changes in demographic information will be dependent on the frequency with which demographic information is updated subsequent to the establishment survey.6 These points apply for scanner data across all objectives subsequently discussed. Combination designs such as overlapping, rotating, repeated, and split-panel surveys offer better potential to select a representative sample (Binder and Hidiroglou, 1988) since the limited membership of sample elements acts to reduce the problem of panel attrition in comparison to a non-rotating panel survey, and the continual introduction of new samples helps to maintain an upto-date sample of a changing population (Duncan and Kalton, 1987). In particular, overlapping surveys are useful in situations where some sample overlap is required and where the desired element selection probabilities vary over time. This situation often arises in establishment surveys because the desired selection probability for an establishment may vary from one cross-sectional survey to the next to reflect its change in size. Kalton et al. (1989) discuss the use of rotating panel designs in which fresh replicate samples are added to the panel at each round as a means to examine panel conditioning through having a comparison group. In theory, this method should enable the effects to be observed, but it relies on holding all other survey conditions constant. In practice this is difficult to do (Lynn et al., 2005). 3.2. Estimation of average values of population parameters Cross-sectional studies are often preferred over panel studies when measuring average cross-sectional estimates. This is because, as with the estimation of population parameters at distinct time points, each cross-sectional survey is based on a probability sample of the population existing at the time of data collection (again assuming that probability sampling procedures were adhered to). Furthermore, when the objective is to produce average crosssectional estimates, another factor to be considered is the correlation between the values of the survey variables for the same individual at different time points. In panel studies this correlation is typically positive and increases the standard errors of the average cross-sectional estimates from a panel survey. Therefore, this objective also favours repeated surveys over panel surveys for average cross-sectional estimates (Kalton and Citro, 1995). 3.3. Estimation of net changes One-shot cross-sectional studies are incapable of measuring change over time and repeated cross-sectional surveys are also inferior to longitudinal panel designs in measuring net change. Although it may be contended that the superior representations 6 Establishment surveys are defined as surveys conducted for the purpose of estimating the scale and characteristics of a potential market. These are often important when setting up a panel to help determine the composition of the panel and enable the calculation of weighting factors where necessary (Birn, 2004).
of the population for repeated cross-sectional studies also argues in favour of these designs (over a longitudinal panel design) for estimating net changes, a limitation of repeated cross-sectional studies stems from their lower power in detecting significant differences between time periods. Conversely, because longitudinal panel designs measure observations on the same individuals, it is possible to focus on changes occurring within subjects and to make population inferences that are not as sensitive to between subject variation analyses (Yee and Niemeier, 1996). To demonstrate the reasons for this consider the following example. Let net change be represented by x2 x1 where x1 and x2 are the means of the variable of interest at times 1 and 2. The variance r2 of the net change is in general given by: 1
r2 ðx2 x1 Þ ¼ r2 ðx1 Þ þ r2 ðx2 Þ 2q½r2 ðx1 Þr2 ðx2 Þ2 where q is the correlation between x1 and x2 . In a repeated survey with two independent samples, q ¼ 0. This is a disadvantage since no shared variance between the two samples is taken into account, meaning that estimates of net change will typically be less precise. In a panel survey however, q is the correlation between an individual’s x-values at each time period. Commonly, an individual’s x-values will be correlated over time, in which case the panel survey will yield a much more precise estimate of net change than will a repeated cross-sectional survey of the same size (Duncan and Kalton, 1987).7 Again, it should be noted that scanner panel data is restricted to measuring behavioural (rather than attitudinal) changes. In the case of combination surveys, a relationship exists between the values attached to some units of the sample at two or more points of time. As such, it is possible to use the information contained in earlier samples to improve the current estimate of the population parameter. This information is utilised to improve error estimates in comparison to those obtained for independent samples. For any form of overlapping samples (including repeated designs with overlap, rotating panel designs, and split-panel designs), estimates of variance can be calculated on the basis of: (1) the standard error of the paired units, where assumptions must be made about the correlation structure between repeated observations on the same units; (2) the variance between observations that are independent at different points in time; and (3) weights for each of the aforementioned variances, which will be dependent on samples size, the underlying variances and the sample design. Many different methods of calculating estimates for overlapping/ rotating designs have been proposed (see Binder and Hidiroglou, 1988 for a review of some of these methods). 3.4. Aggregation of data for individuals over time Cross-sectional studies utilise independent samples. As such, aggregation of individual level data over time is not possible. Conversely, the multiple observations in longitudinal panel data allow for this type of analysis. Therefore, panel studies are useful in predicting long-term or cumulative effects which may be particularly useful in growing markets to show the full impact of marketing strategies over time and are normally hard to analyse in crosssectional studies. In combination studies, the extent to which individual changes can be observed and aggregation over time can be performed is limited to the duration of the panel component of the combination study (which will vary in size depending on whether an overlapping, rotating, repeated or split panel design 7 It needs to be recognised however, that panel surveys and repeated crosssectional surveys may be measuring different concepts of net change. In a repeated cross-sectional survey, net change reflects a combination of changing values and the changing population composition (as it accounts for new entrants to the population). Conversely, in a panel survey, net change typically reflects only changing values because panel members remain the same over time (Duncan and Kalton, 1987).
286
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
is employed). Assuming a constraint on total resources for any given research study, the longitudinal component of a combination survey will comprise a smaller sample size than would have been the case if all the resources had been devoted to the panel component (Duncan and Kalton, 1987; Kalton and Citro, 1995). 3.5. Estimation of gross changes and other components of individual change Longitudinal panels encompass repeated measures on the same group which enhances the analytic potential of data and enables components of gross and individual change to be measured. Consequently, longitudinal panels can reveal churn patterns and lag effects, for instance the conditions under which consumers change brand choice, or the respective roles of factors such as mass media and word-of-mouth in changing brand preferences. Conversely, cross-sectional surveys are best suited to the exploration of behaviours that vary across individuals but not across time. With cross-sectional designs, gross change or change at an individual level can only be inferred from speculation or qualitative analysis of shifts. Repeated surveys can collect data on events occurring in a specified period and on durations of events by retrospective questioning. However, this often introduces a serious problem of response error in recalling dates and the risk of telescoping bias (Bassi et al., 2000; Duncan and Kalton, 1987; Kalton and Citro, 1995; Kalton et al., 1989). Solon (1989) notes that marginal probabilities (representing net change) may be estimated using either longitudinal or repeated cross-sectional data, but certain conditional probabilities may be estimated only with longitudinal data. For example, it is possible to use either longitudinal or repeated cross-sectional data to investigate whether there was an increase purchase of Brand X between two time periods; but to assess whether a person who uses Brand X at time 1 is more likely to use Brand X at time 2 requires the use of longitudinal data. Similarly, longitudinal panels reveal which brand (if any) defectors of Brand X are most likely to switch to. While scanner and diary panels both encompass the ability to measure gross and individual level change, scanner panel data will usually provide a more accurate portrayal of purchase behaviour since it records actual behaviour as it occurs. Conversely, diary panels are self-report devices which will still be subject to some telescoping and/or recall bias. However, diary panels will be preferable if attitudinal change is of interest for reasons discussed in Section 3.1. Combination designs (including repeated designs with overlap, rotating panel designs, and split-panel designs) are typically concerned with estimating current levels and net change. Combination designs can provide only limited information about individual level and gross change since any of these estimates will be based on a smaller sample than would have been the case if all the resources had been devoted to the panel component (Duncan and Kalton, 1987; Kalton and Citro, 1995). 3.6. Ability to make causal inferences Panel studies also offer benefits for establishing the direction and magnitude of causal relations (Finkel, 1995; Menard, 2002; Rindfleisch et al., 2008). There are generally four criteria of causal inference in marketing and the social sciences: (1) variables must covary; (2) their relationship must not be spurious; (3) there must be a logical time order between variables; and (4) there must be a theory about causal mechanisms (Churchill and Iacobucci, 2005). Cross-sectional studies demonstrate covariation, but they generally cannot establish temporal order. Since cause and effect are measured at the same point in time, it may not be possible to distinguish whether any (assumed) effect preceded or followed the cause, and thus these relationships are not certain. Panel data in-
form causal analysis by determining the temporal sequence of presumed cause and effect (Miller, 1999), and they permit better tests of spuriousness (Finkel, 1995; Gravlee et al., 2009). Furthermore, due to the large range of data collected, longitudinal panel studies can help solve the problems normally encountered when defining a theory on the basis of a one-shot study. This is because as the research progresses over a period of time, a longitudinal panel study can allow for the influences of competing stimuli on the subject, which might increase validity of the findings. However, longitudinal study designs per se are no guarantee for drawing valid causal inferences (Taris and Kompier, 2003). All longitudinal panel designs are susceptible to selective attrition. Selective attrition may result in a restriction of the range of the variables of interest (e.g., only brand loyal or involved respondents remain), meaning that the strength of associations can be under or overestimated. Therefore, some aspects of consumer behaviour may be better suited to cross-sectional designs. For example, crosssectional research designs may be better for measuring brand preference in stagnant (often mature) markets since cross-sectional data will often be more representative of the parent population and hence provide a more accurate estimation of population parameters at distinct time points. Furthermore, as with any within-subject design, longitudinal panel data are susceptible to maturation and history effects. Maturation results when experimental groups are maturing at different speeds. For example, in children some socioeconomic groups have been demonstrated to mature faster than others (Cook and Campbell, 1979). This causes a confounding effect whereby changes in a phenomenon under study could be attributed to a hypothesised cause or psychological changes in the subject. Whereas maturation effects involve an internal process, history effects involve an external event that occurs between the two measurements. History becomes a threat when other factors external to the subjects (in addition to the treatment variable) occur by virtue of the passage of time and may impact outcome variables (Cook and Campbell, 1979). 3.7. Ability to conduct a study with limited resources One-shot cross-sectional designs typically require the least input in terms of time and monetary resources since respondents are generally only contacted once and sample management over time is of minimal concern. Furthermore, recruitment costs for one-shot cross-sectional studies, while not to be underestimated in many instances (particularly when incidence rates of the target population are low), are typically lower than for any sampling over time. Similarly, with repeated surveys a new sample is drawn at each wave and, while recruitment costs can be substantial, there are no sample management costs. Conversely, longitudinal panels demand additional expenses in terms of time and money (Rindfleisch et al., 2008). While longitudinal panel data is typically the best method of capturing the complexity of consumer dynamics and gaining causal insights (Rindfleisch et al., 2008), longitudinal studies are still the exception. Rindfleisch et al. (2008) note that of the 178 survey-based Journal of Marketing and Journal of Marketing Research articles published between 1996 and 2005, 94 percent are cross-sectional in nature. Furthermore, Leonidou et al. (2010) assessed all international marketing articles published in the leading mainstream marketing journals during the period of 1975–2004 and also found that longitudinal research was relatively rare (12 percent of the articles utilised longitudinal research designs), and usually took the form of identifying changes in results at different points in time. Williams and Plouffe’s (2007) review of 1012 sales articles published in 15 leading marketing journals found that less than one percent of these articles included longitudinal data. Dekimpe
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
and Hanssens (2000) suggest that the application of longitudinal panel analysis in marketing settings has been hampered by data limitations. Indeed, the dearth of longitudinal panel studies appears to be largely a consequence of the costs and difficulty in obtaining longitudinal panel data sets and/or maintaining the sample over time (Bove and Johnson, 2009; Leonidou et al., 2010). These resources include costs associated with tracking and tracing mobile sample members and often costs of incentives to encourage members to continue to cooperate in the panel. Many researchers find that the problems associated with panel data deem it unsuitable for their requirements, and/or are simply unable to justify the cost and resources required to effectively manage a panel. Dekimpe and Hanssens (2000) note that a major reason for the historical scarcity of longitudinal marketing data relates to firms’ incentive and data collection systems whereby managers typically have little incentive to build databases of historical performance and marketing effort for their products and services. This is because often only current and future performance is rewarded, and the costs simply outweigh the benefits. This is particularly true for small businesses where resources are often scarce. In recent years, the value of the information yielded through panel data is becoming more apparent and practitioners have access to more such data (Dekimpe and Hanssens, 2000). However, these panels are often costly to manage. Additionally, most available data cover only the recent years where technology has been able to store and manage this information. Combination designs incorporate the costs of panel management to the extent that they comprise a panel component. They can also incur costs associated with selecting independent samples at each time period, but are still typically more cost effective than true panel designs. 4. Dynamic consumer research at present The review raises the question of what conclusions can be made when utilising data collected in cross-sectional versus longitudinal panel versus combination study designs. While this is somewhat dependent on the study’s objectives, it is clear that no singular study design is currently available for the researcher who wishes to meet many of the objectives discussed in Section 3; if a survey design is strong on one set of objectives, it is often weak on others. The literature review has pointed out that longitudinal panel designs are preferable for measuring net change, data on events occurring at a specific time period (since respondents’ recall will diminish over time), estimating aggregate data for individuals over time and measuring gross change and other components of individual change. Conversely, cross-sectional survey designs typically produce more accurate estimates of population parameters and average values at distinct time points and are typically more time and cost effective. Overlapping surveys, rotating surveys, split-panel surveys and repeated surveys with overlap attempt to address this issue by balancing the benefits of longitudinal and cross-sectional data (Binder and Hidiroglou, 1988). The limited membership of sample elements acts to reduce the problems of panel conditioning and panel attrition in comparison to a non-rotating panel survey, and the continual introduction of new samples helps to maintain an upto-date sample of a changing population (Duncan and Kalton, 1987). However, in reality the analytical and information benefits associated with longitudinal data will be restricted to the panel component of overlapping, rotating, or repeated survey designs and the benefits gained from cross-sectional data will be restricted to this component of these survey designs. Furthermore, there is no universal theory on the best manner in which to gain efficiency of estimation in these designs (Binder and Hidiroglou, 1988). Thus, it can be argued that such survey designs are rather inefficient as
287
they also suffer the weakness of both panel and cross-sectional data. As an example, consider a rotating panel design. This will require time and resources to manage the panel component, but estimates of gross and other components of individual change and aggregation of data for individuals over time will be limited to the sample size of the panel component since it is not possible to produce these estimates for individuals rotating in and out of the panel. Furthermore, pooled estimates of population parameters and average values of the population may be skewed as a result of members that have been in the panel for several waves. This paradox leads one to question whether there is a more efficient method of collecting or analysing data that meets the majority of requirements discussed in Section 3. This proposition is considered below. 4.1. Econometric pseudo panels as a solution to data limitations Given the deficiencies of cross-sectional data and the problems associated with collecting longitudinal panel data, one practical solution is to exploit, as much as possible, all the information already available in different cross-sectional data sources. This section explores a family of methodologies specifically designed to model gross change using a repeated time series of cross-sectional data. Such models are rarely, if ever, mentioned in the marketing literature. However, econometricians have, for over 20 years, acknowledged the need to develop tools capable of modelling gross change using repeated cross-sectional data. Specifically, they have proposed several ‘pseudo panel’ models which purport to model gross change utilising a time series of cross-sectional data. The pioneering work of Deaton (1985) opened this alternative possibility to estimate models of individual behaviour using micro data. Deaton (1985) proposes that if data are structured as a time series of independent cross-sections (i.e., independent samples of individuals for different periods of time) it is possible to divide the population into cohorts so that each cohort contains the same individuals over time. It is then feasible to calculate the sample means for each group on each time period, and use these sample means as a panel subject to measurement errors. The notion behind this process is that for large enough cohorts (or samples), successive surveys will generate successive random samples of individuals from each of the cohorts. Summary statistics from these random samples generate a time series that can be used to infer behavioural relationships for the cohort as a whole just as if panel data were available (Deaton, 1985). The basic premise is that it is possible to construct pseudo panels, whereby the sample is divided into groups whose membership is assumed to be fixed over time. The average behaviour of these groups is then tracked over time as long as the sample is continually representative of a population that has fixed composition (Gibson and Scobie, 2001).8 Deaton (1985) proposes a measurement-error corrected withingroups estimator for the static model with individual effects, which is consistent with a fixed number of observations per cohort. Collado (1997), McKenzie (2004), and Moffitt (1993), extend this work to show that dynamic models may also be consistently estimated using a time series of independent cross-sections. Although these papers are concerned with essentially the same model9, the 8 The authors of these works propose that such an aggregation process creates cohort population means that have a genuine panel structure given that, at the population level, the groups contain the same individuals over time. Deaton (1985) asserts that if there are additive individual effects, there will be corresponding additive cohort effects. Further, the sample cohort means from the surveys are consistent but error-ridden estimates of the true cohort means. Hence, provided errors-in-variables techniques are used (and error variances and covariances can be estimated from the surveys), the sample cohort means can be used as panels for estimating the relationship. 9 Namely the first order autoregressive model with exogenous variables.
288
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
proposed estimators and their presentations are very different. The models considered by McKenzie (2004) and Moffitt (1993) differ since they allow for heterogeneity of the parameters across cohorts. This makes it difficult to compare these procedures and their underlying assumptions. However, some generalisations about these econometric models can be made. That is, the estimators first aggregate the individual data into cohorts comprising individuals with some similar observed characteristic(s), for example, year of birth. Using this data, the lagged dependent variable is then replaced by a predicted value from an auxiliary regression and the dynamic model is subsequently estimated via ordinary least squares (OLS), instrumental variables (IV) or generalised method of moments (GMM). These econometric pseudo panels have several benefits. First, they suffer less from problems related to sample attrition because the samples are renewed at every period. Second, because they are constructed by averaging groups of individual observations, there are few problems related to measurement error (at least at the individual level). The third argument in favour of the use of these pseudo panels is a more practical one. Specifically, because of the wide availability of cross-sectional data it is possible to construct pseudo panels that are appropriately representative, covering long periods back in time, and substantially more than can be covered by true panels. Notwithstanding the fact that the pseudo panel approaches from econometrics offer a framework for making joint use of independent cross-sectional information, many of the models have limitations. There are a number of model specification problems10, but perhaps the key problem for marketers intending to utilise these pseudo panel models is that they are all developed for application to econometric modelling issues. As such, the research objectives are often very different to those of consumer behaviourists. Therefore, the key concerns relate to these models’ ability to accommodate marketing data, in particular consumer panels. Data capability issues include: (1) Demographic and behavioural variables measured in economic panels are often characterised by limited inter-cohort heterogeneity and therefore pooling of estimators might be reasonable. For example, cohorts defined by birth year are commonly used in the economic literature in the study of household consumption (McKenzie, 2004). However, it is unlikely that some of the behavioural variables measured in a consumer diary panel will reflect such inter-cohort
10 Model specification problems include: (1) the models establish the large-sample properties of economic estimators and test statistics by driving the number of cohorts to infinity – this is unlikely to be satisfactory as there is often a physical limit beyond which one cannot increase the number of cohorts (e.g., birth cohorts); (2) standard pooling estimators will lead to invalid inference if the response parameters are characterised by cohort-wise heterogeneity (this pertains to dynamic modelling in particular) (Pesaran and Smith, 1995) and given that cohorts should be chosen so that they are as distinct from each other as possible (cf. Deaton, 1985), it is important to have a framework which allows for some sort of response heterogeneity; (3) many of the models assume intra-cohort homogeneity. In some cases this may be too great an assumption as it is difficult to assume away unobserved individual heterogeneity in micro data (Girma, 2000); (4) the models assume that the population is fixed. They may introduce biases if the average cohort household fails to account for changing trends in household dissolution and creation (such as migration, for instance) – this assumption seems to be rather ambitious as evidence of most consumer groups suggests that they are typically affected by some form of migration and not at all static (Blackwell et al., 2001); and (5) the decision about the clustering of observations in cohorts depends on a trade-off, this being, a larger number of cohorts means a smaller number of individuals per cohort. On the one hand, one would like to have a large number of cohorts so that the regressions performed with the resulting pseudo panels suffer less from small sample problems but on the other hand, if the number of observations per cohort is not large enough, the average characteristics per cohort would fail to be good estimates for the population cohort means. The literature is not yet conclusive as to a solution to this issue (Cuesta et al., 2007).
homogeneity. Consider the same example of age cohorts. In most instances it would seem unfeasible to assume that all individuals belonging to the same age group or generation would display similar enough consumer behaviour to justify pooling their responses and treating this as a single observation. For example, while consumers in the same age cohort might represent similar household compositions, it is unlikely they will be similar with respect to brand purchase behaviour in a number of product categories. (2) In relation to the aforementioned point, the aggregation framework utilised in the econometric models might not be desirable. Aggregating responses will likely mask much of the complexity underlying any given consumer behaviour. Conversely, an estimation method that is based on individual level data (rather than cohort averages) can make a more efficient use of the available information since it takes into account more granular changes occurring at the micro level, and potential problems of interpretation arising out of aggregation are likely to be mitigated.
5. The potential for pseudo panels in consumer behaviour While econometric pseudo panels provide a valuable starting point for exploiting the benefits of panels using cross-sectional data, it is unlikely that these pseudo panel models will be of use when exploring gross and individual level change in the attitudinal and behavioural variables typically collected in consumer panels (particularly diary panels). This predicament highlights not just the necessity for an approach that allows the investigation of gross and individual level change through the use of repeated crosssectional data, but also one that is developed for use in marketing. Unlike econometric models, this technique should allow for more respondent heterogeneity at the micro level and should take into account any required changes in the population. This article contends that a radically different approach to matching individuals between time periods is required. Such an approach would make use of a large number of variables to match individuals between time periods (to account for respondent heterogeneity and achieve matches with greater accuracy between time periods) and would preclude the use of cohort averages (to secure greater detail at the individual level). Ideally, the technique would also utilise ‘live’ rather than predicted or averaged respondent values. The technique should also be designed specifically for use with marketing data such as those collected in consumer panels. This article proposes the potential for a family of techniques referred to as data fusion11 to provide a solution to these issues. These techniques have been used for many years in the field of media research as a means of linking two independent samples of individuals and utilise attitudinal, behavioural, and demographic data. Traditional data fusion typically involves matching two separate databases (Files A and B) to explore dependency relationships. One file contains independent variables, while the other contains dependent variables. Individual units from data File A are matched to similar units in data File B on the basis of variables common to both files. This is to create single source data which can be used to explore dependency relationships between variables in the merged file. The assumption behind traditional statistical matching is that data files are collected at approximately the same time period. This body of techniques has not yet been applied in a dynamic context. In this research, it is proposed that data files collected at different intervals in time are fused. The following 11 Sometimes termed statistical matching or synthetical matching (D’Orazio et al., 2006).
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
section addresses some of the issues surrounding the use of data fusion techniques to develop pseudo panel data. 5.1. Matching variable congruence When conducting a data fusion, the best matches are achieved when the variables of interest in Files A and B, termed Y t and Y tþ1 respectively, are conditionally independent given the matching variables12, termed X. This notion is termed the conditional independence assumption (see Sims, 1972 for further discussion about this). The conditional independence assumption implies that the matching (X) variable(s) is unchanged, and has a similar value in Files A and B. However, to find a perfect match for each individual in terms of the matching variables may be impossible, especially if (some) matching variables are continuous (Rässler, 2002). Therefore, the fusion process is often carried out using an algorithm based on nearest neighbour techniques calculated by means of a distance measure. Very large minimum distance measures will usually alert an analyst if good matches are unavailable for any given donor and recipient pairs (Reilly, 2000). The inability to produce exact matches is not of concern provided that matching variables remain fairly comparable between data Files A and B. In the case where an analyst is matching across two time periods, this situation is not quite so straightforward. This is demonstrated through the following two scenarios. First, assume that one wishes to conduct a traditional matching exercise whereby an individual in File A is matched with their closest possible match in File B, using the matching variables gender, age and income. Because the data files are collected at the same point in time, the analyst merely needs to match the individual to his closest counterpart on the basis of age, gender and income. Alternatively, consider this second scenario. Assume now that one is matching over multiple periods of time. Assuming that File A are the donor files, the analyst must impute the Y t variable(s) in File A into File B. However, the donor matching variables may have changed in the interim between when Files A and B were collected (e.g., age and income distributions may have changed between waves of data collection). This change arises as a result of the passage of time. In such a situation, File A must be adjusted to reflect these changes. This should be done prior to conducting the matching exercise. 5.2. Donor and recipient choice When conducting any statistical match, one must typically decide which files will be treated as the recipient file and which file will be the donors. Traditional data fusion situations are symmetrical. Therefore, the choice of recipient and donor files typically depends on the matching task (D’Orazio et al., 2006). When fusing data over multiple periods of time the analyst will be faced with data files that are asymmetrical by design. In this instance the choice of donor and recipient units may not be as simple. This is because, the empirical distributions of the recipient files are preserved exactly in the data fusion process, while the empirical distributions of the donor files are subject to change, bringing them more in line with the recipient files (Rässler, 2002).13 Fig. 1 demonstrates the outcome of choosing File B (collected in the latter time period) versus File A (collected in the earlier time per12
Also referred to as common variables. It should be noted that the empirical marginal distributions of imputed variables Yt in the statistically matched file should be compared with their empirical distributions in the original donor sample (prior to the matching process) to evaluate the similarity of both samples. Rässler (2002) suggests that the empirical distribu^ tions ~f should not differ from ^f Y i more than two random samples drawn from the true underlying population. 13
289
iod) as the donor file. Fig. 1 demonstrates that over time, the distribution curve shifts to reflect changes that occur in the population. In Situation 1, File B is chosen as the donor File. This means that File A, which was collected in the earlier time period, is the recipient file. As such, the data fusion will exactly preserve the distribution from the earlier time period (File A) and will not take into account changes that have occurred in the population since the File A was collected. With each subsequent fusion, the number of sufficient matches decreases (since the population is changing and the fusion’s distribution does not reflect these changes) and after several waves only a small number of individuals can be adequately matched (i.e., those in the overlap of the distributions). This lack of update would also be present in a true-panel design (without update). Alternatively, consider Fig. 1, Situation 2. This represents the same changes in population. However, in this scenario File B (from the latter time period) are the recipients and File A (from the earlier time period) is the donor file. In this situation the data fusion will exactly preserve the distribution from the latter time period (File B) and will take into account changes that have occurred in the population since the File A was collected. As such, with each subsequent fusion the data file is updated and matches are possible after the new and distinct members of the population have been present for more than one wave of data collection. This is similar to an updated panel situation. Therefore, because it is important to preserve the distribution of the most recent time period under study, data from the earlier time period, time t (donors) should be fused into data from the later time period, time t + 1 (recipients). This is to ensure that the sample distribution best reflects the population distribution of the most recent point in time. This matter is considered further below in a discussion about the stationarity of the File A and File B distributions. 5.3. Population and sample distributions One of the strongest assumptions of statistical matching is that A [ B is a unique data set of i.i.d records from f ðx; yt ; ytþ1 Þ. When the two samples are drawn at different times, this assumption may no longer hold. For example, let File B be the most up-to-date sample of size nB still from f ðx; yt ; ytþ1 Þ which is the joint distribution of interest, with Y t missing. Let File A be a sample independent of File B whose nA sample units are i.i.d from the distribution gðx; yt ; ytþ1 Þ, with g distinct from f. It is questionable whether these samples can be statistically matched. When completing File B, imputation of B records with live Y t values from File A corresponds to mimicking in the completed recipient file the conditional distribution of Y t given X in observed File A (Kiesl and Rässler, 2006). Matching can be performed when, although the two distributions f and g differ, the conditional distribution of Y t given X is the same on both occasions. In this case, appropriate statistical matching procedures should be defined which assign different roles to the two samples A and B: File A should lend information on Y t to File B (D’Orazio et al., 2006; McDonald and Kippen, 2000). The previous paragraph thus raises the question of stationarity of data files.14 The assumption of stationarity for two time periods should be a reasonable assumption to make provided the data files meet two criteria, these being: (1) They are collected at reasonably similar intervals in time, for example, a year apart. (2) There are no great changes in the population structure due to such causes as extreme migration or catastrophic events.
14 A stationary process is a stochastic process whose joint probability distribution does not change when shifted in time or space (Rodgers and DeVol, 1981).
290
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
Situation 1: File B (Collected at Latter Time Period) Chosen as Donor File
Situation 2: File A (Collected at Earlier Time Period) Chosen as Donor File
Fig. 1. Data fusion over time.
When conducting data fusion over time, the assumption of stationarity only needs to hold over any two time periods for which the fusion is taking place providing the data file collected at the latter time period (File B) is utilised as the recipient file (i.e., Fig. 1, Situation 2). This is because this selection of recipients ensures that the distribution of variables in File B (i.e., f ðX tþ1 Þ) remains unchanged. Therefore, when it is matched with the following time period (time t + 2) the values of X and Y are original rather than imputed values. This means that they should be from a similar distribution such that f ðx; ytþ1 Þ ¼ f ðx; ytþ2 Þ. Furthermore, using File B as recipients will allow for any updates in the population to be taken into account. 6. Discussion: using data fusion to develop pseudo panel data This article has highlighted the potential of using data fusion techniques to develop pseudo panel data using cross-sectional data. Such a technique would be capable of modelling gross change using repeated cross-sectional data and would be able to accommodate the types of variables collected in marketing. Some considerations for the researcher deliberating using such a technique were noted. This section now discusses the implications of these considerations and provides some recommendations for marketers interested in utilising data fusion techniques to develop pseudo panels. The first consideration for developing pseudo panels using repeated cross-sectional data relates to the notion that the donor matching variables may have changed in the interim between when Files A and B were collected. Take for example the simple case of matching people in two data on the basis of their age. If data File A was collected in 2009 and data File B was collected in 2010 then data File A should be age adjusted by a year prior to the matching process. While age represents a simple example, assume now that two data files were being matched based on people’s income or highest educational attainment. The question then arises as to the best method by which to age adjust these demographic variables prior to matching. Furthermore, the situation becomes more complex when matching people on the basis of psychographic variables. This requires further research if undertaking the proposed technique. Secondly, when fusing data over multiple periods of time, the choice of which file should be the recipient and which the donor is not as straightforward as with traditional data fusion since such data files are asymmetrical by design. In traditional data fusion the choice of donor and recipient files depends on the accuracy of the information contained in the two files, the sample size, and the phenomena under study (D’Orazio et al., 2006). Conversely, the choice of donor and recipient files will be more restricted when fusing files collected at differing time periods. It is suggested that in this latter scenario, File B (from time t + 1) are the recipients
and File A (from time t) is the donor file. This will ensure that the population is constantly updated and matches are possible after the new and distinct members of the population have been present for more than one wave of data collection. This restricted choice may impact on data fusions over time when File A (the suggested donor file) is smaller than File B (the suggested recipient file). This is because theory suggests that the analyst should choose the smaller File as the recipient (D’Orazio et al., 2006) since some records in the donor file would be imputed more than once in the recipient file if the donor file was smaller. Consequently, this can artificially modify the variability of the distribution of the imputed variable in the final synthetic file (D’Orazio et al., 2006). Restrictions on the number of matches per donor or utilising only a sample of data File B may help to resolve this issue. Last, an analyst considering utilising data fusion techniques to develop pseudo panel data must consider the assumption of stationarity. It was noted that the assumption of stationarity does not need to be extended beyond the two time periods for which the fusion is taking place when the data file collected at the latter time period (File B) is utilised as the recipient file. Assuming that there is not too large a passage of time between when data files were collected, and that File B are recipients, the assumption of stationarity should hold. This is supported by studies in New Zealand and Australia which suggest that while these populations may demonstrate significant changes over a ten or twenty year period, the assumption of stationarity might be reasonable in over a one-year period (McDonald and Kippen, 2000). Furthermore, other applications of statistical matching have assumed stationarity over a one-year period (see for example Ingram et al., 2000). However, if there are concerns about new entrants to the population, matching can be performed on the overlap of the two distributions only. 7. Conclusion Marketers have long recognised the importance of understanding dynamic patterns of behavioural and attitudinal change (Koller and Salzberger, 2009). While longitudinal panel data is typically the best method of capturing the complexity of consumer dynamics and gaining causal insights (Rindfleisch et al., 2008), a number of journal articles have recently highlighted the dearth of longitudinal studies published in marketing academic literature (Leonidou et al., 2010; Rindfleisch et al., 2008; Williams and Plouffe, 2007). In an attempt to address this gap, this work has discussed crosssectional, longitudinal and combination study designs and their ability to meet various research objectives. A review of benefits and limitations of the different study designs emphasised that the attributes of cross-sectional data are often not apparent in longitudinal panel data and vice versa. The literature review then examined econometric pseudo panels in which repeated cross-sectional data are split into cohorts,
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
averaged, and each cohort mean is treated as an observation. Since the original proposal of pseudo panels over 20 years ago, various model specification improvements have been made to these models. However, these models still have model specification problems (in particular, the assumption of a fixed population), and use cohort averages which are less relevant to the modelling of consumer panel data. Furthermore, these models are incapable of accommodating variables collected in consumer panels such as consumer attitudes and behaviour. This article advocates a radically different approach to developing pseudo panels as a solution to researchers interested in modelling consumer dynamics. Data fusion techniques are proposed as a means of matching independent samples over multiple time periods to create pseudo panels. Using data fusion to develop pseudo panel data will provide many advantages including (1) it should allow for more respondent heterogeneity at the micro level, (2) it will take into account any changes in the population, (3) it will use a large number of variables to match individuals between time periods, therefore accounting for respondent heterogeneity and achieving matches with greater accuracy between time periods, (4) it will preclude the use of cohort averages, hence securing greater detail at the individual level, and (5) it will utilise ‘live’ rather than predicted or averaged respondent values. It is purported that such a technique is extremely appropriate for use with marketing data such as those collected in consumer panels. Finally, theoretical concerns associated with data fusion applied across time periods were discussed. Acknowledgements The author would like to express her grateful thanks to Professor Cristel Russell and Professor Rod Brodie for their helpful comments on the first draft of this paper. References Andrews, R.L., Currim, I.S., 2005. An experimental investigation of scanner data preparation strategies for consumer choice models. International Journal of Research in Marketing 22 (3), 319–331. Apodaca, R., Lea, S., Edwards, B., 1998. The effect of longitudinal burden on survey participation. Paper presented at the conference proceedings of survey research methods section of the American Statistical Association. Retrieved 3 January, 2010 from http://www.amstat.org/sections/srms/proceedings/papers/1998_ 156.pdf. Bailar, B.A., 1975. The effects of rotation group bias in estimates from panel surveys. Journal of the American Statistical Association 70 (349), 23–30. Bassi, F., Hagenaars, J.A., Croon, M.A., Vermunt, J.K., 2000. Estimating true changes when categorical panel data are affected by uncorrelated and correlated classification errors. Sociological Methods and Research 29 (2), 230–268. Binder, D.A., Hidiroglou, M.A., 1988. Sampling in Time. In: Krishnaiah, P.R., Rao, C.R. (Eds.), Handbook of Statistics, vol. 6. North Holland, New York, pp. 187–211. Birn, R., 2004. The Effective Use of Market Research: How to Drive and Focus Better Business Decisions, 4th ed. Kogan Page, London. Blackwell, R.D., Miniard, P.W., Engel, J.F., 2001. Consumer Behavior, 9th ed. Harcourt College Publishers, Orlando, FL. Bound, J., Brown, C., Mathiowetz, N., 2001. Measurement Error in Survey Data. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of Econometrics, vol. 5. Elsevier Science, North Holland, pp. 3705–3843. Bove, L.L., Johnson, L.W., 2009. Does ‘‘true’’ personal or service loyalty last? A longitudinal study. Journal of Services Marketing 23 (3), 187–194. Christian, C.W., Frischmann, P.J., 1989. Attrition in the statistics of income panel of individual returns. National Tax Journal 42 (4), 495–501. Churchill, G.A., Iacobucci, D., 2005. Marketing Research: Methodological Foundations. South-Western, Mason, OH. Collado, M.D., 1997. Estimating dynamic models from time series of independent cross-sections. Journal of Econometrics 82 (1), 37–62. Cook, T.D., Campbell, D.T., 1979. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Houghton Mifflin, Boston, MA. Corder, L.S., Horvitz, D.G., 1989. Panel Effects in the National Medical Care Utilisation and Expenditure Survey. In: Kasprzyk, D., Duncan, G.J., Kalton, G., Singh, M.P. (Eds.), Panel Surveys. John Wiley and Sons, New York, pp. 304–318. Cuesta, J., Ñopo, H., Pizzolitto, G., 2007. Using Pseudo-Panels to Measure Income Mobility in Latin America (working paper #625). Inter-American Development Bank Research Department, Washington, DC.
291
D’Orazio, M., Di Zio, M., Scanu, M., 2006. Statistical matching: Theory and practice. John Wiley and Sons Ltd., West Sussex. Deaton, A., 1985. Panel data from time series of cross-sections. Journal of Econometrics 30 (1/2), 109–126. Dekimpe, M.G., Hanssens, D.M., 2000. Time-series models in marketing: past, present, and future. International Journal of Research in Marketing 17 (2–3), 183–193. Duncan, G.J., Kalton, G., 1987. Issues of design and analysis of surveys across time. International Statistical Review 55 (1), 97–117. Falaris, E.M., Peters, H.E., 1998. Survey attrition and schooling choices. Journal of Human Resources 33 (2), 531–554. Finkel, S.E., 1995. Causal Analysis with Panel Data. Sage Publications, Thousand Oaks, CA. Gibson, J.K., Scobie, G.M., 2001. Household Saving Behaviour in New Zealand: A Cohort Analysis (working paper 01/18). New Zealand Treasury, Wellington, New Zealand. Giesler, M., 2008. Conflict and compromise: drama in marketplace evolution. Journal of Consumer Research 34 (6), 739–753. Girma, S., 2000. A quasi-differencing approach to dynamic modeling from a time series of independent cross-sections. Journal of Econometrics 98 (2), 365– 383. Gönül, F., Srinivasan, K., 1996. Estimating the impact of consumer expectations of coupons on purchase behavior: a dynamic structural model. Marketing Science 15 (3), 262–279. Gravlee, C.C., Kennedy, D.P., Godoy, R., Leonard, W.R., 2009. Methods for collecting panel data: what can cultural anthropology learn from other disciplines? Journal of Anthropological Research 65 (3), 453–483. Han, S., Gupta, S., Lehmann, D.R., 2001. Consumer price sensitivity and price thresholds. Journal of Retailing 77 (4), 417–424. Hansen, M.H., Hurwitz, W.N., Madlow, W.G., 1953. Sample Survey Methods and Theory. Wiley, New York. Hill, D.H., Willis, R.J., 2001. Reducing panel attrition: a search for effective policy instruments. Journal of Human Resources 36 (3), 416–438. Ingram, D.D., O’Hare, J., Scheuren, F., Turek, J., 2000. Statistical matching: A new validation case study. Paper presented at the conference proceedings of survey research methods section of the American Statistical Association. Retrieved 3 January, 2010 from http://www.amstat.org/sections/srms/proceedings/papers/ 2000_126.pdf. Jacoby, J., Szybillo, G.J., Berning, C.K., 1976. Time and consumer behavior: an interdisciplinary overview. Journal of Consumer Research 2 (4), 320–339. Kalton, G., Citro, C.F., 1995. Panel surveys: adding the fourth dimension. Innovation: The European Journal of Social Sciences 8 (1), 25–40. Kalton, G., Kasprzyk, D., McMillen, D.B., 1989. Nonsampling Errors in Panel Surveys. In: Kasprzyk, D., Duncan, G.J., Kalton, G., Singh, M.P. (Eds.), Panel surveys. John Wiley and Sons, New York, pp. 249–270. Kiesl, H., Rässler, S., 2006. How Valid can Data Fusion be? (IAB discussion paper 200615). Institute for Employment Research, Nuremberg, Germany. Koller, M., Salzberger, T., 2009. Benchmarking in service marketing: a longitudinal analysis of the customer. Benchmarking: An International Journal 16 (3), 401– 414. Kusum, A., Lehmann, D.R., Neslin, S.A., 2001. Marketing response to a major policy change in the marketing mix. Journal of Marketing 65 (1), 44–61. Labeaga, J.M., Lado, N., Martos, M., 2007. Behavioral loyalty towards store brands. Journal of Retailing and Consumer Services 14 (5), 347–356. Laurie, H., Smith, R., Scott, L., 1999. Strategies for reducing nonresponse in a longitudinal panel survey. Journal of Official Statistics 15 (2), 269–282. Lazarsfeld, P.F., Rosenberg, M., 1955. The Language of Social Research. Free Press, Glencoe, IL. Lee, E., Hu, M.Y., Toh, R.S., 2004. Respondent non-cooperation in surveys and diaries: an analysis of item non-response and panel attrition. International Journal of Market Research 46 (3), 311–326. Leeflang, P.S.H., Wittink, D.R., 2000. Building models for marketing decisions: past, present and future. International Journal of Research in Marketing 17 (2/3), 105–126. Leonidou, L.C., Barnes, B.R., Spyropoulou, S., Katsikeas, C.S., 2010. Assessing the contribution of leading mainstream marketing journals to the international marketing discipline. International Marketing Review 27 (5), 491– 518. Lepkowski, J.M., Couper, M.P., 2002. Nonresponse in the Second Wave of Longitudinal Household Surveys. In: Groves, R.M., Dillman, D.A., Eltinge, J.L., Little, R.J.A. (Eds.), Survey Nonresponse. John Wiley and Sons, New York, pp. 259–272. Lynn, P., Buck, N., Burton, J., Jackle, A., Laurie, H., 2005. A Review of Methodological Research Pertinent to Longitudinal Survey Design and Data Collection. Institution for Social and Economic Research, University of Essex, Essex. Malhotra, N.K., Peterson, M., Kleiser, S.B., 1999. Marketing research: a state-of-theart review and directions for the twenty-first century. Journal of the Academy of Marketing Science 27 (2), 160–183. Mathiowetz, N.A., Duncan, G.J., 1988. Out of work, out of mind: response errors in retrospective reports of unemployment. Journal of Business and Economic Statistics 6 (2), 221–229. McDevitt, P.K., Small, M.H., 2002. Proprietary market research: are online panels appropriate? Marketing Intelligence and Planning 20 (5), 285–296. McDonald, P., Kippen, R., 2000. Population futures for Australia and New Zealand: an analysis of the options. New Zealand Population Review 26 (2), 45– 65.
292
C. Frethey-Bentham / Australasian Marketing Journal 19 (2011) 281–292
McKenzie, D.J., 2004. Asymptotic theory for heterogeneous dynamic pseudo-panels. Journal of Econometrics 120 (2), 235–262. Mela, C.F., Gupta, S., Lehmann, D.R., 1997. The long-term impact of promotion and advertising on consumer brand choice. Journal of Marketing Research 34 (2), 248–260. Menard, S.W., 2002. Longitudinal Research, 2nd ed. Sage Publications, Thousand Oaks, CA. Meyer-Waarden, L., 2008. The influence of loyalty programme membership on consumer purchase behaviour. European Journal of Marketing 42 (1/2), 87–114. Miller, W.E., 1999. Temporal order and causal inference. Political Analysis 8 (2), 119–140. Moffitt, R., 1993. Identification and estimation of dynamic models with a time series of repeated cross-sections. Journal of Econometrics 59 (1/2), 99–123. Nauck, D.D., Ruta, D., Spott, M., Azvine, B., 2006. Being proactive: analytics for predicting customer actions. BT Technology Journal 24 (1), 17–26. Norris, F.H., 1987. Effects of attrition on relationships between variables in surveys of older adults. Journal of Gerontology 42 (6), 597–605. Patterson, H.D., 1950. Sampling on successive occasions with partial replacement of units. Journal of the Royal Statistical Society, Series B (Methodological) 12 (2), 241–255. Pesaran, M.H., Smith, R., 1995. Estimating long-run relationships from dynamic heterogeneous panels. Journal of Econometrics 68 (1), 79–113. Rässler, S., 2002. Statistical Matching: A Frequentist Theory Practical Applications, and Alternative Bayesian Approaches. Springer-Verlag, New York. Reilly, J.L., 2000. The development and evaluation of statistical matching applications. Unpublished Master of Science thesis, The University of Auckland, Auckland, New Zealand. Rindfleisch, A., Malter, A.J., Ganesan, S., Moorman, C., 2008. Cross-sectional versus longitudinal survey research: concepts, findings, and guidelines. Journal of Marketing Research 45 (3), 261–279. Rodgers, W. L., DeVol, E.B., 1981. An evaluation of statistical matching. Paper presented at the proceedings of the survey research methods section of the American Statistical Association. Retrieved 3 January, 2010 from http:// www.amstat.org/sections/srms/proceedings/papers/1981_027.pdf. Schaeffer, N.C., 1994. Errors of Experience. Response Error in Reports about Child Support and their Implications for Questionnaire Design. In: Schwartz, N., Sudman, S. (Eds.), Autobiographical Memory and the Validity of Retrospective Reports. Springer, New York.
Sharp, B., Sharp, A., 1997. Loyalty programs and their impact of repeat-purchase loyalty patterns. International Journal of Research in Marketing 14 (5), 473– 486. Sims, C.A., 1972. Comments on Okner (1972). Annals of Economic and Social Measurement 1 (3), 343–345. Smith, R.A., Lux, D.S., 1993. Historical method in consumer research: developing causal explanations of change. Journal of Consumer Research 19 (4), 595–610. Solon, G., 1989. The Value of Panel Data in Economic Research. In: Kasprzyk, D., Duncan, G., Kalton, G., Singh, M.P. (Eds.), Panel Surveys. John Wiley and Sons, New York, pp. 486–496. Steenkamp, J.B.E.M., Gielens, K., 2003. Consumer and market drivers of the trial probability of new consumer packaged goods. Journal of Consumer Research 30 (3), 368–384. Strom, Leslie S., 1977. Consumer diary. US Patent 4,000,915. Washington, D.C.: U.S. Patent and Trademark Office. Retrieved 4 January from http://www. freepatentsonline.com/4000915.html. Taris, T.W., Kompier, M., 2003. Challenges in longitudinal designs in occupational health psychology. Scandinavian Journal of Work Environment and Health 29 (1), 1–4. Toh, R.S., Hu, M.Y., 1996. Natural mortality and participation fatigue as potential biases in diary panels: impact of some demographic factors and behavioral characteristics on systematic attrition. Journal of Business Research 35 (2), 129– 138. Waterton, J., Lievesley, D., 1989. Evidence of Conditioning Effects in the British Social Attitudes Panel. In: Kasprzyk, D., Duncan, G.J., Kalton, G., Singh, M.P. (Eds.), Panel Surveys. John Wiley and Sons, New York. Williams, B.C., Plouffe, C.R., 2007. Assessing the evolution of sales knowledge: a 20year content analysis. Industrial Marketing Management 36 (4), 408–419. Winer, R.S., Bucklin, R.E., Deighton, J., Erdem, T., Fader, P.S., Inman, J.J., Katahira, H., Lemon, K., Mitchell, A., 1994. When worlds collide: the implications of panelbased choice models for consumer behavior. Marketing Letters 5 (4), 383–394. Womer, S., 1944. Some applications of the continuous consumer panel. Journal of Marketing 9 (2), 132–136. Yee, J.L., Niemeier, D., 1996. Advantages and disadvantages: Longitudinal vs. repeated cross-section surveys. Project Battelle, 94 (16), FHWA, HPM-40, Retrieved 3 January, 2010 from http://ntl.bts.gov/data/letter_am/bat.pdf.