Measuring the food environment using scanner data
7
Recently, policymakers have explored strategies to promote healthier eating and combat rising obesity rates in the United States by changing the food environment. These strategies include making access to healthy foods easier and making access to unhealthy foods more difficult. Several private government initiatives have been developed, researched, and implemented to achieve healthier food access including locating farmers' markets and large supermarkets in low-resource neighborhoods (Cummins, Flint, & Matthews, 2014; Elbel et al., 2015; Sadler, 2016; Singleton, Li, Odoms-Young, Zenk, & Powell, 2019) and implementing zoning restrictions that limit access to unhealthy foods (Sturm & Cohen, 2009). Scanner data can be useful for exploring food policy questions regarding how the food environment affects food choices and health outcomes. Publicly available survey data usually suppress the locations of households because of disclosure issues, so researchers cannot construct measures of the food environment of surveyed households. For example, in the public-use Consumer Expenditure Survey, information on state of residence is made available for about 80% of the households, but the other 20% of the households are recoded to other states within each census region or suppressed because of nondisclosure requirements by the US Census Bureau [Bureau of Labor Statistics (BLS), 2016].a Household scanner data overcome this issue by including information on the state and county where the household is located and, in some cases, the census block. The detailed purchase information available in the scanner data also allows researchers to develop more refined definitions of the food environment. It is common practice in the public health, nutrition, and economics literature to assume that supermarkets offer healthier food items compared with convenience stores. The number or density of supermarkets within a particular geographic boundary of a household is used as a proxy for healthfulness of the food environment. For example, in the Food Access Research Atlas, the US Department of Agriculture (USDA) defines “low-access census tracts” as those with a significant number or share of individuals in the tract far from a supermarket. The density of supermarkets is used in defining a food desert and food swamp, which is discussed more in Section 7.1. The detailed product information available in scanner data allows researchers to go beyond this rather blunt measure of healthfulness of the food environment. Rather than assume that a particular store format offers more healthful food choices, the analyst can use information on purchases at retail stores to characterize the food environment. Section 7.2 describes a method of using the rich product information available in scanner data to characterize the food environment that does not assume that supermarkets offer the healthiest foods. a
The BLS recently developed experimental state-level weights for the 2016–17 Consumer Expenditure Survey (see https://www.bls.gov/cex/csxresearchtables.htm#stateweights for more details).
Using Scanner Data for Food Policy Research. https://doi.org/10.1016/B978-0-12-814507-4.00007-9 © 2020 Elsevier Inc. All rights reserved.
178
Using Scanner Data for Food Policy Research
In our application (Section 7.3), we examined different measures of the food environment and their association with healthfulness of household food purchases. Using a subset of households who report random-weight purchases (i.e., Fresh Foods panel) and health conditions (Ailment panel) in the Nielsen Homescan data, we estimated the association of food prices and retail access with the healthfulness of household food purchases in the same regression model. This allowed us to disentangle the association between healthfulness of household food purchases and food prices. Much of the previous research has examined the price and retail food environment effects separately, making it difficult to compare their relative importance given the modeling and data differences across studies.
7.1 General approach to measuring the US food environment The retail food environment has been defined several ways in the economics, public health, and nutrition literature, and the definition largely depends on the type of data available to researchers. Publicly available data on the number and type of retail stores are available mostly at the county level, and several studies have appended county-level data to scanner data to measure the food environment (see Section 5.9 for more details). Other studies have used proprietary data that provide the exact location of retail stores so that researchers can quantify the number and type of retail stores for a smaller geographic area (e.g., census tract) or construct geographic areas around each household within the dataset (i.e., buffer rings). In this section, we discuss the different datasets that can be used to measure the food retail environment in the United States. The simplest method used to account for the US food environment is appending county-level information on the number or density of retail stores by format to household scanner data. Supermarkets, warehouse clubs and supercenters, and other largestore formats are assumed to offer a greater variety of foods, including more healthy food options. Conversely, convenience stores and other small-store formats are assumed to have a limited product assortment, carrying highly storable foods that appeal to many American palates and tend to be unhealthy. The USDA's Food Environment Atlas shows the counts and density (per 1000 persons in a county) of supermarkets, convenience stores, specialized food stores, and warehouse clubs and supercenters. The Food Environment Atlas is available from 2009, but its data source, the US Census Bureau's County Business Patterns, is available for earlier years. The Food Environment Atlas also calculates the share of total county population and subpopulations in each county (e.g., children and seniors) that live in low-access areas. The USDA defines access for each census tract along several dimensions, including availability of healthy foods as measured by households' proximity to supermarkets, individual resource constraints (i.e., income and vehicle ownership), and neighborhood resource constraints (i.e., access to public transportation). Definitions of access vary by urban-rural areas. For example, low-access census tracts are those that are low income and have a significant number (at least 500 people) or share
Measuring the food environment using scanner data179
(at least 33%) of the population that are >0.5 or 1 mile from the nearest supermarket, supercenter, or large grocery store for an urban area or >10 or 20 miles for a rural area. Low-income, low-access areas are commonly referred to as “food deserts.” Cooksey-Stowers, Schwartz, and Brownell (2017) argued that the food environment is also composed of sources of unhealthy foods, and the combination of healthy and unhealthy food access should be considered when measuring the food environment. In particular, they found their measures of “food swamps,” areas where unhealthy food options (typically convenience stores and restaurants) inundate healthy food options, were better predictors of obesity than measures of food deserts. Using the USDA Food Environment Atlas, Cooksey-Stowers et al. (2017) developed county-level measures of food swamps they call the Retail Food Environment Indexes. These indexes are the ratio of the number of stores offering unhealthy foods to the number of stores offering healthy foods; the indexes vary by the type of store format that is considered healthy and unhealthy. Most of the publicly available data used to construct measures of the food environment are disaggregated to the county level. However, USDA calculates access measures for census tracts for select years in the Food Access Research Atlas. As with their county-level counterparts presented in the Food Environment Atlas, the Food Access Research Atlas shows the census tracts that are low income and low access. These data are constructed using proprietary Nielsen TDLinx data that identify the exact location of stores and block-level population data from the 2010 Census of Population and Housing. Most household scanner datasets include information on where each sample household lives, and researchers can append food access information from the Food Access Research Atlas as a control variable or variable of interest to household scanner data (see Section 5.9 in Chapter 5 for more studies that have done this). Proprietary data like Nielsen TDLinx allow researchers to calculate a more geographically refined measure of the food environment. For the Food Access Research Atlas, food access measures were constructed in terms of census tracts but some argue that the use of administrative units like census tracts or counties is arbitrary and may not be a behaviorally relevant boundary. Proprietary data that include the exact location of stores have been used to create geographic buffers around each sample household within scanner data. The assignment of buffer distances around each household may also seem arbitrary, but these distances can be varied and the robustness of parameter estimates that are based on particular buffer distances can be checked. In our application in this chapter, we allow the buffer distance to vary with income and urbanicity and allow the econometric model to determine the distances in which the retail environment has the strongest associations with the nutritional quality of household food choices (see Section 7.3). Proprietary databases that have been used to construct the retail food environment include InfoUSA (used in our application), Nielsen TDLinx, Dun & Bradstreet Duns Market Identifiers File, and the Walls and Associates National Establishment TimeSeries database. Each database comes with strengths and weaknesses. For example, Gordon-Larsen, Rummo, and Albrecht (2015) compared food outlet data from TDLinx and the Duns Market Identifiers File to a field-based census of food stores and restaurants in 31 census tracts in Durham, North Carolina (often referred to as “ground truthing.”) They found that 111 (64%) and 95 (55%) of the food stores identified in their census of stores through ground-truthing methods were listed in TDLinx and
180
Using Scanner Data for Food Policy Research
Dun & Bradstreet, respectively. Cho, McLaughlin, Zeballos, Kent, and Dicken (2019) compared many of these databases, and this publication may be a good starting place for researchers to determine the database that fits their research objectives. Many studies have examined how the food environment shapes food choices and dietary and health outcomes. However, identifying the causality between the food environment and outcomes related to food choices is difficult because both individuals and food retail and foodservice companies choose to locate in certain neighborhoods for unobserved reasons that may be correlated. Individuals are not randomly assigned where to live but rather self-select into areas based on preferences that may or may not be observed in the dataset. So, for example, the observed association between supermarket density and healthfulness of food purchases may be picking up the influence of unmeasured endogenous variables (i.e., the mechanism that “distributes” different individuals to different environments). Different statistical methods have been used to account for unobserved characteristics that may influence where individuals and food retailers choose to locate. One way this has been achieved is instrumenting for the food environment with counts of or proximity to a highway, street connectivity, or land zoned for commercial use. These have been popular instruments for fast-food restaurant location (Chen, Florax, & Snyder, 2013; Dunn, 2010), measures of food swamps (Cooksey-Stowers et al., 2017), and food store access (Rummo et al., 2017). The argument for using highway access, street connectivity, and zoning as instruments is that they are associated with food environment development but not associated with consumer food choices and food-related outcomes (e.g., obesity) other than through their association with food environment development. Studies have also used the geographical expansion of Walmart stores from Walmart headquarters in Bentonville, Arkansas, as an instrument for proximity to Walmart on obesity (Courtemanche & Carden, 2011) and healthfulness of purchase baskets (Volpe, Okrent, & Leibtag, 2013). These strategies can be implemented with household scanner data by appending these instrumental variables to each household's census tract or county. Another way to account for endogeneity arising from self-selection is to take advantage of the longitudinal nature of household scanner data. With repeated observations of the same household over time, unobserved heterogeneity can be accounted for using random- or fixed-effects models. In the application outlined in Section 7.3, we use fixed-effects models to account for unobserved heterogeneity (e.g., dietary knowledge, habits, and restrictions) that affects food choices but is unobserved in household scanner data. Also, quasi-experiments that arise from policy changes that induce changes in the food environment (e.g., Healthy Food Financing Projects, see Cummins et al., 2014, for more details) can be used to look at changes in food purchasing behaviors before and after a policy change.
7.2 Approximate Healthy Eating Index for stores and households Following Zhen, Lin, Okrent, Karns, and Chrest (2019), we constructed a measure of the healthfulness of food offered by stores and purchased by households using the 2004–06 Nielsen Homescan Fresh Foods panel (see Box 7.1). Instead of assuming that
Measuring the food environment using scanner data181
Box 7.1 The 2004–06 Nielsen Homescan Fresh Foods and Ailment panel data The 2004–06 Nielsen Homescan Fresh Foods panel is a subset of households in the Homescan panel that records the purchase of barcode and random-weight products. To reduce respondent burden, Nielsen decreased the amount of information collected about random-weight products in 2007. Prior to 2007, households reported random-weight purchases for 44 categories and after 2007 (inclusive), households reported purchases for 10 broad categories. For example, before 2007, households recorded random-weight purchases separately for sweet baked goods, breads, rolls, bagels, cakes, pies, cookies, and other sweet baked goods, and starting in 2007, households recorded random-weight purchases for baked goods. Also, starting in 2007, Nielsen stopped assigning prices to r andom-weight product groups. Because the more detailed random-weight descriptions and price information are necessary to calculate the approximate HEI, our analysis focuses on the latest period when Nielsen collected the more detailed random-weight data, 2004–06. Of the 2004–06 Fresh Foods panel, 9624 households lived in the 52 Nielsen Homescan markets and consistently reported retail food purchases in at least 10 months of the year (i.e., the households included in the static panel).a Specific projection factors were calculated for the Fresh Foods panel so that weighted demographic statistics match census targets precisely. Hence, the weighted statistics can be considered population statistics for the markets and years covered. Compared with the population, households in the Fresh Foods panel were older, more educated, and had higher income and fewer children. This demographic information is consistent with studies that have analyzed the entire static Homescan panel (see Section 5.1 in Chapter 5 for details). We also used the 2007 Ailment panel subsample of the Nielsen Homescan Fresh Foods panel. A subsample of Nielsen Homescan households reported whether household members were overweight or obese in the past 6 months. These data are collected in the beginning of each calendar year. Hence, these data are lagged 6 months; therefore, the 2007 Ailment panel data are appropriate to use with the 2006 Homescan Fresh Foods panel. A total of 1050 households were in both the 2006 Fresh Foods and 2007 Ailment panel subsamples. The percentage of the subsample that were overweight or obese is 67% after applying the projection factors to household members with reported obesity or overweight status. This estimate corresponds exactly with the Centers for Disease Control and Prevention's (CDC's) reported estimate that 67% of the US population in 2005–06 was obese or overweight (CDC, 2008). a
Households in the Fresh Foods panel also live in nine remaining areas that are excluded from this analysis. These are households in areas in each census division that do not live in one of the 52 Homescan markets.
182
Using Scanner Data for Food Policy Research
large-format stores like supermarkets offer the most healthful food options like much of the literature, we constructed the approximate Healthy Eating Index (aHEI), which approximates the 2005 Healthy Eating Index (HEI-2005) for stores and households in the Fresh Foods panel. The aHEI for foods offered by stores was then used in measuring the retail food environment, as described in Section 7.3. The HEI-2005 evaluates diet quality based on its conformance with the 2005 Dietary Guidelines for Americans. Measured on a scale of 0 (no adherence to the Dietary Guidelines) to 100 (perfect adherence to the Dietary Guidelines), the HEI2005 is based on meeting thresholds for nine food-group components measured in cup equivalents or grams (total fruit; whole fruit; total vegetables; dark green vegetables, orange vegetables, and legumes; total grains; whole grains; milk; meat and beans; oils); a discretionary calories component measured as percentage of calories (solid fats, alcohol, and added sugars); and two saturated fats and sodium components measured in grams. For example, consumption of total fruit; whole fruit; total vegetables; dark green vegetables, orange vegetables, and legumes; total grains; and whole grains receives a score of 0–5 based on its closeness to thresholds. Consumption of oils, saturated fat, and sodium that meets or is below thresholds receives a score of 0–10. Calculating the exact HEI-2005 requires converting barcode-level quantities recorded in the scanner data into eight components measured in cup equivalents and four components of ingredient composition of quantities measured in grams and as a percentage of calories. However, linking barcodes in the household or retail scanner data to these 12 components is difficult. USDA recently created such files for the IRI product dictionary for a few select years (Carlson, Tselepidakis Page, Zimmerman, Tornow, & Hermansen, 2019), but for many other scanner datasets these linking files are unavailable. Therefore, we modified a method that creates a monthly aHEI for each store and household in the 2004–06 Nielsen Homescan. This approach is similar to the nutrient profiling algorithm developed by Arsenault, Fulgoni, Hersey, and Muth (2012), but instead of imputing the HEI using select nutrients in foods, the quantity shares of foods are used to predict the HEI. Following Volpe and Okrent (2012) and using the National Health and Nutrition Examination Survey (NHANES), we estimated the association of the quantity shares of 30 mutually exclusive food groups with a respondent's HEI-2005 score. First, we calculated the exact HEI-2005 scores for foods obtained at retail stores in the first day of dietary interviews in NHANES for 2003–08. We then regressed the exact HEI-2005 score on quantity shares of the 30 store-purchased food categories.b The quantity share for each food category was calculated by dividing the gram weight of the category by the sum of the gram weights for the 30 categories. Each HEI regression contained an intercept and 29 slope coefficients on 29 food categories. The 30th food category (commercially prepared other non-sweet items) was dropped from the regression to b
This is a different approach from Volpe and Okrent (2012), who regressed exact HEI-2005 on quantities rather than quantity shares. Homescan purchases are known to be underreported (Zhen, Taylor, Muth, & Leibtag, 2009), and most consumers shop at several retailers. Using Homescan per capita purchases and NHANES coefficients to predict household and retailer aHEI could create significant bias. In contrast, the shares of food groups in Homescan are more closely aligned with other nationally representative datasets.
Measuring the food environment using scanner data183
avoid perfect collinearity because the 30 quantity shares sum to one. It is well known that diet quality generally follows an income gradient (Darmon & Drewnowski, 2008). Therefore, it is possible that the contribution of each food group to the HEI-2005 score differs across income levels because purchase patterns likely differ by income even for households with similar nutritional quality of purchases. Therefore, we estimated the HEI-2005 regression by income group (i.e., income-to-poverty ratio [IPR] less than or equal to 185% vs. above 185% of the poverty line). For the low-income sample, all but six categories (refined grains, low-fat dairy, eggs, oils, frozen commercially prepared non-sweet items, and packaged commercially prepared snacks) have coefficients statistically significant at the 5% level or better (Table 7.1). For the high-income sample, all but five categories (starchy vegetables, other nutrient-dense vegetables, low-fat dairy, fish, and packaged commercially prepared snacks) have coefficients statistically significant at the 5% level or better. Across both income groups, the signs on most of the coefficient estimates are consistent with expectations: fruit, vegetables, and low-fat protein foods are positively associated with HEI-2005 scores, while regular-fat protein foods, solid fat, and sweets are negatively associated with HEI-2005 scores. Only 2 of the 29 slope coefficients are statistically different between low- and high-income NHANES respondents, and only the slope coefficient on “frozen commercially prepared non-sweet items” differ in sign between Table 7.1 HEI regression using 2003–08 NHANES dietary intake data by income level. Explanatory variables (food groups) Intercept 1. Whole fruit 2. Fruit juice 3. Dark green vegetables 4. Orange vegetables 5. Starchy vegetables 6. Other-nutrient dense vegetables 7. Other-mostly water vegetables 8. Legumes 9. Whole grains 10. Refined grains 11. Low-fat dairy 12. Regular-fat dairy
IPR ≤ 185% Parameter estimate
Standard error
IPR > 185% Parameter estimate
Standard error
55.11⁎⁎ 28.16⁎⁎ 11.20⁎⁎ 35.22⁎⁎
1.06 2.89 1.59 12.00
55.41⁎⁎ 30.87⁎⁎ 11.46⁎⁎ 40.98⁎⁎
1.04 2.62 1.75 9.82
66.87⁎⁎
16.17
33.21⁎⁎
14.51
9.45⁎⁎
2.47
4.04
4.31
16.52⁎⁎
5.66
8.29
7.81
15.67⁎⁎
5.86
20.26⁎⁎
4.65
52.33⁎⁎ 25.58⁎⁎ 2.83
6.31 4.38 2.98
39.46⁎⁎ 20.53⁎⁎ 5.82⁎⁎
6.39 3.95 2.15
0.35
1.59
2.04
1.71
−15.12⁎⁎
1.92
−21.30⁎⁎
1.85 continued
184
Using Scanner Data for Food Policy Research
Table 7.1 Continued Explanatory variables (food groups) 13. Low-fat red meat 14. Regular-fat red meat 15. Poultry 16. Fish 17. Nuts and seeds 18. Eggs 19. Oils 20. Solid fat 21. Sugar and sweeteners 22. SSB 23. Non-SSB 24. Water 25. Frozen CP sweet items 26. Other CP sweet items 27. Frozen CP non-sweet items 28. Canned CP non-sweet items 29. Packaged CP snacks N Adjusted R2
IPR ≤ 185%
IPR > 185%
Parameter estimate
Standard error
10.15⁎⁎
3.67
14.08⁎⁎
5.35
−44.47⁎⁎
3.15
−43.66⁎⁎
2.47
12.96⁎⁎ 15.12⁎⁎ 91.59⁎⁎
3.76 5.67 17.69
18.11⁎⁎ 15.32 62.25⁎⁎
3.59 8.26 11.94
−5.82 470.89 −65.66⁎⁎ −37.76⁎⁎
4.29 284.53 14.14 2.52
−17.41⁎⁎ 397.03⁎⁎ −105.20⁎⁎ −33.84⁎⁎
4.90 170.43 27.20 3.15
−23.07⁎⁎ −7.80⁎⁎ −3.23⁎⁎ −48.12⁎⁎
1.12 1.36 1.45 3.11
−23.59⁎⁎ −7.01⁎⁎ −5.74⁎⁎ −42.97⁎⁎
1.11 1.58 1.40 3.38
−28.50⁎⁎
1.73
−30.87⁎⁎
1.63
−7.55
4.39
10.75⁎⁎
3.81
−18.18⁎⁎
2.15
−20.28⁎⁎
2.46
−1.11
3.94
−4.42
4.83
12,365 0.451
Parameter estimate
Standard error
11,701 0.424
Note: IPR, ratio of family income to 185% of the poverty level. CP, commercially prepared. SSB, sugar-sweetened beverage. Coefficients in bold font are statistically different between the two income samples. Food group-level intakes were expressed in weight shares with commercially prepared other non-sweet items dropped to prevent perfect collinearity. ⁎⁎ Statistical significance at the 5% level.
income levels. Between 42% and 45% of the variation in HEI-2005 is explained by consumption of the 30 food groups. The similarity of results between low- and high-income households suggests that the primary driver of the diet q uality-income gradient may be differences in the mix of food groups, not differences in within-food group nutritional quality. For example, within the sugar-sweetened beverage group, there is a large variation in sugar content across different brands. The contribution of sugar-sweetened beverage intake to the exact HEI-2005 score differs depending on the
Measuring the food environment using scanner data185
SSB brands consumed. The fact that the coefficients on the sugar-sweetened beverage group share (Table 7.1) are not statistically different between the low-income and high-income NHANES samples suggests the within-food group differences in nutrient density across income levels may not be an important factor in explaining the income gradient of overall nutrition quality. We then estimated the aHEI for households and stores in the 2004–06 Nielsen Homescan Fresh Foods panel using the estimated income group-specific coefficients in Table 7.1 and coefficients (not shown) estimated from NHANES pooled across income groups, respectively. Household purchases were first classified into 1 of 30 food groups that were used in the HEI regression.c For each chain retailer, we used the household projected purchase quantities (i.e., quantities calculated with projection factors, or sample weights) at the chain level to obtain the quantity shares for each food group-chain-month-year. For independent stores, we calculated quantity shares by retail format (Table 7.2) using projected household purchase quantities at nonchain stores. The quantity shares for each food group-household-month-year were calculated using the household's purchase quantities at all retail outlets. Using the quantity shares, the monthly aHEI for each chain, retail format, and household was calculated using the NHANES-based linear HEI regression model parameters. If a Fresh Foods panelist did not shop at a chain in a particular month, the aHEI for that chain is missing during this period. To fill in these missing values, we regressed nonmissing retailer aHEIs on binary indicator variables for chain names, year, and month; interactions between retail channel and year indicators; and interactions between retail channel and month indicators. The predicted values from this regression were used to calculate the missing retailer aHEI scores. This is equivalent to replacing missing retailer aHEI scores with their averages adjusted for systematic chain, yearly, and seasonal variation. The set of chain-month-year observations with imputed aHEI scores represented 0.2% of the market in dollar sales. Table 7.2 Means and standard deviations of retailer aHEI by retail format. Retail format
Mean
Standard deviation
Club store Convenience store Gas station Drug store Mass merchandiser Supercenter Supermarket
0.04 −0.45 −0.81 −0.49 −0.40 −0.07 0.11
0.66 1.04 0.90 1.21 1.06 0.22 0.95
Note: The retailer aHEI has been standardized to have zero mean and unit variance over the full sample of all retailers. The pairwise differences in format-specific mean aHEI are statistically significant except between any pair of supermarket, club store, and supercenter and between any pair of drug store, mass merchandiser, and convenience store. The Type I experiment-wise error rate is controlled at 5%. c
Note that the NHANES dietary recall data provide quantities consumed, whereas the Nielsen Homescan data provide quantities purchased.
186
Using Scanner Data for Food Policy Research
We assigned the estimated chain-level aHEI for each month-year to all stores in the chain in that period. For nonchain stores, we assigned the estimated format-specific aHEI for the corresponding period. This process amounts to assuming no within-chain and within-channel variation in the healthfulness of foods across stores of the same chain and across independent stores of the same channel, respectively. We did this for two reasons. First, there are fewer Homescan transactions at the store-month-year level than at the chain (or format)-month-year level. With fewer transactions, the noise-to-signal ratio increases for the retail aHEI because the estimated healthfulness for a store is based on the purchase transactions of a few Homescan households whose preferences may not be representative of the store's client population. Second, recall that our objective is to associate the retail environment with the nutritional quality of household purchases. Calculating retail aHEIs at the store level would aggravate the concern for endogeneity because the household aHEIs (the dependent variable) and the retail environment (the independent variables constructed from retail aHEIs of stores in the vicinity of the household) are based on purchases of the same household (and a few others in the same neighborhood). Estimating retail aHEIs at the chain and retail format levels alleviates this concern because the estimates are based on a greater number of Homescan households from multiple locations. For ease of interpretation and comparability, we standardized the retailer and household aHEIs by subtracting the sample average and dividing by the standard deviation. Hence, the standardized retailer and household aHEIs have zero mean and unit variance. Based on the calculated retailer aHEIs, supermarkets have the highest average nutrition index at 0.11 followed by club stores (Table 7.2). As expected, foods sold at gas stations have the lowest nutrition index. These scanner data-based patterns are consistent with results of previous urban food environment audits. The pairwise differences in format-specific mean aHEI are statistically significant except between any pair of supermarket, club store, and supercenter and between any pair of drug store, mass merchandiser, and convenience store. Table 7.3 presents the summary statistics on household aHEI by income group and urbanicity for the sample of 9624 Fresh Foods households in the 52 Nielsen markets during the 2004–06 period. On average, low-income households have
Table 7.3 Means and standard deviations of household aHEI by income group and urbanicity. Household type
Mean
Standard deviation
Low income, high population density High income, high population density Low income, low population density High income, low population density
0.01 0.07 −0.11 −0.05
1.22 1.02 1.10 0.90
Note: The household aHEI has been standardized to have zero mean and unit variance over the full sample of all households. Low-income households are those with an IPR less than or equal to 185%. Low population density areas have 1250 or fewer persons per square mile of land area. The pairwise differences in mean aHEI are statistically significant for all pairs. The Type I experiment-wise error rate is controlled at 5%.
Measuring the food environment using scanner data187
lower aHEI scores than high-income households, and households in areas of higher population density have higher aHEI scores than those in less populated areas. All pairwise differences in the mean aHEI are statistically significant at the 5% level even after controlling for multiple comparison bias. The positive association of household nutrition with urbanicity could be caused by simultaneity between consumer preferences for urbanicity and nutrition or may suggest a genuine causal relationship between the food environment and the nutritional quality of food purchases. In our subsequent main analysis, we employed several empirical and econometric techniques to reduce simultaneity and to uncover evidence of causality (Box 7.2). Table 7.4 reports results from the regression of annual average household aHEI on household characteristics using the subsample of 1050 Homescan panelists that were in both the 2006 Fresh Foods panel and the 2007 Ailment panel. As expected, household nutrition is positively associated with household socioeconomic status. A household with a college-educated head, on average, is associated with a 0.15 standard deviation (SD) improvement in nutrition compared with a household without a
Box 7.2 An alternative method of measuring retailer aHEIs using retail scanner data An alternative method of creating retailer aHEIs is using store scanner data rather than household scanner data. In our application, we scored the aHEI for each retail chain using all household purchases at this chain, and the aHEI for each independent retailer using all household purchases for the retail format. Hence, retailer aHEI varies month to month for chains and independent retailers but not across markets. We did this to minimize selection bias of households choosing particular foods that satisfy their tastes. It has been reported that Homescan households are different than the US population (see Section 5.2), and their preferences for particular foods may be less or more healthful than the US population. This implies that retailer aHEIs based on household scanner data are biased toward the preferences of Homescan households. Store scanner data report all sales for stores that agree to release their information to researchers (see Chapter 2). Hence, retailer aHEIs based on retail scanner data would not suffer from household selection bias. However, store scanner data have some issues as well. For example, not all stores report random-weight food sales to Nielsen or IRI (Muth et al., 2016). Because the produce department accounts for a significant portion of total random-weight food sales, the missing random-weight food sales would likely lower a store's aHEI score. Furthermore, Levin et al. (2018) found significant underreporting in both counts and sales compared with the Economic Census, TDLinx, and other datasets. See Allcott et al. (2019) for an example of using retail scanner data to measure the healthfulness of the retail environment.
188
Using Scanner Data for Food Policy Research
Table 7.4 Association of household aHEI with household attributes. Dependent variable: Household aHEI (annual average, standardized) Explanatory variable Intercept Overweight/obese Household age College educated (binary) Income Income-squared Asian household (binary) Midwest (binary) South (binary) West (binary) Adjusted R-squared
Estimate ⁎⁎⁎
−0.36 −0.01 0.04⁎⁎⁎ 0.15⁎⁎⁎ 0.10⁎⁎⁎ −0.04⁎⁎⁎ 0.34⁎⁎⁎ 0.02 −0.03 0.18⁎⁎⁎ 0.09
Standard error 0.10 0.04 0.01 0.04 0.03 0.01 0.09 0.06 0.05 0.05
Note: Household age = average household head age group number (1: under 25 years; 2: 25–29 years; 3: 30–34 years; 4: 35–39 years; 5: 40–44 years; 6: 45–49 years; 7: 50–54 years; 8: 55–64 years; and 9: 65+ years). Income = per capita income (standardized). Overweight/obese = proportion of overweight or obese household members. Estimates are based on the subsample of 1050 Homescan households that were in both the 2006 Fresh Foods panel and the 2007 Ailments panel. ⁎⁎⁎ Statistical significance at the 1% level.
college-educated head. The coefficients on income and income squared are positive and negative, respectively, suggesting a positive but concave income gradient for nutrition. The coefficient on the share of overweight/obese household members is negative but insignificant. A priori, we had expected the overweight/obese measure to be significantly correlated with household aHEI. There could be several reasons for this unexpected result. First, the Homescan panel may not be representative of the US population in nutrition attitudes and behavior. Previous research found evidence that the Homescan sample is more nutrition conscious than the population (Muth et al., 2013). Restricting the sample to households in both the Fresh Foods and Ailment panels might have further reduced the representativeness of the results. Second, the dependent variable, household aHEI, is imputed using quantity shares and the linear HEI regression coefficients (Table 7.1). By contrast, the exact HEI-2005 is the sum of 12 component scores that are nonlinear functions of food group/nutrient intakes. Therefore, the household aHEI measures the true HEI with error. This “error on the left” results in larger standard errors for the regression coefficients (Hausman, 2001). Third, household member overweight/obesity status is self-reported, which is subject to reporting error. This “error on the right” tends to bias the coefficient on overweight/ obesity toward zero. Because of the lack of association between household aHEI and overweight/obesity in the Fresh Foods-Ailment subsample, we did not include the overweight/obesity variable in the subsequent analysis. This allowed us to focus on the larger sample of 9624 households in the 52 Homescan markets during the 2004–06 period for our main analysis.
Measuring the food environment using scanner data189
7.3 Measuring the retail food environment Unlike most food environment studies that use proximity to or density of supermarkets and convenience stores as measures of the food environment, we used the aHEIs of retailers described in Section 7.2 and buffers around each Homescan Fresh Foods panelist to create a continuous spatial measure of neighborhood food healthfulness. To do this, retailer locations including longitude/latitude and address were obtained from the 2004–06 annual archived files of InfoUSA and merged to the census tract of each household in the Fresh Foods panel. Following Zhen et al. (2019), we adapted the polynomial inverse lag (PIL) of Mitchell and Speaker (1986) to create continuous spatial measure of neighborhood food healthfulness around each household. Overall, we linked 99% of Homescan food purchases (in expenditures) to establishments (by chain or retail format type) in the 2004–06 annual archived InfoUSA files. The remaining 1% were purchases at named retailers in Homescan that could not be matched to food retailers in InfoUSA by business name. We matched 821 chain and independent (named) food retailers in Homescan to the corresponding businesses in InfoUSA based on business names and Standard Industrial Classification (SIC) codes. This matching produced an average matched establishment count of 92,105 per year, which accounted for 92% of total food sales in Homescan. By retail channel, we matched 90%, 95%, and 91% of Homescan food purchases at supermarkets, drug stores, and mass merchandisers, respectively, to specific retailers in InfoUSA (Table 7.5). Because the club and supercenter channels contained few retail chains, 100% of these stores were matched in InfoUSA. The match rate between named retailers in Homescan and InfoUSA was lower for the convenience channel because this channel contained many independent stores. In terms of food sales, supermarkets in the Homescan data accounted for over 70% of total US retail food sales, followed by supercenters at 14%. Convenience stores and gas stations sold just 1% and 0.1% of retail food products, respectively. The remaining 208,710 additional food retail establishments in InfoUSA were classified into seven generic retail formats (see Table 7.2 for the list of formats), based on their primary SIC codes. These unmatched InfoUSA retailers were usually smaller and independently owned, to which we assigned format-specific average food sales calculated based on Homescan purchases at unnamed retailers. These stores accounted for about 7% of total food purchases in the Homescan data. We drew a 30-mile radius from the population-weighted centroid of each tract (population estimates from the 2004–06 American Community Survey) and calculated the average retailer aHEI, weighted by retailer-specific per-store total food sales, in incremental 1-mile concentric circles around each tract centroid.d Healthfulness of the retail food environment was measured by the weighted retailer aHEI and differentiated by distance to the household. Using the weighted aHEI score, chmt (m = 1, 2, …, 30),
d
We divided Homescan retailer total food sales by store count from InfoUSA data to obtain per-store sales for each retailer by year.
190
Table 7.5 Retailer match rates between homescan and InfoUSA and distribution of food purchases across retail formats. Drug stores (%)
Mass merchandisers (%)
Club stores (%)
Convenience stores (%)
Gas stations (%)
90 9 1
95 5 0
91 9 0
100 0 0
55 42 3
87 12 1
100 0 0
72
2
3
8
1
0.1
14
Supercenters (%)
Note: We linked Homescan purchases to businesses in annual archived files (2004, 2005, and 2006) of InfoUSA by retailer name and the retailer's primary SIC code. We used this HomescanInfoUSA linkage to calculate per-store food sales, impute monthly retail chain- and format-level aHEIs, and measure the retail aHEI in varying distances from each Homescan household. The linkage statistics reported in this table are average values over the 2004–06 period. Homescan food purchases are projected to national estimates using survey weights provided by Nielsen for the static Fresh Foods panel. A retail format is one of the following: grocery store, drug store, mass merchandiser, club store, convenience store, gas station, and supercenter. a Homescan purchases at unnamed retailers linked to InfoUSA businesses by retail format. b Purchases at retailers named in Homescan but not found in InfoUSA.
Using Scanner Data for Food Policy Research
Homescan purchases linked to InfoUSA By retailer name By retail formata Homescan purchases not linked to InfoUSAb As a % of total Homescan food purchases
Supermarkets (%)
Measuring the food environment using scanner data191
for the mth concentric circle around household h's tract in period t, we constructed the lth transformed spatial PIL variable: M
chmt , m = 1,…, 30, l m =1 m
zhlt = ∑
(7.1)
where l = 2, …, N and N is the spatial lag length to be determined empirically. The choice of M is guided by the recognition that a larger M reduces the approximation error of the PIL variables to the true shape of the spatial lag distribution (Mitchell & Speaker, 1986). However, the marginal reduction in the approximation error decreases as M increases. We repeated the analysis below using M = 15 and obtained virtually identical results.
7.4 Association between healthfulness of household purchases and retail food environment We used our measure of the retail food environment (i.e., the transformed spatial PIL variable for household h at time t, zhlt) to examine associations between household aHEI and the retail food environment. A benefit of using household scanner data for examining the association between the healthfulness of household food purchases and the retail food environment is the availability of information on prices paid by the household panelists. This information allowed us to model the association between household aHEI and food prices and measures of the retail food environment in the same regression model. Hence, we could then compare the magnitude of the association of the nutritional quality of household purchases with price and the retail food environment variables. Most of the previous research has examined the price and retail food environment effects separately, making it difficult to compare the relative importance of these two factors due to modeling and data differences across studies that likely influence the estimates. Only a few studies have looked at the effect of both food prices and the retail food environment on purchasing decisions and obesity of households and found prices to be a more important factor (Ghosh-Dastidar et al., 2014; Lin, Ver Ploeg, Kasteridis, & Yen, 2014).
7.4.1 Reduced-form model and statistical considerations We hypothesized that the nutrition quality of a household h's store-bought foods in month t, aHEIht, is a function of a number of social, economic, and environmental factors: N
30
30
i =1
i =1
aHEI ht = ∑ ( β zl zhlt + β zyl zhlt yht ) + ∑γ i ln phit + ∑γ iy ln phit yht l =2 3
12
i =1
i =2
+ ∑φi Dhit + ∑δ i moni + δ 05 yr05 + δ 06 yr06 + α h + ht ,
(7.2)
192
Using Scanner Data for Food Policy Research
where yht is per capita income; zhltyht is the interaction between our measure of the retail food environment and income used to capture potential differences in the association by income level; phit is the price for food category i; the Dhit variables include per capita income, per capita income squared, and county temperature; moni is an indicator for the ith calendar month of the year to account for seasonality; yr05 and yr06 are indicators for year 2005 or 2006, respectively; αh is the intercept for household h as the household fixed effect; ϵht is the regression residual; and β, γ, ϕ, and δ are coefficients to be estimated. The price variables control for the effect of food cost variation on household aHEI, and the temperature variable controls for seasonal variations in the nutrition quality of household purchases that differ across locations. Because of potential urbanicity-based differences in shopping behavior, we stratified the sample into a high population-density sample and a low population-density sample using the median county population density of 1250 persons per square mile of land area as cutoff. We estimated Eq. (7.2) separately for the low and high population-density samples. The optimal value of spatial lag length, N, is determined by first starting N at a reasonable value and testing the statistical significance of the coefficients on zhNt and zhNtyht. If N is too low, the PIL variables may not approximate the curvature of the spatial distribution of the association well. If N is too high, there may be too much multicollinearity between the PIL variables. For both samples, we started with N = 4 and continued to lower N until the coefficients on zhNt and zhNtyht were both statistically significant. The effect of a unit change in the weighted average retailer aHEI in the concentric circle m miles away, chmt, on household h's aHEI is N β +β y ∆aHEI ht zl zyl ht . =∑ l ∆chmt m l =2
(7.3)
We estimated the linear model as a fixed-effects model using SAS 9.4 PROC PANEL, where the fixed effects are at the household level. The household fixed effects are subtracted from the regression through within-transformation prior to estimation. The benefit of a household fixed-effects model is that the results are robust to correlations between the explanatory variables and the household fixed effects. The cost of using the within-transformation is that the coefficients on time-invariant household characteristics such as size and the educational attainment of the household head cannot be estimated because of their perfect collinearity with the household fixed effects. Although the household fixed effects control for unobserved heterogeneity that may bias the association between household aHEI and retailer aHEI, we are also aware of other sources of potential endogeneity in using the same purchase data to create variables on the right- and left-hand sides of linear regression model. For this reason, we deploy three additional strategies to reduce this bias. First, we calculated the aHEI for each retail chain using all household purchases at this chain. As long as some households in the United States purchased foods at this chain, we have an aHEI for the retailer. For independent, nonchain grocery stores, we calculated an aHEI using all purchases at independent grocers of the same retail format rather than using a store-specific aHEI to reduce selection bias. An alternative approach to the one we used is to score the healthfulness of a chain's offerings using only the purchases of sample households in the geographic area for that chain and then associate the aHEI of each household's
Measuring the food environment using scanner data193
purchases with the retailer. Compared with this alternative, our approach is less susceptible to selection bias. Second, we linked the retailer aHEI to retailer location data in the 2004–06 annual archived files of InfoUSA by year. Year-over-year variation in InfoUSA due to store openings and closings provides additional supply-side variation in retail aHEI. Third, of the 9624 households in our study, 756 moved between census tracts at least once in the 2004–06 period. The additional variation in retail aHEI for these movers helps identify the household purchase-retail environment association. In summary, identification of associations between the household purchase basket and the retail environment relies on (1) monthly variation in retail chain-level and format-level aHEI, (2) use of household fixed effects, (3) between-year differences in the composition of retailers across neighborhoods as registered by InfoUSA, and (4) household relocation between census tracts. It should be noted, however, that as an observational study our approach only reduces but not eliminates selection bias. An alternative to our approach is to rely on supermarket entry and exit data to identify the causal effects of changes in the food environment on the nutritional quality of purchases (Cummins et al., 2014; Dubowitz et al., 2015; Elbel et al., 2015). However, even these seemingly exogenous food environment changes may not be the perfect natural experiments because entry and exit decisions are likely made with a consideration of consumer demand.
7.4.2 Price index construction We constructed a Fisher Ideal price index for each of the 30 food categories to measure household price variation, phit. Because the within-category variation reflects quality and variety of products purchased across different households, we used a price index to remove this source of food cost differences that are endogenous to a household's taste preferences. The Fisher Ideal price index for household h's purchase of food category i in month t was calculated as phit =
∑ pkht qk 0 ∑ pkht qkht , i = 1,…, 30, ∑ pk 0 qk 0 ∑ pk 0 qkht
(7.4)
where pkht and qkht are the price and purchase quantity of elementary food item k for household h in month t, respectively, and pk0 and qk0 are the base price and base quantity of elementary item k at the brand level. We set the base at the 2004–06 Homescan sample average, which means the price index was equal to one at the base. Note that because the same qk0’s or qkht’s appear in both the numerator and denominator of the Fisher Ideal index formula, we did not need to convert the household purchase quantity into a per capita quantity before calculating the index number. Similar to how we imputed missing aHEIs for households and following Zhen, Wohlgenant, Karns, and Kaufman (2011), we imputed missing elementary prices for products not purchased by a household in a time period by using the predicted values from a linear regression of observed prices on a set of binary indicator variables for months, Nielsen markets, brands, and interactions between these indicator variables. This is equivalent to assigning the market-, time- and brand-specific average price for missing elementary prices.
194
Using Scanner Data for Food Policy Research
7.4.3 Results Table 7.6 reports the results of the fixed-effects model of Eq. (7.2) by urbanicity. Using the procedure outlined in the “Reduced-Form Model and Statistical Considerations” section above, we determined the optimal value of N to be 2 and 3 for the low population-density and high population-density samples, respectively. The coefficients on the PIL variables are not statistically significant in the low population-density sample, which suggests that there is little association between the retail aHEI and household aHEI at this level of urbanicity. For the high population-density sample, all coefficients on the PIL variables are significant at the 1% level. The magnitude and sign of each coefficient on a PIL variable are difficult to interpret as a stand-alone parameter because the shape of the spatial distribution of the association is jointly determined by all PIL coefficients. The most intuitive way to see these results is to plot the predicted change in household aHEI associated with a 1-SD increase in retail aHEI in varying distances to the household. Fig. 7.1 illustrates the magnitude of these associations by income group and urbanicity. Consistent with the statistically insignificant PIL coefficient estimates for the low population-density sample in Table 7.6, the retail-household aHEI association is insignificant at any income level and distance to the household's tract (panels A, C, and E of Fig. 7.1). For the high population-density sample, the association is statistically significant for average- and below-average income households and peaks at 2 miles from the household tract centroid. For densely populated areas, a 1-SD increase in retail aHEI at 2 miles is associated with a 0.04- and 0.06-SD improvement in household aHEI for average- and below-average income households, respectively. There is no statistically significant association for above-average households living in high population-density areas. Two aspects of these results are interesting. First, there is substantial heterogeneity in the retail-household aHEI association. The lack of statistical association in the low population-density sample can be rationalized if these consumers travel farther for grocery stores and if there is a wide dispersion in distance traveled. When both happen, it will be difficult for Eq. (7.1) to obtain statistical significance at specific distances. For the high population-density sample, consumers have a greater number and variety of retailers closer to home. When travel distance is less dispersed across consumers, Eq. (7.1) is better able to identify the distances at which the retail environment has the strongest associations with household purchases. The lack of significance for above-average households in densely populated areas suggests that higher-income consumers are less influenced by the healthfulness of the food environment. This would be the case if higher-income consumers are more health conscious and nutritionally knowledgeable than lower-income consumers such that the latter are more responsive to nutrition cues provided by a healthier food environment. For example, compared with a health-oriented consumer, a less health conscious consumer may be more likely to buy unhealthy food on impulse, and an unhealthy retail environment increases the odds of impulse purchases for this consumer type. Second, because we find different degrees of associations across urbanicity and income groups, our results are probably not entirely driven by endogeneity. To see this, note that by interacting the PIL variables with income in Eq. (7.2), it is very
Low population-density subsampleb Explanatory variables Intercept PIL_2 PIL_3 Per capita income*PIL_2 Per capita income*PIL_3 Log(price of whole fruit) Log(price of fruit juice) Log(price of dark green vegetables) Log(price of orange vegetables) Log(price of starchy vegetables) Log(price of other-nutrient dense vegetables) Log(price of other-mostly water vegetables) Log(price of legumes) Log(price of whole grains) Log(price of refined grains) Log(price of low-fat dairy) Log(price of regular-fat dairy) Log(price of low-fat red meat) Log(price of regular-fat red meat) Log(price of poultry) Log(price of fish) Log(price of nuts and seeds) Log(price of eggs) Log(price of oils) Log(price of solid fat) Log(price of sugar and sweeteners)
Estimate
Standard error
−0.2102 0.0077
0.2193 0.0054
−0.0039
0.0047
−0.2020⁎⁎⁎ −0.0628⁎⁎⁎ −0.0420⁎⁎⁎ −0.0679⁎⁎⁎ −0.0334⁎⁎ −0.0363⁎⁎⁎ −0.0650⁎⁎⁎ −0.0759⁎⁎ −0.0672⁎⁎⁎ −0.0546⁎⁎⁎ −0.0054 0.1079⁎⁎⁎ −0.0464⁎⁎⁎ 0.2135⁎⁎⁎ −0.1089⁎⁎⁎ −0.0141 −0.1387⁎⁎⁎ 0.0248 −1.1756⁎⁎⁎
0.0117 0.0174 0.0092 0.0159 0.0134 0.0118 0.0146 0.0312 0.0152 0.0145 0.0203 0.0183 0.0147 0.0135 0.0122 0.0141 0.0142 0.0133 0.0190
0.2228⁎⁎⁎ 0.1653⁎⁎⁎
0.0166 0.0189
High population-density subsamplec Estimate
Standard error
⁎⁎⁎
0.2464 0.0803 0.0841 0.0684 0.0721 0.0125 0.0176 0.0125 0.0178 0.0147 0.0129 0.0158 0.0378 0.0167 0.0159 0.0213 0.0196 0.0172 0.0154 0.0139 0.0154 0.0163 0.0144 0.0198
0.1950⁎⁎⁎ 0.1041⁎⁎⁎
0.0180 0.0220
0.9412 0.2855⁎⁎⁎ −0.2917⁎⁎⁎ −0.2124⁎⁎⁎ 0.2141⁎⁎⁎ −0.1967⁎⁎⁎ −0.0641⁎⁎⁎ −0.0547⁎⁎⁎ −0.1316⁎⁎⁎ −0.0302⁎⁎ −0.0412⁎⁎⁎ −0.0737⁎⁎⁎ −0.0994⁎⁎⁎ −0.0745⁎⁎⁎ −0.0898⁎⁎⁎ −0.0189 0.1237⁎⁎⁎ −0.0555⁎⁎⁎ 0.1913⁎⁎⁎ −0.1009⁎⁎⁎ −0.0340⁎⁎ −0.1220⁎⁎⁎ 0.0209 −1.3259⁎⁎⁎
Continued
Measuring the food environment using scanner data195
Table 7.6 Fixed-effects model results.a
196
Table 7.6 Continued Low population-density subsampleb Explanatory variables
Estimate ⁎⁎⁎
Standard error
High population-density subsamplec Estimate ⁎⁎⁎
Standard error
0.2717 0.0404⁎⁎⁎ 0.0487⁎⁎⁎ 0.1232⁎⁎⁎ 0.2511⁎⁎⁎ −0.0546⁎⁎⁎ 0.0623⁎⁎⁎ 0.0346⁎⁎ −0.0129 −0.0216 −0.0340 0.0130
0.0133 0.0153 0.0135 0.0140 0.0116 0.0165 0.0161 0.0164 0.0101 0.0128 0.0199 0.0101
0.2878 0.0550⁎⁎⁎ 0.1239⁎⁎⁎ 0.1062⁎⁎⁎ 0.2732⁎⁎⁎ −0.0200 0.0658⁎⁎⁎ 0.0312 −0.0132 −0.0114 0.0171 −0.0433⁎⁎⁎
0.0146 0.0172 0.0153 0.0152 0.0128 0.0175 0.0175 0.0174 0.0107 0.0111 0.0162 0.0110
0.0127 0.0658⁎⁎⁎ −0.0220
0.0184 0.0154 0.0132
0.0194 0.0217 0.0173
0.0170 0.0140 0.0119
Per capita income*log(price of other-mostly water vegetables) Per capita income*log(price of legumes) Per capita income*log(price of whole grains) Per capita income*log(price of refined grains) Per capita income*log(price of low-fat dairy) Per capita income*log(price of regular-fat dairy)
0.0060
0.0156
−0.0063
0.0136
0.0541 0.0136 0.0264⁎⁎ 0.0204 0.0187
0.0341 0.0170 0.0150 0.0216 0.0193
−0.0333 0.0017 0.0159 −0.0091 −0.0163
0.0322 0.0154 0.0139 0.0183 0.0178
Using Scanner Data for Food Policy Research
Log(price of SSB) Log(price of non-SSB) Log(price of bottled water) Log(price of frozen CP sweet items) Log(price of other CP sweet items) Log(price of frozen CP non-sweet items) Log(price of canned CP non-sweet items) Log(price of packaged CP snacks) Log(price of other CP non-sweet items) Per capita income*log(price of whole fruit) Per capita income*log(price of fruit juice) Per capita income*log(price of dark green vegetables) Per capita income*log(price of orange vegetables) Per capita income*log(price of starchy vegetables) Per capita income*log(price of other-nutrient dense vegetables)
0.0208 −0.0434⁎⁎⁎ 0.0268⁎⁎ −0.0205 −0.0028 0.0168 −0.0433⁎⁎ 0.0211
0.0172 0.0148 0.0139 0.0160 0.0162 0.0124 0.0220 0.0190
−0.0023 −0.0243 0.0113 −0.0012 −0.0008 −0.0030 −0.0503⁎⁎⁎ 0.0499⁎⁎⁎
0.0160 0.0137 0.0132 0.0138 0.0154 0.0124 0.0192 0.0173
Per capita income*log(price of sugar and sweeteners) Per capita income*log(price of SSBs) Per capita income*log(price of non-SSBs) Per capita income*log(price of bottled water) Per capita income*log(price of frozen CP sweet items) Per capita income*log(price of other CP sweet items) Per capita income*log(price of frozen CP non-sweet items) Per capita income*log(price of canned CP non-sweet items) Per capita income*log(price of packaged CP snacks) Per capita income*log(price of other CP non-sweet items)
0.0203
0.0221
0.0136
0.0223
0.0062 0.0089 0.0307⁎⁎ −0.0098
0.0150 0.0177 0.0152 0.0155
−0.0049 0.0136 0.0355⁎⁎ −0.0117
0.0137 0.0154 0.0146 0.0151
−0.0081
0.0124
−0.0220
0.0182
−0.0156
0.0182
0.0243 −0.0051
0.0175 0.0116
Per capita income
0.0363⁎⁎⁎
0.0115
0.0440⁎⁎⁎ −0.0054 0.0364⁎⁎ −0.0106 0.0017 0.0598⁎⁎⁎
0.0108 0.0162 0.0155 0.0156 0.0098 0.0140 Continued
Measuring the food environment using scanner data197
Per capita income*log(price of low-fat red meat) Per capita income*log(price of regular-fat red meat) Per capita income*log(price of poultry) Per capita income*log(price of fish) Per capita income*log(price of nuts and seeds) Per capita income*log(price of eggs) Per capita income*log(price of oils) Per capita income*log(price of solid fat)
198
Table 7.6 Continued Low population-density subsampleb
High population-density subsamplec
Explanatory variables
Estimate
Standard error
Estimate
Standard error
Per capita income squared County temperature Indicator for year 2005 Indicator for year 2006 Calendar month fixed effects R2 N
−0.0020 −0.0244⁎⁎⁎ 0.0735⁎⁎⁎ 0.1401⁎⁎⁎ Included 0.3894 128,916
0.0051 0.0081 0.0063 0.0072
−0.0049 −0.0196⁎⁎ 0.0716⁎⁎⁎ 0.1562⁎⁎⁎ Included 0.3711 121,824
0.0045 0.0083 0.0073 0.0083
Using Scanner Data for Food Policy Research
Note: Household aHEI, the dependent variable, is imputed using the income group-specific HEI regression coefficients reported in Table 7.1. Fixed effects are at the household level. Per capita income and county temperature are standardized to have mean 0 and standard deviation of 1. Each food group price index has a base value of 1 where the base is the national average cost for this food group in 2004–06. Sample households are those Fresh Foods panelists residing in 52 Homescan markets. CP, commercially prepared. PIL, polynomial inverse lag. PIL_2 and PIL_3 correspond to the zh2t and zh3t PIL variables. SSB, sugar sweetened beverages. a Household aHEI standardized to have mean 0 and variance of 1. b Includes counties with 1250 or fewer persons per square mile of land. c Includes counties with more than 1250 persons per square mile of land. ⁎⁎⁎ Statistical significance at the 1% level. ⁎⁎ Statistical significance at the 5% level.
Measuring the food environment using scanner data199
Fig. 7.1 Predicted change in household aHEI associated with a 1-SD increase in retail aHEI in varying distances to household tract: (A) below-average income, low population density; (B) below-average income, high population density; (C) average income, low population density; (D) average income, high population density; (E) above-average income, low population density; and (F) above-average income, high population density. Note: Figures based on the PIL coeffcient estimates and their standard errors reported in Table 7.6. The 95% confidence interval is presented by the gray bands. Distance is measured between each concentric circle of retail food environment to the population-weighted centroid of the household tract. Above- and below-average income is measured at 1-SD above and below the per capita average income, respectively.
similar to a triple-difference model widely used in the program evaluation literature. The difference is that we do not have a natural experiment to help identification. A triple-difference model controls for confounders common to subjects in the treatment group. In our case, any effect of common confounders (e.g., supply-demand simultaneity) would be embedded in the zhlt variables. Therefore, the statistically significant
200
Using Scanner Data for Food Policy Research
coefficients on the zhltyht terms suggest much of the association remains even after controlling for common confounders. In contrast to the varying degree of statistical significance between areas of urbanicity, most of the coefficients on lnphit are significant at the 1% or 5% level and have the signs one would expect from the law of demand and the marginal contribution of each food group to the aHEI in Table 7.1. A few coefficients on the interaction terms lnphityht are also statistically significant, which is likely caused by differences in price elasticities of demand by income group. A 1-SD increase in sugar-sweetened beverage prices (equivalent to a 26% price change) is associated with a 0.7-SD improvement in household aHEI. Although this is comparable to the predicted 0.4–0.6SD improvement in household aHEI associated with a 1-SD increase in retail aHEI at 2 miles, the former is significant for all income and urbanicity combinations, while the latter is significant only for average- and below-average households in populated areas.
7.5 Concluding remarks In this chapter, we demonstrated how household scanner data can be used to measure the healthfulness of the food environment. We applied the scanner data-based food environment measures to a reduced-form model of the nutritional quality of household food purchases. The model is reduced form because we largely abstracted from the intermediating steps of consumer food decisions, including the decisions of which stores to visit and what foods to purchase. Using this econometric model, we found statistically significant associations between the retail environment and household nutrition for some segments of the population. Although we developed a four-pronged approach to reduce endogeneity, the associations uncovered by the reduced-form model may or may not be causal. Future research could extend this exploratory research in several ways. First, researchers should continue to look for natural experiments in which the food environment is exogenously changed or for credible instrumental variables for supermarket location decisions. It would not be appropriate to recommend policy changes without establishing causality. Second, scanner data offer unparalleled level of depth in the measurement of consumer and retailer actions. To fully exploit the richness of these data, it is necessary to develop structural models of consumer store and food choices and retailer product pricing and stocking strategies.
References Allcott, H., Diamond, R., Dubé, J.-P., Handbury, J., Rahkovsky, I., & Schnell, M. (2019). Food deserts and the causes of nutritional inequality. Quarterly Journal of Economics, 134(14), 1793–1844. https://doi.org/10.1093/qje/qjz015. Arsenault, J. E., Fulgoni, V. L., III, Hersey, J. C., & Muth, M. K. (2012). A novel approach to selecting and weighting nutrients for nutrient profiling of foods and diets. Journal of
Measuring the food environment using scanner data201
the Academy of Nutrition and Dietetics, 112(12), 1968–1975. https://doi.org/10.1016/j. jand.2012.08.032. Bureau of Labor Statistics. (2016). Consumer expenditure surveys: Protection of respondent confidentiality. Retrieved from:https://www.bls.gov/cex/pumd_disclosure.htm#Geographic. Carlson, A. C., Tselepidakis Page, E., Zimmerman, T. P., Tornow, C. E., & Hermansen, S. (2019). Linking USDA nutrition databases to IRI household-based and store-based scanner data [Technical Bulletin No. 1952]. Washington, DC: US Department of Agriculture, Economic Research Service. Centers for Disease Control and Prevention. (2008). Prevalence of overweight, obesity and extreme obesity among adults: United States, trends 1976–80 through 2005–2006. (December), Table 1. Retrieved from:https://www.cdc.gov/nchs/data/hestat/overweight/overweight_adult.pdf. Chen, S. E., Florax, R. J., & Snyder, S. D. (2013). Obesity and fast food in urban markets: a new approach using geo-referenced micro data. Health Economics, 22(7), 835–856. https://doi. org/10.1002/hec.2863. Cho, C., McLaughlin, P. W., Zeballos, E., Kent, J., & Dicken, C. (2019). Capturing the complete food environment with commercial data: A comparison of TDLinx, ReCount, and NETS databases [Technical Bulletin No. 1953]. Washington, DC: US Department of Agriculture, Economic Research Service. Cooksey-Stowers, K., Schwartz, M. B., & Brownell, K. D. (2017). Food swamps predict obesity rates better than food deserts in the United States. International Journal of Environmental Research and Public Health, 14(11), 1366. https://doi.org/10.3390/ijerph14111366. Courtemanche, C., & Carden, A. (2011). Supersizing supercenters? The impact of Walmart supercenters on body mass index and obesity. Journal of Urban Economics, 69(2), 165–181. https://doi.org/10.1016/j.jue.2010.09.005. Cummins, S., Flint, E., & Matthews, S. A. (2014). New neighborhood grocery store increased awareness of food access but did not alter dietary habits or obesity. Health Affairs, 33(2), 283–291. https://doi.org/10.1377/hlthaff.2013.0512. Darmon, N., & Drewnowski, A. (2008). Does social class predict diet quality? The American Journal of Clinical Nutrition, 87(5), 1107–1117. https://doi.org/10.1093/ajcn/87.5.1107. Dubowitz, T., Ghosh-Dastidar, M., Cohen, D. A., Steiner, E. D., Hunter, G. P., Flórez, K. R., … Collins, R. L. (2015). Diet and perception change with supermarket introduction in a food desert, but not because of supermarket use. Health Affairs, 34(11), 1858–1868. https://doi. org/10.1377/hlthaff.2015.0667. Dunn, R. A. (2010). The effect of fast-food availability on obesity: An analysis by gender, race, and residential location. American Journal of Agricultural Economics, 92(4), 1149–1164. https://doi.org/10.1093/ajae/aaq041. Elbel, B., Moran, A., Dixon, L. B., Kiszko, K., Cantor, J., Abrams, C., & Mijanovich, T. (2015). Assessment of a government-subsidized supermarket in a high-need area on household food availability and children’s dietary intakes. Public Health Nutrition, 18(15), 2881– 2890. https://doi.org/10.1017/S1368980015000282. Ghosh-Dastidar, B., Cohen, D., Hunter, G., Zenk, S. N., Huang, C., Beckman, R., & Dubowitz, T. (2014). Distance to store, food prices, and obesity in urban food deserts. American Journal of Preventive Medicine, 47(5), 587–595. https://doi.org/10.1016/j.amepre.2014.07.005. Gordon-Larsen, P., Rummo, P. E., & Albrecht, S. S. (2015). Field validation of food outlet databases: The Latino food environment in North Carolina. Public Health Nutrition, 18(6), 977–982. https://doi.org/10.1017/S1368980014001281. Hausman, J. (2001). Mismeasured variables in econometric analysis: Problems from the right and problems from the left. The Journal of Economic Perspectives, 15(4), 57–67. https:// doi.org/10.1257/jep.15.4.57.
202
Using Scanner Data for Food Policy Research
Levin, D., Noriega, D., Dicken, C., Okrent, A., Harding, M., & Lovenheim, M. (2018). Examining food store scanner data: A comparison of the IRI Info Scan data with other data sets, 2008–2012. Technical Bulletin 1949. Washington, DC: US Department of Agriculture, Economic Research Service. October. Lin, B.-H., Ver Ploeg, M., Kasteridis, P., & Yen, S. T. (2014). The roles of food prices and food access in determining food purchases of low-income households. Journal of Policy Modeling, 36(5), 938–952. https://doi.org/10.1016/j.jpolmod.2014.07.002. Mitchell, D. W., & Speaker, P. J. (1986). A simple, flexible distributed lag technique: The polynomial inverse lag. Journal of Econometrics, 31(3), 329–340. https://doi. org/10.1016/0304-4076(86)90064-3. Muth, M. K., Cates, S. C., Karns, S. A., Siegel, P. H., Wohlgenant, K. C., & Zhen, C. (2013). Comparing attitudinal survey responses from proprietary and government surveys. Research Triangle Park, NC: RTI International. March. Muth, M. K., Sweitzer, M., Brown, D., Capogrossi, K., Karns, S. A., Levin, D., … Zhen, C. (2016). Understanding IRI household-based and store-based scanner data. [Technical Bulletin 1488]. Washington, DC: US Department of Agriculture, Economic Research Service. Rummo, P. E., Guilkey, D. K., Ng, S. W., Meyer, K. A., Popkin, B. M., Reis, J. P., … GordonLarsen, P. (2017). Does unmeasured confounding influence associations between the retail food environment and body mass index over time? The coronary artery risk development in young adults (CARDIA) study. International Journal of Epidemiology, 46(5), 1456–1464. https://doi.org/10.1093/ije/dyx070. Sadler, R. C. (2016). Strengthening the core, improving access: Bringing healthy food downtown via a farmers’ market move. Applied Geography, 67, 119–128. Singleton, C. R., Li, Y., Odoms-Young, A., Zenk, S. N., & Powell, L. M. (2019). Change in food and beverage availability and marketing following the introduction of a healthy food financing initiative-supported supermarket. American Journal of Health Promotion, 33(4), 525–533. https://doi.org/10.1177/0890117118801744. Sturm, R., & Cohen, D. A. (2009). Zoning for health? The year-old ban on new fast-food restaurants in South LA. Health Affairs, 28(6, Suppl. 1), w1088–w1097. https://doi.org/10.1377/ hlthaff.28.6.w1088. Volpe, R., & Okrent, A. (2012). Assessing the healthfulness of consumers' grocery purchases. Economic Information Brief 102Washington, DC: US Department of Agriculture, Economic Research Service. November. Retrieved from: https://www.ers.usda.gov/ publications/pub-details/?pubid=43682. Volpe, R., Okrent, A., & Leibtag, E. (2013). The effect of supercenter-format stores on the healthfulness of consumers’ grocery purchases. American Journal of Agricultural Economics, 95(3), 568–589. https://doi.org/10.1093/ajae/aas132. Zhen, C., Lin, B.-H., Okrent, A., Karns, S., & Chrest, D. (2019). Nutrition and the retail environment: Evidence from a national consumer panel. In Working paper. Athens, GA: Department of Agricultural and Applied Economics, University of Georgia. Zhen, C., Taylor, J. L., Muth, M. K., & Leibtag, E. (2009). Understanding differences in self-reported expenditures between household scanner data and diary survey data: A comparison of Homescan and Consumer Expenditure Survey. Review of Agricultural Economics, 31(3), 470–492. https://doi.org/10.1111/j.1467-9353.2009.01449.x. Zhen, C., Wohlgenant, M. K., Karns, S., & Kaufman, P. (2011). Habit formation and demand for sugar-sweetened beverages. American Journal of Agricultural Economics, 93(1), 175–193. https://doi.org/10.1093/ajae/aaq155.