SOCIAL
SCIENCE
RESEARCH
13, 268-286 (1984)
Correcting for Ratio Variable Correlation: Examples Using Models of Mortality BRIAN F. PENDLETON University
of Akron
The major purpose of this paper is to examine one part of the ratio variable correlation problem, correlated denominators, using path models of young, middleage, and older-age mortality, each with 11 sociodemographic and cause-of-death independent variables. The models are calculated using a traditional ratio approach and a residual variable approach for a sample of counties. The methodological discussion focuses on differences in coefficients for the two approaches in terms. of significance, magnitude, and hypothesis support. It is concluded that the residual approach is worth pursuing as a viable statistical solution to the correlated denominators component of the ratio variable correlation problem.
Much of the research used to study health status, social behavior, and mortality relies on the use of aggregate data. A major controversy exists in this area. Ratio variables often are constructed for the purpose of controlling an extraneous variable (e.g., population size or per capita characteristics; cf. Uslaner, 1976, and Lyons, 1977). Evidence indicates that the correlation of two or more ratios that have common, or highly correlated, components can sometimes lead to statistical spuriousness and conceptual ambiguity (Fuguitt and Lieberson, 1974; Schuessler, 1973, 1974; Bollen and Ward, 1979; Anderson and Lydic, 1977, 1978; Atchley, Gaskins, and Anderson, 1976; Pendleton, Warren, and Chang, 1979; Pendleton, Newman, and Marshall, 1983). Little empirical evidence exists, however, for examining the degree and consequences of this spuriousness when actual data are used. The major purpose of this paper is to examine the problem of ratio variable correlation for multivariate path models of young, middle-age, and older-age mortality and to test a residual procedure designed to “correct” for the spuriousness resulting from ratio variable correlations. Briefly reviewed first is the problem of ratio variable correlation, followed The author thanks Richard D. Warren. H. C. Chang. David L. Rogers. and an anonymous reviewer for extremely helpful comments on earlier versions of this paper. Requests for reprints should be sent to Brian F. Pendleton, Department of Sociology, University of Akron, Akron, OH 44325. 268 0049-089X/84 $3 .OO Copyright All rights
0 1984 by Academic Press. Inc. of reproduction in any form reserved
RATIO
VARIABLE
269
CORRELATION
by a description of the multivariate model of mortality. Hypotheses, methods and variables, and results follow, and a discussion of the results and implications ends the paper. RATIO VARIABLE
CORRELATION
Recent social science research shows that at a purely empirical level when two ratios are correlated, and two or more of the ratio components are highly correlated, the resulting correlation will be spuriously inflated. For example, if two ratio variables are computed, a = b/c and d = el f, and the denominators are highly correlated (i.e., rcf # 1.00 but is high) the correlation between the ratios is (Pearson, 1897; Fuguitt and Lieberson, 1974, Pendleton er al., 1979) rbeVbVe rad
=
(vi
+
-
vz
-
recVeVc
-
2rbcvbvc)"2
rbfvbvf (v:
+
vj
rcfvcvf -
2refvevf)"*
(1)
where rod = correlation between blc and elf, rb,, rd, rbe, r,, , rbf, rcf = product-moment correlation coefficients, vb, V,, V,, Vf = coefficients of variation (mean divided by the standard deviation) for variables 6, e, c, and f, respectively. Even though the researcher’s original intention may have been to examine the association between b and e, the introduction of highly correlated control (or deflating) variables c and f, introduces a statistical dependency (Pearson, 1910; Yule, 1910; Vanderbok, 1977; Bollen and Ward, 1979; Fuguitt and Lieberson, 1974; Schuessler, 1974; Pendleton et al., 1983). Even if it is assumed in Eq. (1) that rbe = r,, = rbf = rbc = 0 (i.e., all intercorrelations are equal to 0), the correlation ref between ratios a and d is greater than zero because of the mutual dependencies on c andf, the highly correlated denominators. In this case, the spurious correlation would be equal to (Schuessler, 1973; Pearson, 1897; Fuguitt and Lieberson, 1974; Pendleton et al., 1979) rcfVJf rad
=
(VZ
+
vy
(VZ
+
byn
(2)
It also has been shown that when the denominators are the same the statistical dependency still exists and that this problem, both for highly correlated denominators and for equal denominators, extends from bivariate correlation to multivariate statistical techniques based on correlation matrices (cf. Fuguitt and Lieberson, 1974; Pendleton et al., 1979, 1983; Rangarajan and Chatterjee, 1969; Pearson, 1897). It appears, however, that “spuriousness” is a major problem only when model testing, not model building (Pendleton et al., 1983; see also Kasarda and Nolan, 1979). Although a number of solutions have been investigated, including logarithms (O’Conner, 1977; Schuessler, 1973, 1974), partial correlation
270
BRIAN
F. PENDLETON
(Kuh and Meyer, 1955; Przeworski and Cortes, 1977; Fleiss and Tanur, 1971), part correlation (Logan, 1971, 1972), and principle components analyses (Atchley et al., 1976), debate about the efficiency and bias of such solutions continues (Kasarda and Nolan, 1979; Anderson and Lydic, 1978; Atchley and Anderson, 1978; Long, 1979a; Wright, 1979; Pendleton et al., 1983; Bollen and Ward, 1979). Recently, it has been suggested that each ratio variable may be expressed as a residual in which the numerator has been regressed on the denominator (Freeman and Kronenfeld, 1973; Pendleton et al., 1979, 1983; Bollen and Ward, 1979; Vanderbok, 1977; Fuguitt and Lieberson, 1974). This study is an attempt to assess the empirical and hypothesis testing differences between the traditional use of ratios and the residual form of control for ratio variables. For the sake of parsimony, ratios with correlated denominators are used. Mortality models for young, middleage, and older-age mortality will be expressed both in the traditional ratio form and as residuals and used as the substantive example. THE MORTALITY
MODEL
Briefly discussed below is a simple 11-variable mortality model. It will provide an empirical test of the ratio variable correlation problem where correlated denominators have purposefully been built in. Variables were chosen that were prevalent in the literature and could easily be calculated as rates and residuals to allow for comparisons. Because previous research rarely defined situations for only a young, middle-age, or older-age population, hypothesized relationships are applied to all three age groupings with support or refutation to be determined inductively in the findings. Substantive characteristics of the model construction are discussed in greater detail by Pendleton (1977, pp. 16-49; see also Kitagawa and Hauser, 1973; McGirr, 1976a, 1976b). Social Class Although social class measures often are crude and the dividing lines between class stratums are arbitrarily made (Roberts, McBee, and MacDonald, 1970; Hodge and Siegal, 1%8), an inverse relationship between social class and mortality has been documented by a number of studies (Antonovsky, 1967; Sheps and Watkins, 1947; Griffiths, 1971; Roberts et al., 1970). There are three middle-range concepts employed most often to represent social class: education, occupation, and income. The weight of the evidence points to an inverse relationship between these measures of social class and mortality at all ages. Kitagawa and Hauser (1968, 1973), Stockwell (1963), and Upchurch (1962) all report a general inverse relationship between education and mortality. Schwirian and Lagreca (1971) report the influence of education on mortality operating through soundness of housing (see also McGirr,
RATIO
VARIABLE
CORRELATION
271
1976b). Persons engaged in higher status occupations generally experience lower overall mortality (Sly and Chi, 1972; King, 1971; Brenner, 1971; Metropolitan Life Insurance Co., 1959) but this direction is not a universal finding (e.g., Sauer and Parke, 1974; Stocks, 1938; Martin, 1951) and results appear to differ somewhat depending upon causes of death investigated (e.g., Stern, 1951; Tuckman, Youngman, and Kreizman, 1965; Logan, 1954). Income (or economic status) almost invariably is negatively associated with mortality (Coulter and Guralnick, 1959; Patno, 1960; Stockwell, 1963; Yeracaris, 1955; Altenderfer, 1947; Sly and Chi, 1972; Ellis, 1957; Coombs, 1941; McGirr, 1976b; Stem, 1951), and Schwirian and Lagreca (1971) conclude that income influences mortality through housing quality. Urban-Rural
Residency
Rural areas generally enjoy lower overall mortality, but this differential appears to be decreasing over time (Price, 1954; Nam, 1968; Syme, Borhani, and Buechley, 1966; Pendleton and Chang, 1979; Kitagawa and Hauser, 1973). Most authors, however, find urban areas to be characterized by higher death rates from most causes of death (Arriga, 1967; Wiehl, 1948; Hamilton, 1955; McGirr, 1976a, 1976b; Stern, 1951; Stocks, 1938; Dot-n, 1959; Hitt and Bertrand, 1951; McMahan, 1951). Marital
Status
Research shows married populations to have lower mortality than single, divorced, or widowed populations (Nam, 1968; Sheps, l%l; Geerken and Gove, 1974; Gove, 1973; Young, Benjamin, and Wallis 1970). This inverse relationship generally holds for all age categories and both sexes (Shurtleff, 1955). Sex Among all the differentials of mortality, the most universally supported is sex (Dom, 1959; Nam, 1968; Wingard, 1982). Research invariably shows lower mortality for females (Price, 1954; McMahan, 1951; Spiegelman, 1967; Madigan, 1957; Enterline, 1961; National Center for Health Statistics, 1973). Some authors posit social class and marital status as variables intervening between the sex differential and mortality (Patno, 1960; Yeracaris, 1955; Kitagawa and Hauser, 1968; Stocks, 1938; Logan, 1954; Martin, 1951; Gove, 1973; Geerken and Gove, 1974). Health
and Medical
Care and Facilities
Where medical or health facilities and care are relatively unavailable to the population, mortality is higher. Sound health and the availability and utilization of medical facilities for emergency and durational care are inversely associated with mortality (Dorn, 1959; Stockwell, 1961; Sly
272
BRIAN
F. PENDLETON
and Chi, 1972; Schwirian and Lagreca, 1971). There is some evidence that social class is antecedent to health and medical care and facilities (Upchurch, 1962). Housing Housing density (overcrowding) and poor housing conditions usually are linked positively with mortality (Ellis, 1957; Stockwell, 1%3; Coombs, 1941). In the past, this relationship was especially strong when mortality was measured as the incidence of acute (infectious) diseases (Benjamin, 1965; Mabry, 1958; Ellis, 1957). There also is evidence to indicate that housing serves as an intervening variable between education and income levels and mortality (Schwirian and Lagreca, 1971). Causes of Death Three causes of death are delimited to represent the composition of overall adult mortality: acute, chronic, and social (World Health Organization, 1957; Weatherby, Nam, and Isaacs, 1983; Hillery, Ludtke, and Weisbuch, 1968; Roberts et al., 1970). Maternal deaths, infant deaths and other symptoms, senility, and ill-defined conditions are not directly relevant to this study and are not considered further. Greater control of acute diseases has lowered their relative contribution to overall mortality (Dauer, Korns, and Schuman, 1968; Spiegelman, 1967; Ellis, 1957). Chronic diseases are characterized by a degeneration of normal physiological processes. Mortality from degenerative diseases has increased in relative importance during the past few decades. The third major category of causes of death is social. Included here are mortality vehicle accidents, all other accidents, suicide (and self-inflicted injury), and homicide (and war). Social mortality other than motor vehicle accidents is somewhat culture bound largely because available sociocultural materials and economic factors define the possibilities of death (Hillery et al., 1968). Figure 1 graphically displays the analytical relationships between the dependent variables, causes of death, and young, middle-age, and olderage mortality. The literature review suggests the following multivariate hypotheses. TH.l. Social class as measured by income, educational and occupational characteristics will exhibit significant inverse relationships in multivariate models with young, middle-age, and older-age mortality due to acute, chronic, and social causes. TH.2. Residency, marital, health and medical, sex, and housing status will exhibit significant relationships in multivariate models with young, middle-age, and older-age mortality due to acute, chronic, and social causes.
RATIO VARIABLE Sex
Social Class (Education) Social Class (Occupation) Social Class (Income)
Housing
Acute Causes of Death Chronic Causes of Death
Marital Status
Residency
273
CORRELATION
Health and Medical Care and Facilities
Mortality A. Young B. Middle-Age C. Older-age
Social Causes of Death
FIG. 1. Analytical variables and orders of priority for models of young, middle-age, and older-age mortality. For the sake of clarity directional arrows are left out. However, the models may be read as path models with temporal movement from left to right and downward. Exceptions to this causal inference are the relationships between sex and marital status, and among acute disease, chronic disease, and social causes of death: these are associations with no causal implications.
MEASUREMENT AND METHODS The three dependent variables and 11 independent variables just reviewed are measured with aggregate data from all 99 Iowa counties for 1970.’ Discussed below are the procedures for dependent and independent variable measurement. Dependent Variables: Young, Middle-Age,
and Older-Age Mortality
Correlates and causes of death are known to differ with various age groups, but rarely has an attempt been made to divide adult mortality into sociologically meaningful groups and identify socioeconomic antecedents to causes of death. The division of adult mortality into young, middle-age, and older-age mortality can aid in the interpretation of mortality differentials at meaningful life cycle levels. “Young” mortality refers to the standardized death rate for persons 20 to 39 years of age. Various methods for age standardization are available (Daniel, 1974; Kitagawa, 1955, 1964; Wunsch and Termote, 1978). The direct method is used in this study (U.S. Bureau of the Census, 1975, p. 419) in which
where m, = age-adjusted standardization rate, m, = the age-specific death rate for a P, particular area at a given time, P, = the standard population at each age, and P = EP, = the total standard population. I Data are from Project 1972 at Iowa State University, Dr. David L. Rogers, principal investigator, and Pendleton (19771, drawing from the Iowa State Department of Health (19771, Taylor (19771, and various Bureau of the Census publications.
274
BRIAN
F. PENDLETON
Both the minimum and maximum age values of this age group are by no means universally accepted. Past research (e.g., Weiss, 1976; U.S. Department of Health, Education, and Welfare, 1974; Metropolitan Life Insurance Co., 1977; Neugarten, Moore, and Lowe, 1968), however, indicates that a satisfactory median for establishing the lower age limit for “young” is 20 years of age. “Middle-aged” mortality refers to the standardized death rate for persons 40 to 59 years of age. Research by Dorn (1959), Spiegelman (1967), Weiss (1976), the Metropolitan Life Insurance Company (1977), Neugarten et al. (1968), and Neugarten (1974) support a division between young and middle age at 40 (see also Bogue, 1969, and U.S. Bureau of the Census, 1975, p. 473). “Older-aged” mortality refers to the standardized death rate for persons 60 to 75 years of age. Criteria for identifying an older segment of a population are perhaps least agreed upon for any substantive age grouping. The Metropolitan Life Insurance Company (1960a, 1960b) has used 60 to 80 and 65 to 84 years of age, respectively, to define the older segment of the United States population, and Weatherby et al. (1983) use 85 + in a cross-national context. Demographic research by Spiegelman (1967), Bogue (1969), and, for country mortality, by McGirr (1976b) and Van Es and Bowling (1976) has defined the percent of the population over age 65 as an elderly population. Neugarten (1974) and Neugarten et al. (1968), however, found people’s perception of “old-age” actually was lower. An upper limit of 75 is chosen because the impact socioeconomic factors have in determining mortality lessens considerably after the average expectation of life age is passed and because of people’s perceptions of a difference at age 75 (Neugarten, 1974). Independent
Variables
When calculating each independent variable as a rate, the denominator, or deflating factor, is a population variable (e.g., total county population, population 14 and older, population 20 and older). Residuals are calculated by regressing the numerator on the denominator. The resulting regression coefficient is then used to calculate a predicted numerator; the residual is the difference between the observed numerator and the predicted numerator (the residual approach is discussed by Bollen and Ward, 1979; Pendleton et al., 1979, 1983; Vanderbok, 1977). For example, if X = Y/Z, a residual variable X is calculated as Y = 6, + b,Z + e
(4)
^y = b, + b,,=Y
(5)
X=Y-
^u
(6)
RATIO
VARIABLE
CORRELATION
275
where Y = numerator (criterion variable), Z = denominator (deflating variable), b0 = intercept, bl = regression coefficient, e = error,Y = predicted Y, & = b, = regression coefficient from Eq. (5) where Z, the denominator or deflating variable, has been entered for control purposes; and F = residual variable. Each of the 11 independent variables was calculated as a rate, and then as a residual, for 1970. The discussion below explains the operationalization of these variables; findings in which the three mortality models (young, middle-age, older-age) for 1970 expressed as ratios are compared to those expressed as residuals follow. Notice that the denominators used for all variables are highly correlated (e.g., county population, population 14 and older, population 20 and older, total number of families), which meets the example described earlier in Eqs. (1) and (2).
Sex is calculated as the proportion of the county’s total population that is over age 14 and female. The residual is calculated from the number of females regressed on the total county population. The proportion of the county population age 14 and over and married is chosen to represent marital status. The contribution of persons below age 14 to marital status is negligible. The residual calculation is the regression of number of married on the population age 14 and older. High school graduates among the population 20 years of age and older are chosen to represent the general educational status of a county. Most high school educations are completed by age 20 while many college educations (the most viable alternative) continue into the late 20s well beyond the lower age boundary for the young mortality model. The ratio calculation is the number of high school graduates divided by the population 20 and older. Residual calculation is done by regressing the number of high school graduates on the population 20 and older. Occupations designated as white-collar (Bergel, 1962) for 1970 were chosen for an occupational measure and include professional, technical, and kindred workers; managers, officials, and proprietors (except farm); clerical and kindred workers; and sales workers. The ratio calculation for the proportion of the labor force engaged in white-collar occupations is the sum of males and females employed in white-collar jobs divided by the total employed labor force. Residuals are obtained by regressing those employed in white-collar occupations on the total employed labor force. Income is measured by the proportion of all families above the state median family income level for 1970. The ratio calculation is the number of families in the county above the state’s median family income level for 1970 (median = $6664) divided by the total number of families in the county. Residuals are calculated by regressing the number of families above the median family income level on the total number of families
276
BRIAN F. PENDLETON
in the county. Residency is represented by the proportion of a county’s population that is urban and uses the number of people residing in urban areas and total population. Housing density is the proportion of all occupied housing units with more than 1.0 persons per room. Housing units with more than 1.0 persons per room are considered to be “overcrowded.” Ratios are calculated by dividing the number of occupied housing units with more than 1.0 persons per room by the total number of occupied housing units. Residuals are the regression of the former on the latter. The number of medical doctors available in the county accurately reflects both health care availability and the proximity of medical care facilities (e.g., clinics, hospitals, or private office practices). The ratio calculation for 1970 is the number of medical doctors divided by the total population; residuals are again obtained by regressing the former on the latter. Acute and chronic diseases and social causes of death were identified from work by Hillery et al. (1968), Roberts et al. (1970), and the World Health Organization (1957).’ The ratio calculation is the sum of all deaths due to acute diseases divided by the total number of deaths. Residuals are obtained by regressing the number of acute disease deaths on the total number of deaths. Data used for acute diseases and the total number of deaths are 3-year averages. The logic of these calculations is the same for chronic diseases and social causes of death. FINDINGS Correlations
Table 1 displays the zero-order correlations for all variables for 1970. Coefficients above the diagonal are based on variables calculated as rates. Coefficients below the diagonal are based on variables calculated as residuals. Note that the correlations for the three mortality ages are the same both for ratios and residuals because the mortality rates have been standardized and, thus, are the same. Following are tables displaying coefficients for OLS equations based on variables calculated as rates and residuals. The variables should be read vertically as one continuous equation. ’ Acute diseases include influenza and pneumonia; all forms of tuberculosis; syphilis and its sequelae; alI forms of dysentery; bronchitis, emphysema, and asthma; meningococcal infections, poliomyelitis; meningitis; and all other infectious annd parasitic diseases. Chronic diseases include diseases of the heart; hypertension, cerebrovascular disease, arteriosclerosis, all other diseases of arteries, arterioles, and capillaries; all other major cardiovascular diseases; malignant neoplasms, including neoplasms of lymphatic and hematopoietic tissues; diabetes mellitus; peptic ulcer and ulcer of the stomach and duodenum; cirrhosis of the liver; nephritis and nephrosis; and vascular lesions affecting the central nervous system. Social causes of death include motor vehicle accidents, all other accidents. suicides, homicides, and all other external causes.
age
Y0Ulg
.014 .344***
- .025 .371*** ,141
- .408***
.743*** .436*** ,124 -.189 .240* ,063
,119 -
,151
.138
Sex StatUS
-
.034 ,120
.ool
- .614***
.550*** - .054 - ,028 ,099 ,059
- ,029 - ,018 - .ool
- .368***
,186 .322*** - ,084 .154 - .346***
,103 .I80
SK?
,089
Education
- ,103 .236** - ,039
,142
- .513*** .425*** - .400*** .555*** - .5%***
.2w* - ,081
,177
-.169
occupation
,013 -.130 - ,013
0.36
- .217* .462*** .679*‘* - .372*‘* ,078
.306** - ,057
.03 1
- .019
Income
** pc.01.
*** p e .OOl.
,115 ,042
-.004
-.I46
- .389*** - .300** ,181 .305** .381*** -
,232’ - ,075
.231*
- ,034
HOUSing
- ,078 - .649***
-
.076
-
,024 - ,220’ - .075
- .095 - ,030 ,036 ,064 .114 ,158
.141 ,030
.207*
.086 -.124
.166
- .039
Social causes of death
with
-.l% -sum
-.161
-.119 ,009 - ,091
,150 - .220 -
.177
.224* ,100 .219*
399 - ,089 - ,068 .236* -440 ,151
,053 ,182
- .067
-.187
Chronic causes of death
Calculated
Acute causes of death
- ,026
Mortality
- .474*** ,134 .491*** .243* .370*** .121
,024 -.170
,038
-.121
Health and medical
and Older-age
based on variables calculated as residuals are below
-.I70 .235**
- ,023
- .464*** ,181 .79v** .620*** - .250**
.435*** ,060
,216’
-.117
Residency
TABLE 1 Causes of Death, and Young, Middle-age, Ratios and Residuals for 1970”
- ,023 .511***
- ,002
.104
Characteristics,
based on variables calculated as rates are above the diagonal. Correlations
- .036 -.I11
-.190 ,047 .393***
-.167
-.090
.117
.064 -.loo
.226* ,016 - .231 .122 -.006 .126
.131
-
.357***
.133
.158 ,024 -.143 -.177 - xl95 .176
.357*** ,158
-.I02
Olderage Mortality
,144 .I55 .082 .077 .I24 .014
.I33 .194
- .I02
-
Middleage Mortality
Matrix for socioeconomic
Mortality
’ Correlations the diagonal. * p =G .05.
mortality Sex status Marital status Education Occupation Income Residency Housing Health and medical Causes of death Acute Chronic Social
Older
mortality Middle-age mortality
Young
Correlation
278
BRIAN F. PENDLETON
Ratio and Residual Standardized Partial Coejjicients
Table 2 displays standardized partial regression coefficients for young, middle-age, and older-age mortality models calculated with ratios for 1970. The models of mortality display some interesting direct effects. Between 16 and 26% of the variation in young, middle-age, and older-age mortality is accounted for by 1970 socioeconomic characteristics and causes of death. The strongest predictor variable is white-collar occupations with young mortality (it is in the hypothesized direction); white-collar occupations and income with middle-age mortality (the latter is in the hypothesized direction); and urban residency with older-age mortality (it is in the hypothesized direction). TABLE 2 Regression Coefficients for Socioeconomic Characteristics, Causes of Death, and Young, Middle-age, and Older-age Mortality Calculated with Ratios for 1970 Dependent variables 1970 Independent variables 1970 Sex status Marital status
Young mortality”
Middle-age mortality”
Older-age mortality”
,227’ (.W - ,289’ .24@ (.019) - .485* (.020) .137b (.012) .04Sb (.ow - .009 f.044
.020 (.095) ,282’ C.044) .lOSb ( .029) ,432’ (.030) - ,458’ (.018) .134b (.007) .341b (.067)
- .071 (.258) .281b (.119) .017 (.078) ,016 (.081~ - .006 (.048) .519* (.018) .134 (.183)
- ,080 (.076)
- ,068 (.116)
- .075 (.315)
- .07Sb 1.003) - ,233’ (.oOl) - .0806 (.003) .16
.161 (.004) - .056’ (.OOl) .0936 C.004) .23
103b (.Oll) .061b (.004) .027b (.012) .26
(.02) Education Occupation Income Residency Housing Health and medical care facilities Causes of death Acute Chronic Social R2
a The first number is the standardized partial regression coefficient (direct effect). The numbers in parentheses are standard error. ’ Coefficient is at least twice its standard error and is significant.
RATIO VARIABLE
279
CORRELATION
Table 3 presents the standardized partial regression coefficients for socioeconomic characteristics, causes of death, and young, middle-age, and older-age mortality calculated with residuals for 1970. Table 3 is organized the same as Table 1. Two models of mortality display R”s above .20; middle-age mortality is .24 and older-age mortality is .27. While only 14% of the variation in young mortality is accounted for by the 1970 socioeconomic characteristics and causes of death, all but one of the partial /Is are significant. Relationships correctly hypothesized are those with marital status, urban residency, housing density, and health and medical care and facilities. With the correlated effect of population size and number of deaths removed through TABLE 3 Regression Coefficients for Socioeconomic Characteristics, Causes of Death, and Young, Middle-age, and Older-age Mortality Calculated with Residuals for 1970 Dependent variables 1970 Independent variables 1970
Young mortality”
Middle-age mortality”
Older-age mortality”
Sex
.370b (.18 E-3) - .208’ (.lO E-3) .020b (.Ol E-5) .199b (.07 E-3) .22gb C.01) .126* (.Ol E-5) .130b (.40 E-3)
- ,082 (.27 E-3) .l95b (.15 E-3) .080* (.16 E-3) - .121b (.lO E-3) - .268’
- ,203’ (.74 E-3) .309b (.41 E-3) .040b (44 E-3) - .276b (.28 E-3) .0666 (.M) .153b (.Ol E-5) .Oli” (.16 E-2)
status
Marital status Education Occupation Income Residency Housing Health and medical care/facilities Causes
of death
Acute Chronic Social R’
(.W - ,108’ (.Ol E-5) .132b (.60 E-3)
- .172’ C.001)
-.117b
- .002 (.003) - .269’ C.004) -.151b (.09 E-3) .14
.157b (.ow .o86b (.007) .432b (.13 E-3) .24
(.ow
.065b (.58 E-2) .103” (.Ol) .172b
(.@a .443b (.36 E-3) .27
a The first number is the standardized partial regression coefficient (direct effect). The numbers in parentheses are the standard error. b Coefficient is at least twice its standard error and is significant.
280
BRIAN
F. PENDLETON
residualization, we find all three causes of death to be negatively related to young mortality and significant (although acute causes of death is not significant). Within the middle-age mortality model all but one of the partial p’s again are significant. Socioeconomic characteristics operating in hypothesized directions are sex status (p = - .08; but not significant) whitecollar occupations (p = - .12), county wealth or income (/3 = - .27), and housing density (p = .13). While all three causes of death are significant and positive, a special note is made about the large p for social causes of death among the middle-age (/3 = .43). Most important among the socioeconomic characteristics accounting for the R* of .27 in the older-age mortality model are sex status (p = - .20), marital status (/3 = .31), and social causes of death (p = .44). Sex status, white-collar occupations, urban residency, housing density, and health and medical care/facilities all are socioeconomic characteristics that are significant and in the hypothesized directions. All three causes of death are significant in the older-age mortality model. Most prevalant, however, are social causes of death (p = .44). Comparison of Rate and Residual Equations in Standardized Form
The young, middle-age, and older-age mortality models for 1970 rate and residual calculations offer insight to the apparent advantage of residual analysis. The R2’s for each mortality model are very similar but a number of differences are apparent when the direct effects are compared. When calculated with ratios, 14 of the 24 relationships between socioeconomic characteristics and young, middle-age, and older-age mortality are significant; 7 of these 14 are in the hypothesized directions. In contrast, when residuals are used, 23 of the 24 relationships are significant and 12 of these are in the hypothesized directions. The most consistently significant socioeconomic variables in the rates models are marital status and urban residency. In the residual analyses marital status, education, white-collar occupations, income, urban residency, housing, and health and medical care/facilities are significant in all three mortality models for 1970. For the most part, levels of significance and directions for the three causes of death and models of mortality equations using rates and residuals are the same. Two notable exceptions are the extremely high, positive p’s between social causes of death and middle-age and older-age mortality (~3 = .43 and .44, respectively), in the residual analysis. In summary, the use of residuals instead of ratios in 1970 resulted in a much greater number of relationships for socioeconomic characteristics and causes of death with mortality models to be significant. Also, more of these significant relationships using residuals tend to be in the hy-
RATIO VARIABLE
CORRELATION
281
pothesized direction.3 As noted by Pendleton et al. (1983) when model testing, the ratio variable correlation problem may be of major importance, and correcting for the induced spuriousness becomes a major priority for the researcher (see also Yule, 1910). Kasarda and Nolan (1979), a few years earlier, stressed the importance of the theoretical framework from which models are derived (see also Schuessler, 1973; Long, 1979a, 1979b; Macmillan and Daft, 1979). Interestingly, the magnitude between relationships displayed in ratio and residual analyses differ, but not with any patterned regularity. About half increase between ratio and residual comparisons and half decrease. Yet the number of significant relationships, which increase dramatically when one moves into the residual models, means correlated denominators can change dramatically the standard error of regression coefficients4 DISCUSSION More of the significant relationships, both hypothesized and not hypothesized, were found with equations using variables calculated as residuals. The multivariate mortality models were developed from extremely strong previous research. It may be assumed that the residual analysis more accurately described the hypothesized relationships and, thus, provided a more accurate empirical “picture” of the theory. Multiple benefits to the development of theories of socioeconomic epidemiology may be gathered from studies based upon residual, as opposed to more traditional, rate or ratio calculations. Socioeconomic epidemiology is the study of the extent to which differences in socioeconomic status account for differences in mortality, indicating gains that could be achieved in mortality reduction if socioeconomic conditions are improved. A number of multivariate relationships tested and generated in this study provide insight into, and raise questions about, relationships between county socioeconomic characteristics and young, middle-age, and older-age mortality. For example, the cross-sectional analysis for 1970 ratio and residual models show higher occupational status and social causes of death to be positively correlated with younger age mortality (with other factors controlled), rather surprising in view of past bivariate 3 It should be remembered that residuals for the three dependent variables, young, middle-age, and older-age mortality cannot be calculated. They are calculated and used as standardized rates both for rates and residual analyses. 4 It should be noted that the lack of changes in magnitude between ratio and residual models may be due to the substantive nature and ordering of variables in the mortality models rather than the lack of a correlated denominator effect manifested in the p coefficients. This interpretation is partially supported by a second look at the zero-order correlation coefficients (Table 1). More than half the comparisons show lower bivariate correlations for the residual correlations, empirically providing some support for the notion of spuriousness when there are correlated denominators.
282
BRIAN F. PENDLETON
research. Do present societal standards prescribe young people to pursue occupational careers that have a profound influence on their social causes of death? The more important methodological and statistical finding concerns the use of rate and ratio variables in health, demographic, and sociological research. Standardized partial regression coefficients calculated with ratios resulted in fewer significant and less meaningful relationships. The use of residualized variables, where the numerator is regressed on the denominator, provided substantially more significant and correctly hypothesized relationships. Perhaps most important for future research in health, medicine, and the social sciences is an examination of research using ratios that have correlated components. Misleading associations are now known to result in certain situations when correlational analyses are used and the ratio form of the variable is not theoretically specified. But the magnitude of this “spuriousness” and the degree to which ratio variable correlations affect final associations remains a relatively uncharted path. Tangential effects of the “spuriousness” to component computations of other inferential statistics remain to be identified. REFERENCES Altenderfer, M. E. (1947) “Relationship between per capita income and mortality in the cities of 100,000 or more population,” Public Health Reports 62, 1681-1691. Anderson, D., and Lydic, R. (1977). “Ratio data and the quantification of drug effects.” Biobehavioral Reviews 1, 55-57. Anderson, D., and Lydic, R. (1978). “A simulation program examining the use of ratios and raw variables in analyses of variance,” Computer Programs in Biomedicine 8, 87-90. Antonovsky, A. (1967), “Social class, life expectance, and overall mortality.” Milbank Memorial Fund Quarterly 4.5, 31-73. Arriga, E. E. (1967). “Rural-urban mortality in developing countries: an index for detecting rural underregistration,” Demography 4, 98-107. Atchley, W. R., and Anderson, D. (1978) “Ratios and the statistical analysis of biological data,” Systematic Zoology 27, 71-78. Atchley, W. R., Gaskins, W. C., and Anderson, D. (1976), ‘Statistical properties of ratios, Pt I, Empirical results,” Systematic Zoology 25, 137-148. Benjamin, B. (1965). Social and Economic Factors Affecting Mortality, Mouton, The Hague/Paris. Bergel, E. E. (1962), Social stratification. McGraw-Hill, New York. Bogue, D. .I. (1969), Principles of Demography. Wiley, New York. Bollen, K., and Ward, S. (1979), “Ratio variables in aggregate data analysis: their uses, problems, and alternatives,” Sociological Methods and Research 7, 431-450. Brauell, J. F., and Gillespie, M. K. (1981), “Comparative demography,” International Journal of Comparative Sociology 22, 141-168. Brenner, M. H. (1971), “Economic changes and heart disease mortality.” American Journal of Public Health 61, 606-611. Coombs, L. C. (1941), “Economic differentials in causes of death,‘” Medical Care 1, 246254. Coulter, E. J., and Guralnick, L. (1959). “Analysis of vital statistics by census tract.” Journal of the American Statistical Association 54, 730-740.
RATIO VARIABLE
CORRELATION
283
Daniel, W. W. (1974), Biostatistics: A Foundation for the Health Sciences, Wiley, New York. Dauer, C. C., Koms, R. F., and Schuman, L. M. (1968), Infectious Diseases, Harvard Univ. Press, Cambridge, Mass. in The Study of Population: An Inventory and Appraisal Dom, H. F. (1959), “Mortality,” (P. M. Hauser and 0. D. Duncan, Eds.), pp. 437-471, Univ. of Chicago Press, Chicago. Ill. Ellis, J. M. (1957), “Socioeconomic differentials in mortality from chronic diseases,” Social Problems 5, 30-36. Enterline, P. E. (1961), “Causes of death responsible for recent increases in sex mortality differentials in the United States,” Milbank Memorial Fund Quarterly 39, 312-328. Featherman, D. L. (1971), “A research note: a social structural model for the socioeconomic career,” American Journal of Sociology 77, 293-304. (a) Featherman, D. L. (19711, “Residential background and socioeconomic achievements in metropolitan stratification systems,” Rural Sociology 36, 107-124. (b) Featherman, D. L., and Hauser, R. M. (1976), “Changes in the socioeconomic stratification of the races, 1962-73.” American Journal of Sociology 82, 621-651. Fleiss, J. L., and Tanur, J. M. (1971), “A note on the partial correlation coefficient,” American Statistician 25, 43-44. Freeman, J. H., and Kronenfeld, J. E. (1973), “Problems of definitional dependency: the case of administrative intensity,” Social Forces 52, 108-121. Fuguitt, G. V., and Lieberson, S. (1974), ‘Correlation of ratios or difference scores having common terms,” in Sociological Methodology 1973-1974 (H. L. Costner, Ed.), pp. 128-144, Jossey-Bass, San Francisco. Geerken, M., and Gove, W. R. (1974), “Race, sex, and marital status: their effect on mortality,” Social Problems 21, 567-580. Gordon, R. A. (1968), “Issues in multiple regression,” American Journal of Sociology 73, 592-616. Gove, W. R. (1973), “Sex, marital status, and mortality,” American Journal of Sociology 79, 45-67. Griffiths, M. (1971) “A geographical study of the mortality in an urban area,” Urban Studies 8, 11l-120. Hamilton, H. C. (1955), “Ecological and social factors in mortality variation,” Eugenics Quarterly 2, 212-223. Hillery, G. A., Ludtke, R. L., and Weisbuch, J. (1%8), Causes of Death in the Demographic Transition, Paper presented at the annual meeting of the Population Association of America, Department of Sociology, Virginia Polytechnical Institute, Boston. Hitt, H. L., and Bertrand, A. L. (1951), “Rural-urban differences in mortality,” in The Sociology of Urban Life (T. Lynn Smith and C. A. McMahan, Eds.), pp. 267-280, Dryden, New York. Hodge, R. W., and Siegal, P. M. (1968), “The measurement of social class,” in International Encyclopedia of the Social Sciences, Vol. 15, pp. 316-325, Macmillan Co., New York. Iowa State Department of Health (1977), Special mortality tabulations performed for this study, Iowa State Department of Health, Des Moines. Kasarda, J. D., and Nolan, P. D. (1979), “Ratio measurement and theoretical inference in social research,” Social Forces 58, 212-227. King, H. (1971), “Clerical mortality patterns of the anglican communion,” Social Biology 18, 164-176. Kitagawa, E. M. (1955), “Components of a difference between two rates,” Journal of the American Statistical Association 59, 1168-l 194. Kitagawa, E. M. (1964), “Standardized comparisons in population research,” Demography 1, 296-315. Kitagawa, E. M., and Hauser, P. M. (1968), “Education differentials in mortality by cause of death: United States, 1960,” Demography 5, 318-353.
284
BRIAN F. PENDLETON
Kitagawa, E. M., and Hauser, P. M. (1973), Differential mortality in the United States: A study in Socioeconomic Epidemiology, Harvard Univ. Press, Cambridge, MA. Kuh, E., and Meyer, J. R. (1955), “Correlation and regression estimates when the data are ratios,” Econometrica 23, 400-416. Land, K. C., and Felson, M. (1976), “A general framework for building dynamic macro social indicator models: including an analysis of changes in crime rates and police expenditures,” American Journal of Sociology 82, 565-604. LeRichie, W. H., and Milner, J. (1971) Epidemiology as Medical Ecology, Williams & Wilkins, Baltimore. Logan, C. H. (1972), “General deterrent effects of imprisonment,” Social Forces 51, 64-73. Logan, W. P. D. (1954). “Social class variations in mortality,” Public Health Reports 69, 1217-1223. Long, S. B. (1979), “The continuing debate over the use of ratio variables: facts and fiction,” in Sociological Methodology: 1980 (K. Schuessler, Ed.), pp. 37-74, JosseyBass, San Francisco. (a) Long, S. B. (1979), Deterrance Findings: Examining the Impact of Errors in Measurement, Bureau of Social Science Research, Washington, D.C. (b) Lyons, W. (1977), “Per capita index construction: a defense.” American Journal of Political Science 21, 177-191. Mabry, J. (1958). “Some ecological contributions to epidemiology,” in Patients, Physicians and Illness (E. G. Jaco, Ed.), pp. 49-54, Free Press, New York. Macmillan, A., and Daft, R. L. (1979), “Administrative intensity and ratio variables: the case against definitional dependency,” Social Forces 58, 228-248. Madigan, F. (1957), “Are sex mortality differences biologically causes?” Milbank Memorial Fund Quarterly 35, 202-223. Martin, W. J. (1951), “A comparison of the trends of male and female mortality,” Journal of the Royal Statistical Society, Series A (General) 114, 387-406. McGirr, N. (1976), Differentials in Mortality: Some Underlying Dimensions, paper presented at the annual meeting of the Population Association of America, Montreal, Canada, Department of Sociology, Duke University. (a) McGirr, N. (1976). Mortality Differentials in the Southeast, paper presented at the annual meeting of the Southern Regional Demographic Group, New Orleans; Department of Sociology, Duke University. (b) differences in longevity,” in The Sociology of McMahan, C. A. (1951) “Rural-urban Urban Life (T. L. Smith and C. A. McMahan, Eds.), pp. 281-289. Dryden, New York. Medansky, A. (1964), “Spurious correlation due to deflating variables,” Econometrica 32, 652-655. Metropolitan Life Insurance Co. (1959), “Mortality and social class,” Statistical Bulletin 4o(Oct.), 9-11. Metropolitan Life Insurance Co. (1960), “Causes of death in later life,” Statistical Bulletin 41(0ct.), 6-8. (a) Metropolitan Life Insurance Co. (1960) “Trends in survival at the older ages,” Statistical Bulletin 41(Sept.), l-3. (b) Metropolitan Life Insurance Co. (1977), “Leading causes of death among insured lives. Statistical Bulletin 58(April):9. Nam, C. B. (Ed.) (1968), Population and Society. Houghton Mifllin, Boston. National Center for Health Statistics (1973), Mortality Trends: Age, Color, and Sex, United States, 1950-1969. U.S. Department of Health, Education, and Welfare, Washington, D.C. Neugarten, B. L., Moore. J. W., and Lowe, J. C. (1968), “Age norms, age constraints, and adult socialization,” in Middle Age and Aging (B. L. Neugarten, Ed.), pp. 2228. Univ. of Chicago Press, Chicago.
RATIO VARIABLE
CORRELATION
285
Neugarten, B. L. (1974), “Age groups in American society and the rise of the youngold,” The Annals of the American Academy of Political and Social Science 415, 187-198. Omran, A. R. (1971), “The epidemiologic transition: a theory of the epidemiology of population change,” Milbank Memorial Fund Quarterly 49, 509-538. O’Connor, J. F. (1977), “A logrithmic technique for decomposing change,” Sociological Methods and Research 6, 91-102. Patno, M. E. (1960), “Mortality and economic level in an urban area,” Public Health Reports 75, 841-851. Pearson, K. (1897), “Mathematical contributions to the theory of evolution: on a form of spurious correlation which may arise when indices are used in the measurement of organs, ’ ’ Proceedings of the Royal Society of London 60, 489-498. Pearson, K. (1910), “On the correlation of death rates,” Journal of the Royal Statistical Society 73, 534-539. Pendleton, B. F. (1977), Socioeconomic Epidemiology: Differential Determinants in a Longitudinal Framework, Unpublished PhD dissertation, Department of Sociology, Iowa State University, Ames. Pendleton, B. F., Warren, R. D., Chang, H. C. (1979) “The problem of correlated denominators in multiple regression and change analyses,” Sociological Methods and Research 7, 451-475. Pendleton, B. F., and Chang, H. C. (1979), “Ecological and social differentials in mortality: inequalities by metropolitan-nonmetropolitan residency and racial composition,” Sociological Focus 12, 21-35. Pendleton, B. F., Newman, I., Marshall, R. S. (1983) “A Monte Carlo approach to correlational spuriousness and ratio variables,” Journal of Statistical Computation and Simulation 18, 93-124. Price, P. H. (1954), “Trends in mortality differentials in the United States,” Southwestern Social Science Quarterly 35, 255-263. Przeworski, A., and Cortes, F. (1977). “Comparing partial and ratio regression models,” Political Methodology 4, 63-75. Rangarajan, C., and Chatterjee, S. (1969), “A note on comparison between correlation coefficients of original and transformed variables,” American Statistician 23, 28-29. Roberts, R. E., McBee, G. W., and MacDonald, E. J. (1970) “Social status, ethnic status, and urban mortality: an ecological analysis,” Texas Reports on Biology and Medicine 28, 13-28. Sauer, H. I., and Parke, D. W. (1974), ‘Counties with extreme death rates and associated factors,” American Journal of Epidemiology 99, 258-264. Schuessler, K. (1973), “Ratio variables and path models,” in Structural Equation Models in the Social Sciences, (A. S. Goldberger and 0. D. Duncan, Eds.), pp. 201-228. Seminar, New York. Schuessler, K. (1974), “Analysis of ratio variables: opportunities and pitfalls,” American Journal of Sociology 80, 379-396. Schwirian, K. P., and Lagreca, A. J. (1971), “An ecological analysis of urban mortality rates,” Social Science Quarterly 52, 574-587. Sheps, C., and Watkins, J. H. (1947), “Mortality in the socio-economic districts of New Haven,” The Yale Journal of Biology and Medicine 20, 51-80. Sheps, M. C. (1961), “Marriage and mortality,” American Journal of Public Health 51, 547-555. Shurtleff, D. (1955), “Mortality and marital status,” Public Health Reports 70, 248-252. Sly, D. F., and Chi, P. S. K. (1972), “Economic development, modernization, and demographic behavior,” The American Journal of Economics and Sociology 31(4), 373-386. Spiegelman, M. (1967), “Recent mortality in countries of traditionally low mortality,” in Proceedings of the World Population Conference, 1%5, Vol. 2, pp. 375-378, Department of Economic and Social Affairs, United Nations, New York.
286
BRIAN F. PENDLETON
Stem, B. J. (1951), “Socio-economic aspects of heart disease,” Journal of Educational Sociology 24, 450-462. Stocks, P. (1938), “The effects of occupation and of its accompanying environment on mortality,” Journal of the Royal Statistical Society 101, 669-696. Stockwell, E. (l%l), “Socioeconomic status and mortality in the United States, Public Health Reports 76, 1081-1086. Stockwell, E. (1%3), “A critical examination of the relationship between socioeconomic status and mortality,” American Journal of Public Health 53, 954-964. Syme, S. L., Borhani, N. O., and Buechley, R. W. (1966). “Cultural mobility and coronary heart disease in an urban area,” American Journal of Epidemiology 82, 334-346. Taylor, J. R. (1977), “Comparison of observed expectation of life at birth for selected Iowa geographic areas,” unpublished paper, Iowa State Department of Health, Des Moines. Tuckman, J., Youngman, W. F., and Kreizman, G. B. (1965) “Occupational level and mortality,” Social Forces 43, 575-577. U. S. Bureau of the Census (1975), The Methods and Materials of Demography, (H. S. Shryock, J. S. Siegal, and Associates, Eds.), U.S. Govt. Printing Office, Washington, D.C. U. S. Department of Health, Education, and Welfare (1974). Vital Statistics of the United States: 1970, Vol. III-Marriage and Divorce, U.S. Govt. Printing Office, Washington, D.C. Upchurch, H. M. (1%2), “A tentative approach to the study of mortality differentials between educational strata in the United States,” Rural Sociology 27, 213-217. Uslaner, E. (1976), “The pitfalls of per capita,” American Journal of Political Science 20, 125-133. Vanderbok, W. G. (1977), “On improving the analysis of ratio data,” Political Methodology 4, 171-184. Van Es, J. C., and Bowling, M. (1976) The Aging of Local Populations: Illinois Counties between 1950 and 1970, paper presented at the annual meeting of the Rural Sociological Society, Department of Sociology, University of Illinois-Urbana, New York. Weatherby, N. L., Nam, C. B., and Isaac, L. W. (1983), “Development, inequality, health care. and mortality at the older ages: A cross-national analysis,” Demography 20, 27-43.
Weiss, N. S. (1976), “Recent trends in violent deaths among young adults in the United States,” American Journal of Epidemiology 103, 416-422. Wiehl, D. G. (1948), “Mortality and socio-environmental factors,” The Milbank Memorial Fund Quarterly 26(0ct.), 335-365. Wingard, D. L. (1982) “The sex differential in mortality rates,” American Journal of Epidemiology 115, 205-216. World Health Organization (1957), Manual of the Internal Statistical Classification of Diseases, Injuries, and Causes of Death (7th rev.), World Health Organization, Geneva. Wright, R. L. (1979), Comparing regression models involving ratio variables, pp. 574-579, proceedings of the Business and Economic Statistics Section, American Statistical Association, Washington, D.C. Wunsch, G. J., and Termote, M. G. (1978), Introduction to Demographic Analysis, Plenum, New York. Yeracaris, C. A. (1955). “Differential mortality, general and cause specific in Buffalo, 1930-1941,” Journal of the American Statistical Association 50, 1235-1247. Young, M., Benjamin, B., and Wallis, C. (1970), “The mortality of widowers,” in Social Demography (T. R. Ford and G. F. DeJong, Eds.), pp. 172-177. Prentice-Hall, Englewood Cliffs, N.J. Yule, G. U. (1910), “On the interpretation of correlations between indices or ratios,” Journal of the Royal Statistical Society, 73, 644-647.