Regional Science and Urban Economics 8 (1973) 1?3-173. ,T,North-Holland
AN EXAMINATION OF RESIDENTIAL MOBILITY THROUGH THE USE OF THE LOG-LINEAR MODEL: I. THEORY LB. WHITNEY Department of Statistics. University of Waterloo. Watcrfoo, Ont., Canada
B.PL BOOTS
Received May 1976, final version received June 1977 A method is proposed for studying the efkt of sock-economic characteristics on household mobility in an urban area. This method defines a priori, non-physical spaces in which the households move and then constructs on these spaces a probability model which can detect associations between socio-economic characteristics of a household and where the household moves in a priori spaces.
I. Introductioo In this paper we discuss a method for assessing the impact of selected household characteristics on the residential mobility of households in a specified area. An individual household can be thought of as occupying simultaneously, positions in a variety of spaces representing aspects of the city’s structure. The utility of analyzing urban behaviour in terms of such spaces has been demonstratcd by studies such as those of Rees (2970) and Yeates (1972). The spaces constructed here may be considered ‘a priori’ in that their definition precedes the collection of data. The choice of appropriate spaces and their associated dimensions is influenced by the findings of previous empirical studies. In addition, all the spaces created are ‘non-physical’ ones. This is intentional and the reasons are several. First, as many rescarchcrs have shown empirically [e.g.. Murdie (1969). Timms (1971)] a position in a non-physical space (e.g., a social sp~e) often implies a distinctive location in real physical space. Second, in cxumining rcsidcntkll movcmcnts, \\ here the cfcstinrltion location c~tn be considered to follow the housing sclccCon, it may be inappropriate to search for p;rttcrns in phykt1 spaw bccawsc the Itis\ wzll sorted the housing opportunitics the grwtcr the wUon in spatial lc)C;l~i~)n. In addition, the units of physical sp;w
for \\hich
daf:l arc availabk
subject to conklcr;W
explanatory (1974))
internal
(c.g., ccnws
trxts,
~;Iri;Lfion and my
power of some traditional
approaches
well
traflic
zoiit‘s)
contrlbutc
[see for example
arc often to the low Simmons
154
J.B. Whitney and B.N. Boots, An examitration of residential mobility
The base population under study, consists of all households in a specified urban area at time to, During a given time period t1 -to (in our case about one year), a household can be classified in one of the following three categories: (i) remains at the same address, (ii) moves out of the urban area, or (iii) moves to a new address in the urban area. In this paper our discussion will deal with the sub-population of the third category, the popuIation of the households that move within the urban area in the time period tl -to. As mentioned above, we assume these households move in a number of nonphysical spaces, which have been constructed a priori. The spaces which we have selected are; a housing space (H), a community space (C), a tenure space (T) and combinations of these spaces such as (H, C), (H, 7): (H, C, T), etc. On these spaces we suggest a method of constructing a probability modtl which can be used to assist in assessing the impact that selected characteristics have on the movement patterns of households in these spaces. The method is based on the log-linear model, which requires that the response variables be polytomous. Our main objective of this modelling procedure is to describe a complex set of data concisely and to facilitate comparisons with other sets of data. The central problem of this objective is to obtain a suitable joint probability distribution for the moving households The log-linear model has received much attention in the recent literature; see for instance discussions by Birch (1963), Bishop (1969), Bishop et al. (1975), COX (1970, 1972), Davis (1974), Goodman (1970, 1971, 1972), Haberman (1974), Nelder and Wedderburn (1972), Mantel (1966), Plackett (1974), Theil (1970). The results given for the log-linear model in this paper are not new, they could be obtained by reading the above references. However, to our knowledge, this is first application of the log-linear model in the residential location context. The procedures proposed here require a large amount of data. Until recently this amount of data did not seem to be available. But now assessment rolls are computerized and a good deal of the required information can be obtained by comparing assessment rolls for succeeding years. This discussion is given in two papers: part I (this paper) and part II. Part I deals with theoretical aspects, while in part II the theory discussed in part I will be applied to 1ata obtained from computerized assessment rolls. 2. The a priori spaces 2.1. Tile housing space (H) A household, when it moves has a choice of various kinds of housing. For instance, housing could be classified according to quality and type; where type may refer to detached, semi-detacl%l, or r-~lltiple-housing. Indeed, using a factor analytic approach, Yeats (1972, tab’ found the principal dimensions
J-B. Whitney and B.N. Boots. An examirratiorr of residential mobility
155
of housing variation in both Kingston and Winnipeg to be ones which summarized characteristics of quality and type of home. Therefore ue suggest that a suitable housing space can be constructed by identifying housing of a particular quality Q and type t with a point in a 2-dimensional space which has as its two axes, quality and type. The variable, type. is polytomous. Quality can be either a continuous or discrctc variable. But because of the comments in the previous section. WCwill assume quality to bc a discrete variable, taking on Q values. Therefore. the points in the housing space ff can bc given as ff =
((4, r):q = 1, . . . . Q. I = 1, . . . . 7-j.
N is a Z-dimensional classification.
rectangular
lattice where the axes refer to the method of
Most attempts to define a community space have made USCof some or all of the recurrent dimensions identified first by social area analysis and later by factorial ecology (i.e., social status, family status, ethnic status, and mobility) [Berry (1971)]. Such an approach emphasizes the social component while overlooking physical environmental variables even though they are recognized as important influences in urban behaviour [Fishelton (1975)]. Such tin approach also means that the units involved in such spaces are aggregates of individual households. In our study we wish (both) to retain the individual household as the unit for all spaces and provide some balance between the social and physical attributes in the definition of the community space. In addition, to facilitate development of the model, we prefer to define a community space of limited dimensions. We suggest that an appropriate one-dimensional space might be one which makes use of zoning classes to create a discrete scale. Such a scale would provide a direct measure of the physical homogeneity of the area in which a household was located in terms of land use and building types. This would also suggest implicity much about the nature of its social homogeneity. Thus, the points in the community space, C can be given as
c =
(k:k
= I, .
..(
K).
The tenure of a household in its simplest classific;rtion can be delined as two categories, own or rent. Heme the tenure space ‘/‘can be very simply represented bY 7. = (I’:/* = I, 2;,
156
J.B. Whitney and B.N. Boots, An examinationof residential mobiiity
where 1 is the category of own, while 2 represents rent or non-owners. Pickvance (1973,1974), in particular, has demonstrated the importance of tenure characteristics to residential
mobility.
2.4. The combined spaces The points of these spaces are all possible ordered n-tuples that can be
obtained from the subspaces which are used to construct the combined space. The n-tuples can be represented as points of an n-dimensional rectangular lattice, the axes of which are in one-to-one correspondence with the axes of the subspaces. For example the points of the housing, tenure space, (H, T), would be represented by the set of triplets (H, T) = (q, t, r); q = 1, . . . , Q, t = 1, . . . , T, r = 1,2. These triplets could be plotted on a 3-dimensional lattice where the axes are the adjoined axes of the two subspaces H and T. 3. Probability models on the a priori spaces 3.1. Organizationof data and notation
We are concerned with the households of an urban area at time to, and in particular with the subset of households that move within the urban area during the time period between to and t 1 . At any time t we can attach to each household of the urban area, a number of characteristics which we feel might be important in studying the movement characteristics of a household in relation to some or all of the a priori spaces. If these characteristics are considel ed to be explanatory rather than response they could be recorded as continuous variables. For simplicity we assume that the characteristics are recorded in terms of discrete variables. Examples of such characteristics are: lift cycle stage, income, ethnicity, length of residence at that location, occupation, family size, and location in the a priori spaces at time to. The models we apply to the a priori spaces are condition‘\1 upon some or all of the state of affairs at time to. We assume that with a particular a priori space in mind there will be some character: ‘its at time to \vhich can be used to stratify all the households of the urban arca into subgroups cv cohorts, for which it is useful to study infikidually, the impact of other characteristics on the moving households. III the remainder of the discussion \VCwill refer to :he otlrcr charncteristics as the covariates. The covari:llcs ~uld likely be rc~~~&d :lt time I,, s but this is not necessary. It is possibk that ;I c~~v;k;ltc is ;I change that II;IS occurred in the time period tl - tOI The division of the population into cohorts \vill give ;I I WC l~o~~~ogc‘:~c’~~t~~ group of households and hence we s11ou1d be abk to study the impact of covariates by means of a simpler model. High-order interaction terms needed to model a very heteroger,ous group can be omitted, in this respect the simpler
J.B. Whitney and B-N. Boots, An examirration of residential mobility
157
models are also easier to interpret. Studying each cohort individually also allowsfor some flexibility. It will not be necessary, and in some cases may not be desirable, to have the same set of covariates for each cohort. In additicm, for some cohorts, it may be useful to redefine the spxe sf interest by dropping and/or combining points. Another important motivation for defining cohorts is to facilitate the implementation of the procedure that we propose. In effect, we will wish to avoid contin ncy tables with extremely large numbers of cells many of which may be empty or have low counts. NOW having decided the space upon which we wish to study movement patterns [for definiteness we shall choose the housing, tenure space (H, T)] we determine the cohorts. The defining characteristic of the cohorts is chosen to be the location e households in (H, T) at time lo. The location in the space at time to is sted because the type of model in which we are interested is a conditional model describing the influences that some covariates have on the moves in the space of interest, given that the move begins at a given point in that space. In addition, regardless of what space we are considering, tenure would seem to be a natural subdivision in order to obtain some degree of homogeneity. Given a particular cohort, the units of the population with which we will be most concerned are those households of the cohort that moved in the time period I, -lo. These moving households can be classified according to their location in (H, T) at time I, and a number of covariates; for our purposes say just one covariate 2. If we apply these classifications simultaneously, the groups which result can be represented by the cells of a N-dimensional rectangular lattice, where the axes refer to the methods of classification. The resulting array of cell frequencies is often looked upon as a fnuiti-dimcnsi~)n;tl contingency table. For our particular choice of the space (/I, Tj and co\ariatc %. we can let the classifications ((q, t, I’), y = I, . . . , Q; t = 1, . . . , ‘I; I’ == 1,2j of the space (H, T) be associated with axes I, 2 and 3, respectivcly. The classifications of the covariate 2, say 01 = 1, . . . , i21, will then be associated with a fourth axis. In this discussion and for the implcme~+ation of this proposal we shall assmc the data on the rmving houscholdc uf a given cohort ts bc orgmizcd in the form of a nlulti-dirne~:sior~~~lcontingency table.
158
J.B. Whitney and B.N. Boots, An exantination of residetrtial mobility
cell frequencies x,,,, over the dimensions omitted in the table. For instance in C14 the observed frequency in cell (q, m) is of the form x,,r x,,,, = x~+ +m, where + denotes summation over the subscript. 3.2. T/g raidont proces Purtderl’ykgthe model
Throughout the rest of this discussion we will keep, unless otherwise stated, the classification that gave table C, 234 What is said for this case we assume true for cases involving the other spaces. For a given cohort at time tI , there will be N of the households in the cohort that have moved within the urban area; nqtrm of these will have moved to the point (q, t, r) of (I& T) and have charac eristic 111of covariate Z. If we assume that for those households that moved each household acts individually*, and each household with characteristic 771 = 1, . . ., M, im approximately the same probability of moving to point (q, t, r) of (H, T) then for any value of m say nto ) the variables nqtrm,will follow a multinomial distribution, with total numbers of trials n, + +moand parameters pqfrmo. The combined distribution for the Q,,, is then a product of multinomial distributions. With this distributional assumption we have that, conditional on the totals of C4, the observed table C,,,, is an observation from a product of multinomial distributions, each cell having parameter pqtrm or equivalently expectation lqrrm = JQ~,J+ + +m. The extension is immediate to the cases where the simultaneous classification of the covariatcs produces a multidimensional table. In order to describe this model more concisely, that is to determine rnorc explicitly how the methods of classification arc associated with each other, we will try to determine relationships among the l
&trnl* 3.3. The log-linear model One of the standard quantities use: to measure associations among methods of classification is a cross product ratio of the Aqlnnor some function of it; if the ratio is 1 this indicates that no association is present. Motivated by this type of measure, it is argued by Plackett (1974) and others that an appropriate method of express5g the expectation of each cell, ABrrm,in order to assess associations, is to assume that the Aqfrrndepend multiplicativcly on the methods of classification, or that the natural logarithm of the Rat,,,”depends linearly on the methods of classification. Hence, using the notation of Bishop (1969) the expectations Rqtrrnare expressed in terms of the log-linear model
ln hjtrm=
u+
WI)
+
LlzW
+
u309
+
u4OW
+ 4 4(qm)+ U23(tr) + U&4 +
V, 2 4(WJ)
+
U, 34fqI’d
+
+
U,
Jqt) + V, 3(qr)
+ U3&n) + U123(qtr) U2 3 4( trm)
+
U, z 3
,(qtrm).
( 1)
J.B. Whitney and B.N. &?ots, An examination of residential mobility
159
The U-terms refer to the main and multi-factor effects. The numerical subscripts refer to the axes involved and the alphabetic characters refer to the categories of these areas. For example the term c’, J tm) refers to the two factor effect of dimensions 2 and 4 for categories t and tll; the two factor effect of housing type t and category IJI of covariate Z. (It is common terminology to refer to two factor eUccts as first-order interactions, three factor effects as second-order interactions, etc. In the sequel either terminology Cl1 be used.) Expression (I) resembles the kind of relationship that one has for the expectation in the usual analysis of variance model for quantitative responses. As in the analysis of variance model the number of parameters is greater than the number of possible expected values and hence restrictions have to be placed on the parameters to obtain a unique specification. Two of the possible restrictions that cm bc applied are: (i)
As in the usual model of arulysis of urkwx. the subscripted viewed as deviations from an overall mean and so WChave that
turns
are
In general each U-term summed over a~iy of its ck.r;sifications is zero. These restrictions were suggested by Biwh (1963). wlut~ of cjtr awl ttt, sry q’t ‘r ’ and tu’, set U equal to In ~.q.,‘r~m. and set each U-term which is a function of any of these prc-assigned values equal to zero. For example, suppose that q’ = 1’ = I” = td = 1, then terms of the type U,(l) = U, ~~jr,1) = U, 23J( qr 1ttz) = 0. These restrictions were suggested by Mantel (1966).
(II)
For prc-assigned
The model with Nrch’s restrictions has as bases for comparisons overall means, while the model with Mantel’s restrictions has the expectations of a chosen cell as the base for comparison. Expression (I) when subjected to either restriction (i) or (ii) has the number of unknot parameters equal to the number of cells. This expression under either restriction is the most complex model that we can huvc under the Jog-linear model as all multifactor U-terms are present. That is, all possible interactions among methods of classifications that can be expressed by the log-linear model are present. The method of a! iwing the interactions and the search for a concise description of the d&r is in effect a starch for a simpler model than expression ( 1): :I model where some of the U-terms arc set equal to zero, especially the higher order interaction U-terms. Setting a U-term equal to zero implies that all the components of the U-term index by the alphabetic indexes are zero. The following are s( it‘ models that might be assessed.
160
H,:
J.B. Whitney and B.N. Boots, An examination of residential mobility
The usual contingency table hypothesis of no association among the methods of classification would be given by setting all interaction terms to zero, to give the model
In &n = U+ V,(q;+ Ll,(t) + U,(r) + V,(m).
H,:
This is likely to be the simplest model tested, as main effects are usually conceded as exicting and the possible associations are of interest. The model which would - :rrespond to the hypothesis that the response classifications are associated with one another but are not associated with the covariate 2, is given by the model
H3: The model corresponding to the hypothesis that the response variables are not associated with one another but are associated with the covariate 2, is given by
The assessment of these relationships among the classifications is done by assessing how well the contingency table produced by the hypothesis and the data fits the observed table C1234. A judgment, partly based on statistical considerations, is then made as to whether or not the data supports the hypothesized model. 3.4. Methods of assessing the hypothesized model The two methods commonly used are the likelihood ratio test and the usual chi-square goodness-of-fit test that is associated with contingency tables. Both methods require the calculation of the maximum Irkelihood estimates of the U-terms under the hypothesized model, or what turns out be cquivalcnt, the maximum likelihood estimates of the &,,, under the hyp~thesizcd model. In many cases the maximum likelihood estimates of the 2,,,m,arc more convenient to find and to use than WC the U-terms themscl\-*cs.This is cspcci:llly true for the class of hierarchical models to be defined in the next section. For the type of model’ that we are dealing with in this paper, the test statistics of ‘As the total number approach infinity.
in the population
increases in six (a pproxhcs
inCnity) the &,,,
also
J-B. Whitney and B. N. Boots, An examintition of residential mobility
161
both methods can, under fairly wide conditions, be well approximated by the same chi-square distribution. Denote the set of all Aqtm,by R and the corresponding set of x,,, by x. With this notation the maximum likelihood estimate of the A,,,,, under a given hypotheGs H, is defined to be the set A, henceforth denoted by ii, which makes the set of observed outcomes x the most probable under the hypothesized model Hi. To obtain the statistic for the likelihood ratio test, let the probability of observingxwhen A = & bep,(x;&). Let the maximum likelihood estimate of A under the full model (I ) +Cgiven by 1 and the probability of observing x when A = f bep,(x; $). Then the test statistic $(x) for the likelihood ratio test is constructed by comparing pi(X;A,) with p&; 1) through the ratio
Values OfRi(X) near 1 indicate that the data could likely have come from either model and so the data is unable to discriminate bet Neen the two models. In this case since H, would have a special meaning for us or be a simpler model we would tend to select the model Hi over the full model. We could also say that if R,(X) is near 1, the U-terms omitted from the full mode1 help very little in describing the data and hence may be set equal to zero. If R,(x) is large, then the full model makes the observed x much more probable than it could ever be under model Hi* The data then discriminates between the two models in favour of the first. Or one may say that some or all of the U-terms omitted under Hi appear to be
needed to describe the data and therefore should not be set equal to zero. To help us judge how large R,(x) should be before we say it discriminates between the two models, we consider the distribution of Ri(X) or equivalently of G,(x) = 2 In R,(x) when the hypothesized mode1 Hi is assumed true. Under fairly wide conditions, it can be shown that if each element of i is = > 5, the quantity2 G,(x) is approximately distributed chi-square with N*--_v dcgrecs of freedom. Where c(’is the number of linearly independent paramctcrs for the full model and J’ is the number of linearly independent parameters for the hypothcsizcd model /I,. Thcrcforc if we wish to test if Ri(x) cannot discriminate bctwccn the two models, or the observed data is consistent with the hypothcsi;?cd model N, WCcan perform a test of signific;lnce with respect to the chi-square distribution: ix., dctcrmine the Pr {;l.&, 2_ G(x)] = 2, it’ x is small, the observed data is inconsistent with the hypothesis at the level CL if we have two hypothesized models II, and H2 where the U-terms of Hz include the U-tc’rms of II,, then R,(x) and G,(x) can be partitioned similar to the distribution
162
J.B. Whitney and B.N. Boots, An exarrlitratiort of residerrtial mobility -_
the sum of squares in the usual analysis of variance model. To see this we write
or R,(x) =
P2W R,) R2(X)A& 2,)
In terms of the G(x) functions we get
G(x) = 2{lv,(x; i2)-lnpl(x;
&)}+G,(x),
so that
G(x) - G,(x) =
2(ln
p2(x; i2) -In
pl(x;
I,)),
We can also take R&x) or GJx) as a measure of how discrepant model Hi is with the full model. Then G,(x), for instance, can be expressed in terms of the amount by which the discrepancy with the full model is decreased, namely 2(ln p2(x; X,) -In pl(x; &)} = G,(x)-G,(X), by using the additional parameters that are needed to form model & and the discrepancy that still exists when model I& is used. If pz(x; &)lpl(x; &) is near 1 or G,(x)-G,(x) is near 0 the amount of decrease in the discrepancy is very little and the data cannot discriminate very well betwen the two models. Hence the additional U-terms needed to form model Hz from model I& seem to be of little USCand may be considered as having the value zero. Under the assumption that model H, is the correct model it can be argued that Gl(x)-GJxj is distributed x2 with y2 -yl degrees of freedom; yi is the number of parameters in model Hi. Again if we find that the probability of observing a & +, variate greater than our observed G,(x)-G,(x) is small, i.e., PI={&,, 2 G,(s)- G,(s)} = 01,OLsmall, we would suspect the hypothesis that model Ii1 is true. It would then bc reasonable to consider model H2 as a possible model to describe the dntn, Model II? could be tested in a similar way against a model which includes the U-terms 01 H2 and so on. This procedure is described in detail in Goodnxrn (1971) iIIIJ is a method which helps in choosing an uppropriatc log-linwr IWJCI. This method will be used in part II. To use the above proccdurcs wc IIW~ to krmw the number of parameters in a given model. The ~ncthod of counting the pwmctcrs is somewhat similar to the met&d used in analysis of variaucc it,lJ is the swx regardless of whether the Mantel or Birch restrictions are placed on the para-
meters. The Mantel and Birch restrictions on the parameters arc such that: (i)
The number of parameters (inciutling the constant term) in the full model equals the number of cells in the iontingency table.
1. B. H ‘ilitney and B.N. Boots, An cxamhatiorr of residential mobility
163
(ii) The main effect corresponding to a given characteristic or a dimension of the table has one less parameter than the number of categories for the characterist ic. (iii) For the interaction U-terms, first note the characteristics or dimensions involved in the given interaction. Then the product of the number of parameters in the corresponding
main efl’ects equals the number of parameters associated with the given interaction. For the chi-square goodness-of-fit
A,(x) =
test we calculate
c (+m g- - &m)2 9
qtrm
qrrm
where the &,,rm are the elements of 1, the maximum likelihood estimate of il under the hypothesized model Hi. If the I,,, are 2 5 and the total number of moving households is f;iirly large, the distribution of A(x) is approximated by a chi-square distribution with W--J’ degrees of freedom when the hypothesis Hi is true. The test of significance for Hi is carried out in the usual way. The quantity /Ii(X) does not lend itself to a partitioning similar to that of Ri(x). A third and complementary approach is to examine residuals. To every cell (9. I, I’,1~) of C,134 for a given hypothesis is associated the observed frequency ‘\‘ctf r,n and the expected or fitted value &,,,. In many casts >c,,r,,,will be the muximum likelihood estimate given the hypothesized model. A function &L/w,,,9 &n ) which gives a measure of discrepancy between the two values can bc con4dcrcd as a residual of the ccl1 (9, t, r, 111)for the given hypothesis. The main objectives of the study of residuals are: (i) to look for dcparturcs from the hypothesized model, and (ii) to look for \\h;tt might bc going on beyond what is already in the hypothc\ided model. The dctinition of the residual [the spccifiarbitrary, but to accomplish c:ttion of the function ~(s ,,,,n,, E,,,,J] i‘s scmwhat objccti\scs (i) d (Ii), c~pw;tlly (i), WCshould dctinc the reAn.tals so that their di~tributic)ll is as ntxr ;IS p4iblc to sttmc hnw II tixnl. TIN following definitions ;itc one‘s that would ~3t to bc ;tppropt iatc for the log-linear nwdcl [c’ox ;Ind SitLbll(l!ltjK), ~lilt~~~lll~lll ( 1’)74)]’
W
(.\-‘~~;l,l-(;,,lrrr,y ’ “,, (f(& ,,,,)” (‘.
164
J.B. Whitney and B.N. Boots, An examinationof residential mobility
Each of these asymptotically defines a standard ncrmVll deviate.3 Definitions (a) and (b) are directed more towards objective (ii), while definition (c) is suggested
with objective (i) in mind. Along with a scanning of the residuals, plots of the residuals, such as quantile plots, aid in the interpretation and detection of departures. It may be that the likelihood ratio test or the chi-square test indicates that the hypothesized model is inconsistent with the observed data. An examination of the residuals will reveal those cells which cause the test to indicate inconsistency. If these cells are few in number there *nay be grounds for retaining the model as a tentative one. Even if the model is judged to be consistent by the tests, the residuals should still be studied as an additional check on the adequacy of the model. 3.5. A useful class of models (Hierarchical Mod&) A model is considered to be hierarchical if for any U-term set to zero, any other U-term whose numerical subscript contains the numerical subscripts of the one set to zero, will also be zero. For example, if the UZ3 = 0 then for the model to be hierarchial the terms Ut 23 = U234 = U1234 = 0. Through these models we can try to obtain a suitable model which is simpler than the full model by eliminating in descending order the high-order interaction U-terms. We could also start from the very simple model of HI and build rnorc complex ones by adding higher-order interaction terms. The support given to any of these models can be assessed by the methcd discussed in section 3.4. One of the main difficulties in implementing these procedures is obtnining the maximum likelihood estimate of the 1qtr,,,under a given hypothesized model. The hierarchical models in general will require an iterative procedure to obtain the maximum likelihood estimates. A suggested procedure for this is iterative scaling or iterative proportional fitting [Plackett (1974)]. The nice feature about this procedure is that it is relatively simple and i general programme can be set up which is applicable to all hierarchical mc;lels [Bishop (1969)].’ The iterative scaling procedure produces the best expected value in the IikelihooJ sense, for each cell, given certain marginal tables. The given marginal tables arc those associated with the U-terms that are included in the hypothcsizcd mod& To every U-term there is associated the corresponding subscripted marginal table; associated with the U-term, Ij234 is the tublz CzJa. Because the m&l IS hierarchical, U, 34, being in the model, implies UZj, U24, U2, etc. NT dso in th\: model. This would mean that tables C2 J, CZ4, C,, CC arc given or held f~xcd. But C 234 being fixed implies that these subtablcs are liscd. Hence spccil’ying that ‘These definitions can be further refined to give better approximations to the standard normal deviate. The refinement is mostly an adjustment of the variance. These definitions seem to give reasonable results and are simple to calculate. 4There are a number of programmrs on the market which arc specially designed for this procedure. One of them, CTab, is by Haberman, Universily of Chicago.
thcsc subtAlcs bc fixed along with Czzj \f ould be redundant. hicrnrchical model UC’ look for all the non-redundant fixed proceed to fit the best expectation, Aqrm with those tables spccrfic details. see Bishop (1969) or PIackett (1974). With co\ariates included in the mode}, the nrobabilities
Thus for a given tables and then fixed. For more
dctcrmined are cgnditionA probabilities given the covariatcs. This means that the subtables corresponding to the simuhancous classification of the covariates arc held fixed. This implies that rhc corresponding U-terms must alwqs be included in the hypothesized model. It is a convenient feature of the approach discussLd here that lnbclling a variable as a covariatc only imposes this restriction on model selection. The tests and the methods of obtaining maximum likelihood cstinwtcs rcmnin w-changed. At times if the number in the categories of a characteristic are not fixed a priori it m;ly bc conwnient to blurr the Stinction betw’cen covariatc and response ;I\ it ma>’ bc tfittjcuh to m:lke the distinction until prccisc questions are posed. In such cases wr‘ can carry on as though all characteristics arc response and SW what models emerge. When we do decide lthhat characteristics are csplanatory WC’can then adjust the model if necessary. if the full model is hypothesized then the full table ;s held fixed and the maximum likelihood cstimatcs for the iqtrrnare taken to be observed cell entries \\ith zero entries a-djustcd by adding the quantity f/2. This adjustment is used to overcome some difGcultics that arise 11hen an empty cell occurs. There are many suggestions for overcoming difticulties of empty cells. This adjustment seems to wrk as \wD as any. Hw~ecvcr if the empty ccHs are numerous and the remaining cells ha\c low counls these adjustments can give a distorted picture. At present ?‘wc is wry little theory on how to deal with data of this form. In extreme’ cases tizrc may be SO little information in the data that these statistical techniques c;
JIllOf bc 1lWd.
Whether Birch’s restrictions
arc used the individual C,Wxms are lincar combinations of the In ;Iqtrm.The sanw linear combinations of the muximum likelihood estimates In J,,,,,,, c;tn bc used to obtain the nwximum likelihood
CSlillli~tCS
of’ thC
U-tci
or Mtintel’s restrictions
Ill.
166
J.B. Whitney and B.N. Boots, AN exatniMion of residential mobility
Hence, for any multi-f$tor interaction U-term, say U&V), we can create a standardized variable &(@)/(var fiI &))“‘, which is asymptotically normal with mean 0 and variance 1 if U,,(qt) = 0 for all q and t. The quantity fi12(qt) is the maximum likelihood estimate of U,,(qr) and for the Mantel model with 4’ = t’ = r’ = m’ = 1, &(qt) = In s,,,, -In jlrl, -In ?qltl. The variance of &(qt) is given by
Although the standardized variates are not mutually independent, \je can get some idea of whether a multi-factor U-term is zero simply by inspection of the values of the observed standardized variates. Also the standarJizcd variates which have values of about 2 and over indicate that the correspording U-terms should be in the model which describes the data. For high-dimensional tables, say four or more, this gives us a starting model in the search for a model which fits the data. A further useful aid is to rank the observed standardized variables in order of increasing absolute value and plot them against the quantiles of a half-normal distribution? If all U-terms are zero then the points would lie close to a line of un:t slope. Departure from this line would indicate U-tel ms which may not be,zerc, and should be conridered as candidates for inclusi.\n in the model. When the model is selected, the parameters can again be estimated from the maximum likelihood estimates $,rm fc r Jqlrm.Again the variances of the resulting estimator of the U-terms may be obtained by treating the variates In &,,, as if there were independent with variance l&,,, [Plackett (1974)]. 4. Interpretations 4.1. Interpretationof models Models which contain third-order interactions or three or more second-order interactions are usually very dificult to interpret. The most that can usually be said is that the association structure among the characteristics is complex. In such cases it would be more profitable to find another or finer stratification of the data to obtain simpler models. The most co~nplcx 111oc1c1that will be discussed in this section will have no more thnn OIW won&order il~tc’raclicw term. One of the simplest models is the model HI of swtion 3.3,
The half normal plot was first introduced log-iinear by Cox and Lauh (1967).
by Daniel (15-W) and was lirqt applkd
to the
J. B. Whitney and B.N. Boots, An examination of residential mobility
The model is free of interaction terms and implies that the characteristics not associated with each other. Consider the model
167
are
Since there is no second-order interaction term in this model, the association that exists between any pair of characteristics, say 2 and 3, is the same at every level of characteristic I and 4. The U, 3(qr) term is absent and this implies that characteristics 1 and 3, quality and tenure, are independent at cvcry level of characteristics 1 and 4. Similar statements can be made for the other pairs of characteristics which have first-order interactions absent from the model. There is no interaction term linking characteristic 4 to characteristics I or 2. In such cases the contingency can be collapsed over characteristic 4 and not change the interaction between characteristics 1 and 2 and characteristics 2 am! 3. Therefore we could collapse over the characteristic 2 and, consider a model on the (If, T) space. The U, z(qt) and Uz 3(tr) terms in this new model would be the same as in the original. In general if a characteristic, say j, is not linked by a U-term to a subset S of the remaining characteristics of the contingency table, the table can be collapsed over characteristic j and not change the U-terms involving characteristics that belong to S. If we take the model and include a second-order interaction term, say u, 23(qtr) we get Ill &,,,m = u+ W(1)+ W)+ + 6
3(v)
U,(r)+ V,(m)+
+ u, 309 e + u3 4wo
U,,(q)
+ u, 2 3&W.
The third-order interaction term U, 23(qtr) implies that the first-order interaction of any pair of characteristics obiained from 1,2 or 3 changes for at least one level of the third characteristic in this triplet. The first-order interaction of pairs of characteristics not obtainable from the triplet I, 2 or 3 arc interpreted as in the previous model.
It wm$ lh:tl the nlw;f uwf~~I intcrprc~ationu for the paranwters arc those that GUI he wnncctcd directly 10 the ccl1 Iw~twbilitics l?r,r,nror wll expcctal.ions rwdcl Mith tot21 rlunlkcr of oi~hcrv;ltic)lls tt. pl;,,.,rl = i.q!r/Ii . In :t mulfincmkll I. ,,“““itt~ 3 INI so I?,rm a ml 4,m arc csscnt ially the same‘. This being so wc will make the intcrprctations in terms of the protwbilitics p,,,,,,,. In section 3.3 it was rncntioned that in terms of the expcctation~ or the probabilities it WIS xwned that they depended multiplicativrly on thtl methods of classification. N..ltural
168
J.B. Whitney and B.N. Boots, Au exantilratiollof residential mobilit)
logarithms were taken so that the multiplicative relation could be put into a linear form, a form which more closely resembles analysis of variance and regression. The interpretation of the parameters in this discussion is based on probability ratios or odds and because of this the multiplicative formulation is convenient. Further interpretations given to the parameters of a second-order interaction and parameters of lower relatives is usually quite cumbersome and so our discussion is restricted to no more than first-order intcracxiols. Also for ease of discussion our attention is restricted to a 3-dimensionai conti ,gency table. The fog-linear model containing all first-order interaction terms for a 3dimensional table is
and in multiplicative
form is
where the natural logarithms of a y-term is the corresponding U-term. Tire model with Marltel’s restrictions: In this case the base for comparison is the probability or expectation of a selected cell, say the cell (1, 1, 1) which has q = t=r = 1. This being so it is easily found that y = p,, , , and the main effect terms are given by the probability ratios or odds,
Y&l) = P,ll/hll~ The first-order interaction
yrz(qt)
=
Y2(t)
=
P,tr/P1119
Y3W=
P,lrh'
terms are odds ratios. For instance,
I
!!!c pyll =-, Pm&t1 P1t1 Plll
PqllP111
This odds ratio can be viewed as a comparison of choosing category 9 vcr~u~ category 1 of characteristic 1 at two levels of characteristic 2, t = 1 rmd I, \\ hen characteristic 3 is held fixed at r = 1. If the odds ratio is the same for cvcry v;~lue of t then y12(qt) = 1 for all 9 and t. I II the model where thcrc NC no wcc~~~c;lorder interactions involving characteristics 1 and -9 3 i’l. ,(qr> = 1 I’or all q UN.! f i5 taken to mean that characteristics 1 and 2 arc indepcndcnt. If‘ a similrrr COIW parison is done at different values of I’, i.e., y,,,./~, .r//~,f,l+,rr, m_l this ratio changes for at least one value of I’, then the second-order interaction term y12&trj would be in the model. In the above argument the roles of 9 and t can be reversed and this argument can also be applied to any of the other first-order interaction terms.
J. B. If ’hitnc)* ad
B. .V. Boors. AN mamivatiort
of residential
mobility
169
A f&M approach is through the USCof geometric averages of probability ratios. In this approach a category is chosen from each characteristic and used tls a base for comparison. For the purpose of dkussion we will take ihe first category of each characteristic. The first consideration is the main effect terms. If, for characteristic I we compare the probability of cittegory (1 tcrsus category 1 for a11 levelh of the remaining characteristics we get the set of probability ratios /)qlr/pllr for all t and r. An nverapc of these ratios I\ ill give :HI overall mcasurc for characteristic 1 of whether or not category CJis favourcd over category 1. The ;lvcra_rc that is c~~rnp~Mc with t hc multiplic;it iw mod4 or the log-linear model is the geometric ;\VCf;I$C( fl,/l, pq,,.‘p* J’ i TR. If this gcomctric average is > I ( < i ) then overall I;!I zhwwtcristic 1 catcgcbry q is firi ourcd (is not htoured) m cr c;~tcgo~y 1. This gcomctric average corresponds to a simple function of the U-terms. By taking the natural Iog:withms of the geometric average we get
In terms of the multiplicative model this bccomcs i’,(q)li_y,( 1). Similar results can he obtained for the remaining charactcri4cs. In this way the differences of main 4% terms can be related to overall averages of probability ratios. Also for characteristic 1 we can compute similar geometric averages at each Ict’el of one uf t hc lenxining chrtctcrist its. say chalactcristic 2. This would giw an expression of the form (Ilr~yl~/~~,,~)‘? These quantities can vary considerably as t ranges I ), which can easily be seen to be a through its values. if this is so y,(q)/;‘I( is likely to give a misleading gcomt’tric wcrugc of the quantities (~~~~JJJ,,,)“~, imprcGw
of ho\f clrtcgory q is favourcd with rcspcct to category 1.
170
J.B. Whitney and B.h? Boots, An examinatiorl of residential mobility
In terms of the multiplicative l/R
=
model we get
rdd m’--
Y2W
Y12(&
rdl)
Y2W
Y12U 1)’
(2)
the product of the ratio of main effect terms times the adjustment ktor ~~~(qf)/~~2(11). Part of the average odds of q, t vs. 1, 1 is accounted L-- by the overall odds of q versus q = 1 and .! versus t = 1 of characteristic 1 anI1 2; the product of the ratio of the main effect terms. The adjustment factor or interaction ratio is the amount by which the average odds of q, I vs. I, I deviates from what can be explained by the overall effects of q and t. Additional insight into the interpretation of y,2(q, t) can be g-lined by considering the following. Expression (2) can be written as
The quantities (nr~qrr/plt,)“R and (~&J.&Q~)‘/~ are the conditional geometric averages over characteristic 3 of categories q versus 1 of characteristic 1 given t of characteristic 2 and of categories t versus 1 of characteristic 2 given q of characteristic 1. If the first geometric average has the same value for all t and the second the same value for all 4, then we can write
Now
for all values oft. Make this substitution
avcragc is the same in the above expression and get
I!12(qr)/y, 1( 1, 1) = 1 for all q ..ind t. This implies that ;I&/) = 1 or fir+orJcr U,,(qO = 0 for all q and t. Mencc in a model which 1~s at nest interactions, if the main effects terms yl(q)/yl( 1) a’nd ~~(r)/;*~(l) arc quaI IQ ull their respective conditional gcomotric avcragcs then no intcrwtion ccists between characteristics 1 and 2 and the terms corresponding 10 this irwractiw do not enter the model. In a manner similar to obtaining cond it ion;ll ~wmc‘t ric wcr:y~~ v it 11 iqm~t to the main effect terms, we can find similar qwntitics Uor tlw first-order interaction terms. If these conditional first-order C;.! .Litit\s dilkr fowl lwcl to Icwl then a second-order interaction is present in tl,e model. This type of ~wgu~nc‘~~~ can be extended to the higher inter;lctions of n n~~~!cl.
Therefore
J.B. Whitney md B.,Y. Boots. An examinutiorr of resident&J mobility
171
5. Data requirements 5.1. Amourrt of data required The model selection or determination of relationships among the cell expectations is based upon fitting a contingency table under a given hypothesis and assessingthe fit by means of asymptotic properties of distributions and residuals. For the asymptotic properties to be rcasonablc approximations, the estimated expected value of each cell under the hypothesis should not be too small, say at least 2 5. In hierarchical models a hypothesis fixes certain subtables and it is also a property of these mod& that the full table of the c$timated expected values when collapsed to give tables corresponding to the fixed tables must give
the obserwd fixed table. Hcncc for the asymptotic proper ties to giw reasonable approximations UC need the obscrvcd cell frcqucncies of the fixed tables to be large. It is easily seen that a wry modest number of classifications for an individual moving household in a given cohort will result in a very large number of cells in the full contingency table. Hence, to obtain Inrgc enough ceil frequencies for the fixed subtables of hypothesized models with some complexity, w will need a large number of moiing households in a given cohort. This must be kept in mind when designing the a priori spaces and when selecting and categorizing the covariates. Howcvcr, the analysis of each cohort in isolation from other cohorts helps to keep the number of cells in the full contingency table under control as the a priori spacc~ and the cmwirttes can bc tailored to the particular cohort.
172
J.B. Whitney and B.N. Boots, An exarnirtation of residential ntobi!itJ
(i)
The stochastic or response component of the model is a multinomid distribution or a product of multinominl distributions. (ii) The deterministic component cxprcs~cs the probabilities /I~,~,,,as a multiplicative function of the response and explanatory variables. When viewed in this context, the log-linear model is a sub-model, but an important and versatile sub-model of the very genera’ * :mi-Markov Model discussed by Ginsberg (1971, 1972a, 1972b). The I(,, ‘inear model can be gcneralizcd to include continuous explanatory variables such as time: see Cox ( ’ 370) and Plackett ( 1974). Generally for these cases the iterative prop01 .:.~~al fitting procedure for obtaining maximum likelihood estimates is I ot applicable and an iterative procedure such as the Newton-Raphson procedure is needed. As mentioned before the methods dcscribcd in this paper arc mainly Jircctcd towards providing a method of assessing whether or not various associations exist and hence the selection of a suitable model, to express these associations. The rnbdels that we are considering may be thought of as Markov type models in that they only depend on where the household was immediately before the move. The systems that we look at my be stable with respect to the model, in that the model we select for a particular cohoit may be suitable for a number of future time periods, even though the probabilities ;;ssociated \\ ith the mode’ may change. If we were able to trace a moving l~ouscl~old through successive time periods we could investigate the abov ‘2 statement by obtaining a suitable model or set of models for each time ptxiod. We would cvcn go further znd add lime as an additional dimension to the contingency table and then starch for II suitable set of models. These models could indic;lte if the proccbs is Mark~v A IIIC)I-c detaild dixusbic~rl 0f this is or if the process is tinle ~~II~~~~II~cYI~. given in Bishop et al. (1975).
Rcfcnences Berry, B.J.L., 1971, Comparative factori,.tl ecology, Econontic Gcogr;rpl~ 47 (~upplcment, special edit ion). . Birch, M.W., 1963, Maximum likelihood in tht’cc-i\ay contin~wtcy t,!blcs, Joumll cd’ the Royal Statistical Society B 25, 220--233. Bishop, Y.M.M., 1969, I-_111contingency tabs, logits, WCI split contingency talks, Btornctrics 25, 383-399. Bishop, Y.M.M., SE. Fienberg and P.W. Holland, 19X, IXwctc multivrrri:rtc itn;tlyG: Theory and practice (M.I.T. Press, Cambridge, MA). Cox, D.R., 1970, The analysis of binary data (h!cthucn, Lonriw). Cox, D.R., 1972, The analysis of multivarintc binary dirttt, Jou~ml 01’ the Royal St;rti\tic;ll Society C 2 I, 113-l 20. COX,D.R. and E. Lauh, 1967, A note on the graphical analysis of rnulti~lilllcn\i~~r~;~i cor~tIng:cIIc’v tables, Technometrics 9, 481-488. Cox, D.R. and E.J. Snell, 1968, A general definition of residuals (with discussion), Journal of the Royal Statistical Society B 30, 248-275.
Daniei. C.. 1959, USC of half normal plot< in interpreting factoriai tivo-klcl experiments, Tcchncwwrricr I. 3 I 1-3-l I. Da~ix J.A.. 1074. Hierarchical mockI\ ior +niticancc tr’stb in multidimen4on.rl contingency tah!cx: An cxcg~‘si\ of GwLm~n’s rcccnt pqxr~, in: H. Cwtner, ed.. Sociological methodology (Joswy-lIa\s, San Francisco. Cj\). Fishelton, G., 1975, Hcwchnld’t loc,:tion in an urban area: An cutenqion of the traditional modci. Socio-econ~mic Pbnning Scicnccs 9, IX- 13. Girwbcrg, Ralph R.. 1971, Semi-hlatlcov proccsxs and mobility, hlnthematical Sociology 1, 233 -2Q. Ginckwg. Ralph R.. IWa. Critique of protxrbilitic; modcl~: Application of the wmi-Mnrkov model to migration, Jblathcmatical Sociology 2. 63-82. Ginsberg, Ralph R.. 1972h. Incorporaticg caw;~l ~, of dichotunwub ~clr~~ble\, Amcrlc;in Sociological Rc\iew 37, 2X-46. Hnkwman, S.3.. 1974, The analysis of frequency data (Cjnivcrsity of Chicago Press, Chicago, IL). Mantel, N., 1966. Mom& for complex contingency tables and polychotomuus dosage response cur\ es. Riometrics 22, 83-95. Murdic, R.A., 1969, The factorial ecology of metropolitlrn Toronto, 1931 -1961: An essay on the wcial geography ol‘ the city. Dcp~rtrncnt of Geography Research Paper no. I16 (Llnbersity of C’hic.lgo, C‘hicago. IL). Ncldcr, J.A. and R.1V.M. b’cddtxhurn. 1972, Generdid linear models, Journal of the Royal Stntktical Society 135. 370 ZW. Pich~;~nce. (‘.(;., 1972, Life-cycle, howing tenure and isrra-urban redcntial mobility: A c;ru~;~lmodel. I hc Socl~~logicrtl Re\iw 21 ‘70 297. A path analytic Pick\;Incc, (’ Ci., 197-l. I ifc-c\sclc.hou4nt: tGurc illld rc\iL!CntiJl mobility: ;1,7pr0xh. l’rhdn StutlicN I I. I71 I#. I’l;dctt. R.L., 1974, 7 hc anal\& of c.ltcgoric;ll data ((‘harks Gritlin, London). Rccs, P.11.. 1970, l’hc l‘.:c’tori,lI CCC&~::~of nwtr~qwlktn (‘hic;lgo. in: H.J.L. Hcrr~ and f-‘.W. tltx::w. c,!\.. (ie~~~r,!phic,tl pc’rtpccti\cz on urban \yktt’mb ( P~CHIICC-t lal!, I ,~glt’~~~u.l C’lrll’\. KJ,. !%mn;on~. .I.13 . 1074, P,ttterns of rccidcntiai movcmcnt in Metropolitan Toronto, Department of Gwgr;iphy Rcw,trch Pubiicntion III). I3 (1 ‘ni\&ty of Torontrr F:c‘t<, Toronto). Ihcil, H.. l\IiO, On the c~tim;1tton of rcl;ltrcw\hrp t involking qwlit;lti\c \dttdcs, Amcrkxn .lourn.ll crf Swidog,y 70, 103 I%. 1~~71. 7’tic urban mowi~: Tw;trds ;t tlwor~ of rc\idcntid clilfcrct~ti~tion Timnb. D.W.G.. (~‘illl~lV%lgC Unibcrdj Prcw, (‘;rmbridgc). k’wtc\, h?.. lV7L, ilic congrucncc IWbbc‘cn hewing \p;Icr’, wcidl 4pilCl: ;tnci conlmutlity \paCC itIlL wmc c~.pcrinlcnls wn~crntr~g irk impliCdtion~, I’m irk~nnic‘nt and Planning 3, 397414.