Computers, Environment and Urban Systems 27 (2003) 53–70 www.elsevier.com/locate/compenvurbsys
Implementing spatial segregation measures in GIS David W.S. Wong* Department of Geography, George Mason University, Fairfax, VA 22030, USA Received 11 October 2000; received in revised form 15 June 2001
Abstract Segregation can be regarded as the spatial separation of different population groups. Most studies of segregation so far have relied on aspatial measures that are not effective to differentiate different spatial distribution patterns among population groups. Spatial measures have been introduced but have not been adopted widely because they are difficult to compute even with Geographic Information Systems (GIS). However, it is still logical to employ GIS technology to compute spatial segregation measures. A previous implementation attempt was successful, but inefficient and suffered from several short comings. This paper describes a recent effort to implement a set of spatial segregation measures in a GIS environment. An integrative approach is adopted to take advantage of various recent advances in GIS technology. Algorithms designed to implement spatial segregation measures consist of the general procedures to extract spatial information from feature data and to combine spatial information with attribute (population) data to derive the indices. Different types of spatial information include geometric characteristics and spatial relationship of the areal units, and the location information of the population. Using ArcView GIS and its Avenue programming tool as an application example, computations of various spatial segregation indices were implemented as additional functions in an ArcView user interface. # 2002 Elsevier Science Ltd. All rights reserved. Keywords: Spatial segregation measures; Spatial information; ArcView GIS
* Tel.: +1-703-993-1212; fax: +1-703-993-1216. E-mail address:
[email protected] (D.W.S. Wong). 0198-9715/03/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved. PII: S0198-9715(01)00018-7
54
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
1. Introduction Measuring the level of segregation among population groups effectively has been one of the central issues in segregation and population studies. Researchers have frequently used the dissimilarity index D advocated by Duncan and Duncan (1955) to indicate the level of segregation. Even though many doubts were shed on the ability of the D index to indicate the level of segregation, it has been reaffirmed to be the most useful index for the evenness dimension of segregation (Massey & Denton, 1988). Nevertheless, a series of studies has revealed the inadequacies of the dissimilarity index from the spatial perspective and a set of spatial segregation indices has been proposed during the past decade. These segregation measures are regarded as spatial either because they explicitly utilize geographical information in their formulations such that the results will change if the locations of population groups have changed, or the spatial interaction among population groups across areal unit boundaries is accounted for in determining the level of segregation. Unfortunately, almost a decade after some of these measures were first introduced, very few studies have employed these measures. Part of the reason that spatial measures have not been widely adopted is because these measures are difficult to compute while the aspatial measures are convenient and easy to use. Traditional segregation measures such as the D index or the entropy-based diversity index can easily be computed using generic software packages such as spreadsheet and database programs. These generic computing tools are more than adequate to compute aspatial indices even if the study involves a geographically extensive area and thus a large data set. But to compute even the simplest spatial index, certain types of geographic information are required. If the study area involves only a few subunits, the geographic information may be derived manually. But for realistic studies with reasonable size data sets, any type of geographic information has to be derived in an automatic fashion and the use of Geographic Information Systems (GIS) becomes necessary. Unfortunately, the development of GIS technology has not matured enough to the level such that GIS users can execute standard built-in functions to compute spatial segregation indices. Thus, these spatial measures are still not accessible to most researchers and practitioners. A short research note (Wong, 2002b) has announced the completion of a recent effort to develop tools in GIS to compute spatial segregation measures. The resulting tools are available to the public (http://geog.gmu.edu/seg). However, that research note does not provide a detailed review of the spatial measures or discuss the methodological issues in GIS related to the development of the tools. Therefore, the current paper focuses on approaches to implement the set of spatial segregation measures in GIS environments. Algorithms were designed to derive spatial information from feature data. The spatial information is then combined with population attribute data to compute spatial segregation indices. These algorithms are much more efficient than the relatively primitive approach adopted in the past (Wong & Chong, 1998), and exploit the advanced GIS technology, such as the object-oriented environment, efficient spatial indexing methods to support spatial queries, and simpler spatial data formats. The current implementation also includes multi-group measures, which were
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
55
not implemented in the previous effort. Calculations of spatial indices are structured as additional GIS functions in ArcView—one of the most popular GIS. The next section provides an overview of the set of spatial segregation measures. These measures can handle the traditional two-group settings and the multi-group settings. Interested readers are encouraged to refer to the original publications for detailed discussions. The third section systematically categorizes and discusses different types of spatial information required for the computation of spatial segregation indices. In the fourth section, the previous approach to implement spatial measures is critically reviewed and a new and improved approach is presented. The fifth section discusses an effort using ArcView GIS as an application example to implement the set of spatial indices. It is followed by an illustration using simulated landscapes and an empirical example to show how these measures can be used in GIS.
2. Spatial segregation measures Numerous methods have been proposed to measure segregation. Various methods or approaches are based upon different conceptualizations of the segregation phenomenon (Kaplan & Holloway, 1998). The spatial assimilation concept is still central to segregation studies (Massey, 1985), and the evenness of racial distribution as manifested through residential pattern is believed to be the most important aspect. Therefore, most of the discussions below revolve around the spatial evenness of population distribution in measuring segregation. 2.1. Two-group measures The dissimilarity index D was advocated by Duncan and Duncan (1955) to reflect the level of segregation between two population groups. This index is easy to compute and has intuitive interpretations favored by many sociologists and population researchers. Therefore, it has been used extensively in the past several decades. Specifically, it is defined as 1 X wi bi D¼ ð1Þ W B 2 i
where bi and wi are black and white population counts in areal unit i, and B and W are the total black and white population counts of the entire study region, respectively, using the traditional two-group black–white setting. D ranges from 0 to 1, indicating no segregation to perfect segregation, respectively. The index has received very strong endorsements from recent findings that it is very effective to capture the evenness dimension, which is the most important dimension in measuring segregation (Massey & Denton, 1988). Since 1991, a series of publications has revealed a major deficiency of the dissimilarity index. It is true that D is effective to capture the evenness
56
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
of population, but only to the extent that the spatial arrangement of population is not considered. As long as each areal unit within the study area is dominated by one group or the other exclusively, the D index will return a ‘‘1’’, indicating a perfectly segregated situation, even if some adjacent areal units are occupied by different groups. Population groups in different areal units, even if the units are adjacent to each other, cannot interact across unit boundaries to lower the level of segregation. Alternative spatial segregation measures have been introduced. So far, most of their formulations were based upon Newby’s argument that segregation involves spatial separation of population groups (Newby, 1982), and spatial separation reduces interaction among groups over space. Therefore, different population groups locating next to each other, even if they are in different areal units, should have a relatively low level of segregation. Most spatial measures model segregation by allowing population groups in neighboring units to interact. The D(adj) index introduced by Morrill (1991) is the original dissimilarity index less the amount of potential interaction between different groups across areal unit boundaries. The level of potential interaction between any pair of neighboring units is then determined by the differences in the racial mixes of neighboring units. Formally, PP cij zi zj i j PP DðadjÞ ¼ D ð2Þ cij i
j
where D is defined as before, zi and zj are the proportions of minority (or majority) between areal units i and j, while cij will be zero if i and j are not neighbors, and one if they are. The D(adj) measure was further modified by Wong (1993) in several directions. Based upon the premise that the intensity of interactions across a boundary is not a simple function of adjacency, but likely the length of the shared boundary, the D(adj) index is slightly rewritten to incorporate a boundary-length component to moderate the interactions across areal units. This index, labeled as D(w), is defined as DðwÞ ¼ D
1 XX wij zi zj 2 i j
ð3Þ
where all terms are defined as before, and dij wij ¼ P dij
ð4Þ
j
In Eq. (4), dij is the length of the shared boundary between areal units i and j, and the denominator is basically the total length of the boundary for areal unit i. Furthermore, Wong (1993) argued that the intensity of interactions between areal units
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
57
is also a function of the size and shape, or the compactness of the two adjacent areal units. To incorporate the geometric characteristics of areal units into the evaluation of segregation, a compactness measure based upon the perimeter-area ratio was used. Eq. (3) is further modified to formulate the index D(s), which is 1=2 ðPi =Ai Þ þ Pj =Aj 1 XX D ð sÞ ¼ D ð5Þ wij zi zj 2 i j MAXðP=AÞ where Pi =Ai is the perimeter-area ratio for areal unit i, and MAX(P=A) is the maximum perimeter-area ratio among all the areal units in the study region. 2.2. Multi-group measures All the measures described above are for comparing two population groups, but today’s societies are more likely to be multiethnic. For multi-group comparisons, the two-group measures can still be used by comparing all possible pairs of population groups (Morrill, 1995). However, the results are not comprehensive and multiple groups cannot be treated simultaneously. Based upon the concept of the dissimilarity index D, Morgan (1975) and later Sakoda (1981) introduced a multigroup version of D. Specifically, it is defined as PP Nij Eij 1 i j D ðm Þ ¼ P ð6Þ 2 NP:j 1 P:j j
where Eij ¼
Ni Nj N
ð7Þ
In Eqs. (6) and (7), Nij is the population count of the jth population group in areal unit i, Ni is the total population regardless of groups in areal unit i, Nj is the total population of group j in the entire study region, N is the total population in the entire region, and P.j is the proportion of population in group j. The interpretation of D(m) is the same as D. This multi-group measure of segregation can accommodate more than two groups, but shares the same limitations with the dissimilarity index D. That is, the measure is aspatial and rearranging populations among areal units will not change the overall level of segregation. A spatial version SD(m) was proposed based upon the concept of composite population counts (Wong, 1998). The composite population count of areal unit i for group j is defined as X CNij ¼ d Nkj ð8Þ k
58
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
where d(.) is a function defining the neighborhood of i. The premise is that within this neighborhood of i, people belong to different ethnic groups can interact as if they are in unit i. The subscript k refers to areal unit within the study region and k should include i. After the composite population counts for all areal units are computed, they are used to calculate D(m) as if they are the original population counts. Therefore, D(m) and SD(m) have the same mathematical properties. Using the composite population counts in SD(m) accommodates the interactions of population groups within the neighborhood, which are not accounted for in the conceptualization of D(m). All measures mentioned so far rely on the concept of dissimilarity among population groups. Another spatial segregation measure for multi-group is based upon explicit spatial dissimilarity, or the concept of spatial congruence. If different groups do not have similar distribution patterns, they are likely separated in a spatial sense. To capture the overall spatial distribution characteristics (overall location, dispersion and orientation) of each population group, a standard deviational ellipse can be used to fit the locations of people in that group. After multiple ellipses are derived for different groups, they are then compared and combined to derive an index of segregation based upon the ratio of the intersection and union of all ellipses (Wong, 1999). Specifically, the index is defined as T T T E1 E2 E3 . . . En S¼1 S S S ð9Þ E1 E2 E3 . . . En where Ei is the deviational ellipse describing the distribution of population group i. In general, a clustered distribution will yield a smaller ellipse and a more dispersed distribution will generate a larger ellipse with the rotation of the ellipse indicating the orientation of the distribution. However, one should recognize that all spatial measures discussed above, including the ellipse-based measure, are summary measures of the entire study region. Different levels of segregation at the local scale may not be distinguishable by the summary index values representing the overall regional situations. Besides the geographical variability of segregation, there are other specific spatial issues related to the use of segregation measures, such as the modifiable areal unit problem (Wong, 1998). However, they are beyond the scope of this paper. There are also many broad issues related to the evaluation of segregation level, including the long-term debate about the effectiveness of different measures in capturing different dimensions or aspects of segregation. For instance, using measurements related to the concentration dimension, Poulsen and Johnston (2000) demonstrated that ghetto-like environments were not characteristics among areas with a high minority concentration in Australia. To date, Massey and Denton’s review (1988) is still the most comprehensive survey on segregation measurement issues, and the evenness dimension is believed to be the most important. There are additional issues on the use of spatial segregation indices, such as their statistical properties, even though all measures discussed so far range from zero to one. Their effectiveness as compared to aspatial segregation measures in real world applications
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
59
is definitely an interesting research question that has yet to be answered. Some of the issues can be investigated with analytical tools and methods, however, empirical studies are needed to explore the usefulness of these spatial indices. Unfortunately, computing any of these spatial measures is not an easy task. Computation tools are needed to facilitate the use of these indices in empirical studies. The rest of this paper discusses the development of these tools.
3. Spatial information and GIS Traditional measures of segregation mostly require attribute or population data only for their computation procedures (Kaplan & Holloway, 1998). No cartographic or feature data describing the geographical and geometrical characteristics of the features are needed. Therefore, their calculations are rather straightforward with today’s spreadsheet or database programs and no GIS are required. However, the advantages of using GIS in segregation studies are well documented (Wong, 1996). A significant strength of GIS highlighted by many GIS proponents is their spatial analytical capability. GIS can assist any analysis in three broad categories: analyze attribute data without incorporating any geographic characteristic of the features, analyze only the geographic information without referencing to the attribute data, and analyze the spatial features by combining the geographical information and the attribute data (Goodchild, 1987; Laurini & Thompson, 1992). Computing spatial segregation measures belongs to the last category of analysis in GIS because both the population data and their spatial characteristics are needed in the processes. All reviewed spatial measures that were proposed to improve traditional segregation measures share a common characteristic: a certain type of spatial information is required in the computation. However, to enable this type of analysis, spatial information has to be extracted from spatial data in order to be combined with population data. GIS are critical to this type of analysis because they can provide pertinent spatial information required for the computation processes. In addition, computations of certain spatial measures such as the ellipse-based S index rely on the manipulation of geographical features. Most spatial data sets required for population analyses are disseminated in GIS formats. To handle and manipulate these data sets and to facilitate spatial segregation analysis, GIS are logical and necessary tools. Table 1 lists various types of spatial information that are required by different spatial segregation measures reviewed so far. Spatial information involved can be classified into three categories: spatial relations, geometric characteristics, and location (Wong, 2000). Because several spatial measures (all spatial variations of D and its spatial multi-group version) rely on the premise that interactions across neighboring units are important in measuring segregation, therefore, the adjacency or neighborhood information is used in these spatial measures. In the spatial versions of the two-group measures, adjacency is used to compare the differences in racial mixes of neighboring units. The spatial version of the multi-group index uses adjacency information to identify the neighborhood from which the composite
60
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
Table 1 Types of spatial information required by spatial segregation indices Types of spatial information
Spatial relations Geometric characteristics Locational
Specific metrics
Adjacency Distance Area, perimeter Length of boundary Location
Spatial Segregation Measures D(adj)
D(w)
D(s)
SD(m)
X
X
X
X X
X
X X
S
X
population counts are derived. However, adjacency is a limiting case in defining a neighborhood. A more flexible way to define a neighborhood is based upon the distance measure. This flexibility is accommodated in the formulation of SD(m) in Eq. (8), and therefore distances between areal units may be required to determine neighborhoods. To compute D(w) and D(s), geometric characteristics of the areal units are needed in addition to the adjacency or neighborhood information (Table 1). Because D(w) takes into consideration how the length of a shared boundary can affect crossunit interactions, the length of the common boundary of each pair of neighboring areal units is required. For D(s), an additional consideration is the compactness of areal units. The compactness measure adopted is based upon perimeter and area of the unit. Therefore, these two geometric characteristics of the areal units have to be derived or computed from the feature database through GIS. Finally, to fit an ellipse to the spatial distribution of an ethnic group, the locations of people in terms of x–y coordinates are explicitly utilized in the computation process. The information of where people are located has to be obtained from the feature database through GIS. Depending on the data structure in which the population distribution information is captured, the x–y coordinates can be the actual locations of individuals, or the centroids of areal units if the population data are aggregated to areal units as polygons in the vector data format.
4. Approaches implementing spatial segregation measures 4.1. A loose-coupling approach Unfortunately, the current development of GIS is not mature enough to the extent that novice users can easily implement these spatial measures within existing GIS environments. Depending on the spatial data structure, different types of spatial information described in Table 1 maybe captured by the feature data, but are not readily available to the users for analysis. Most GIS do not have existing or built-in functions to extract various types of spatial information to the formats that may allow users to decide how the information can be combined with attribute data for
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
61
analysis. Extensive programming efforts are required to extract or compute various spatial information. The results can then be stored in formats that are easily accessible by the users. Therefore, one approach to implement spatial segregation measures in GIS is to compute and store various types of spatial information explicitly, and users are empowered to define how spatial information and attributes can be combined or analyzed separately. Wong and Chong (1998) adopted this two-step approach to implement the set of two-group spatial segregation measures, D(adj), D(w) and D(s), but not other multi-group spatial measures. This approach is generally regarded as the loose-coupling approach because two types of packages (a GIS package and another package for handling the mathematical or matrix operations) are loosely coupled together to accomplish the entire procedure. First part of the two-step approach involves the extraction of geographical information from feature data. Borrowing ideas from Ding and Fotheringham (1992), spatial adjacency relationship was extracted from an internal file in the GIS package (arc attribute table of the ARC/INFO coverage) and was explicitly recorded in an adjacency matrix. Geometric attributes (area and perimeter) of enumeration units were extracted from the GIS database (polygon attribute table in the ARC/INFO coverage) and were written out into a separate file. Population attributes of enumeration units are also copied to a separate file. In the second step of the process, users can specify how different types of spatial information and attributes can be combined together. This sub-process is enabled by the use of a powerful statistical and mathematical package (S-plus). Users can manipulate various matrices derived from files storing spatial information and attribute data. By performing various matrix algebraic operations, spatial segregation measures were computed. This two-step loose-coupling approach provides a flexible setting for spatial analysis and modeling in the sense that users have great latitude to specify the relationship between spatial information and attribute data. Users can explore different formulations by specifying different algebraic operations. Nevertheless, this loosecoupling approach has several major shortcomings. From the user’s perspective, this approach is rather costly in a financial sense that users have to acquire two different packages (a GIS package and a statistical/mathematical modeling package). Users also have to learn two packages. In addition, users have to be well versed in matrix algebra in order to manipulate those matrices correctly and successfully. The general public probably will not have access to this type of computation environment. Another potential problem with this approach is that when the study area involves a large number of areal units, the matrices recording the spatial information and attribute data may be too large to be handled efficiently. The loose-coupling approach can also be regarded as a database management approach because it relies on processes extracting data from spatial databases and manipulating the extracted information. No GIS operation is required and practically GIS need not be used. The approach also relies on the older topological spatial data format in which adjacency information is explicitly recorded. Geometric measures of features are readily available in most cases because they are recorded as feature attributes. On the other hand, with the advances in GIS technology, the
62
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
loose-coupling method may not be efficient and effective anymore. A new approach based upon current GIS technology to implement spatial segregation measures is warranted. 4.2. An integrated approach Most newer GIS share several characteristics that are important to the implementation of spatial segregation measures. These characteristics include the objectoriented environment, efficient spatial query capability through advanced spatial indexing systems, friendly and flexible user interfaces, and powerful programming tools. In addition, the improvement in hardware performance due to the advancement in computer engineering makes some of the intensive computation processes to be executed efficiently. All these enable a more integrated approach to implement spatial segregation indices. All spatial segregation indices mentioned above, except one, modify the aspatial index of dissimilarity, which can be computed without any GIS-specific operation. Therefore, before executing any spatial operation in GIS, basic data manipulation functions and procedures are used to compute the D index first. Remaining steps have to rely on spatial operations in GIS. Table 1 indicates that adjacency is a common type of spatial relation among most spatial segregation measures. Adjacency can be regarded as a limiting case of spatial relation based upon distance. If two areal units are adjacent to each other, the distance between their nearest parts is zero. Therefore, both the adjacency relation and distance can be enumerated with the same type of spatial query and the definition of neighbors is no longer limited to adjacency. Previous efforts in depicting spatial relationship have been very much dominated by the work in spatial statistics and spatial econometrics (Anselin, 1988; Griffith, 1988). Spatial weights matrices, which record the spatial relationship of geographical features, are created first, and then the computation of statistics using the spatial weights matrices and attributes follows. But with powerful spatial query functions in GIS and advanced spatial indexing systems, spatial queries can be executed very efficiently even with large spatial databases. Therefore, algorithms designed to compute spatial segregation indices start with the extraction of spatial relation through spatial queries. Neighbors of a given areal unit can be identified by selecting areal units touching the unit. If the neighborhood is defined with a distance function, the same spatial selection process can be performed with the distance parameter. After the spatial relation (either adjacency or neighborhood as defined by a given distance) is enumerated, population data of the given areal unit and its neighboring units are also extracted and are immediately combined with spatial relation information to compute certain elements of the spatial segregation indices. These procedures offer significant improvements in efficiency over the previous approach. They are not dependent upon a specific data format (topological) as the previous approach is, and even the simplest and more compact ‘‘spaghetti’’ data format can work well with the current approach. Even though, the adjacency information may not be recorded explicitly in the spatial databases, the information can
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
63
be derived through spatial queries ‘‘on the fly’’. The search for neighbors is an efficient spatial search process with the help of efficient spatial indexing systems instead of a laborious database search for left and right polygons. Besides the spatial relation information, certain spatial segregation indices [D(w) and D(s)] require geometric parameters of areal units in the computation. In an object-oriented GIS environment, some of those geometric characteristics, such as area and perimeter, are stored as attributes of the geographical features. If not, they can be easily computed from the feature data. When neighborhood units of a given area are selected, their geometric parameters are also obtained, and can then be used in the computation immediately, such as the calculation of the compactness index. However, depending on the specific spatial data model adopted, the length of the shared boundary between any pair of areal units may not be stored as a feature attribute as found in the arc-node topological data model (Burrough & McDonnell, 1998). The boundary-length parameter may have to be derived using built-in GIS functions. For data stored in non-topological formats, a general method to compute the length of the shared boundary is to identify the intersection of the neighboring units, and then compute the length of the intersected segment. After the geometric parameters are derived or computed, they are combined with population data and spatial relation information to calculate the values for spatial segregation measures. Fig. 1 summarizes the major steps of the general algorithmic design for calculating spatial segregation measures. For the ellipse-based measure, location of people is the only spatial information required. Therefore, after the location information (in terms of x–y coordinates) of all enumeration units is extracted as a feature attribute, the coordinate information is combined with population count data to derive parameters specifying the ellipse. Using these parameters, an ellipse object is constructed. The same process is repeated for each population group. After all ellipses are derived, they are converted into polygon features such that polygon overlay operations are performed on them to obtain the intersection and union of all ellipses (Chrisman, 1997). The result, which can be visually represented by the polygons of intersection and union, can be displayed to the analyst for evaluation. The geometric characteristics of the ellipse object, which is among several different geometric objects supported by many object-oriented GIS environments, can be specified by a few parameters. The ellipse object is tremendously useful and important in computing the S index. Without the object-oriented environment, it will be very tedious, though still feasible, to create ellipse-shape polygons to derive the S index.
5. An example of implementation using ArcView The general algorithm for the spatial variations of D described in Fig. 1 is implemented in several slightly different manners in ArcView GIS with adjustments specific to different spatial segregation indices. ArcView is chosen as an illustrative example because of several reasons. It is one of the most popular desktop commercial GIS in an object-oriented environment. Its spatial indexing system is relatively
64
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
Fig. 1. Algorithms for implementing spatial segregation indices.
sophisticated to enable efficient spatial querying even if the spatial database is relatively large. Its programming environment with Avenue scripts is reasonably powerful with a user-friendly and flexible graphic user interface. In this example, Avenue scripts were written to implement four specific algorithms: the aspatial segregation measures D and D(m), the spatial variation of D for two groups, and the spatial version of the multi-group D(m), and the ellipse-based S index. Aspatial measures [D and D(m)] are included for comparison purposes. When D and D(m) are computed, no GIS-specific functions are required. The operations involved are algebraic and database-related functions. When more than two population groups are selected, D(m) will be calculated instead of D. Because D(s) is built upon D(w), and D(w) is built upon D(adj), therefore, it is logical to implement all these 2-group spatial measures in one program and these values are computed sequentially. A neighborhood of a given areal unit is identified by selecting areal units within zero distance unit. The length of the shared boundary is obtained in two steps. First, a line feature is created by intersecting the two neighboring
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
65
polygons. Then an Avenue request is sent to the newly created line feature to obtain the length. Area and perimeter values are obtained by sending appropriate Avenue requests to the polygon features. The multi-group spatial index [SD(m)] was implemented separately partly because it deals with more than two population groups and partly because the composite population counts have to be derived. The Avenue script for SD(m) identifies neighboring units with the same method as in the two-group spatial indices. The ellipse-based measure, as indicated in Fig. 1, has a different algorithm. Appropriate Avenue requests are sent to the polygon objects to obtain their x–y coordinates. After the ellipse parameters (angle of rotation, deviation along the xaxis and deviation along the y-axis) are derived, they are used to create an ellipse object in ArcView. However, to facilitate later steps in the computation, all ellipses are converted into polygon features. Then overlay operations are applied to them to obtain the union and intersection of the ellipses. The algorithm also allows users to save all ellipses, their intersection and union as new polygon features in shapefiles, the native non-topological file format for ArcView GIS. Then the polygons showing the ellipses, intersection and union can be brought back into ArcView for visual examination. These four slightly different algorithms were implemented in four Avenue scripts and each of them is attached to a menu item in the modified graphic user interface of ArcView. Fig. 2 shows the user interface with the new menu (‘‘Segregation’’) and the four menu items corresponding to the four major computation procedures. These new features are stored in an ArcView project file that can be download freely from the author’s website (http://geog.gmu.edu/seg). In order to utilize these add-on functions in ArcView, all users needed to do is to add data themes into the view document and to select the desirable indices for analysis. Users interested in altering the original scripts to adjust for specific situations can request the scripts from the author directly.
6. Simulated landscapes and an empirical example To demonstrate how these tools can be used, a set of five simulated spatial configurations is shown in Fig. 3. These five configurations are arranged into two groups. The first group (a–c) is for the two-group situations. The other two (d–e) are for multi-group situations. Configuration (a) is a checkerboard-like system, while in configuration (b), people in one group are all concentrated in one quadrant. Configuration (c) is similar to (b), but with only four large areal units instead of 100. The D index and its spatial versions are derived for the four configurations and the results are reported in Fig. 3. Based upon previous research, it is not surprising to see that all three configurations have D=1, indicating perfectly segregated situations. D(w) and D(s) have identical results for configurations (a) and (b) because the P=A ratios for all areal units are the same and are equal to the MAX(P=A). But for configuration (c), the D(s0 ) is calculated using the MAX(P=A) ratio for configurations (a) and (b). Configuration (c) appears to have a higher level of
66
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
Fig. 2. ArcView GUI with the new segregation tools and results from Washington, DC.
segregation according to D(s0 ) than that indicated by D(adj) and D(w). It is because D(s0 ) takes into account the fact that the large block of minority at the corner has a lower chance of interacting with others as compared with other configurations with smaller blocks which provide a higher potential for spatial interaction. Configurations (d) and (e) are for multi-group situations. Four groups (1, 2, 3 and 4) are involved. Configuration (d) has a more dispersed pattern than configuration (e), and thus the segregation level of (d) based upon both multi-group measures is lower than that of (e). Fig. 3 also includes the intersection and union polygons of the ellipses. Apparently, the intersecting polygon for configuration (d) is relatively big as compared to the union polygon. On the contrary, the intersecting polygon for configuration (e) is rather small as compared to the union polygon, resulting at a higher level of segregation. Using census tract data of Washington, DC from the 1990 census as an example, the results for aspatial and spatial measures are reported in Table 2. In the twogroup illustration, white and black are used. The D value is 0.7670 while all spatial measures are lower because they account for the potential interaction of the two groups across census tract boundaries. In the multi-group illustration, white, black and Asian are used. The aspatial D(m) yields a value of 0.7616 and the segregation level reflected by the spatial version of D(m) is 0.6508. The ellipses of the three population groups are also shown in Fig. 2, with the center of whites in the west and center for blacks in the east. Asians are in between the two other groups, but the
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
67
Fig. 3. Simulated configurations for two-group and multi-group spatial measures.
intersection of the three ellipses is rather small, resulting in a high level of segregation. Please note that for the ellipse-based measure, the segregation value, though it is bounded between zero and one, is meaningful only when it is compared with the result from another study region (Wong, 1999).
7. Summary and discussions Studies of segregation have relied primarily on measures developed by sociologists and demographers. These measures are very useful, but are not sufficient to describe
68
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
Table 2 Values of segregation values for Washington, DC Two-group (White/Black) D D(adj) D(w) D(s)
Multi-group (White/Black/Asian) 0.7670 0.6370 0.6506 0.7064
D(m) SD(m) S
0.7616 0.6508 0.9592
the spatial distributions among population groups. Spatial measures were introduced but were not used mainly because there were no tools available to compute these relatively complicated measures. This paper provided a concise overview of several spatial segregation measures which can complement traditional aspatial measures of segregation. The paper identified spatial information in categories that can be derived from GIS and are required to compute spatial segregation measures. An older approach to implement a subset of spatial segregation measures was reviewed here but a more integrated approach is adopted to implement all measures. Algorithms were designed based upon the types of spatial information required in the computation, and the general algorithm was modified for different spatial segregation measures. Using ArcView and its Avenue scripting language as an example, the set of spatial segregation indices was implemented as additional tools in the ArcView user interface. By implementing these spatial measures using one of the most popular GIS packages, it is hoped that researchers and analysts who are interested in using these spatial measures have the necessary tools for the computation. The previous effort, which combined the capabilities of ARC/INFO and S-plus to implement the twogroup indices only, put a great deal of burden on the technical skill of the users/ analysts and relied on older GIS technology. The current effort took advantage of the object-oriented GIS environment, the advanced spatial indexing system to facilitate efficient spatial queries for even large spatial databases. The definition of neighbors is no longer limited to adjacent units, but is distance-based. The new tools developed work seamlessly in ArcView. When these tools for computing spatial measures become more accessible, it is hoped that new studies using these measures can shed additional light on the spatial dimension of segregation. Many issues related to spatial segregation measures are yet to be explored and these new spatial tools should be valuable to such endeavors. Most spatial analytical and modeling procedures combine attribute data with spatial information of geographical features captured by feature data in the analysis. The general algorithm proposed in this paper will be generally applicable to implement various spatial analytical and modeling tools in GIS, including various spatial autocorrelation statistics and spatial interaction models. It is likely that the algorithm has to be adjusted for specific situations, such as the different segregation spatial measures covered in this paper. This paper also demonstrated an obvious advantage of using GIS to compute spatial measures when the ellipse-based S index is used. Parameters for an ellipse can possibly be derived without GIS, however, GIS are definitely the ideal tools to
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
69
derive the intersection and union of the ellipses. In addition, the results can be displayed geographically to offer insights and enhance further analysis. All spatial segregation measures discussed in this paper can be regarded as global measures because they are summary indicators for the entire region. GIS are not quite useful to represent the results of these global measures (except the ellipse-based S index) and the local variations in segregation level cannot be revealed. But for local segregation measures (Wong, 2002a), the implementation approach suggested in this paper is definitely applicable and the results can be displayed visually in maps with GIS in a very effective manner.
Acknowledgements This project is partially supported by the National Institute of Health/Child Health and Human Development (NICHD) under the National Institute of Health (NIH) grant number 1 R03 HD38292-01.
References Anselin, L. A. (1988). Spatial econometrics: methods and models. Dordrecht: Kluwer Academic Publishers. Burrough, P. A., & McDonnell, R. A. (1998). Principles of geographical information systems. Oxford: Oxford University Press. Chrisman, N. (1997). Exploring geographic information systems. New York: John Wiley. Ding, Y., & Fotheringham, A. S. (1992). The integration of spatial analysis and GIS. Computers, Environment and Urban Systems, 16, 3–19. Duncan, O. D., & Duncan, B. (1955). A methodological analysis of segregation indexes. American Sociological Review, 20, 210–217. Goodchild, M. F. (1987). A spatial analytical perspective on geographic information systems. International Journal of Geographical Information Systems, 1(4), 327–334. Kaplan, D. H., & Holloway, S. R. (1998). Segregation in cities. Washington, DC: Association of American Geographers. Laurini, R., & Thompson, D. (1992). Fundamentals of spatial information systems. New York: Academic Press. Massey, D. S. (1985). Ethnic residential segregation: a theoretical synthesis and empirical review. Sociology and Social Research, 69(3), 315–350. Massey, D. S., & Denton, N. A. (1988). The dimensions of residential segregation. Social Forces, 67, 281–315. Morgan, B. S. (1975). The segregation of socioeconomic groups in urban areas. Urban Studies, 12, 47–60. Morrill, R. L. (1991). On the measure of geographical segregation. Geography Research Forum, 11, 25–36. Morrill, R. L. (1995). Racial segregation and class in a liberal metropolis. Geographical Analysis, 27, 22–41. Newby, R. G. (1982). Segregation, desegregation, and racial balance: status implications of these concepts. The Urban Review, 14, 17–24. Poulsen, M. F., & Johnston, R. J. (2000). The ghetto model and ethnic concentration in Australian cities. Urban Geography, 21, 26–44. Sakoda, J. N. (1981). A generalized index of dissimilarity. Demography, 18, 245–250. Wong, D. W. S. (1993). Spatial indices of segregation. Urban Studies, 30, 559–572. Wong, D. W. S. (1996). Enhancing segregation studies using GIS. Computers, Environment and Urban Systems, 20(2), 99–109.
70
D.W.S. Wong / Comput., Environ. and Urban Systems 27 (2003) 53–70
Wong, D. W. S. (1998). Measuring multiethnic spatial segregation. Urban Geography, 19, 77–87. Wong, D. W. S. (1999). Geostatistics as measures of spatial segregation. Urban Geography, 20(7), 635–647. Wong, D. W. S. (2000). Several fundamentals in implementing spatial statistics in GIS: using centrographic measures as examples. Geographic Information Sciences, 5(2), 163–174. Wong, D. W. S. (2002a). Modeling local segregation: a spatial interaction approach. Geographical and Environmental Modelling (to appear). Wong, D. W. S. (2002b). Spatial measures of segregation and GIS. Urban Geography (to appear). Wong, D. W. S., & Chong, W. K. (1998). Using spatial segregation measures in GIS and statistical modeling packages. Urban Geography, 19(5), 477–485.