Spatial data constraints: Implications for measuring broadband

Spatial data constraints: Implications for measuring broadband

ARTICLE IN PRESS Telecommunications Policy 32 (2008) 490– 502 Contents lists available at ScienceDirect Telecommunications Policy URL: www.elsevierb...

1MB Sizes 0 Downloads 31 Views

ARTICLE IN PRESS Telecommunications Policy 32 (2008) 490– 502

Contents lists available at ScienceDirect

Telecommunications Policy URL: www.elsevierbusinessandmanagement.com/locate/telpol

Spatial data constraints: Implications for measuring broadband Tony H. Grubesic Department of Geography, Indiana University, Bloomington, IN 47405-7100, USA

a r t i c l e i n f o Keywords: Broadband DSL Spatial analysis Data constraints Availability

abstract The accurate determination of where broadband telecommunication services are available in the United States continues to be a significant challenge. Existing data regarding broadband provision, such as that provided by the Federal Communication Commission (FCC) simply designate ZIP codes with at least one high-speed Internet subscriber. As ZIP code areas vary greatly in size and shape, the lack of geographic specificity as to exactly where broadband is available, particularly within ZIP code areas, confounds communications policymaking. Further, there are a number of additional geographic nuances concerning broadband availability that also inhibit empirical examination and policy generation, including the spatial limitations of digital subscriber line services. The purpose of this paper is to briefly review the issues concerning broadband measurement in the United States and provide an empirical analysis of several spatial data constraints that must be accounted for when interpreting and constructing public telecommunications policy. & 2008 Elsevier Ltd. All rights reserved.

1. Introduction Where is broadband available? In the United States, this should be a relatively easy question to answer. For example, there are a number of public and private research entities that attempt to track where broadband telecommunications services are provided. The Pew Internet and American Life project utilizes random-digit dial telephone surveys to determine how people connect to the Internet (Horrigan, Stolp, & Wilson, 2006). The United States Federal Communication Commission (FCC, 2007) publishes a semiannual report summarizing the number of facilities-based providers who have at least one broadband subscriber in each ZIP code in the US. There are also many technology consulting firms that track trends in broadband adoption and provision. For example, Forrester Research conducts annual surveys of 60,000–100,000 households to determine their technology adoption and behaviors, including broadband use (Kolko, 2007). Recent estimates suggest that while 99.96% of US households are located in a ZIP code with at least one provider, nearly half of all households are located in ZIP codes with more than 10 providers (FCC, 2007). Further, Horrigan (2007) notes that 47% of American adults have broadband at home, representing a nearly 100% increase in home adoption during the past three years. While these numbers are certainly impressive, the lack of geographic specificity regarding exactly where broadband is available remains troubling. For example, consider Fig. 1, which illustrates Nye County, Nevada, and ZIP code 89049 (Tonopah, NV). With a population of 3140 in the Tonopah ZIP code, this region is not an extremely large broadband market; however, it is a geographically expansive one. The ZIP code covers 14,874.4 km2, an area larger than the state of Connecticut (13,023 km2). According to the 2004 FCC database, there are between one and three broadband providers in this ZIP code. Where are these providers? The FCC does not disclose this information and given the size of 89049, understanding where a provider offers service is certainly an important consideration. For example, if one makes the

E-mail address: [email protected] 0308-5961/$ - see front matter & 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.telpol.2008.05.002

ARTICLE IN PRESS T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

491

NEVADA Reno

Tonopah Central Office Nye Country Central Offices Tonopah (89049) Nye Country

50

0

Las Vegas

Kilometers 50 100

150

200

Fig. 1. Broadband coverage for Tonopah, ZIP code 89049.

assumption that digital subscriber line service is available, it is possible to more narrowly define the region of broadband availability to an area near the Tonopah central office (CO), which located at the far western reaches of 89049.1 In this context, broadband provision, availability and accessibility are clearly not equivalent from a spatial perspective. While the provision of broadband has been established for the ZIP code in question, it is unlikely that DSL service is available for all of 89049. As a result, the extent of broadband accessibility, at least for this region, is unknown. This type of uncertainty is illustrative of several relatively common data constraints that analysts face when evaluating and developing telecommunication policies in the United States. Geographic space, technological limitations of broadband platforms, data aggregation routines and methods of analysis can bias policy interpretation and formulation. The purpose of this paper is to illuminate a specific problem associated with broadband deployment, namely, using spatial data to better estimate where digital subscriber line service is available. By focusing on this specific platform, this paper also provides an important, albeit nuanced review of the geographical issues that can confound both broadband measurement and policy evaluation when using spatial data for analysis. To reiterate, this paper does not seek to serve as a comprehensive review of the data constraints associated with measuring broadband. There are far too many supply and demand-side issues to address appropriately. Instead, this paper uses DSL coverage estimates to evaluate the implications of spatial data and their accurate representation for telecommunications policy development. The remainder of this paper is organized as follows. The next section reviews several of the existing limitations associated with broadband data and their associated measurement approaches. This is followed by a case study of the Columbus, Ohio, metropolitan area, which illustrates several of the more challenging geographic limitations of broadband data, emphasizing digital subscriber lines. Finally, Section 4 provides a brief discussion of how these inconsistencies can be remedied and offers some concluding thoughts.

1 The author recognizes that either cable or wireless/satellite broadband might be the platform, which is available in Tonopah, but this example is for illustrative purposes only.

ARTICLE IN PRESS 492

T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

2. Data constraints and problematic measurement assumptions In the United States, the most comprehensive, publicly available database documenting broadband provision is maintained by the FCC. The Form 477 data collection program publishes a semiannual report of facilities-based providers with at least one broadband subscriber in each ZIP code. As discussed by Grubesic (2006) and many others, there are several notable drawbacks to the Form 477 data.2 Again, one pertinent example of its limitations was highlighted in Fig. 1. Specifically, the presence of a provider(s) in a ZIP code does not guarantee ubiquitous access. These basic concerns with the FCC data, and broadband data more generally, are echoed in a recent address by Greenstein (2007) and a special report regarding the measurement of broadband by Flamm, Friedlander, Horrigan, and Lehr (2007). Where the former is concerned, several of the most pressing data constraints for the emerging Internet economy are addressed. While a portion of this discussion focuses on issues related to Internet use in firms, electronic retailing, software use and the pricing of Internet access, Greenstein (2007) does acknowledge FCC broadband data limitations. For example, in addition to the major gaps in our understanding of the geographic scope of availability within a ZIP code, Greenstein (2007) suggests that there is a dearth of information regarding the size of firms offering broadband, pricing and bandwidth provided to end users. Where the special report on broadband data by Flamm et al. (2007) is concerned, three important points are highlighted:

(1) Collection of data should be at a sufficiently fine-grained level to permit regional analysis of the impacts of communication technology; (2) The United States should be able to produce a map showing the availability of infrastructure in the country; (3) Academic researchers, non-profit organizations, the government and the private sector must work collaboratively to gather data that permits assessment of quality of service and the user experience.

All of these points have merit and largely echo the sentiment of analysts and researchers working on broadband issues in both the public and private sectors throughout the United States. A supplementary point worth making is the potential for drastically overstating the populations covered by broadband. As noted previously, the FCC estimates suggest that 99.96% of US households are located in a ZIP code with at least one provider. While this is indeed a statement of fact, it should not be interpreted that those 99.96% of US households can actually access broadband. As will be discussed in the next section, great care is needed for estimating how many households are actually covered with broadband availability. Evidence suggests the potential for drastically overestimating these numbers, particularly where digital subscriber lines are concerned. Obviously, there are a number of representational issues associated with broadband data in the United States. However, there are supplemental databases that can help clarify many of the issues associated with broadband availability. For example, Prieger and Hu (2007) utilize a combination of Census 2000 data, telecommunications infrastructure data and DSL availability data to model issues of race, competition and service quality in the US.3 Notably, because the geographic range of DSL service degrades with distance, Prieger and Hu (2007) utilize a relatively conservative, Euclidean (i.e. straight line) service range of 1.5 miles from each telephone exchange CO to estimate DSL availability.4 This type of empirical sensitivity for the geographic limitations associated with telecommunication platforms is certainly important, and a feature that is currently lacking in many analyses concerning broadband availability (e.g. FCC, 2004). The potential for misrepresenting spatial data is an important one in telecommunication analysis, with many of the salient issues reviewed in Grubesic and Murray (2005). In addition, Madden and Tan (2007) explore methodological approaches for identifying appropriate extrapolation techniques for forecasting telecommunications data. In both cases, the authors address the validity of assumptions made in basic telecommunications analysis, highlighting potential inconsistencies and their potential impacts on policy generation. Of particular interest to this paper is how DSL service range and coverage is modeled in the literature. As mentioned previously, many analyses simply assume that if DSL is available in a telephone exchange area (i.e. wirecenter), DSL is also available throughout the exchange region (Rappaport, Kridel, Taylor, Alleman, & Duffy-Deno, 2003). Even in studies that account for the limitations in service range, such as Prieger and Hu (2007), it is still possible to generate erroneous estimates of DSL availability if the spatial relationship between COs and subscribers is not modeled accurately. In the next section, a case study of Columbus, Ohio, is used to highlight these potential spatial data constraints and the degree to which numerical error can accumulate in these approaches is quantified.

2

For a more detailed discussion regarding the limitations of the FCC data, see Prieger and Lee (2008); Flamm (2006) and Prieger and Hu (2007). The authors focus on the Ameritech operating area, which is the incumbent local exchange company (LEC) in five Midwestern states (Wisconsin, Illinois, Michigan, Indiana, and Ohio). 4 Prieger and Hu (2007) also note that the service range for DSL is contingent upon line distance, which corresponds to the actual length of a twisted copper pair from a household to a central office. 3

ARTICLE IN PRESS T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

493

Columbus MSA Urbanized Area Ohio

20

Kilometers 0 20 40

Fig. 2. Columbus, Ohio, metropolitan statistical area.

3. The spatial representation of broadband availability 3.1. Study area and data The Columbus metropolitan area is a relatively large urban complex located in the central portion of the Ohio (Fig. 2). With a population of nearly 1.6 million, Columbus has a relatively robust broadband market. According to the FCC (2004), there were two ZIP codes in the metropolitan area that had 17 broadband providers and a regional average of six in 2004. For the purpose of this case study, a number of geographic base files are used for analysis. First, CO and telephone exchange area data were acquired from GDT (2004).5 The CO database consists of all facilities in the state of Ohio that are listed by the Local Exchange Routing Guide from Telcordia Technologies. This includes information regarding the physical location of COs, the geographic extent of their coverage areas (i.e. wirecenter service areas) and general information on CO capabilities. As noted previously, FCC broadband data by ZIP code for December 2004 are also utilized in this analysis. The ZIP code boundary files for this analysis were also acquired from GDT. Finally, a comprehensive database of Ohio streets was acquired from Caliper (2005).

3.2. Estimating DSL coverage As noted in Sections 1 and 2, accurately estimating DSL coverage in the United States remains difficult. As distance increases from the CO, service quality often degrades and the likelihood of provision also decreases. In many instances, 18,000 ft is the maximum distance in which asymmetric DSL is available (Grubesic & Horner, 2006).6 Fig. 3 displays all COs and their respective service areas for the Columbus MSA. COs and service areas were selected by querying the spatial 5

Geographic Data Technology was recently acquired by TeleAtlas. There are markets where remote digital subscriber line access multiplexers (RDSLAM) are installed. In effect, this remote switch allows for additional households and businesses to be covered by DSL technologies through the use of fiber-based relay stations. In the Columbus area, the AT&T Uverse plan is the latest and most aggressive incarnation of fiber-to-the-node (FTTN) and fiber-to-the-premises (FTTP) plan in the area. At the time this paper was written, the Ohio Department of Commerce had just cleared AT&T to offer such services (CBF, 2007), so their specific spatial distribution was unknown. 6

ARTICLE IN PRESS 494

T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

10

Kilometers 10 20 0

Columbus MSA Central Office Wire-Center Service Area

Fig. 3. Wirecenters and central offices intersecting the Columbus MSA.

database to determine which wirecenter regions intersected the eight county Columbus MSA. In total, there are 110 wirecenter service areas and 130 COs.7 Because DSL service is available in every CO for this study area, one could make the assumption that DSL coverage within each wirecenter area is ubiquitous, as exampled in Rappaport et al. (2003). To determine population or households served, coverage statistics can be aggregated from Census blocks to quantify the coverage area. For example, in this instance, 718,561 households are estimated to have access to broadband DSL in the study area. Obviously, this type of assumption has significant potential for error, particularly considering the geographic service limitations inherent to DSL. Basically, the assumption of ubiquitous coverage within a wirecenter is too generous. In an effort to more accurately represent DSL coverage for Columbus, one can follow the lead set by Prieger and Hu (2007), where DSL service range is estimated by examining the spatial distribution of existing subscribers by ZIP+4 units.8 In this instance, the authors determined that 1.5 mile (2.41 km) was the critical deployment distance between COs and subscribers. Alternatively, without access to such detailed subscriber data, or in instances where subscriber data are incomplete, one could generate a simple Euclidean distance-based buffer from all CO locations corresponding to the critical distance of 1.5 mile to estimate DSL coverage. For the purposes of this paper, a more generous estimate of 18,000 ft (3.4 mile or 5.484 km) is utilized. This is not without basis, as this geographic range represents a typical maximum qualification distance for many providers (Abe, 2000).9 Fig. 4 illustrates the estimated service coverage for DSL in the Columbus MSA when using this approach. There are three major features of this map worth noting. First, once the erroneous assumption of ubiquitous coverage for wirecenter service areas is dropped, the broadband landscape shifts dramatically. For example, a basic spatial analysis of block data indicates a decrease of 86,802 households covered. Secondly, as Fig. 4 displays, not only is the geographic extent of DSL coverage more limited for the region as a whole, the interplay of DSL service range both within and between wirecenter areas is significant. Similar to the findings of Prieger and Hu (2007), there are many instances where DSL service range does

7 In many instances, a wirecenter area is served by more than one central office. This is particularly common in central business districts where demand for telephone lines is the highest. 8 DSL subscription data are indicative of at least one household or business in the ZIP+4 area subscribing to DSL. This information is then aggregated to Census blocks. 9 The authors recognize that new technologies, such as RDSLAMs, have the potential to increase this distance. See Grubesic and Horner (2006) for more details.

ARTICLE IN PRESS T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

10

Kilometers 0 10

20

495

18,000 ft. Euclidean Buffer Central Office Wire-Center Service Area Logan, OH

Fig. 4. Within and between-unit bias for Euclidean DSL coverage estimates.

not extend to 100% of the wirecenter service area. For example, the geographic expanse of the Logan, Ohio, wirecenter far exceeds the DSL coverage range of its CO (Fig. 4). This represents a distinct within-unit spatial bias in DSL coverage. However, there also appears to be some overlap in coverage from the CO located northwest of Logan (Sugar Grove, Ohio). Is it possible that customers in the Logan wirecenter service area can get DSL coverage from the Sugar Grove CO? Hypothetically, yes. In reality, no. This represents a between-unit bias in DSL coverage. DSL service between COs cannot overlap wirecenter service areas (Abe, 2000). Twisted copper pairs for the plain old telephone system (POTS) extend from the CO to households located in their wirecenter only. As DSL technology relies on these copper pairs, overlap between wirecenters is not possible. While it is conceivable that a customer could pay for his or her premises to be completely rewired to a different CO, the cost of doing this would be astronomical and in most cases, it is likely that the local exchange carrier would decline this service request at any price. In short, the between-unit bias is one of the more subtle, yet important aspects of measuring broadband availability. If not accounted for, this type of spatial bias can drastically alter the composition of DSL coverage areas, yielding a numerical inflation of household coverage estimates. Consider, for example, Fig. 5, where between-unit bias is accounted for in the spatial analysis. The Euclidean-distance buffers are geographically ‘‘clipped’’ to represent coverage areas within their wirecenter service areas only. Numerically, the difference between a coverage analysis that accounts for between-unit bias versus one that does not, is significant (Table 1). By accounting for between-unit bias, 17,271 households are eliminated from the coverage estimates and a more realistic landscape of broadband DSL availability begins to emerge.

3.3. Network constrained DSL coverage A final approach to the spatial representation of DSL network coverage concerns the differences between Euclideandistance based coverage analysis and a more realistic portrayal that incorporates network distances and line-lengths. Prieger and Hu (2007, p. 10) note that, ‘‘telephone wires often run along roads arrayed in a grid, so the distance ‘as the crow

ARTICLE IN PRESS 496

T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

10

Kilometers 0 10

20

18,000 ft. Euclidean Buffer (Clipped) Central Office Wire-Center Service Area Logan, OH

Fig. 5. The 18,000 ft Euclidean distance buffer to account for between-unit bias in DSL coverage.

Table 1 Differences in DSL coverage estimates, Columbus, Ohio MSA Wirecenter coverage area characteristics

Scenario Scenario Scenario Scenario Scenario Scenario

(1) ubiquitous coverage (2) 18,000 ft. Euclidean buffer (3) 18,000 ft. Euclidean buffer, accounting for between-unit bias (4) 18,000 ft. generalized network buffer (5) 18,000 ft. generalized network buffer, accounting for between-unit bias 2 minus Scenario 5

Block count

Population

Households

42,347 33,347 31,992 29,266 28,152 5,195

1,830,255 1,578,365 1,531,196 1,408,188 1,356,335 222,030

718,561(100.00%) 631,453(87.87%) 614,182(85.47%) 567,999(79.04%) 547,370(76.17%) 84,083

flies’ between a house 2.2 wire miles from the central office may be as short as 1.5 miles’’. Specifically, ‘‘if the wires take right angle turns along streets, the ‘worst case’ scenario is a right triangle with base and height each of length 1.1 miles. In this case, the distance from the house to the CO by air is only 1.5 miles’’ (Prieger & Hu, 2007). This is an extremely important spatial consideration when modeling DSL coverage. While the spatial errors associated with a failure to account for this subtlety might not be significant for a single wirecenter service area, when an entire region is considered simultaneously, the accumulation of error can be significant. In an effort to account for this problem, a network-based coverage threshold is calculated for each CO in the Columbus MSA. In this instance, a high-resolution street network is used as a basis for estimating a non-compact, convex, network service area of 18,000 ft. around each CO.10 Fig. 6a highlights the resulting landscape of DSL coverage for the Columbus area.

10 These convex coverage areas are generated by enclosing the point on each street segment in the network that is 18,000 ft away from the central office with a minimum bounding polygon. While not utilized in this paper, it is possible to generate compact, non-convex service areas to estimate DSL coverage. Both types of service areas are easily generated in most commercial GIS packages such as ArcGIS and TransCad.

ARTICLE IN PRESS T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

497

18,000 ft.Network Distance Buffer Central Office Wire-Center Service Area Logan, OH

18,000 ft.Network Distance Buffer Central Office Wire-Center Service Area Logan, OH Fig. 6. (a) The 18,000 ft network distance buffer for estimating DSL coverage and (b) the 18,000 ft network distance buffer to account for between-unit bias in DSL coverage.

Not surprisingly, this spatial representation indicates a dramatic reduction in households and population covered, even when compared to the Euclidean distance coverage area where between-unit bias was accounted for (Table 1). A final modification for the network constrained buffer approach is, in fact, to account for between-unit bias. Fig. 6b displays these results and the accompanying statistics are noted in Table 1. Again, over 20,600 households are removed from the coverage area when between-unit bias is accounted for. In sum, the empirical evidence provide by Figs. 4–6 and the numerical evidence provided in Table 1 suggest the potential for overestimating DSL coverage in the Columbus MSA is substantial. Fig. 7 provides a final, graphical summary of exactly how these different coverage approaches vary for a single CO and its associated wirecenter area. While the differences in coverage are not always dramatic, one can certainly see the need to account for these spatial constraints

ARTICLE IN PRESS 498

T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

3000

Feet 0 3000

N

6000

Clipped Network Coverage Network Coverage Clipped Euclidean Coverage Euclidean Coverage South Solon, OH (Central Office) South Solon, OH (Wirecenter)

Fig. 7. A comparative snapshot of wirecenter coverage.

when evaluating broadband coverage. In sum, the differences between using a strict Euclidean distance buffer versus a network constrained distance buffer where between unit bias is accounted for in the overall study area is 222,030 for population and 84,083 for households. 3.4. FCC broadband data and potential coverage error A final important issue in this paper is developing an approach for rectifying DSL coverage at the ZIP code level for the purpose of estimating potential coverage error in the FCC broadband database. As mentioned in Section 2, the FCC claims that 99.96% of US households are located in a ZIP code with at least one provider. Again, while this may be true, there are many instances where ZIP codes have large spatial footprints and the presence of a provider(s) does not guarantee ubiquitous access. That said, how much of a gap actually exists when evaluating DSL coverage at the ZIP code level in the Columbus MSA. Not surprisingly, estimating this type of numerical error is an extremely difficult task, given the spatial mismatch between wirecenter, ZIP code and Census block boundaries combined with the coverage polygons for each CO. Two minor modifications needed to be made to the study area to accomplish this analysis. First, the Columbus MSA intersects with 161 unique ZIP code areas. In many locations, the spatial extent of these areas exceeds that of the wirecenter boundaries used for the initial analysis. This means that additional households are potentially served by COs not included in the first coverage analysis. To account for these households, an additional 53 wirecenters and 59 COs were integrated with the existing data to ensure complete coverage of each ZIP code area. Secondly, each Census block was assigned an identification number that corresponded to the ZIP code area in which it was located.11 This is a critical step in the analysis, because it allows for extremely accurate household coverage estimates within ZIP codes. As the basic unit of coverage analysis (e.g. Census blocks) is relatively disaggregated, as coverage buffers are applied from COs, each Census block that is 11 Census blocks were allowed membership in one ZIP code only. This was accomplished with a simple geometric rule, based on the centroid location of each of the Census block polygon.

ARTICLE IN PRESS T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

499

100.0000 90.0000

Percent DSL Coverage

80.0000 70.0000 60.0000 50.0000 40.0000 30.0000 20.0000 10.0000 Percent Households Covered 43064

43076

43746

43036

43021

43149

43085

43163

43205

43212

43222

43350

43062

43204

43321

45369

43235

43152

43315

43224

43117

43110

43228

43030

43105

43015

45601

43162

43031

43140

43150

43115

43013

43045

43084

43207

43153

43231

43219

43162

43015

43025

43081

43240

43054

43721

43222

43212

43205

43163

ID

43085

0.0000

ZIP Code 100.0000 90.0000

Percent Coverage Error

80.0000 70.0000 60.0000 50.0000 40.0000 30.0000 20.0000 10.0000 Percent Potential Coverage Error 43336

43136

43033

43036

43721

43219

43231

43153

43207

43084

43045

43013

43115

43150

43140

43031

45601

43105

43229

43112

43074

43062

43064

43076

43746

43082

43341

45654

ID 43152

0.0000

ZIP Code Fig. 8. (a) Network constrained DSL household coverage by ZIP code. Note: not all ZIP codes are labeled on x-axis. (b) Potential DSL coverage error in FCC broadband data. Note: not all ZIP codes are labeled on x-axis.

covered (including its number of households) can be attributed to a particular ZIP code, while still maintaining wirecenter boundary constraints. This allows one to estimate, with good accuracy, the location and percentage of households within coverage range of DSL service for each ZIP code area.

ARTICLE IN PRESS 500

T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

Table 2 Statistical correlates of DSL coverage for Columbus MSA ZIP code areas

Percent of households covered FCC broadband provider count Household density

Wirecenter area size (square miles) Central office count

Pearson correlation Sig. (2-tailed) N Pearson correlation Sig. (2-tailed) N Pearson correlation Sig. (2-tailed) N Pearson correlation Sig. (2-tailed) N Pearson correlation Sig. (2-tailed) N

Percent of households covered

FCC broadband provider count

Household density

Wirecenter area size (square miles)

Central office count

1

.404 .000 161 1

.315 .000 161 .420 .000 161 1

.015 .852 161 .003 .965 161 .421 .000 161 1

.251 .001 161 .180 .023 161 .079 .316 161 .277 .000 161 1

161 .404 .000 161 .315 .000 161 .015 .852 161 .251 .001 161

161 .420 .000 161 .003 .965 161 .180 .023 161

161 .421 .000 161 .079 .316 161

161 .277 .000 161

161

 Correlation is significant at the 0.01 level (2-tailed).  Correlation is significant at the 0.05 level (2-tailed).

This final spatial analysis yields two interesting results worth mentioning. First, Fig. 8a displays the percentage of each ZIP code’s households that are covered by the network constrained DSL coverage approach.12 The interpretation of this information is fairly intuitive. Percent coverage decreases from 100% (n ¼ 26) to 0.0% (n ¼ 21), with an average value of 61.47 and a standard deviation of 32.54. Broadly interpreted, this suggests that an average of 61.5% of all households in the study area can receive DSL at a service distance of 18,000 ft. Not surprisingly, there are still a handful of ZIP codes where DSL service would not be available under this scenario. These are primarily small, unincorporated locations that are geographically isolated from existing COs or on the fringe of wirecenter service area boundaries. Fig. 8b displays the difference between the coverage estimates generated in this study and the broadband statistics reported by the FCC. The interpretation of this graph is slightly different than Fig. 8a. In Fig. 8b, values of 100% indicate that the FCC is reporting at least one facilities-based broadband provider with an active broadband line in the ZIP code area, but according to the coverage analysis in this paper, DSL would likely not be available. Again, households in these ZIP code areas are too distant from a CO to receive DSL service. In these instances, one would assume that cable or wireless broadband are the platforms obtainable. Values less than 100% and greater than 0.0% represent the potential DSL coverage gap between the FCC’s broadband statistics and the network constrained coverage analysis. In these ZIP codes, the FCC is reporting that at least one or more broadband providers are active. Therefore, if the assumption is made that DSL is the only platform available in these ZIP code areas (granted, this is a fairly major assumption), under the existing reporting rubric, the FCC would indicate that 100% of these households are located in a ZIP code where broadband is available. However, by operationalizing the network constrained coverage analysis and assuming the coverage range is 18,000 ft from a CO, Fig. 8b indicates the potential numerical overestimation between actual households within the 18,000 ft range and the erroneous assumption of ubiquitous coverage. For the 112 ZIP code areas, where there is a potential coverage gap, the average number of households within DSL range is 35.82%. Finally, values of 0.0% in Fig. 8b suggest that all of the households in these ZIP code areas are within 18,000 ft coverage range of an existing CO and would likely be eligible for DSL service. 3.5. Statistical correlates of DSL coverage Table 2 highlights the statistical correlates of DSL coverage for the Columbus MSA. The results are not particularly surprising and simply confirm several of the known, demand-side determinants of broadband provision. For example, there are positive and significant correlations between the percentage of households within range (18,000 ft) of DSL service (network constrained) and FCC broadband provider counts, household density and CO counts at the ZIP code level. As noted in previous studies, household density is positively associated with broadband provision (Grubesic, 2006; Grubesic & Murray, 2004). Further, the correlation analysis suggests that the size of wirecenter service areas is negatively correlated with household density (Grubesic, 2003). Larger wirecenters tend to be more rural and less densely settled. As illustrated by this paper, many of the larger wircenters in the Columbus MSA have households located outside the standard 18,000 ft DSL range of their respective CO(s). 12

This is the most conservative coverage estimate, but it is likely the most realistic.

ARTICLE IN PRESS T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

501

4. Discussion and conclusion While the empirical analysis in this paper focuses on a relatively small aspect of broadband measurement (i.e. DSL), the implications of these results have a much broader context for telecommunications policy development and evaluation. First, telecommunications data are complex. For example, in addition to establishing the availability of DSL in a given region, issues of service range, line qualification, line conditioning, pricing and bandwidth are likely to further complicate both spatial and econometric analyses. Not surprisingly, the challenges associated with DSL are also apparent in other broadband platforms. Consider wireless broadband, which is also subject to several spatial and technological caveats. The accurate estimation of coverage areas, price, bandwidth, signal propagation and network security are all valid areas of concern when evaluating wireless options. Secondly, how does one account for these issues when generating and interpreting telecommunications policy? Obviously, there is no clear-cut answer to this. Although Grubesic and Murray (2005) provide a framework for addressing the challenges associated with spatial data in the context of telecommunications policy, there are a number of unresolved issues. For example, where broadband data are concerned, there are serious limitations associated with using ZIP codes to collect broadband provision information. In addition to the weaknesses highlighted in previous work (Flamm, 2006; Prieger, 2003), it is important to note that ZIP codes are not regions—they are linear features, corresponding to address ranges and streets that were designed for the sole purpose of making the United States Postal Service more efficient when delivering mail (Grubesic & Matisziw, 2006). While ZIP codes certainly provide an easy way to tabulate business data, including broadband provision, they are less than desirable for spatial and economic analysis—regardless of their popularity in the public and private sectors. A partial solution to the development of better telecommunication policies is the use of more, platform specific infrastructure data. As highlighted by this paper, CO and wirecenter data are critical to understanding DSL coverage. In fact, the results presented in Section 3 should indicate to analysts just how little one could say about DSL availability when provided with data at the ZIP code level. Similarly, information on cellular radio towers, spectrum allocation, Wi-Fi hot spots and satellites are equally critical to understanding issues of wireless broadband availability. In the end, only the providers actually know which households are served. The challenge for researchers is finding both creative and computationally efficient ways to integrate the infrastructure information that is available into spatial and econometric studies for evaluating policy. Obviously, geographic information systems (GIS) offer one software-based approach for managing and manipulating these data, but there are many others. Finally, what do the results of this analysis say about the FCC Form 477 broadband data and their utility for policy-based analysis? Simply put, the FCC data are, at best, difficult to use. Interestingly, there is movement at the federal level to remedy many of the shortcomings associated with the current FCC data. Congressman Edward J. Markley (D-MA) introduced the Broadband Census of America Act to the US House of Representatives in October 2007. In a nutshell, the bill is intended to provide a better and more comprehensive inventory of broadband availability in the United States. This would include a broadband inventory map that ‘‘identifies and depicts the geographic extent to which broadband service capability is deployed and available from a commercial provider or public provider throughout each State’’ (H.R. 3919, 3). More specifically, the bill is seeking to tabulate broadband data at the nine-digit ZIP code level, census tract level or the functional equivalent. If enacted, this bill would serve as an absolutely critical step in the broadband data collection and distribution efforts of the federal government. Needless to say, the information in such a census would likely provide enough detail to finally determine the geographic, demographic and socio-economic extent of the digital divide. Moreover, these data would certainly help generate more meaningful public policies to help remedy existing inequities. From an empirical perspective, it is critical that the sponsors of this bill do not settle for substandard geographic units for tabulation purposes. As noted in this paper and in previous work (Grubesic, 2008; Grubesic & Matisziw, 2006), ZIP codes, in any form (i.e. 5 or 9 digit) are not satisfactory spatial units. A best-case scenario would be the collection of broadband provision data at the Census block level. While Census block groups would also provide enough geographic detail for a meaningful local-level analysis, Census tracts would represent the absolute spatial maximum for unit size. In sum, analysts are often restricted by some combination of data quality, availability or other observational biases. As illustrated here, although spatial data are important for telecommunication analysis, care is needed in their integration and use. In the case of DSL coverage, many of the potentially troubling errors can be rectified through the use of basic GIS functionality and spatial analysis routines. Most notably, by formally accounting for the true spatial nature of DSL coverage, one can effectively correct for overestimations in households served in DSL markets. Where policy is concerned, perhaps the most important point is that while the numerical overestimates for DSL coverage may be small for a single study area, the inflation of error and its propagation through subsequent statistical analysis can have a detrimental impact on the validity of results for regional or national-scale studies. In fact, this holds true for any analysis regarding broadband and its policies. In conclusion, care must be exerted by analysts, particularly when using spatial data, to ensure that the most realistic and accurate representation of broadband, regardless of platform, is maintained. References Abe, G. (2000). Residential broadband. Indianapolis, Indiana: Cisco Press. Caliper Corporation. (2005). URL: /http://www.caliper.comS.

ARTICLE IN PRESS 502

T.H. Grubesic / Telecommunications Policy 32 (2008) 490–502

CBF [Columbus Business First]. (2007). AT&T gets state’s OK for U-verse. URL: /http://www.bizjournals.com/columbus/stories/2007/11/05/daily27.htmlS. Federal Communications Commission [FCC]. (2004). Availability of advanced telecommunications capability in the United States. Washington, D.C. URL: /http://www.fcc.gov/broadband/706.htmlS. Federal Communications Commission [FCC]. (2007). FCC strategic goals: Form 477 reporting requirements and deployment data. URL: /http://www. fcc.gov/broadband/data.htmlS. Flamm, K. (2006). Diagnosing the disconnected: Where and why is broadband access unavailable in the US? URL: /web.si.umich.edu/tprc/papers/2006/ 588/flammbb0806.pdfS. Flamm, K., Friedlander, A., Horrigan, J., & Lehr, W. (2007). Measuring broadband: Improving communications policymaking through better data collection. Washington, DC: Pew Internet and American Life Project. Geographic Data Technology [GDT]. (2004). Wirecenter premium. URL: /http://www.teleatlas.com/index.htmS. Greenstein, S. (2007) Data constraints and the internet economy: Impressions and imprecision. NSF/OECD meeting on factors shaping the future of the internet. URL: /http://www.oecd.org/dataoecd/5/54/38151520.pdfS. Grubesic, T. H. (2003). Inequities in the broadband revolution. Annals of Regional Science, 37, 263–289. Grubesic, T. H. (2006). A spatial taxonomy of broadband regions in the United States. Information Economics and Policy, 18, 423–448. Grubesic, T. H. (2008). Zip codes and spatial analysis: Problems and prospects. Socio-economic planning sciences, 42(2), 129–149. Grubesic, T. H., & Horner, M. W. (2006). Deconstructing the divide: Extending broadband xDSL services to the periphery. Environment and Planning B, 33, 685–704. Grubesic, T. H., & Matisziw, T. C. (2006). On the use of ZIP codes and ZIP code tabulation areas (ZCTAs) for the spatial analysis of epidemiological data. International Journal of Health Geographics, 5, 58. Grubesic, T. H., & Murray, A. T. (2004). Waiting for broadband: Local competition and the spatial distribution of advanced telecommunication services in the United States. Growth and Change, 35(2), 139–165. Grubesic, T. H., & Murray, A. T. (2005). Geographies of imperfection in telecommunication analysis. Telecommunications Policy, 29, 69–94. Horrigan, J. B. (2007). Why it will be hard to close the broadband divide. Pew internet and American life project. URL: /http://pewresearch.org/pubs/556/ why-it-will-be-hard-to-close-the-broadband-divide&reason=0S. Horrigan, J. B., Stolp, C., & Wilson, R. H. (2006). Broadband utilization in space: Effects of population and economic structure. The Information Society, 22, 341–354. H.R. 3919. (2007). Broadband census of America act of 2007. Report no. 110–443, US House of Representatives. Kolko, J. (2007). A new measure of residential broadband availability. Telecommunications policy research conference. URL: /http://web.si.umich.edu/ tprc/papers/2007/716/measuring%20BB%20availability%20v6%20081707.pdfS. Madden, G., & Tan, J. (2007). Forecasting telecommunications data with linear models. Telecommunications Policy, 31, 31–44. Prieger, J. E. (2003). The supply side of the digital divide: Is there equal availability in the broadband internet access market? Economic Inquiry, 41(2), 346–363. Prieger, J. E., Hu, W. -M. (2007). The broadband digital divide and the nexus of race, competition and quality. URL: /http://papers.ssrn.com/sol3/ papers.cfm?abstract_id=1008309S. Prieger, J. E., & Lee, S. (2008). Regulation and the deployment of broadband. In Y. K. Dwivedi, A. Papazafeiropoulou, & J. Choudrie (Eds.), Handbook of research on global diffusion of broadband data transmission. New York: Information Science Reference. Rappoport, P. N., Kridel, D. J., Taylor, L. D., Alleman, J. H., & Duffy-Deno, K. T. (2003). Residential demand for access to the Internet. In G. Madden (Ed.), Emerging telecommunications networks: The international handbook of telecommunications economics, Vol. 2 (pp. 55–72). Cheltenham, UK: Edward Elgar.