ORIGINAL ARTICLE
Medicare Claims Data Resources: A Primer for Policy-Focused Radiology Health Services Researchers Andrew B. Rosenkrantz, MD, MPA a , Danny R. Hughes, PhD b,c, Richard Duszak Jr, MD d Abstract As societal stakeholders call for increased evidence-based health policy, considerable attention has focused on Medicare, the country’s largest payer. Concurrently, medical imaging has come under considerable scrutiny as a contributor to rising health care expenditures. Accordingly, many recent studies have focused on multiple factors related to the utilization of imaging among Medicare beneficiaries. This article summarizes several national Medicare fee-for-service data sources relevant to supporting ongoing investigations. Aggregated 100% data sets include the Physician/Supplier Procedure Summary Master Files and the Medicare Provider Utilization and Payment Data: Physician and Other Supplier Public Use File. The former focuses on services, specialties, and sites of service; the latter focuses on providers. Both permit high-level national assessments of imaging utilization and spending. Individual 5% random-sample claims-level data sources include the Carrier Standard Analytical File Limited Data Set and the Research Identifiable File, which contain greater beneficiary-level information. Both facilitate more robust patient- and encounter-level analyses and some assessment of downstream outcomes but involve greater costs and require greater privacy oversight. More recently, Medicare data are being merged with registry data (eg, Surveillance, Epidemiology, and End Results–Medicare Linked Database files), creating opportunities for even more robust analyses given richer clinical information. Understanding these data sets and trade-offs in their use will aid policy-focused imaging health services researchers in most effectively conducting their investigations. Key Words: Health services research, health policy, Medicare, claims data, radiology J Am Coll Radiol 2017;-:---. Copyright 2017 American College of Radiology
INTRODUCTION Health services research focuses on access to and drivers of health care services, the cost and quality of such services, and their downstream impact on patients and the population at large [1-3]. In light of the current federal health policy environment placing increased emphasis on promoting quality and value of care, policy-focused
a
Department of Radiology, NYU Langone Medical Center, New York, New York. b Harvey L. Neiman Health Policy Institute, Reston, Virginia. c Department of Health Administration and Policy, George Mason University, Fairfax, Virginia. d Department of Radiology and Imaging Sciences, Emory University School of Medicine, Atlanta, Georgia. Corresponding author and reprints: Andrew B. Rosenkrantz, MD, MPA, Department of Radiology, Center for Biomedical Imaging, NYU School of Medicine, NYU Langone Medical Center, 660 First Avenue, 3rd Floor, New York, NY 10016; e-mail:
[email protected]. Drs Rosenkrantz and Duszak are supported by research grants from the Harvey L. Neiman Health Policy Institute. The authors have no conflicts of interest related to the material discussed in this article.
ª 2017 American College of Radiology 1546-1440/17/$36.00 n http://dx.doi.org/10.1016/j.jacr.2017.04.005
health services research is recognized as increasingly important [4]. In radiology, this interest is evidenced by a recent broad range of studies examining trends in utilization and spending for medical imaging services nationally [5] and regionally [6] and at the levels of provider specialty [7], site of service [8], and specific type of service [9]. By its nature, meaningful policyfocused health services research relies on large data sets containing information for broad populations. Such data sets offer insights regarding societal use of particular families of services (eg, imaging) in a manner that is not possible when solely using small-scale local data sets from a single center, as is the norm for many clinically oriented investigations. These large population-level data sets effectively form the backbone of high-quality policyfocused health services research. A key challenge for radiologists and nonradiology researchers pursuing policy-focused imaging health services research is access to relevant large data sets. Individual investigators cannot generate such data sets on
1
their own accord using local institutional resources, thus necessitating external data sources. Although private payer data sets exist [10-13], these are often expensive, encumbered by restrictive data use agreements, and of variable relevance to federal policymaking. As such, most investigators focus on data from CMS. CMS provides access to numerous data sets on the basis of billed Medicare services in efforts to improve the transparency and accountability of care [14], with the goal that interested researchers will take advantage of such data sets to improve health care delivery overall and the Medicare program in particular [14,15] to benefit the public good [16]. For all these reasons, CMS administrative data sets serve as the cornerstone for most policy-focused imaging health services research. Nonetheless, many academic radiologists lack a basic familiarity with these, including relative strengths and limitations. In this article, we provide an overview of the available national administrative CMS fee-for-service data sets with particular relevance to radiology research and focus on two widely available aggregated data sets, two commonly used individual claims-level data sets, and a novel data source linking individual claims-level and cancer registry data.
COMMONALITIES OF CMS CLAIMS DATA SETS In CMS data sets, patients receiving services are designated as beneficiaries. Physicians and other qualified health care professionals billing services are designated as providers. Since 2006, CMS has identified providers using 10-digit National Provider Identifier (NPI) numbers [17]. Providers’ specialties are designated using 2-digit numbers from Medicare Healthcare Provider Taxonomy Codes [18]. Specialty codes most relevant to radiology are 30 (diagnostic radiology), 36 (nuclear medicine), and 94 (interventional radiology). CMS claims data do not offer more granular stratification of diagnostic radiologists (eg, neuroradiologists, breast imagers, abdominal imagers), but algorithmic-based mapping tools may allow such identification [19]. Provider specialty code assignments originate in the Medicare Provider Enrollment, Chain, and Ownership System, an online interface used by providers to initially enroll in Medicare and to periodically update their information [20]. Frequently, physician practice credentialing staff members submit this information on behalf of participating providers (ie, providers are not required to personally register). 2
Rendered services are designated by Healthcare Common Procedure Coding System (HCPCS) codes [21]. Level I HCPCS codes correspond with Current Procedural Terminology codes, which are developed and maintained by the AMA to identify medical services and procedures [22]. Level II HCPCS codes represent additional codes added by CMS to describe services not currently identified by Current Procedural Terminology codes. The two sets are differentiated by the use of five numerals for level I codes (eg, code 76092 for standard film screening mammography), compared with a single letter followed by four numerals for level II codes (eg, code G0202 for digital screening mammography) [23]. Services are also assigned two-digit place-of-service codes reflecting the sites at which the services were rendered (eg, inpatient hospital, emergency department, hospital-based outpatient, non-hospital-based outpatient) [24]. The place of service is further stratified as a facility or nonfacility given differences in the Medicare fee schedules for these two alternatives. The online CMS Data Navigator provides a simple menu-driven user interface for identifying and differentiating many CMS data sets [25]. The Data Navigator allows filtering the data sets by program (including Medicare, Medicare Advantage, Medicaid, and the Center for Medicare and Medicaid Innovation), setting or type of care (including hospital, inpatient, outpatient, physician services, and providers), topic (including access to care, coding, disparity, expenditures, mammography, outcomes, Physician Fee Schedule, screening, and utilization), geography (including county, hospital referral region, national, regional, and state), and document type (including fact sheet, interactive tool, publically available data set, publication, report, and restricted-use data file). The Data Navigator, however, does not itself store any of the data sets but rather directs researchers to relevant data source sites on the basis of the listed categories. In addition, the CMS Research Data Assistance Center (ResDAC) offers academic and nonprofit researchers free assistance in accessing and using CMS data sets, including information and training [26]. ResDAC also serves as the body through which researchers place requests for many of the data sources that require specific applications with subsequent review and approval by CMS. Finally, almost all of the data sets discussed herein are so large that they are not directly accessible using Microsoft Excel or other spreadsheet software; they must be handled through dedicated databases or statistical packages such as SAS or Microsoft Access. In some such instances, CMS provides codes for initially importing the data set into SAS. Journal of the American College of Radiology Volume - n Number - n Month 2017
SPECIFIC CMS CLAIMS DATA SETS CMS claims data sets can be categorized broadly as (1) aggregated data sets, (2) individual claims-level data sets, and (3) individual claims-level data sets linked to clinical registries (Table 1). We detail common data sets in each group below. Physician/Supplier Procedure Summary Master Files Background. CMS provides a wide range of data sets designated as public use files (PUFs) [27]. PUFs are nonidentifiable files in which the data have been edited and stripped of any information that could be used to identify unique beneficiaries, instead providing data solely at an aggregate level [28]. The Physician/Supplier Procedure Summary (PSPS) Master Files [29] are the PUFs that have to date been most widely used by policy-focused radiology health services researchers. These annual data sets contain a 100% summary of all Part B fee-for-service claims. Each file is aggregated by combinations of HCPCS code, provider specialty code, place-of-service code, and several additional modifiers [29]. For each entry, the file provides the total submitted services and charges, the total allowed services and charges, the total denied services and charges, and the total payment amounts [29]. With the PSPS Master Files available for nearly a quarter century, they support longitudinal studies exploring extended temporal trends in imaging utilization and spending. In addition, the inclusion of the billing provider and the place-of-service facilitates analyses of imaging billed by radiologists versus nonradiologists and of imaging performed in specific settings of interest, respectively. Given that PSPS Master Files report only frequency data and that Medicare fee-for-service enrollment varies each year, those frequency data must be normalized for enrollment to calculate rates of utilization (eg, per 10,000 or 100,000 Medicare beneficiaries) [30]. Examples. PSPS Master Files have been broadly applied in policy-focused radiology health services research, addressing such topics as temporal trends in the utilization of individual procedures (eg, CT angiography [31], CT colonography [32], and screening mammography [33]), the relative roles of radiologists and nonradiologists in individual procedures (eg, vascular ultrasound [34], endovascular neurointervention [35], lumbar puncture [36]), as well as shifts in the location of imaging (eg, from private offices to hospital facilities [37]).
Medicare Provider Utilization and Payment Data: Physician and Other Supplier Public Use File Background. A second PUF relevant to policy-focused health services research is the Medicare Provider Utilization and Payment Data: Physician and Other Supplier PUF [38], which was developed in response to an Obama administration transparency initiative. Unlike the PSPS Master Files, which aggregate data to focus on specific services, the Physician and Other Supplier PUF focuses on services at the individual provider level. The Physician and Other Supplier PUF contains utilization and payment information regarding services and procedures performed by providers for Medicare fee-for-service beneficiaries on the basis of a 100% sample of final-action physician/supplier Part B noninstitutional line items [39]. The data are organized at the level of unique combinations of provider (as indicated by NPI), service (as indicated by HCPCS code), and place of service (facility versus nonfacility). Thus, individual providers have numerous entries to reflect all unique rendered services. For each record reflecting a given HCPCS code rendered by a given provider, the file contains the total service count, the total unique beneficiaries receiving the service, the number of unique beneficiaries per day receiving the service, the average charges submitted by the provider for the service, the average payment amount allowed by Medicare (including deductibles and coinsurance), the average actual Medicare payment (not including deductibles and coinsurance), and the average standardized Medicare payment (which corrects for geographic variation in payments, for example because of differences in local wages or input prices, to facilitate geographic comparisons) [39]. Additional information for providers, along with their NPIs, includes name, gender, address, credential, and specialty. CMS also provides two aggregate tables based on the Provider and Other Supplier PUF [40]. One aggregate table provides summary data across all HCPCS codes at the individual provider level, such that each provider has a single entry. The provider-level summary data include total service counts, total unique billed services, total unique beneficiaries receiving care, total submitted charges, total allowed Medicare amounts, total actual Medicare payments, and total standardized Medicare payments [39]. This file also provides summary characteristics of the beneficiaries receiving care from each provider, including summary measures of beneficiary age, gender, race, presence of chronic conditions, and risk scores [39]. The
Journal of the American College of Radiology Rosenkrantz, Hughes, Duszak n Medicare Claims Data Resources For Radiologist
3
4
Table 1. Characteristics of Medicare claims data resources commonly used by policy-focused radiology health services researchers Physician/Supplier Procedure Summary Master Files
Physician and Other Supplier Public Use Files
Limited Data Set Standard Analytical Files
Carrier Research Identifiable File
SEER-Medicare Linked Database Files Medicare fee-for-service beneficiaries associated with a 5% national sample included in a SEERparticipating cancer registry 93% of patients 65 years of age in SEER registries Registry-linked individual claims level All fields contained within Carrier Research Identifiable File as well as tumor histology, size, stage, and grade, and treatments received
Journal of the American College of Radiology Volume - n Number - n Month 2017
Patient population
Medicare fee-for-service beneficiaries
Medicare fee-for-service beneficiaries
Medicare fee-for-service beneficiaries associated with a 5% national sample
Medicare fee-for-service beneficiaries associated with a 5% national sample
Medicare sample
100%
100%
5%
5%
Data format
Aggregated
Aggregated
Individual claims level
Individual claims level
Data field content
Submitted and paid claims frequency and payments at the CPT/HCPCS service level by specialty group and site of service
Paid claims frequency, charges, and payments at the CPT/ HCPCS service level for all Medicare participating providers
Submitted and paid claims with both CPT/HCPCS and ICD codes; provider identifier; facility of service; beneficiary age, gender, race, and county; date of service since 2010
IRB approval required for research use Data access Ability to handle using Excel
Generally not
Generally not
Possibly
All fields contained within Limited Data Set Standard Analytical Files as well as beneficiary ZIP code and date of birth; beneficiary Social Security number with special permission; date of service for years before 2010 Approval by CMS privacy board and local IRB are both required
External medium No
External medium No
External medium No
External medium No
Years available
1991-2014
Download from CMS website Possibly, but only if using aggregate tables or filtered version of detailed file 2012-2014
1996 to a portion of 2016
1999-2015
Initial data acquisition fee
$250 per calendar year data set
None
$1,700/year or $1,075/quarter
Range from $4,000 for up to 1 million beneficiaries to $20,000 for 5 million to 20 million beneficiaries for a single requested quarter; additional fee of 50% of the first-quarter fee for each additional quarter; fee varies if additional request customization
Services through 2014 for patients diagnosed with cancer through 2013 $110 for each of a basic patientlevel and provider-level summary file for a single cancer site for a single year, along with $85 for each additional Medicare file requested
Approval by local IRB required
Note: CPT ¼ Current Procedural Terminology; DUA ¼ data use agreement; HCPCS ¼ Healthcare Common Procedure Coding System; ICD ¼ International Classification of Diseases; IRB ¼ institutional review board; SEER¼ Surveillance, Epidemiology, and End Results.
other aggregate table summarizes the data by HCPCS codes at the national and state levels. Examples. The detailed level of information available through the Provider and Other Supplier PUF allows granular analyses on the basis of the specific services offered by groups of providers. One study used the file to classify the subspecialty of academic radiologists on the basis of individual radiologists’ specific distribution of billed services, correctly recognizing subspecialties of diagnostic radiologists (eg, thoracic, abdominal, and musculoskeletal imaging) that are not among the three standard CMS provider codes [19]. Another study used the file to assess the impact on radiologists of proposed federal regulations that would use billed claims to determine physicians’ eligibility for special considerations in new payment models [41].
and disparities in care in a manner not possible using PSPS Master Files or the Physician and Other Supplier PUF. Moreover, the encrypted identifier allows far more powerful analyses related to distinct services received by individual beneficiaries, such as identifying beneficiaries who received specific combinations of services or examining downstream events subsequent to a baseline imaging service. One study used LDS to demonstrate that, in a multivariable model, advanced practice clinicians are associated with the ordering of more imaging services than primary care practitioners for similar-complexity patients during office visits [45]. Another used LDS to assess adherence to continuous breast cancer screening, demonstrating a decreased rate of subsequent screening mammography after the release of the revised US Preventive Services Task Force guidelines [46].
Limited Data Set Standard Analytical Files
Carrier Research Identifiable File
Background. CMS Limited Data Sets (LDS) provide information at the individual beneficiary and encounter levels. Beneficiaries are assigned encrypted identifiers, and selected variables are either blanked or ranged to maintain anonymity [28]. Nonetheless, it is possible to reidentify beneficiaries by using technology to link the files to other data sets, and as such the LDS files are deemed to contain protected health information and require appropriate privacy oversight [15]. The LDS of greatest interest to radiology researchers is the Carrier Standard Analytical File (SAF), also referred to as the Medicare Claims File [42]. LDS SAFs are available for specific claim types, including inpatient, outpatient, and physician/supplier Part B claims, with each such file containing all claims associated with a 5% national sample of Medicare beneficiaries [43]. The Physician/ Supplier Part B LDS SAF contains fields included in previously described data sets such as HCPCS codes, place of service, and information regarding submitted and allowed charges and payment [44]. However, the SAFs include a wealth of additional fields, including facility of service; International Classification of Diseases (ICD)-9 or ICD-10 codes; and beneficiary demographic information such as age, gender, race, and county [44]. The SAFs also include the date of service beginning in 2010, as well as the quarter of service for earlier years. Beneficiary ZIP code and date of birth are not provided.
Background. Research Identifiable Files (RIFs) contain information comparable with that in LDS but provide the date of service for years before 2010 as well as additional beneficiary-level protected health information, such as ZIP code, date of birth, and, with special permission, Social Security number [28]. Such information allows RIFs to be linked at the beneficiary level to a variety of non-CMS data sets [28]. Of greatest relevance to radiology is the Carrier (or Physician/Supplier Part B claims) RIF [47], which is comparable with the LDS Carrier SAF, containing HCPCS codes; ICD-9 or ICD-10 codes; reimbursement amounts; patient demographics; provider information; and dates of service. As with the LDS SAFs, only claims associated with a 5% sample of Medicare enrollees are available. But unlike the LDS, researchers may customize the requested cohort on the basis of combinations of various claims elements such as state, diagnosis, provider type, and location [47-49]. Requests for RIF are developed in conjunction with ResDAC and then submitted to ResDAC for further processing [50,51]. The ResDAC website provides numerous downloadable templates for preparing the application [52]. The review process may entail multiple rounds of revision to the initial request. CMS indicates that the process typically takes a minimum of 3 to 5 months [53] and potentially as long as 6 to 8 months [54].
Examples. The inclusion of beneficiary-level information such as age, gender, and race allows assessment of variations
Examples. The RIFs represent the most robust and powerful data set available from CMS. As with the LDS, they allow linking multiple disparate services received by
Journal of the American College of Radiology Rosenkrantz, Hughes, Duszak n Medicare Claims Data Resources For Radiologist
5
individual beneficiaries but go further by allowing crossreferencing with other data sources. One study used the Carrier RIF to demonstrate a higher frequency of downstream CT examinations in patients undergoing ultrasound in the emergency department setting when performed and interpreted by radiologists rather than by nonradiologist providers [55]. An additional study conducted an interrupted time-series analysis from 2005 through 2012 to demonstrate an immediate and significant decline in screening mammography rates after the 2009 revised Preventive Services Task Force guidelines [56]. A further study linked Medicare claims data to 2004 to 2012 Medicare Chronic Conditions Warehouse data to show that tracking of imaging expenditures could identify patients with otherwise undiagnosed chronic conditions [57].
Surveillance, Epidemiology, and End Results–Medicare Linked Database Files Background. The Surveillance, Epidemiology, and End Results (SEER) program of the National Cancer Institute (NCI) uses information from a range of clinical cancer registries to provide statistics and epidemiologic information regarding the US cancer population [58]. Since 1973, SEER has monitored trends in the incidence, demographics, primary treatments, survival, and causes of death for a wide range of cancers. Its participating registries routinely collect information on patients newly diagnosed with cancer in specified geographic regions, which SEER uses to release annual updates. The SEER registries, NCI, and CMS have partnered to establish the collection of files described as the SEERMedicare Linked Database [59]. This database merges SEER information with data regarding all Medicare claims rendered for patients with cancer from the time of Medicare enrollment until death, including services occurring before the time of diagnosis as well as during long-term follow-up after treatment. As such, it provides nearly all information contained in administrative claims data but enriched by comprehensive clinical registry data. The registries provide CMS with identifiers for all individuals in their files, which CMS then matches to its master enrollment file. Currently, the database provides information for 311,503 patients with lung cancer, 260,922 with breast cancer, and 369,026 with prostate cancer [60]. The detailed SEER registry data include tumor histology, size, stage, grade, and treatments received. Medicare incorporates additional encrypted 6
information regarding the physicians and hospitals rendering such services, as well as information associated with a 5% national sample of Medicare patients residing in SEER-monitored areas but without cancer diagnoses, thereby serving as a control cohort [61,62]. Given the possibility of patient reidentification, the SEER-Medicare Linked Database is not a PUF [63]. Rather, investigators must submit requests that are reviewed by SEER and NCI [64,65]. Such requests must seek data for specific cancer site(s) and year(s), as well as for specific Medicare files; the entire database cannot be requested. The review process typically takes 4 to 6 weeks, with an additional 4 to 6 weeks then needed to prepare the data [64].
Examples The NCI indicates that interest in the SEER-Medicare Linked Database has grown “exponentially,” with a continually growing number of requests [66], and indeed, numerous studies have applied the data for radiology health services research. McCarthy et al [67] used the database to demonstrate that prior mammography use could help explain the tendency for a more advanced stage of breast cancer diagnosis for older black than older white women. Farjah et al [68] demonstrated that lung cancer survival improved with an increasing number of modalities (among CT, PET, and invasive procedures) used for mediastinal staging. Kokabi et al [69] demonstrated access to cancer-directed therapies, including transarterial chemoembolization and ablation, to be important contributors to prolonged overall survival for patients with unresectable hepatocellular carcinoma and favorable sociodemographic factors. LIMITATIONS Although CMS claims data sets offer enormous potential for policy-focused radiology health services researchers, these data sets also carry a number of limitations. First, because the data sets contain claims data solely for the Medicare fee-for-service population, no information is available regarding patients receiving insurance coverage through a Medicare Advantage plan, Medicaid, or private insurers. In addition, Medicare does not ensure the complete accuracy of all of the administrative data [70], which are susceptible to errors due to incorrect or incomplete billing. Moreover, to avoid potential deidentification, CMS prohibits displaying a count of 10 or less in any individual cell [39] (eg, for the number of rendered services or the number of patient Journal of the American College of Radiology Volume - n Number - n Month 2017
admissions or discharges, instead blanking such entries in the files). This removal of small counts may affect the overall results if occurring at a sufficient frequency for enough cells within a given analysis. Finally, from a practical standpoint for imaging health services research, the organization of CMS data by HCPCS code, for which there are more than 1,000 such codes not uncommonly performed by radiologists, is insufficient for many desired focused analyses. Rather, families of imaging-related HCPCS codes (eg, related to a given body region or focus area) must be defined to allow meaningful and standardized investigations. Although such definitions have previously been lacking, the recently proposed Neiman Imaging Types of Service classification system for noninvasive diagnostic imaging professional services [71] stands poised to address this gap and to allow more reproducible and targeted research of imaging trends using national claims data.
CONCLUSIONS National data sets regarding the utilization of and spending on health care services are the mainstay of policy-focused health services research. A variety of CMS fee-for-service data sets offer trade-offs between their degree of comprehensiveness and ease of access. This review is intended to familiarize policy-focused imaging health services researchers with several particularly relevant CMS fee-for-service data sets to guide such investigators in most effectively identifying proper data sources to support their research.
-
-
-
-
foundation for a variety of policy-focused imaging health services research studies. The Physician/Supplier Procedure Summary (PSPS) Master Files and Medicare Provider Utilization and Payment Data: Physician and Other Supplier File are both public use files (PUFs) based on 100% samples of claims with aggregate data. These can facilitate a broad variety of high-level descriptive studies on imaging utilization and spending but do not support assessments of downstream outcomes. The Carrier Standard Analytics Files Limited Data Set (SAF LDS) and the Carrier Research Identifiable File (RIF) both contain 5% random samples of actual beneficiary-level claims. They permit some assessments of downstream outcomes related to imaging, but both entail substantially greater costs to obtain and analyze data. The SEER-Medicare Linked Database files merge extensive clinical information from national cancer registries with information from all lifetime Medicare fee-for-service claims for such patients, enabling powerful analyses of the role of imaging in the cancer population, including its impact on downstream outcomes. Understanding the details and trade-offs of various administrative CMS data sets will aid policy-focused imaging health services researchers selecting appropriate data sources for their investigations.
TAKE-HOME POINTS -
CMS offers a number of national administrative fee-for-service data sets containing information regarding billed claims. These can provide a
ADDITIONAL RESOURCES Additional resources can be found online at: http://dx. doi.org/10.1016/j.jacr.2017.04.005.
Journal of the American College of Radiology Rosenkrantz, Hughes, Duszak n Medicare Claims Data Resources For Radiologist
7