Tutorial R
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Ken Yale, DDS, JD
Chapter Outline Introduction Pre-Tutorial Background on Original Data Set Hospital Charge Analysis Initial Data Exploration
795 796 798 800
Is There a Relationship Between Mean Charges and Status (Public or Private)? Relationship of Charges to Location Quality Matters
800 801 809
INTRODUCTION This tutorial takes you through a data set on hospital charges for an inpatient stay for heart failure, gathered from all the hospitals in the State of New York. The State of New York recently released all charge and cost data (the latter are estimated based on a formula) for all hospitals in the State of New York. With enormous changes taking place in healthcare insurance coverage and costs, this information may become very important in the future for consumers as they look for the best value (i.e., highest quality for the lowest charge) in caring for their health. Health care is one of the most perplexing areas for price comparison. Over the years hospitals and physicians have charged a government mandated price, or negotiated prices with health insurance companies, or charged whatever the market would bear for those without health insurance. As a result, hospitals and physicians have learned to shift the cost of products and services around so the lower government mandated prices could be offset by charging more to insurance companies or persons without health insurance. Because of this price shifting it has been almost impossible to determine how much you might pay for a procedure or visit to the hospital or doctor office, and there was little demand for the information because you might never see the charges or costs if your health insurance company handled all those issues. This situation could change in the near future. The Affordable Care Act, among other things, requires almost everyone to have health insurance coverage and be able to pay for the care they receive. Through employers, government benefit programs, and individual health insurance policies, most of the citizens of the United States are required by federal government law to purchase a health insurance policy. The new federal government requirement could make it more important than ever to better understand the amount you are charged and the quality measured for a service, and could increase demand for hospitals that provide higher-value products and services. Here is some background on this growing demand for price comparison. The new, individual health insurance policies created by the Affordable Care Act come in three basic packages. The “Gold” covers richer packages with more benefits and lower payments every time you go to the hospital or see a physician (these payments are known as “copayments” or “copay”). The higher benefits and lower out-of-pocket copay means the plan is more expensive to purchase. Since payment of a monthly “premium” charge can be $800 or more (roughly the monthly cost of an expensive car payment), many people may decide they want to purchase a less expensive “Silver” or “Bronze” plan (if they purchase a Practical Predictive Analytics and Decisioning Systems for Medicine. DOI: http://dx.doi.org/10.1016/B978-0-12-411643-6.00035-1 © 2015 Ken Yale. Published by Elsevier Inc. All rights reserved.
795
796 PART | 2 Practical Step-by-Step Tutorials and Case Studies
health insurance policy at all, as they can decide not to pay for the insurance, but rather get a federal government penalty for not buying health insurance coverage). Silver and Bronze plans can have significantly less expensive upfront monthly premium payments, but in exchange for the less expensive monthly payment the plans have higher copayments and deductibles (the deductible is the annual amount that you have to pay before the insurance kicks in and starts paying anything). A person with health insurance coverage can have significant out-of-pocket costs when attending a hospital or doctor, and these costs can become very high before the deductible is met. As a result, knowing what a hospital or doctor charges for a product or service can be very important, and it is anticipated the demand for such information will grow as more people get health insurance coverage. This tutorial looks at inpatient hospital charges, based on newly released information from the New York State Department of Health (https://health.data.ny.gov/Health/Hospital-Inpatient-Discharges-SPARCS-De-Identified/u4ud-w55t). The data set is 940 MB (almost a gigabyte), which might be difficult to download to your computer. Therefore, we went to the smaller Hospital Inpatient Cost Transparency data set (https://health.data.ny.gov/Health/Hospital-Inpatient-CostTransparency-Beginning-200/7dtz-qxmr), which is still relatively large at 61 MB, more than 380,000 rows of data on about 315 different procedures with variables in 14 columns. We then extracted the data set you see here the charges and associated costs for the moderate “Heart Failure” procedure (also known as DRG 194, moderate severity) compiled for all hospitals in New York State. We could have used any of the 315 different procedures, and chose this because there were a couple of quality metrics associated with that DRG. To this extract we added back zip codes (which were in the original 940-MB data but removed in the 61-MB version), the ownership status of the individual hospitals (whether Publicly owned or Privately owned), their academic status (also known as “academic medical center,” but here we use the simpler Training or Non-training hospital designation), and two proxies for quality the AMI and HF process measures. The following is additional background information from the State of New York Department of Health: Hospital Inpatient Cost Transparency: Beginning 2009 This data set contains information submitted by New York State Article 28 Hospitals as part of the New York Statewide Planning and Research Cooperative (SPARCS) and Institutional Cost Report (ICR) data submissions. The data set contains information on the volume of discharges, All Payer Refined Diagnosis Related Group (APR-DRG), the severity of illness level (SOI), medical or surgical classification, the median charge, median cost, average charge and average cost per discharge. When interpreting New York’s data, it is important to keep in mind that variations in cost may be attributed to many factors. Some of these include overall volume, teaching hospital status, facility specific attributes, geographic region, and quality of care provided. For more information, check out: http://www.health.ny.gov/statistics/sparcs/
PRE-TUTORIAL BACKGROUND ON ORIGINAL DATA SET The original Cost Transparency Data set had reported charges and costs for all procedures for the 3 years 2009, 2010, and 2011. Figure R.1 shows a screenshot of the original 61-MB data set. Our initial view of the data used the Data tab, Filter function to determine if there were missing data (blanks) (Figure R.2). In addition, the Filter allows you to order the data alphabetically. The initial review using the Filter showed very few blanks in the data, and sorting the hospital names alphabetically identified some hospitals that only reported for one or two years out of the three. Upon further investigation it turned out that those hospitals had closed, or did not do certain procedures, or did not do the procedure every year. In addition, the number of different procedures in the data set was very large, making it challenging to run a tutorial. As a result, we decided to extract the Heart Failure procedure (DRG 194) and, within DRG 194, the Moderate cases (Severity 2), which seemed to have data for most hospitals in the data set as well as a quality metric. We then removed the hospitals that had fewer than 3 years’ data, including: G G G G G G G
Peninsula Hospital Center closed Women and Children’s Hospital of Buffalo no 2010 data TLC Health Network Tri-County Memorial Hospital only 2009 data SVCMC-St Vincents Manhattan only 2009, 2010 data Sunnyview Hospital and Rehabilitation Center only 2009 data North General Hospital only 2009 data Long Island Jewish Schneider Children’s Hospital Division too few procedures (one in 2009, three in 2010, four in 2011).
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
797
FIGURE R.1 View of the original Hospital Inpatient Cost Transparency dataset.
FIGURE R.2 Using the Filter function in Statistica.
Finally, we added information on training hospitals (also known as Academic Medical Centers) because these institutions generally advertise their university affiliation, giving the appearances of higher quality care (although the evidence for this is not settled), and it would be interesting to see if they cost more or less than other hospitals, and if they in fact have higher quality care. The information for training hospitals came from www.healthguideusa.org/teaching_hospitals_new_york.htm, and the Association of American Medical Colleges at www.aamc.org/members/coth/. We then added information on whether the hospital was government owned (Public) or privately owned (Private) by obtaining information from the New York City Department of Health & Human Services (http://home.nyc.gov/portal/site/ nycgov/menuitem.12383c1cbb72dee6a62fa24601c789a0/), and the federal government Centers for Medicare and
798 PART | 2 Practical Step-by-Step Tutorials and Case Studies
Medicaid Services (CMS) Medicare Compare hospital database (https://data.medicare.gov/Hospital-Compare/HospitalGeneral-Information/v287-28n3?category 5 Hospital-Compare&view_name 5 Hospital-General-Information). These data would let us know the difference in charges between Public and Private hospitals in the State of New York. The Private hospitals are all non-profit, as the New York State legislature outlawed for-profit hospitals.
HOSPITAL CHARGE ANALYSIS We first wanted to get familiar with the data, and see if there are simple relationships between the different variables associated with hospital charges. Once you open STATISTICA, import the data set labeled Hospital Inpatient Cost Transparency Tutorial by going to the Home tab, Open Icon, Open Document tab. Double click the file and it opens in the STATISTICA desktop (Figure R.3). You can click “Import all sheets to a Workbook” (Figure R.4); in this case there is only one data sheet in the file (Figure R.4). The data set contains the following variables (Figure R.5). Facility ID: The identification number given to the facility by the State of New York a unique code identifying a facility location certified to provide healthcare services under Article 28 of the Public Health Law. This number was assigned upon receiving a Certificate of Operation, and is also known as the Permanent Facility Identifier (PFI). There are 194 unique facilities in this data set. Facility Name: This is the name of the facility where services were performed based on the Permanent Facility Identifier (PFI), as maintained by the New York State Department of Health Division of Health Facility Planning. Note that this field contains the Facility Name current to the update date of this record. It is not specific to the discharge year, and the names change as hospitals change ownership. Zip Code: This is the zip code of the hospital as recorded by the New York State Department of Health (http://hospitals. nyhealth.gov/browse_search.php?form 5 ALL). Status: Hospitals are designated Public from the lists maintained by the New York City Department of Health & Human Services, and the list maintained by the CMS Medicare Compare website. AMC: This stands for “Academic Medical Center,” and in this data set it is used to indicate whether the hospital is a training hospital, employing a significant number of physician “residents” in training and associated medical school faculties. The “N” indicates the hospital is not a training facility.
FIGURE R.3 The Open icon allows you to find and open the dataset.
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
799
FIGURE R.4 Importing the dataset to a Workbook.
FIGURE R.5 View of the opened Hospital Inpatient Cost Transparency dataset file modified for this tutorial.
APR DRG: Diagnosis Related Groups (DRGs) are a patient classification system which associates the type of patients a hospital treats (i.e., its case mix) to the costs incurred by the hospital. There are currently three major versions of the DRG in use: basic DRGs, All Patient DRGs, and All Patient Refined DRGs. The basic DRGs are used by the Centers for Medicare and Medicaid Services (CMS) for hospital payment for Medicare beneficiaries. The All Patient DRGs (AP-DRGs) are an expansion of the basic DRGs to be more representative of non-Medicare populations, such as pediatric patients. The All Patient Refined DRGs (APR-DRG) incorporate severity of illness subclasses into the AP-DRGs
800 PART | 2 Practical Step-by-Step Tutorials and Case Studies
APR DRG Description: This is the name given to the particular APR DRG code. For example, DRG 194 is the code for Heart Failure. APR Severity of Illness Code: The APR-DRGs have a severity of illness sub-code classification. According to the 3M, the organization that created the APR-DRGs, severity of illness is the “extent of physiologic decompensation or organ system loss of function.” It is one of several factors used to define the complexity of a particular patient case. There are four severity of illness subclasses, numbered 1 to 4, indicating, respectively, minor, moderate, major, or extreme severity of illness. APR Medical Surgical Code: Procedures are divided into either medical or surgical. The difference is that surgical procedures may require more intense resources, such as an operating room, anesthesia, or recovery room. The distinction between medical and surgical also helps define the clinical specialty involved. In this data set “M” designates a medical procedure. Year: This is the calendar year in which the procedures were performed for which the Charges, Costs, and other information were reported. Discharges: This is the number of patients discharged from the hospital for that particular APR-DRG in the year indicated. Charge: This is the amount charged by the facility for the specific procedure. Cost: Costs are estimates, based on facility data reported to the State of New York Statewide Planning and Research Cooperative. Costs are estimated for each procedure as a Ratio of Cost to Charges (RCC). The RCC are specific to the procedure, the facility, and the severity of the illness of the patient. RCCs are calculated and reported by the individual facility, and they are certified and may be audited. An example is provided by the State of New York: “For example, if hospital charge is $20,000 and the RCC is 50%, the estimated cost is $10,000.” AMI: The Acute Myocardial Infarction quality measure is a composite measure of the overall processes of care used to treat heart attack patients (for more information, see the National Quality Measures Clearinghouse AMI web page at www.qualitymeasures.ahrq.gov/content.aspx?id 5 35572). In this data set, the number is out of a possible 100. HF: The Heart Failure quality measure is a composite measure of the overall processes of care used to treat heart failure patients (for more information see the National Quality Measures Clearinghouse HF web page at www. qualitymeasures.ahrq.gov/content.aspx?id 5 35573). In this data set, the number is out of a possible 100. A note on the hospital performance measures AMI and HF. The measures included in the file are “screen scraped” from the quality tab of the New York State Department of Health hospital profile (see an example at http://hospitals. nyhealth.gov/browse_view.php?id 5 208&p 5 quality&hpntoken 5 1). In this data set are the composite measures for heart attack (AMI) and heart failure (HF), which we use as a proxy for quality. These are process measures, however, and not adjusted for the complications or difficulties of individual patients (also known as “risk adjustment”). For example, you can have an excellent process but a bad outcome. There are many other outcomes and quality measures that give a better indication of the quality of care. In addition, the data were based on content on the website in 2012, and the underlying data came from CMS Hospital Compare data submissions, which could have been measured some time before they appeared on the CMS website. More information may be found at http://hospitals.nyhealth.gov/faq.php.
Initial Data Exploration To get a feel for the data, and perhaps discover some initial information, we are going to see how out-of-pocket charges vary with the different variables: location (Zip Code), ownership (Private or Government), academic status (T for training institution and N for not a training institution), and proxies for quality (AMI and HF). As a result, our dependent variable is Charges and the other variables are the predictor variables. We shall first explore the data graphically, looking for interesting relationships between variables.
Is There a Relationship Between Mean Charges and Status (Public or Private)? We shall first create a means plot by going to the Graphs tab and clicking the Means button (Figure R.6). Click on the Variables button; select Mean Charge as the continuous dependent variable, and Status as the grouping variable, then click OK and OK (Figure R.7). It appears that Public hospitals, on average, charge slightly more than Private hospitals, with a very wide range of charges among Public hospitals and a much narrower range of charges among Private hospitals (Figure R.8). The wide range of charges looks unusual one of those “That’s odd . . .” moments and may require further exploration. Let’s continue and see what else we find.
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
801
FIGURE R.6 Locating the Means Plot button in the Graphs tab.
FIGURE R.7 Selecting variables for the means plot.
Relationship of Charges to Location A Scatterplot graph can help us look at the relationship between Mean Charges and Zip Code. Since the lower numbered zip codes in New York State happen to be in more populated eastern parts of the state, and higher number zip codes tend to be in more rural and western areas, it will be interesting to see if the charges follow a similar pattern. Go to the Graphs tab, Scatterplot button, and select Zip Code as the X variable and Mean Charges as the Y variable. Then click OK and OK (Figure R.9).
802 PART | 2 Practical Step-by-Step Tutorials and Case Studies
FIGURE R.8 Initial Private hospitals and Public hospitals means plot comparison.
FIGURE R.9 Choosing variables to set up a scatterplot graph.
In the graph in Figure R.10 there appears to be a correlation between Zip Code and Mean Charges, but another interesting note is the outlier that appears around the $121,000 charge. This should be explored further, as it may have also been the cause for the wide range of charges identified for Public Hospitals. One way to check for outliers is to use a Box Plot. Go to the Graphs tab, Histogram button, and choose Mean Charge for the Variable (Figure R.11). And we can see in the graph, all charges are below $60,000, bar one charge above $120,000 (Figure R.12).
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
FIGURE R.10 Scatterplot showing relationship between mean charges and zip code
803
note the outlier.
FIGURE R.11 Selecting a variable for a box plot histogram.
We find the outlier, and determine it is an abnormally high charge and the hospital has very few of these procedures. Few procedures could mean they do not normally handle these kinds of cases, or it is a mistake. Other reasons could be that, as a Public hospital, they have political pressures that create a unique situation. All of these are guesses, and the decision is to remove the outlier, as it skews the data. We remove all three rows (2009, 2010, and 2011 data) of the hospital (Figure R.13).
804 PART | 2 Practical Step-by-Step Tutorials and Case Studies
FIGURE R.12 Boxplot histogram showing distribution of variables and revealing an outlier.
FIGURE R.13 Outliers located for removal from dataset.
With the outlier removed, we go back to the Histogram of Mean Charges and see a more even distribution of charges. It is interesting to note the charges are still skewed to the right (higher charges) but without the extreme outlier (Figure R.14). We also run the Scatterplot again to view the relationship between Zip Code and Mean Charges, and see a more pronounced slope (Figure R.15). With the outlier removed, we can go back to the Means analysis of the relationship between Mean Charges and Public or Private status by opening the previous analysis. We see lower overall charges by Public hospitals, but still a wide range of charges (Figure R.16).
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
805
FIGURE R.14 Boxplot histogram showing distribution of variables after outlier removed.
FIGURE R.15 New scatterplot after outlier removed showing relationship between mean charges and zip code.
We also want to see if there is a relationship between Mean Charges and Teaching status. To get a quick graphical view, we go to the Graph tab, Means button, and use Mean Charge as the continuous dependent variable and Teaching status as the grouping variable (Figure R.17). Teaching hospitals appear to have higher charges (Figure R.18). Since we’re looking at the effect of Private versus Public hospitals, and Teaching versus Non-teaching hospitals, on Mean Charges, it might be important to explore the data some more and find out the proportions of Private to Public hospitals, and Teaching to Non-teaching hospitals. To view this, you go to the Graphs tab, Histogram button and use Status as the Variable to see how Private versus Public hospitals compare (Figure R.19).
806 PART | 2 Practical Step-by-Step Tutorials and Case Studies
FIGURE R.16 New Private hospitals and Public hospitals means plot comparison after outlier removed.
FIGURE R.17 Setting up means plot, choosing variables.
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
807
FIGURE R.18 Relationship between Mean Charges and teaching status.
FIGURE R.19 Setting up box plot histogram, choosing variable Status (Private or Public).
Private hospitals outnumber public by more than five to one (Figure R.20). To view Teaching versus Non-teaching hospitals, go to the Graph tab, Histogram button, and select AMC for the variable (Figure R.21). The graph shows that about 33% of hospitals are labeled as Teaching (Figure R.22).
808 PART | 2 Practical Step-by-Step Tutorials and Case Studies
FIGURE R.20 Private hospitals compared to Public hospitals
numbers of hospitals in the state.
FIGURE R.21 Setting up box plot histogram, choosing variable AMC (teaching or non-teaching).
From our initial review of the data we have found Teaching hospitals charge more than Non-teaching hospitals; Public hospitals charge less than Private hospitals; and hospitals in and closer to New York City charge more than hospitals further away from New York City. Further analysis needs to be done to determine the relationship with quality of care, to assess whether the higher charging hospitals give higher quality care. These analyses shall be performed next.
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
FIGURE R.22 Teaching hospitals compared to non-teaching hospitals
809
numbers of hospitals in the state.
QUALITY MATTERS Now that we have a feel for the data, and see there are some correlations between the variables, we turn to the matter of quality. Charges might be higher because of more use of resources and better resulting quality. If this is the case, you might “get what you pay for” and may wish to pay for the better quality. On the other hand, if quality is not related to charges and we wish to go to the highest quality hospital at the lowest cost (the current definition of “value” in the healthcare ecosystem), then we would like to know how best to predict which hospital has the best value. This is where predictive analytics can assist. Here, we use a limited number of variables to simplify the example. To get a better understanding of the relationship between quality and cost requires many more quality measures, and is beyond the scope of this tutorial. The metrics of quality in this tutorial data set include HF and AMI measures, which are required by CMS to be reported. These are surrogates rather than comprehensive measures of quality, as they are process (instead of outcomes) metrics and only use a small set of parameters. As CMS describes, HF “estimates a hospital-level risk-standardized mortality rate (RSMR), defined as death from any cause within 30 days after the index admission date, for patients discharged from the hospital with a principal diagnosis of heart failure (HF)” (www.qualitymeasures.ahrq.gov/content. aspx?id 5 35573). The AMI measure is even further removed from the Heart Failure DRG, the subject of our inquiry. CMS states that AMI “estimates a hospital-level risk-standardized mortality rate (RSMS), defined as death from any cause within 30 days after the index admission date, for patients discharged from the hospital with a principal diagnosis of acute myocardial infarction (AMI)” (www.qualitymeasures.ahrq.gov/content.aspx?id 5 35572). Nevertheless, the same hospital services treat both acute myocardial infarction and heart failure diagnosis groups thus our inclusion in this tutorial as surrogates for quality. There are many other measures of quality that can be used, and the reader can see these measures at the National Quality Measures Clearinghouse (www.qualitymeasures.ahrq.gov/index.aspx). Additional measures used by the State of New York Department of Health can also be accessed at their website (http:// hospitals.nyhealth.gov/technotes.php). The example we are going to run in this tutorial is a prediction of quality of care, as measured by the HF metric as the target variable. The predictor variables include Zip Code, Status (Private or Public), Discharges, and Mean Charge. All the variables listed, except Discharges, have been discussed previously, and some relationship seems to exist among them. The number of Discharges is included because the hypothesis is, the more procedures performed (here Discharges is a surrogate for the number of procedures) the better you get, and therefore the higher quality of care.
810 PART | 2 Practical Step-by-Step Tutorials and Case Studies
FIGURE R.23 Locating Sort Cases functionality in Statistica.
FIGURE R.24 Sorting by year to isolate observations (rows) with missing HF values.
We first look at the HF target variable, and see there are numerous missing numbers. This is because the measures were obtained only for 2011. To run the analysis we need to change or remove the missing variables. For the years 2009 and 2010 we are going to remove the cases, all of which have HF designated as UK (for unknown). First we go to the top of the Year variable, right click, and click on Sort Cases (Figure R.23). The Sort Options dialog allows us to sort only by Year (Figure R.24). When we click OK, we include the other variables by default (Figure R.25).
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
811
FIGURE R.25 Sorting by year allows you to isolate rows (cases) from 2009 and 2010, which are missing values.
FIGURE R.26 Highlighting the 2009 and 2010 cases, starting from the top.
We then highlight the cases for 2009 (Figure R.26). Scroll down to include all cases for 2009 and 2010, using the cursor to highlight and the Shift key to include all cases, and in the Edit tab click on the Delete button and Cases (Figure R.27). You then see the dialog box that confirms the cases you wish to delete (Figure R.28).
812 PART | 2 Practical Step-by-Step Tutorials and Case Studies
FIGURE R.27 All 2009 and 2010 rows (cases) highlighted, locating Delete button in Edit tab of Statistica.
FIGURE R.28 Affirming cases to delete.
What is left is all the cases for 2011. Here, we are going to take all the missing numbers, designated by either 0 or NA for “Not Available,” and substitute 94.5, which is the mean of the existing 169 values. We could also use the Mean imputation function, but since there are only 25 of 194 missing values we are going to simply change them to 94.5. To do this we highlight the column we wish to change, go to the Edit tab, Replace button, and either Replace All in the column or Find and Replace each cell (Figure R.29).
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
813
FIGURE R.29 Replacing NA with a mean value, all of the above steps show the work it takes to clean up data and steps necessary to get it right.
FIGURE R.30 Locating Data Miner Recipes in Statistica.
Next, we use the Data Miner Recipe application in the STATISTICA software. This will give us an initial feel for the data and possible predictive analytics we can use. First, go to the Data Mining tab and click the Data Miner Recipes button (Figure R.30). Click the New button and the Data Miner Recipes application opens, and the Open/Connect data file button allows us to search for the appropriate data set (Figure R.31).
814 PART | 2 Practical Step-by-Step Tutorials and Case Studies
FIGURE R.31 Selecting the source of data to run in Data Miner Recipe.
FIGURE R.32 Opening the specific dataset.
Here, we open the file to reveal the STATISTICA data set we plan to use (Figure R.32). Clicking the OK button connects the data set to the software application. Then, clicking the Select Variables button opens the dialog box where we select our Target variable (HF) and Input variables (Zip Code, Discharges, Mean Charges, Status, and AMC). Individual variables can be selected by left clicking while you hold down the Ctrl button (Figure R.33).
Divining Healthcare Charges for Optimal Health Benefits Under the Affordable Care Act Tutorial | R
FIGURE R.33 Selecting the predictor (Input) and outcome (Target) variables.
FIGURE R.34 Locating Run to Completion in the Next Step button.
815
816 PART | 2 Practical Step-by-Step Tutorials and Case Studies
FIGURE R.35 Data Miner Recipe runs multiple models, here showing Boosted Trees giving the highest correlation coefficient.
After we click OK, we go to the Next step button drop-down arrow, find the Run to Completion button, and click it (Figure R.34). Once the software finishes we see the results. The Boosted trees shows the greatest predictability, while the other three default models show less of an ability to predict results (Figure R.35).