P45 Prognostic modeling with the lasso: An empirical evaluation
138s
Abstracts P45 PROGNOSTIC MODELING WITH THE LASSO: AN EMPIRICAL EVALUATION E.W. Steyerberg and M.J.C. Eijkemans Erasmus University Rotterdam, The...
Abstracts P45 PROGNOSTIC MODELING WITH THE LASSO: AN EMPIRICAL EVALUATION E.W. Steyerberg and M.J.C. Eijkemans Erasmus University Rotterdam, The Netherlands Prognostic modeling with regression analysis poses problems with respect to the selection of predictors and the estimation of the regression coefficients. Stepwise selection of predictors is widely used nowadays, despite several disadvantages, such as bias in the coefftcients of the selected characteristics (overestimation) and in the standard errors (underestimation). Estimates of the regression coefficients can be corrected for statistical overestimation with a shrinkage factor. The ‘Lasso’, developed by R. Tibshirani, is a recent method that combines selection and estimation in a statistically attractive way. In this study we compare the predictive performance of the Lasso with other selection and estimation strategies. We used the GUSTO-I data set (data courtesy: GUSTO investigators and Duke University), which consisted of 40,000 patients from over 1000 hospitals. 7% of the patients had died 30 days after myocardial infarction. Logistic regression analysis was applied with 8 dichotomous predictors. For selection, we used a backward stepwise method with various p-values: 0.01, 0.05, 0.15, 0.50, 1.0 (fixed selection). For estimation, we shrunk the coefficients on the basis of a bootstrap procedure (300 replications). Models were developed in one half of the data set (n=20,000). Predictive performance was quantified as the model log-likelihood in the other half. 121 small and 48 large subsamples were formed on the basis of geographic location of the hospitals. Averages of the predictive performance were calculated over these small and large subsamples. We found that the Lasso performed very similar to selection with a p-value of 0.50 and shrinkage of the coefficients. A much poorer performance was seen if predictors were selected with the conventional significance level of 0.05 or if the original regression coefftcients were used for prediction without shrinkage. We conclude that the Lasso is a promising method for prognostic modeling. If stepwise selection is preferred, a much higher p-value than the conventional level of 0.05 should be used with subsequent shrinkage of the regression coefftcients. P46 ASSESSING THE EFFECTS OF TREATMENT IN OBSERVATIONAL STUDIES: ADJUSTING FOR DISEASE SEVERITY Caroline A. Sabin, Amanda Mocroft and Andrew N. Phillips Royal Free HospitalSchool of Medicine London, England Whilst it is usually accepted that treatment efficacy should be evaluated from randomized clinical trials, in some conditions, such as the human immunodeficiency virus (HIV), there are concerns that clinical trials may be impractical and that rapid changes in treatment strategies mean that clinical trial results are out-of-date by the end of the trial. In such circumstances, it has been suggested that treatment effects could be estimated from observational studies. However, assessment of treatment from observational studies is often subject to bias. For example, patients may be given treatment because they are sicker than patients who remain untreated. These biases have been described’ and approaches to obtaining treatment effects which are adjusted for the disease stage of the individual throughout follow-up are suggested. In HIV infection one approach is to include the CD4