Can we live without P values? The answer

Can we live without P values? The answer

Accepted Manuscript Can We Live without p-Values? The Answer Eugene H. Blackstone, MD PII: S0022-5223(17)31947-5 DOI: 10.1016/j.jtcvs.2017.09.030 ...

246KB Sizes 1 Downloads 43 Views

Accepted Manuscript Can We Live without p-Values? The Answer Eugene H. Blackstone, MD PII:

S0022-5223(17)31947-5

DOI:

10.1016/j.jtcvs.2017.09.030

Reference:

YMTC 11981

To appear in:

The Journal of Thoracic and Cardiovascular Surgery

Received Date: 7 September 2017 Accepted Date: 8 September 2017

Please cite this article as: Blackstone EH, Can We Live without p-Values? The Answer, The Journal of Thoracic and Cardiovascular Surgery (2017), doi: 10.1016/j.jtcvs.2017.09.030. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

1

Can We Live without p-Values? The Answer

2

Eugene H. Blackstone, MD

3

RI PT

4

Heart and Vascular Institute, Department of Thoracic and Cardiovascular Surgery

6

Research Institute, Department of Quantitative Health Sciences

7

Cleveland Clinic

8

Cleveland, Ohio

9

USA

11

M AN U

10

SC

5

Word count: 497

13

Author disclosures: Author has nothing to disclose with regard to commercial support.

14

Funding: Cleveland Clinic

TE D

12

15 16

24 25 26 27 28 29 30 31 32

AC C

18 19 20 21 22 23

EP

17

Address for correspondence Eugene H. Blackstone, MD Department of Thoracic and Cardiovascular Surgery Cleveland Clinic 9500 Euclid Avenue / Desk JJ-4 Cleveland, OH 44195 Phone: 216-444-6712 Fax: 216-445-0444 Email: [email protected]

C:\pdfconversion\WORK\authordoc\Can we live without p values.docx

9/15/2017 11:59 AM

ACCEPTED MANUSCRIPT 2

Central Message

34

Methods borrowed from machine learning can be used to identify important variables in

35

ordinary multivariable analyses without use of p-values.

36

Characters + spaces: 142

RI PT

33

37

Central Figure

42 43

EP

41

Eugene H. Blackstone, MD

AC C

40

TE D

M AN U

39

SC

38

ACCEPTED MANUSCRIPT 3

When members of the American Statistical Association (ASA) Board decided to develop

45

a policy statement on p-values and statistical significance, they did so knowing they had

46

not previously taken positions on specific matters of statistical practice.1 They

47

recognized that “misunderstanding or misuse of statistical inference” had reached a

48

point that a policy statement was necessary: “In view of the prevalent misuses of and

49

misconceptions concerning p-values, some statisticians prefer to supplement or even

50

replace p-values with other approaches.” The common use of p-values to identify risk

51

factors in multivariable models, which Naftel2 called more of an art than science, sprang

52

to mind. Could one really “replace p-values” and actually better answer research

53

questions? I challenged Hemant Ishwaran, PhD, one of our Deputy Statistical Editors, to

54

consider how one might identify “significant” risk factors in commonly used multivariable

55

models without using p-values.

SC

M AN U

Applying methods borrowed from machine learning, he and University of Miami

TE D

56

RI PT

44

graduate student Min Lu developed a novel method to accomplish this.3 They introduce

58

two measures of “variable importance” as a substitute for p-values and illustrate them

59

by reanalyzing heart failure data using well-known Cox proportional hazards regression.

60

EP

57

The method is more than a substitute for p-values, however. Built in is bootstrap resampling, whereby new datasets are formed and analyzed by sampling the original

62

one with replacement. This means that some observations are repeated while others

63

are left out—about a third on average—allowing statistics to be generated based on

64

how well results are predicted for the left-out patients. “Important” variables are those

65

that improve this prediction. I cannot overemphasize the huge advantage this provides.

66

We often chide authors who claim to have “predictive models,” but provide no internal

AC C

61

ACCEPTED MANUSCRIPT 4 67

(let alone external) validation to be able to use the word “predictor.” The Lu/Ishwaran

68

method provides 1,000s of cross-validations as a byproduct. Another advantage is less sensitivity to sample size than p-values. This is

70

important for analyses of genomic data and large national datasets in which extremely

71

tiny p-values may be produced for nearly every variable, and it becomes unclear what

72

the important features are. Further, their method is a gateway into a host of other

73

machine-learning methods, such as efficient ways to manage large numbers of

74

variables and ways to illuminate the shape of relationships of continuous variables to

75

outcomes without model assumptions.

M AN U

SC

RI PT

69

There are limitations, however. The method is computationally intensive, but

76

those who already routinely borrow machine-learning methods to generate multivariable

78

models are used to this.4 It does not provide statistics with which we are familiar.

79

Fortunately, Efron and Hastie from Stanford have recently published an accessible book

80

intended to bridge statistics of the past, including p-values, and those of the computer

81

age.5

TE D

77

Can we live without p-values? Perhaps not for research (such as clinical trials)

83

well suited to a method modeled on English common law (innocent until proven guilty

84

beyond reasonable doubt). But for variable selection using common multivariable

85

models, it may well be possible to live, and live well, without p-values!

AC C

86

EP

82

ACCEPTED MANUSCRIPT 5

References

87 88

90 91

1. Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. Am Statistician. 2016;70:129–133.

RI PT

89

2. Naftel DC. Do different investigators sometimes produce different multivariable

equations from the same data? (Letter). J Thorac Cardiovasc Surg. 1994;107:1528-

93

9.

98 99

M AN U

97

4. Rajeswaran J, Blackstone EH. Identifying risk factors: challenges of separating signal from noise. J Thorac Cardiovasc Surg. 2017;153:1136-8. 5. Efron B, Hastie T. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. New York: Cambridge University Press, 2016.

TE D

96

Thorac Cardiovasc Surg. 2017 (in press).

EP

95

3. Lu M, Ishwaran H. A prediction-based alternative to p-values in regression models. J

AC C

94

SC

92

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT