Pragmatic statistical issues in biological research: Introduction to special series

Pragmatic statistical issues in biological research: Introduction to special series

Accepted Manuscript Pragmatic statistical issues in biological research: Introduction to special series Robert S. Danziger, Michael L. Berbaum PII: D...

374KB Sizes 0 Downloads 78 Views

Accepted Manuscript Pragmatic statistical issues in biological research: Introduction to special series

Robert S. Danziger, Michael L. Berbaum PII: DOI: Reference:

S0022-2828(18)30084-1 doi:10.1016/j.yjmcc.2018.03.013 YJMCC 8704

To appear in:

Journal of Molecular and Cellular Cardiology

Received date: Accepted date:

20 February 2018 17 March 2018

Please cite this article as: Robert S. Danziger, Michael L. Berbaum , Pragmatic statistical issues in biological research: Introduction to special series. The address for the corresponding author was captured as affiliation for all authors. Please check if appropriate. Yjmcc(2018), doi:10.1016/j.yjmcc.2018.03.013

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Pragmatic statistical issues in biological research: Introduction to special series Robert S Danziger1,* [email protected], MD, MBA, Michael L Berbaum2, PhD 1Departments

of Medicine, Physiology and Biophysics, and Pharmacology,

IP

*Corresponding

of Health Research and Policy, University of Illinois at Chicago‘ author at: Department of Medicine, University of Illinois at Chicago, 840

CR

2Institute

T

University of Illinois at Chicago

US

S Wood St, Chicago, IL 60612

AN

Abstract: This is an introduction to a special series so an abstract may not be required (but some verbiage is needed in this space to submit!!) Keywords: statistics; reproducibility; quantitative analysis; comparisons; power analyses

M

Statistical methods address “ways of dealing with the collection, analysis, interpretation, presentation and organization of data may provide one of the robust ways

ED

of addressing reproducibility.” Within the biosciences, we perceive that among all this

PT

activity, there has been limited attention to the frankly statistical aspects of improving reproducibility. In a survey by Nature [1], nearly 90% of responding scientists reported

CE

that reproducibility would be improved by “more robust experimental design and better

AC

statistics.” The purpose of this Series, which will be published over the next 1-2 years, is to address some of the key elements believed to be contributing to a lack of reproducibility in biomedical sciences and, to specifically focus on statistical questions relevant to the types of studies published the cardiology and cardiovascular research. Topics include: 1.When a single animal provides a series of measurements, how many animals must be tested in treatment and comparison groups?

ACCEPTED MANUSCRIPT

Typical Question: We are doing a very difficult and expensive experiment. Each animal we study requires several weeks to prepare. We can make multiple measurements on each animal. How do we determine how many animals should be studied and the

T

number of measurements to be made on each one? How do we allow for retention, or

CR

IP

its mirror image, dropout, and even death?

2. When is exclusion of wild or outlying data points permissible and what

AN

US

statistical precautions avoid misleading results?

Typical Question: We are studying the effect of some drugs on cardiac function in an

M

animal model. The effect is very different in some animals. How do we best handle this

ED

statistically?

PT

3. Appropriate use of statistics in hypothesis testing (Null hypothesis testing):

CE

One reads apparently strong criticisms of the way statistical testing is conducted and

AC

suggestions for revised procedures.

Typical Question: What are the underlying issues and do these “reforms” have merit? Should I conduct testing (inference)?

ACCEPTED MANUSCRIPT 4. Straw-man hypothesis testing: superiority versus equivalence – low power so never detect a difference. Nowadays we may want to test for more than differences, e.g., superiority (inferiority) or an order (monotone, umbrella).

T

Typical Questions: How are these tests to be conducted statistically and what are

IP

the pros and cons? We have a new intervention (drug) and want to compare it with an

AN

US

CR

existing care regime or drug. How should we conduct such a test?

M

5. What is learned from extending or repeating the same experiment? When is

ED

extending a study legitimate? How can one learn from repeated experiments?

PT

Typical Question: If I am testing for superiority (1-sided) and don’t reach significance,

AC

studies?

CE

can I just infer that there is a difference? Can I combine the results with those of other

6. Multiplicity. How many tests can a single design support? What are the limits on conducting multiple analyses and tests?

Typical Question: Sometimes there are a variety of ways to make comparisons on the same data (parameter) – indeed, there are more comparisons conceptually than the

ACCEPTED MANUSCRIPT data can distinguish (identify). This is termed multiplicity and there are a number of solutions – multiple comparison techniques. How do I recognize the situation and pick a good technique?

T

7. Multiple outcomes from the same subjects (or animals). Often a number of

IP

aspects of a biological process are measured and all these are potentially interacting

CR

responses from the same subjects or animals. How should we approach such data

US

statistically (and practically)?

AN

Typical Question: We are trying to determine the effects of a drug on cardiac function and toxicity. We have looked at 26 different parameters, e.g., ejection fraction, diastolic

ED

drug. How do we interpret this?

M

function, blood pressure, etc. We find that two of these parameters are affected by the

CE

PT

8. When can/should missing data be filled in and what methods are acceptable?

Typical Question: We are correlating cardiac function with molecular changes in heart

AC

tissue. However, for some animals we have measurements only of function since the assay on the tissue did not work and/or the animal died before the time at which the heart was to be harvested for study. Alternatively, in some animals we were able to do the assay on the heart tissue and do not have the functional data.

ACCEPTED MANUSCRIPT 9. Heterogeneity among subjects. Randomization makes groups of subjects (animals) equivalent (between groups). But the subjects still differ among themselves within groups. What are worthwhile ways to reduce the impact of such individual

T

differences on results?

IP

Typical Question: We are determining the effect of aortic banding on blood pressure. In

CR

each rat baseline blood pressure is a little different. Do these initial differences matter

US

and how best can we handle these?

AN

10. When we actually do the experiments, we find that the requisite “n” to show statistical significance is greater or less than that predicted by our power analysis. How

M

do we interpret this? Was the SD of the response different from that anticipated? Was

ED

the distribution of response different than assumed?

PT

Typical Question: I have performed an experiment ten times and detect near

CE

significance (just above P < 0.05, e.g., P = 0.07). Should I keep repeating it? What can

AC

I learn from the data?

11. Parametric versus non-parameteric analyses. There are two kinds of statistical tests, parametric and non-parametric. How should we decide which ones to use? What are the trade offs? Typical Question: I am measuring the BMI of individuals and trying to determine the relationship between exercise on BMI and calories consumed. What test should I use?

ACCEPTED MANUSCRIPT

12. What are principles and best practices for managing research data?

Typical Question: We use Excel spreadsheets for our data entry and record keeping

IP

T

and to make plots. What could go wrong? What are alternative and better platforms for

CR

saving and analyzing data?

US

13. Oversight and review of data: When there are doubts about the origin of a dataset, i.e., was it “faked?”, what tools can help decide the matter rigorously (non-

AN

subjectvely)?

M

Typical Questions: My post-doc has just given me a spreadsheet with 526

ED

protein measurements. I really am questioning whether he actually measured all of these or made some up. The graduate student has generated some data points that

PT

just look “too good to be true.” How can I check if they are fabricated? I wonder if a

CE

dataset I was given is filtered and outliers removed?

AC

14. Studying dose-response relationships. A change is introduced into a biological system. We would like to determine how much change results in certain features of the system that are of interest. In the case of a drug, this is a dose-response relationship. There are a number of issues to consider:

Typical Questions: How many different doses should be considered? How should we determine the time window for observing the response? Should we observe the

ACCEPTED MANUSCRIPT response repeatedly and how should we summarize the time course of the response? Are there multiple kinds of response, perhaps occurring at different time lapses and with different durations?

T

15. Sometimes it seems useful to categorize a continuous response into discrete

IP

categories: present versus absent or low, medium, high. What statistical principles

CR

should be followed in this process and what tools are available to carry out this work.

US

Typical Question: We are measuring blood pressure to determine if there is a

AN

relationship between renal function as measured by creatinine and blood pressure. How should we analyze creatinine measurements and blood pressures, i.e., as low, normal,

ED

M

or high versus as continuous variables?

16. When results cannot be replicated either materials and techniques are at

PT

variance (lack of reproducibility), or there are unknown factors at play. Either way, the

CE

investigation will yield worthwhile discoveries.

AC

Truly irreproducible results may provide significant insights. “Replication can increase certainty when findings are reproduced and promote innovation when they are not.” If 1) there is adequate reporting of all known variables and methodological details believed to be relevant and 2) robust and appropriate statistics have been used, then greater insights may be discovered if results are not reproducible [4]. Prominent among these is that there are unappreciated factors or contributory components to the results.

ACCEPTED MANUSCRIPT For example, normally temperatures at which rats are housed has not been reported in the past since it was not believed to be a significant variable. However, recently is has been found that it is [5]. Thus, lack of reproducibility of results in mice, especially in cancer models, may be, at least in part, due to lack of control of this variable. In general,

T

the larger the sample size that is required to demonstrate a significant difference, the

IP

greater the number of unappreciated variables. Similarly, ‘passenger mutations’ were

CR

discovered to confound results [6] and non-uniform antibodies [7]. But it is only with rigorous reporting and statistics, that truly irreproducible results may lead to these

US

insights.

AN

The goal of this series is to address commonly encountered statistical issues in normal contexts in which they may arise. Both theoretical background and websites which can

M

do the analyses discussed will be included in each article. Since this is a series, we

ED

encourage readers to recommend topics of interest or current questions in their

PT

laboratories.

CE

References

4.

AC

[1] M. Baker, 1,500 scientists lift the lid on reproducibility, Nature 533(7604) (2016) 452-

[2] N.A. Vasilevsky, M.H. Brush, H. Paddock, L. Ponting, S.J. Tripathy, G.M. Larocca, M.A. Haendel, On the reproducibility of science: unique identification of research resources in the biomedical literature, PeerJ 1 (2013) e148.

ACCEPTED MANUSCRIPT [3] C. Kilkenny, N. Parsons, E. Kadyszewski, M.F. Festing, I.C. Cuthill, D. Fry, J. Hutton, D.G. Altman, Survey of the quality of experimental design, statistical analysis and reporting of research using animals, PLoS One 4(11) (2009) e7824. [4] A. Ward, T.O. Baldwin, P.B. Antin, Research data: Silver lining to irreproducibility,

T

Nature 532(7598) (2016) 177.

IP

[5] K.M. Kokolus, M.L. Capitano, C.T. Lee, J.W. Eng, J.D. Waight, B.L. Hylander, S.

CR

Sexton, C.C. Hong, C.J. Gordon, S.I. Abrams, E.A. Repasky, Baseline tumor growth and immune control in laboratory mice are significantly influenced by subthermoneutral

US

housing temperature, Proc Natl Acad Sci U S A 110(50) (2013) 20176-81.

AN

[6] T. Vanden Berghe, P. Hulpiau, L. Martens, R.E. Vandenbroucke, E. Van Wonterghem, S.W. Perry, I. Bruggeman, T. Divert, S.M. Choi, M. Vuylsteke, V.I.

M

Shestopalov, C. Libert, P. Vandenabeele, Passenger Mutations Confound Interpretation

ED

of All Genetically Modified Congenic Mice, Immunity 43(1) (2015) 200-9. [7] M. Baker, Reproducibility crisis: Blame it on the antibodies, Nature 521(7552) (2015)

AC

CE

PT

274-6.