How to analyze an observational study

How to analyze an observational study

SYMPOSIUM: RESEARCH How to analyze an observational study Preliminaries and reporting 1. Get your data in order: could someone else pick up your dat...

281KB Sizes 1 Downloads 76 Views

SYMPOSIUM: RESEARCH

How to analyze an observational study

Preliminaries and reporting 1. Get your data in order: could someone else pick up your dataset and analyze it? a. Complete data entry checks b. Label data fields c. Decide what to do about missing data d. Determine (in advance) which confounders to include in the analysis e. Identify the type and distributional shape of continuous data. 2. Decide on a statistical analysis package (e.g. Stata, SPSS, R, Excel) and use it systematically (keep a “log file”): could someone else replicate your analysis on a similar dataset? 3. Provide a full description of the characteristics of your study sample (a summary statistics table): could someone else easily determine if your results apply to their population? 4. Justify the use of and check the assumptions of statistical tests: would someone else have used the same approach? 5. Report the study and results using an appropriate checklist as a template (e.g. the STROBE statement (https://www.strobestatement.org/)): could someone else critically appraise your study and/or replicate it? 6. Publish your (anonymized) data if possible: can you enable others to replicate your analysis and/or address additional research questions?

Celia Brown

Abstract This paper outlines the common statistical methods used to analyze four types of observational study: ecological studies, cross-sectional studies, caseecontrol studies and cohort studies. Three statistical methods are considered in detail: correlation coefficients, t-tests for the difference between two means from independent samples and odds ratios. For each method, the need for the same four key outputs is highlighted: the measure of effect, its precision (95% confidence interval), its statistical significance (p-value) and its clinical significance (comparison to the minimum important difference). Where possible, this paper explains the derivation of measures of effect and statistical significance to help readers understand where the numbers come from. The paper also highlights other important aspects of analyzing the quantitative data collected in an observational study, potential biases to consider, and good practice in reporting. Examples from relevant pediatric studies are used where appropriate.

Keywords caseecontrol study; cohort study; cross-sectional study; ecological study; statistical methods

Box 1

Introduction

between these two variables for low-income countries, in which the overall need for medically-justified caesarean sections is not met (such that increasing the rate would reduce mortality) but the rate of medically-unjustified sections is very low (and would therefore have little, if any, effect on the mortality rate). Data for 119 countries were included in the study (59 of which were in the low-income group) and, as per good practice, the data were made publicly available. As with many ecological studies, both the independent (caesarean section rate) and dependent (neonatal mortality rate) variables in this study were ratio, so the data can be visualized on a scatter diagram and analyzed using a correlation coefficient, which is the measure of effect. A correlation coefficient describes the strength of the relationship between two variables, and can range from -1 (perfect negative correlation) through 0 (no correlation) to þ1 (perfect positive correlation). If you imagine a line of best fit drawn through the points on the scatter diagram, the correlation coefficient tells us how close the average point is to this line. It does not tell us about the gradient of the line (or its equation for non-linear relationships). Correlation coefficients can be calculated for ordinal, interval and ratio data, although different methods are suitable for different types and distributional shapes of data and the nature of the relationship between the two variables (Table 1). Correlation coefficients can be calculated “by hand”, but doing so does not do much to aid understanding of the metric. However a correlation coefficient should never be computed before checking the nature of the relationship between the two variables on a scatter diagram. The authors of our example study used the natural logarithm of neonatal mortality rates to ensure the required assumption of a

Observational studies come in a variety of forms and have a variety of aims. In general, observational studies seek to estimate the amount of disease or health in a specified area at (in) a specified time (period) and increase our understanding of the causes, risk factors and/or protective factors for disease. In this article we explain how to analyze the data collected for four common forms of observational study: ecological, crosssectional, caseecontrol and cohort, using appropriate statistical methods. There are four key outputs from a statistical analysis, regardless of the study design: the measure of effect, its precision (confidence interval), its statistical significance (p-value) and its clinical significance (comparison with the minimum important difference or equivalent). There are several aspects of observational studies that are not explicitly considered in this article, yet are either prerequisites to data analysis or follow-on from it as good practice in reporting. These aspects are listed in Box 1.

Ecological studies Ecological studies generally use area or national level data on two variables that are hypothesized to be related. Our example (Althabe et al., 2006) uses data on caesarean section rates and neonatal mortality at country level, with countries analyzed by income group. The authors hypothesized a negative relationship

Celia Brown PhD, Associate Professor in Quantitative Research, University of Warwick, Warwick Medical School, Coventry, UK. Conflicts of interest: none declared.

PAEDIATRICS AND CHILD HEALTH xxx:xxx

1

Ó 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Brown C, How to analyze an observational study, Paediatrics and Child Health, https://doi.org/10.1016/ j.paed.2019.11.005

SYMPOSIUM: RESEARCH

linear (straight line) relationship between the two variables was met so they could use a Pearson’s correlation coefficient (Figure 1a and b). A negative correlation of r ¼ -0.775 was found for the low income group of countries, which was statistically significant (p less than 0.001). Cohen’s rules of thumb for interpreting the strength of an association from a correlation coefficient are more than 0.1 ¼ small, more than 0.25 ¼ medium and more than 0.4 ¼ large. The authors did not give a 95% confidence interval for the correlation coefficient, but this would be -0.860 to -0.647, all of which is above the minimum value for

a large effect size, so we expect this association is clinically significant. To help us further interpret the clinical significance of a correlation coefficient, it is helpful to calculate r-squared (r2), the coefficient of determination, which is the square of the correlation coefficient. In this case, the r2 value is 0.6, or 60%. The formal definition of r2 is that it quantifies the amount of variation in the dependent variable that is explained by variation in the independent variable. However this definition is not particularly intuitive; but can be explained with the help of Figure 1c). The vertical distance of each country point from the diagonal line of best fit (plotted using a linear regression) is the residual (error) for that point. The aim of a regression is to plot the line that minimizes the sum of the squared residuals across all of the data points (they are squared because some are positive and some are negative). The sum of the squared residuals for the data plotted in Figure 1c) is 9.086; this is the variability left “unexplained” by the regression model. If there was no relationship at all between the caesarean section rate and neonatal mortality, then the line of best fit would be horizontal and would lie at the mean neonatal mortality rate of 3.01 on the log scale. The sum of the squared residuals from this horizontal line is 22.774. Therefore using the caesarean section rate reduces the unexplained variation to

Choosing a correlation coefficient Correlation coefficient

Type(s) of data

Distributional Nature of shape relationship

Pearson Spearman Kendall taub

Interval or ratio Normal Ordinal, interval or ratio Any Ordinal, interval or ratio Any (Small samples, many ties)

Linear Monotonic Monotonic

Table 1

a

b

c

Figure 1 The relationship between Caesarean Section rate and Neonatal mortality rate in 59 low-income countries (data from Althabe et al., 2006). (a) Using the “raw” neonatal mortality data (the relationship is a curve). (b) Using the natural logarithm (LN) of the neonatal mortality data (the relationship is now linear). (c) With the linear line of best fit (diagonal line) and mean natural logarithm (LN) of the neonatal mortality rate (horizontal line).

PAEDIATRICS AND CHILD HEALTH xxx:xxx

2

Ó 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Brown C, How to analyze an observational study, Paediatrics and Child Health, https://doi.org/10.1016/ j.paed.2019.11.005

SYMPOSIUM: RESEARCH

9.086, so the proportion of the variation explained is (22.774 e9.086)/22.774 ¼ 0.6 or 60%. The remaining variation is due to other factors, which could include the proportion of women having a skilled attendant at birth or clean cord practices following delivery. Ecological studies generally use secondary data and can be quick and cheap to undertake. However they can only tell us about associations and extreme care is required in interpreting the results of these studies. Confounding and the ecological fallacy are key potential weaknesses that need to be considered when analyzing data or critically appraising an ecological study.

The time in seconds taken to complete the test was recorded for each individual; as these times were approximately normally distributed within each group, they can be summarized using the mean and standard deviation (those with chronic fatigue syndrome N ¼ 120, mean 59.7 s, SD 15.2 s; those without chronic fatigue syndrome N ¼ 39, mean 53.5 s, SD 14.0 s). We want to know if the difference in mean times of 6.2 s is worth writing home about. First, we need to know the magnitude of this difference in a format that is easy to interpret and for this we use the “standardized” difference in means as our effect size. The difference in means is standardized with respect to the pooled standard deviation, which is the standard deviation for the entire study sample of 159 adolescents of 15.1. The effect size is therefore 6.2/15.1 ¼ 0.43 standard deviations. Cohen’s rules of thumb here are more than 0.2 ¼ small, more than 0.5 ¼ medium and more than 0.8 ¼ large; the authors of the paper were seeking at least medium effect sizes (i.e. the minimum important difference was 0.5 standard deviations), so the difference in means is not clinically significant. (However, until we have calculated the 95% confidence interval for the difference in means we cannot rule out that it could be clinically significant). Second, we want to know if this difference in means is statistically significant, i.e. whether we can reject the null hypothesis that the difference in means is 0. Because we have continuous, normally distributed data from two independent groups, we can use the student t-test for independent samples (other options for continuous data are shown in Table 2). In reality, we would use a computer program to conduct a student ttest, but the process is outlined below as it may help the reader understand where the t-statistic and p-value for the test come from. We need to remember that we have data from a sample of the whole population with and without chronic fatigue; with a different sample we would most likely have found different means and thus a different difference between those means. We use our pooled standard deviation and the sample size to estimate the standard error of the difference in means e the likely amount of variation in the differences in means that we would have got with different samples from our populations. Our

Cross-sectional studies Cross-sectional studies provide a “snap shot” of what is happening in a population at a particular point in time. Data are gathered from a representative sample of the population of interest, often using surveys. They are used to address a variety of research questions and can be used to compare different populations, as in our example study (Sulheim et al., 2015). As such, a variety of approaches to analysis are used; the optimal approach will depend on the nature of the research question and the type of data collected. A common approach to analysis is the use of odds ratios, which are explained in the section on case econtrol studies below. Our example study compares cognitive function in adolescents aged 12e18 with and without chronic fatigue syndrome. Although two groups (cases and controls) are being compared, this study is still a cross-sectional design because it is seeking contemporaneous associations between disease status and an outcome; it is not looking backwards to determine risk factors for the disease, or forwards to examine the future consequences of having a risk factor or a disease. Seven measures of cognitive function were assessed in the study, so we can imagine a data collection spreadsheet with one row for each individual, and columns for disease status, personal characteristics such as gender and scores/times on each of the seven measures. Here we use the Color Word Inference Test assessment of cognitive inhibition to illustrate the use of differences in mean scores between groups as the outcome measure.

Choosing a statistical test for comparing averages (means or medians) across groups Number of groups

Relationship between groups

Distribution of data

Most appropriate statistical test Use a two-sided test unless there is a good reason not to!

2

Independent/unpaired (different individuals in each group) Paired (2 observations from each individual) Independent/unpaired Paired Independent/unpaired Paired Independent/unpaired Paired

Normal

Student t-test (independent samples)

Normal Skewed Skewed Normal Normal Skewed Skewed

Student t-test (paired data) Mann Whitney U test Wilcoxon rank sum test One-way ANOVA Repeated measures ANOVA KruskaleWallis test Friedman test

2 2 2 >2 >2 >2 >2 Table 2

PAEDIATRICS AND CHILD HEALTH xxx:xxx

3

Ó 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Brown C, How to analyze an observational study, Paediatrics and Child Health, https://doi.org/10.1016/ j.paed.2019.11.005

SYMPOSIUM: RESEARCH

standard error of the difference in means is 2.75 s, and we use this to draw the most likely distribution of the differences in means that we would get if we took repeated random samples from the same populations if the null hypothesis is true (Figure 2): it is a t-distribution with mean 0, standard deviation 2.75 and 157 degrees of freedom (the total sample size-2). We look to see how often we would have got a difference in means at least as large as 6.2 s (as either a positive or negative) if the null hypothesis is true, by calculating the area under the curve above and below 6.2 and -6.2 respectively, as a proportion of the area under the whole curve (this is known as a two-sided test). 6.2 s is 2.26 standard errors of the difference in means (6.2/2.75 ¼ 2.26) and this standardized difference in means is the test statistic (the t-statistic) for our statistical test. The area under the curve (combining both ends) is 0.026 of the area under the entire curve (we could find this out by looking up 2.26 in the table of the tdistribution given our sample size and for a two-sided test) and this is the p-value of the test: the probability of getting a difference in means in our sample of at least 6.2 s if the null hypothesis is true in the population. Using the “standard” critical p-value of 0.05 we would reject the null hypothesis and conclude that response times for the test are slower for those with chronic fatigue compared to those without. With a p-value below 0.05, the whole of the 95% confidence interval for the difference in means (0.8 to 11.7 s) is above 0, and does not rule out a clinically significant effect size of at least 0.5 (at the top end of the 95% confidence interval, the effect size would be 0.77). Cross-sectional studies can provide useful evidence of associations between variables; but because they collect data at a single point in time, they cannot be used to establish causation. It is also important to consider the number of analyses undertaken (and reported); in our example study seven measures of cognitive function were used but the “standard”, single outcome critical p-value of 0.05 was applied. As more tests are conducted, the risk of a false positive result (rejecting a null hypothesis that

is true, a type I error) increases, so it is good practice to apply a correction factor and set a lower critical p-value when undertaking multiple statistical tests. The easiest correction is the Bonferroni adjustment, in which you divide 0.05 by the number of tests undertaken; but this is conservative and instead increases the risk of a false negative result (not rejecting a null hypothesis that is false, a type II error).

Caseecontrol studies Caseecontrol studies are appropriate to explore risk factors for disease, particularly when the disease in question is rare. The idea is to “over-sample” those with the disease (cases) and compare their exposure to one or more risk factors with a group of those without the disease (controls). Cases are over-sampled relative to the incidence or prevalence of the disease in the population (if individuals were selected at random from the population of interest, too few of them would have the disease to enable a precise estimate of the effect of a risk factor to be made, because the disease is rare). Controls can be “matched” or “unmatched” to cases; and the method of analysis depends on whether there is matching (and, if so, how many controls are matched to each case). Both the outcome/dependent variable (disease status; case or control) and explanatory/independent variable (exposure to risk factor) are usually binary or dichotomous (1/0) variables. This means that the results of simple case econtrol studies can be shown in what is known as a 2  2 table (example in Box 2) and are analyzed using the odds ratio as the measure of effect (together with its 95% confidence interval). An odds ratio is the odds of exposure (to the risk factor) for cases relative to (divided by) the odds of exposure for controls. As with the coefficient of determination, this definition can be hard to conceptualize. Imagine 50 cases are recruited, of whom 45 are found to have been exposed to the risk factor in question; we can calculate that five cases were not exposed. The ratio of

The difference in means was 6.2 (with a standard error of 2.75), so we find the area to the right of +6.2 and to the left of 6.2 on the t-distribution with mean 0, SD 2.75 and 157 degrees of freedom.

-8

-6

-4

-2 0 2 4 Difference in means (seconds)

6

8

Figure 2 Determining the statistical significance (p-value) of the difference in means.

PAEDIATRICS AND CHILD HEALTH xxx:xxx

4

Ó 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Brown C, How to analyze an observational study, Paediatrics and Child Health, https://doi.org/10.1016/ j.paed.2019.11.005

SYMPOSIUM: RESEARCH

If the odds of exposure were the same for cases as for controls, the odds ratio would be 1; it would be lower than 1 if the odds of exposure was lower for cases than for controls, i.e. if the exposure was protective against being a case. Cohen’s rules of thumb for interpreting odds ratios are: more than 1.5 or less than 0.67 ¼ small, more than 2.5 or less than 0.4 ¼ medium and more than 4.3 or less than 0.23 ¼ large. A statistical significance test for an odds ratio will use the null hypothesis value of 1. There is a very wide 95% confidence interval for the odds ratio using the data above, of 4.6 to 39.9 (note the confidence interval is not symmetric around the odds ratio itself). As the entire confidence interval is above 1, the odds ratio is statistically significantly different from 1 at p less than 0.05 (indeed, p less than 0.001), and we reject the null hypothesis of there being no association between exposure and being a case. This result is likely to be clinically significant as the whole of the 95% confidence interval exceeds Cohen’s “large” effect size, but there remains considerable uncertainty as to what the “true” odds ratio is. The reason for using the odds ratio, rather than the risk ratio, or relative risk, is explained in Box 2. The vast majority of published caseecontrol studies use more complicated statistical methods than a 2  2 table and corresponding odds ratio as in the example above. This enables the researchers to consider the effect of multiple risk factors e including continuous variables such as age rather than just binary variables such as exposure e and control for confounders in a single analysis; the method most often used for this is binary logistic regression. Having controlled for confounders, the results are displayed as adjusted odds ratios. Nevertheless, the measure of effect in logistic regression is still an odds ratio, and this measure can be interpreted in the same way as for a univariate (2  2) analysis. To recap, the odds ratio gives the relative odds of exposure to cases compared to that for controls: an odds ratio of 2 is a doubling of odds (an increase of 100%) and an odds ratio of 0.5 is a halving of the odds (a decrease of 50%). Caseecontrol studies have the advantage of being fairly quick and cheap (compared to cohort studies in particular) and are useful to explore risk factors for rare diseases. One important source of bias is in differences in response rate for cases and controls: in a study looking at risk factors for pediatric inflammatory bowel disease, Jakobsen and colleagues (2013) report response rates of 91% for cases and 45% for controls. We do not know if the controls who responded were representative of all controls. Another key concern is recall bias: cases may be “searching” for potential causes of their disease and may have different recall to controls.

Why odds and not risk The 2  2 table for the example in the text, also including totals, is shown in the first four coloumns of the table below. We calculated the odds ratio as 13.5. In a caseecontrol study, cases are almost always over-sampled (in our example we had one control for every case, giving an incidence of 50%). Let’s see what the odds ratio and relative risk would have been had we included enough controls so that the proportion of cases in the study matched the proportion of cases in the population. We cannot do this in real life because only very rarely do we know the true incidence or prevalence of the disease. If the true incidence of our disease in our total population of 2,000 was just 2.5% and had we included all 50 cases, then there would be 1,950 people in the population without the disease. If we assume that we had randomly sampled 50 people without the disease, the exposure status of the remaining 1,900 should be in the same proportions as those we had sampled, meaning that of the total 1,950 disease-free individuals, 40% (N ¼ 780) would have been exposed and 60% (N ¼ 1,170) would not have been. The odds of exposure for all those without the disease is still 0.67 and so the odds ratio is still 13.5. The relative risk is calculated by considering the rows of the table i.e. exposure status, rather than the columns as for the odds ratio. At population level, the risk (probability) of having the disease amongst those exposed is 45/825 ¼ 0.055 and the risk amongst those not exposed is 5/1,175 ¼ 0.004, giving a relative risk of 0.055/0.004 ¼ 12.8, which is similar to the odds ratio. However, using the data from cases and our sample of controls, the risk of having the disease for those exposed is 45/65 ¼ 0.69), and the risk for those not exposed is 5/35 ¼ 0.14. The relative risk is 0.69/0.14 ¼ 4.84. Being exposed is still shown to increase the risk of having the disease by a factor of just under five, but this is significantly below the effect of the exposure estimated using the odds ratio, so using the relative risk distorts the true magnitude of the effect of a risk factor. Cases Controls (sample) Exposed 45 Unexposed 5 Totals 50

20 30 50

Total Controls Total (sample) (population) (population 65 35 100

780 1,170 1950

825 1,175 2000

Box 2

exposed to unexposed cases is 45:5 or 9:1. For every case that was not exposed, there were nine cases who were exposed; the odds of exposure for cases is nine. Now imagine 50 (unmatched) controls are recruited, of whom 20 were exposed and hence 30 were not exposed. The ratio of exposed to unexposed controls is 20:30 or (approximately) 0.67:1. For every control that was not exposed, there were 0.67 controls who were exposed; the odds of exposure for controls is 0.67. The odds ratio is then 9/0.67 or 13.5. Cases have over thirteen-fold higher odds of having been exposed to the risk factor. However, what is perhaps more useful is that we can also say that those who were exposed have over thirteen-fold higher odds of being a case. (Try and see if you can show this using algebra!)

PAEDIATRICS AND CHILD HEALTH xxx:xxx

Cohort studies Cohort studies e whether prospective or retrospective e can provide evidence of causality when randomization is not possible: the British Doctors Study demonstrating the health risks of smoking is probably the most famous example of a cohort study. As with caseecontrol studies, the underlying data required to answer the research question(s) can usually be presented in a 2  2 table: exposure to a risk factor and whether or not the health outcome being considered subsequently occurred are both shown as binary variables (yes/no).

5

Ó 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Brown C, How to analyze an observational study, Paediatrics and Child Health, https://doi.org/10.1016/ j.paed.2019.11.005

SYMPOSIUM: RESEARCH

However, because they are not randomized, confounding is still a potential problem. As a result, researchers will usually collect data on a wide range of potential confounding variables at baseline (e.g. participant demographics such as socioeconomic status) and use these to undertake “adjusted” analyses, as described above for caseecontrol studies. Most cohort studies include a random sample of the whole population of interest (i.e. regardless of exposure status) and follow the entire cohort to see which health outcomes occur. An example is the Avon longitudinal study of parents and children, which aimed to identify risk factors in early life (up to 3 years) for obesity at age 7 (Reilly et al., 2005). Here maternal education (as a proxy for socioeconomic status) was considered a key potential confounder. While in cohort studies such as Reilly et al.’s, it is possible to calculate incidence and hence use relative risk as the measure of effect, most researchers (including this example) actually use logistic regression methods and present their results as odds ratios. However this convention does not preclude the use of other approaches to both cohort study design and analysis. For example, Rees et al. (2004) compared psychiatric morbidity amongst children and parents between children admitted to pediatric intensive care and those admitted to general pediatric wards. Outcomes between the two groups were compared with the Mann Whitney test (see Table 2) or the chi-squared test of association (see Chapter 8 of Swinscow in Further Reading). A key potential weakness with cohort studies is attrition/loss to follow-up: those lost from the study may be systematically different to those who remain, or there may be differential attrition rates by exposure status, which could bias the results. Providing a comparison of the initial and final cohort characteristics is therefore an important part of the analysis of a cohort study.

but thinking about how you will present and explain your results (effect size, confidence interval, p-value and clinical significance) to those who will use and/or benefit from the research.

Funding CB is supported by the National Institute for Health Research Collaboration for Leadership in Applied Health Research and Care West Midlands (NIHR CLAHRC WM), now recommissioned as NIHR Applied Research Collaboration West Midlands. The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. A FURTHER READING n JM, Gibbons L, Jacquerioz F, Bergel E. Althabe F, Sosa C, Beliza Cesarean section rates and maternal and neonatal mortality in low-, medium-, and high-income countries: an ecological study. Birth 2006; 33: 270e7. Bland JM, Altman DG. The odds ratio. BMJ 2000; 320: 1468. https:// doi.org/10.1136/bmj.320.7247.1468. Davies HTO, Crombie IK, Tavakoli M. When can odds ratios mislead? BMJ 1998; 316: 989e91. https://doi.org/10.1136/bmj.316.7136. 989. Gøtzsche PC. Believability of relative risks and odds ratios in abstracts: cross sectional study. BMJ 2006; 333: 231e4. Jakobsen C, Paerregaard A, Munkholm P, Wewer V. Environmental factors and risk of developing paediatric inflammatory bowel disease. A population based study 2007e2009. J Crohn’s Colitis 2013; 7: 79e88. Jogalekar A. Chocolate consumption and Nobel prizes: a bizarre juxtaposition if ever there was one. 2012. Retrieved from, https:// blogs.scientificamerican.com/the-curious-wavefunction/ chocolate-consumption-and-nobel-prizes-a-bizarre-juxtapositionif-there-ever-was-one/. Messerli FH. Chocolate consumption, cognitive function, and Nobel Laureates. NEJM 2012; 367: 1562e4. https://doi.org/10.1056/ NEJMon1211064. Rees G, Gledhill J, Garralda ME, Nadel S. Psychiatric outcome following paediatric intensive care unit (PICU) admission: a cohort study. J Intensive Care Med 2004; 30: 1607e14. Reilly JJ, Armstrong J, Dorosty AR, et al. Early life risk factors for obesity in childhood: cohort study. BMJ 2005; 330: 1357. Skelly AC, Dettori JR, Brodt ED. Assessing bias: the importance of considering confounding. Evid Based Care J 2012; 3: 9e12. Swinscow T. Statistics at square one. London: BMJ Publishing, 1997. You may also find StatQuest YouTube videos by Josh Starmer useful, such as this one on logistic progression: https://www.youtube. com/watch?v¼yIYKR4sgzI8.

Conclusion As with other aspects of study design, deciding on the methods of data analysis that will be employed should form part of the study planning phase and be documented in the study protocol (which should be published if possible). However, not all studies go according to plan and sometimes changes to the planned data analyses are required; again these need to be documented and the reason for such changes explained. There is not always one and only one appropriate statistical method for analyzing a dataset; although different methods applied to the same data should result in the same conclusions being drawn (if they do not, it is likely that one method was not actually appropriate!). In most situations, an early conversation with a statistician will be fruitful, just as early conversations with patient and public representatives are useful. Keep both of these groups in mind as you progress through the study: staying true to statistical principles,

PAEDIATRICS AND CHILD HEALTH xxx:xxx

6

Ó 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Brown C, How to analyze an observational study, Paediatrics and Child Health, https://doi.org/10.1016/ j.paed.2019.11.005