WASP (Write a Scientific Paper) using Excel – 6: Standard error and confidence interval

WASP (Write a Scientific Paper) using Excel – 6: Standard error and confidence interval

Early Human Development xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Early Human Development journal homepage: www.elsevier.com/loca...

241KB Sizes 0 Downloads 32 Views

Early Human Development xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Early Human Development journal homepage: www.elsevier.com/locate/earlhumdev

WASP (Write a Scientific Paper) using Excel – 6: Standard error and confidence interval Grech Victor Academic Department of Paediatrics, Mater Dei Hospital, Malta

A R T I C L E I N F O

A B S T R A C T

Keywords: Software Computers Statistics Biostatistics

The calculation of descriptive statistics includes the calculation of standard error and confidence interval, an inevitable component of data analysis in inferential statistics. This paper provides pointers as to how to do this in Microsoft Excel™.

1. Introduction

3. Utility of standard error

The previous paper explained the derivation of standard deviation (SD, σ, sigma), and what it represents [1]. This paper will show how SD can be used to derive further information about the population being sampled.

Standard errors are crucial in order to study the significance of the differences between means as part of inferential testing, such as in ttests. While these are calculated by Excel, it is important to understand software generated outputs and not just quote p values in a paper's results section. As already explained [1], the standard deviation of a sample will not decrease with increasing sample size, but will more accurately reflect the population standard deviation. The standard error however is a measure of the precision of an estimate of the population mean. The standard error thus decreases with increasing sample size.

2. Standard error of the mean A consecutive set/series of repeated samples from the same population will have different means, purely due to the random nature of true sampling. If analysed, the means of these samples have a normal distribution around the true (and usually unknown) population mean. This series of means, since it is normally distributed, has a standard deviation, just like an individual dataset. However, commonly, researchers will have one dataset, which constitutes a sample of a population under study. The standard error of a mean of a single sample is an estimate of the standard deviation of the distribution of means that would be obtained were a series of such samples available for analysis. This value is referred to as the standard error of the means and is calculated by dividing the standard deviation of the sample by the square root of the number of observations in the sample (minus 1, i.e. the division is by n-1, a modification known as Bessel's correction). This was explained in the previous paper in this series and the calculations are reproduced in Fig. 1. The quick Excel generated summary statistics were also explained in the previous paper in this series and are reproduced in Table 1 [1].

4. Confidence interval Returning to the small sample (n = 15) of urinary lead levels, the mean was 1.5 μmol/24 h with a standard deviation of 0.84. The standard error is therefore SD/√n i.e. 0.84/√15 = 0.22. This implies that were the population to be repeatedly sampled, 95% of the sample means would fall within the range 1.5 ± (1.96 × 0.22) = 1.5 ± 0.43. This is referred to as the 95% confidence interval, and implies that there is only a 5% chance that this range excludes the unknown but estimated range of population means. Bayesian statisticians call this the credible interval rather than the confidence interval and they define it as the range of possible values that the real population mean could have, with 95% certainty. The standard error of a mean is therefore a statement of probability with regard to the difference between the mean of the sample and the true population mean. This also implies that by chance alone, a mean from any particular sample has an approximately 5% chance of being above or below two standard errors from the true population mean.

E-mail address: [email protected]. https://doi.org/10.1016/j.earlhumdev.2018.01.013

0378-3782/ © 2018 Elsevier B.V. All rights reserved.

Please cite this article as: Grech, V., Early Human Development (2018), https://doi.org/10.1016/j.earlhumdev.2018.01.013

Early Human Development xxx (xxxx) xxx–xxx

V. Grech

Fig. 1. Excel summary statistics.

Table 1 Calculation of standard error and confidence interval (see formulas on right of table).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

A

B

C

D

Differences From mean −1.4 −1.1 −0.9 −0.7 −0.4 −0.3 −0.2 0 0.2 0.4 0.4 0.5 0.7 1.1 1.7

Total n Mean

Urinary lead (μmol/24 h) 0.1 0.4 0.6 0.8 1.1 1.2 1.3 1.5 1.7 1.9 1.9 2 2.2 2.6 3.2 22.50 15 1.50

Differences Squared 1.96 1.21 0.81 0.49 0.16 0.09 0.04 0 0.04 0.16 0.16 0.25 0.49 1.21 2.89 9.96 0.71 0.84

Variance Std. dev.

E

F

G

Mean Standard error Median Mode Standard deviation Sample variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence level (95.0%)

1.50 0.22 1.50 1.90 0.84 0.71 −0.21 0.22 3.10 0.10 3.20 22.50 15 0.47

=AVERAGE(B3:B18) =G7/SQRT(G15) =MEDIAN(B3:B18) =MODE(B3:B18) =STDEV·S(B3:B17) =VAR.S(B3:B17) =KURT(B3:B17) =SKEW(B3:B17) =G13-G12 =MIN(B3:B17) =MAX(B3:B17) =SUM(B3:B17) =COUNT(B3:B17) =CONFIDENCE.T(0.05,G7,G15)

=D19/(15–1) =SQRT(D20)

5. Use of standard deviation, standard error and confidence interval

Similarly, were two samples to be compared in order to ascertain whether they truly represent two different populations, there is also a 5% chance that:

The variance, and better still, the standard deviation, are used to describe the properties of a dataset, and approximates that of the population studied, and therefore relates only to the sample subjects studied. The standard error is used to draw inferences from the samples observed and extend such findings to the reference population, including the comparison of means of different study groups. Indeed, the next paper in this series will deal with the t-distribution [3].

1. The two samples may be found to be arising from the same population, even if they do not, or vice-versa 2. To be found to be different even when arising from the same population. All of the equations manually detailing how to obtain the abovementioned values dynamically are demonstrated in Table 1, while Fig. 1 shows how to obtain these as one-off values using Excel's Analysis Toolpak are shown in Fig. 1. Further conclusions and deductions i.e. inferential statistics, will be dealt with in the rest of this series of papers.

Acknowledgments The inspiration for this series of papers arises from Thomas Douglas Victor Swinscow's original series of papers in the 1970s entitled “Statistics at Square One” [2] as well as the Excel-based statistics talks prepared for the international Write a Scientific Paper course (WASP – 2

Early Human Development xxx (xxxx) xxx–xxx

V. Grech

References

http://www.ithams.com/wasp) [4,5]. I would also like to thank Dr. Neville Calleja (Director at Department of Health Information & Research - Department of Health Information & Research, Ministry of Health, the Elderly and Community Care) for reviewing these manuscripts.

[1] V. Grech, Biomedical Statistics using Excel – 5: Quartiles and Standard Deviation, (2018) (Previous paper in this BPG set). [2] T. Swinscow, Statistics at square one, Br. Med. J. 1 (6020) (1976) 1240. [3] V. Grech, WASP (Write a Scientific Paper) Using Excel – 7: The t-Distribution, (2018) (Next paper in this BPG set). [4] V. Grech, WASP – Write a Scientific Paper course: why and how, J. Vis. Commun. Med. 40 (3) (2017 Jul) 130–134. [5] V. Grech, S. Cuschieri, Write a Scientific Paper (WASP) - a career-critical skill, Early Hum. Dev. (2018), http://dx.doi.org/10.1016/j.earlhumdev.2018.01.001 (in press, in first BPG EHD 4527).

Conflict of interest statement There are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

3