The sampling distribution

The sampling distribution

STATISTICS AND RESEARCH DESIGN The sampling distribution Nikolaos Pandis Corfu, Greece, and Bern, Switzerland I n statistics, we use a random sampl...

2MB Sizes 11 Downloads 62 Views

STATISTICS AND RESEARCH DESIGN

The sampling distribution Nikolaos Pandis Corfu, Greece, and Bern, Switzerland

I

n statistics, we use a random sample from the population of interest to draw conclusions and make inferences about the population. If the sample does not represent the population of interest, then inferences from data derived from the sample might not be valid. For example, determining the effect of an intervention for adolescents based on the results from a sample of adults could yield the wrong conclusion.

The sampling distribution1

The first of the 2 plots (left) in Figure 1 shows the distribution of age from 1 sample drawn from a population of interest. We can see some skewness in the data; however, no gross deviations from normality are observed. The right plot with normally distributed data is the

distribution of the means from 1000 random samples drawn from the same population of interest as the left plot. The distribution of the sample means on the right is called the sampling distribution. The sampling distribution is the distribution of all possible sample means that could be drawn from the population, but it is almost always a hypothetical distribution because typically we cannot calculate every conceivable sample mean. The mean of the sampling distribution is an unbiased estimator of the population mean with a computable standard deviation. The notion of the unbiased estimator is based on the idea that if samples of the same size are repeatedly extracted from a population and their mean values are calculated with the common formula, then on average these mean values will approach the population mean.

Fig 1. Distribution of age. Private practice, Corfu, Greece; visiting assistant professor, Department of Orthodontics and Dentofacial Orthopedics, School of Dental Medicine/Medical Faculty, University of Bern, Bern, Switzerland. Am J Orthod Dentofacial Orthop 2015;147:517-9 0889-5406/$36.00 Copyright Ó 2015 by the American Association of Orthodontists. http://dx.doi.org/10.1016/j.ajodo.2015.01.009

The interesting part is that the sampling distribution (distribution of the means) regardless of the distribution of the single-sample data from the target population follows a normal distribution, and this characteristic is used to make statistical inferences. 517

Statistics and research design

518

Fig 2. Sampling distribution with 1 sample (wide curve) and 25 samples (narrow curve).

Fig 3. The standard deviation and the standard error.

As the sample size increases, the margin of error gets smaller: ie, the sampling distribution is more peaked, and the estimate is more precise. The sampling distribution plot in Figure 2 shows 2 curves (wide and narrow) that have the same mean but different numbers of samples. The wide sampling distribution includes only 1 sample, whereas the narrow sampling distribution includes 25 samples. As the number of samples increases, the spread of the distribution becomes more narrow; hence, the precision (width around the mean) increases. With a large enough sample size (n . 30), the sampling distribution from which we are drawing our 1 sample will be normally distributed regardless of the shape of the population's characteristics. This property is called the “central limit theorem.” It is important

April 2015  Vol 147  Issue 4

because it says that when the sample size is large, the distribution of the sample estimates tends to be normal. This happens even if the distribution of the original data is not normal. If the original distribution is approximately normal, the sampling distribution is normal even with small sample sizes. The sampling distribution has a mean equal to the population mean m. Although estimates obtained from each sample will vary, their overall mean (the mean of the sampling distribution) will always be equal to the population value. Figure 3 shows the relationship between the standard deviation and the standard error. As pointed out earlier, the sampling distribution will be normally distributed when the sample size is larger than 30, and the standard error (standard deviation of

American Journal of Orthodontics and Dentofacial Orthopedics

Statistics and research design

the sampling distribution) is the margin of error from either side of the sample mean. We know that about 95% of a normal distribution lies within about 2 SD (more precisely, 1.96 SD) of its mean. The standard deviation of a sampling distribution is the standard error. The standard error depends on the variations in the population and the sample size, and

519

measures how precisely the population mean is estimated by the sample mean.

REFERENCE 1. Kirkwood BR, Sterne JAC. Essential medical statistics. 2nd ed. Oxford, United Kingdom: Blackwell; 2003. p. 14-41.

American Journal of Orthodontics and Dentofacial Orthopedics

April 2015  Vol 147  Issue 4