Making valid and reliable inferences from a sample to a population is a cornerstone of science and there are many pitfalls that may crop up along the way in our efforts to do this. Because of such difficulties, we often hear students and researchers apparently attempting to limit the claims they are making for their analyses by claiming something along the lines that their results ‘apply only to the sample at hand and should be generalized to the broader population with caution’. Such claims should, in almost every instance, be viewed with scepticism, for we are hardly ever interested in the idiosyncratic characteristics of a particular sample. Even when this sort of statement is made, generalization to a broader population is almost always implicit in the conclusion being drawn.

In science we are almost always interested in making general statements from particular instances i.e. drawing inferences from known facts to unknown facts. What statements like the above are really saying is ‘I know this sample is not very representative of the population but, if it were, this might be true’. Such statements, however, are little better than armchair conjecture, as they both fail to adequately link the postulated theory with observations representative of the population of interest. Fortunately, though, if a sample is collected properly, it is possible to make valid and reliable generalizations to the broader population within known bounds of error. To do this, it is essential to understand the concept of the sampling distribution as this is the key that allows us to link our specific sample with the broader population.

Next is to be able to distinguish between the

But how do we know the sampling distribution of our statistic without drawing a huge (or infinite) number of samples each time we wish to use it? Fortunately, we don’t need to actually draw all the samples that would be necessary to physically plot sampling distributions, which would of course be completely impractical, because of known mathematical links between the parameters of a sample and the sampling distribution from which it is taken. If we draw a sufficiently large random sample, all the information necessary for drawing inferences to the population from which the sample was drawn is contained within the sample data. To understand why this is so, it is important to understand a number of additional, inter-related ideas. The first, and possibly the most important of these, is the concept and properties of the normal distribution.