## The Normal distribution

It is common in statistics to make use of probability distributions; these are distributions (shapes) which are specified by a mathematical formula incorporating one or more 'parameters.' Because they are defined mathematically, we know how they behave under different values of the parameters. If we can assume real data are a sample from a distribution which follows, at least approximately, a known theoretical form, then we can infer certain facts about it and indeed we can calculate how likely it is that a given value could have come from that distribution. The most commonly used probability distribution is the Normal or Gaussian distribution which follows the well known bell-shaped curve which is often seen with 'real' data. Technically, the familiar Normal curve is a frequency curve or 'probability density function' (PDF) centred, on the x-axis, around the mean and with the height representing the 'probability density.' The total area under any PDF is always set to equal one, therefore the area under the curve (AUC) to one side of a given x-axis value can be interpreted as the probability of observing a result at least as extreme as that value.

The shape of the Normal curve (Fig. 5.10) is determined entirely by its two parameters, the mean (m) and the standard deviation (SD). Whatever the actual values of the mean and standard deviation, approximately 68 per cent of the area under a Normal curve will lie within the x-axis range defined by the mean ± one SD, and approximately 95 per cent will lie within the mean ± 2SD.

The distance of any point on the x-axis from the mean is known as the standard Normal deviate; effectively this is a variable with mean zero and standard deviation one. The Normal distribution with mean zero and SD 1 is called the Standard Normal distribution, and written as N ~ (0,1). Any normally distributed variable can be transformed into one with a standard Normal distribution by subtracting the mean and dividing by the standard deviation. This is the basis of many hypothesis tests and hence of sample size

Probability density

Probability density

Fig. 5.10 The Normal distribution calculations too, as it gives a common reference point whatever the original distribution of the variable. Tables of the Normal distribution are actually of the standard Normal distribution and will tell you, for a given Z (x-axis) value, the proportion of the AUC which lies to one side of the value (these are 1-tailed or 1-sided p-values) or the proportion of the AUC which lies outside the range defined by ±Z (these are 2-tailed or 2-sided p-values).

## Post a comment