Statistical significance

All results obtained by statistical methods suffer from the disadvantage that they might have been caused by pure statistical accident. The level of statistical significance is determined by the probability that this has not, in fact, happened. P is an estimate of the probability that the result has occurred by statistical accident. Therefore a large value of P represents a small level of statistical significance and vice versa.

In experiments where we are obliged to resort to statistics it is therefore proper procedure to define a level of significance at which a correlation will be deemed to have been proven, though the choice is often actually made after the event. It is important to realise that, however small the value of P, there is always a finite chance that the result is a pure accident. A typical level at which the threshold of P is set would be 0.01, which means there is a one percent chance that the result was accidental. The significance of such a result would then be indicate by the statement P<0.01.

Unfortunately it has become customary in some branches of science, particularly epidemiology, to operate with much lower levels of significance. A level frequently quoted is P<0.05. This means that there is a one in twenty chance that the whole thing was accidental (In one notorious case the threshold was even raised to 0.1 in order to obtain the “required” result). This is particularly worrying in areas that are newsworthy or politically correct, since it is likely that more than twenty similar experiments are being conducted worldwide, so it is almost certain that there will be one positive result, whether the correlation is genuine or not. Because negative results are almost never published (publication bias) this means that an unknown but possibly large number of false claims are sustained as verities.

It is difficult to generalise, but on the whole P<0.01 would normally be considered significant and P<0.001 highly significant.

The provenance of the P<0.05 criterion goes back to the great pioneer of significance testing, R A Fisher, who is deemed to have given it his imprimatur. He did not in fact do this and late in his life stated that he had just used this level in his calculations as a “mathematical convenience”. Furthermore, he also stated that “without randomisation there is no significance”.

Many leading scientists and mathematicians today believe that the emphasis on significance testing is grossly overdone. P<0.05 had become an end in itself and the determinant of a successful outcome to an experiment, much to the detriment of the fundamental objective of science, which is to understand.

An alternative way of putting it is to quote a confidence interval, e.g. (relative risk: 0.56; 95% confidence interval: 0.32–0.97), which means that there is a one in forty chance that the relative risk is 0.97, or to all intents and purposes unity.

Return to FAQs