Let’s start of with the tossing of a coin calling one outcome H, for heads and the other T for tails. If we toss it once we get four events to which we can assign numbers representing their probability:
Neither heads or tails 0
Either heads or tails 1
Thus our events H and T have evens probability, which we call ½. Now suppose we toss the coin twice. We have four combinations that can occur: HH, HT, TH, TT. Now we have no reason to suppose that any of these is less probable than another, so we assign them equal probabilities, totalling 1, and grouping them we get the following:
Two heads ¼
One of each ½
Two tails ¼
Because “one of each” can happen in two ways we have to give each of them an equal probability and add them; so we have distributed our probabilities in the proportions 1,2,1. If we now toss the coin three times per experiment we get four possible combinations:
Three heads 1/8
Two heads and a tail 3/8
Two tails and a head 3/8
Three tails 1/8
There are three ways of getting each two-plus-one combination: TTH, THT, HTT and HHT, HTH, THH. So we have now distributed our probabilities in the proportions 1,3,3,1 and divided by the total, as the total probability of the events has to be 1. A pattern is emerging and this pattern is known as Pascal’s triangle, though it was known to Chinese algebraists in 1303, try it for four or five and you can prove it yourself:
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
Each number is found by adding the two either side above it. Thus without any complicated calculations we can predict the probability of any outcome for any number of coin tossings.
Each number in the triangle represents the number of ways of selecting r things from n different things, where n is the number of tossings (the number of rows from the top) and r is the horizontal position from the left. As a mathematical short hand we write this nCr and we can see from the symmetry of the triangle that it has the important property that nCr = nCn-r.
That is all very well, but what if our coin was biased and the probabilities of H and T are unequal, but of course sum to unity. Let us set the probability of H = p and the probability of T = q.
Then we get for the double tossing:
Two heads p2
One of each 2pq
Two tails q2
For the triple tossing we get:
Three heads p3
Two heads and a tail 3p2q
Two tails and a head 3pq2
Three tails q3
We can extend this to any number of tossings, n, and have therefore established, by common sense, the Bernoulli theorem:
If the probability of success in each trial is p, then the probability of r successes in n trials is nCr pr qn-r.
Let us illustrate this numerically. If our coin was biased, the probability of a head might be 0.4 and of a tail 0.6, so wherever we see a T we put 0.4 and wherever we see an H 0.6. If the idea of a biased coins seems difficult, imagine a black bag containing four discs marked H and six discs marked T. You shake it up, make a selection, identify it then replace it. Then for two selections the probabilities are modified to:
Two heads 0.4x0.4 = 0.16
One of each 2x0.4x0.6 = 0.48
Two tails 0.6x0.6 = 0.36
So for any combination all we do is multiply the probabilities together and then multiply by the number of ways we can get that particular combination. We check our calculations by making sure the sum adds up to 1. We can do this for three, four, ten, one thousand or any number of tossings and calculate the exact probabilities. This is the binomial distribution and that is all it is, common sense. Of course it is rather tiring to toss a coin one thousand times, so we get a computer to do it.
It is fairly obvious that the average number of heads we are going to get is np, so this is the average of a binomial distribution. The standard deviation is the square root of npq.
Here are the numbers for n = 10, p = 0.4:
Back to FAQs