**Of
birthdays and clusters
**

As we gratefully witness the dying fall
of yet another silly season, it is interesting to note how often birthdays
provide the basis of the most popular fallacies. In August
2001 a classic of the genre appeared when the *New Scientist* announced
that researchers in Scotland had established that anorexics are likely to be
have been June babies. The team studied 446 women who had been diagnosed as
anorexic and observed that 30% more than average were born in June. As the
monthly average of births is about 37, we deduce that the June number must have
been 48. At first sight this looks like a significant result (at least by
epidemiological standards) since the probability of getting 48 or more in a
random month is about 3%. **But that is not what they are doing!** They are
making twelve such selections and **then picking the biggest**. Application
of the theory of the statistics of extremes tells us that the probability of the
largest of twelve such selections being 48 or greater is 30%, which is not
significant even by epidemiological standards.

One of the concepts that gets the media excited and brings a glow of expectation to the eyes of compensation lawyers is the cluster. Every time there is a disease outbreak, cluster-watching comes back into vogue, as is now happening in the UK during the foot and mouth epidemic. In the distant days when epidemiology was young and saved millions of lives, a cluster comprised hundreds of cases, as in the famous story of Dr John Snow and the Broad Street pump. In 1854 Snow was able to rid London’s Soho of cholera and end a train of epidemics that claimed thousands of lives.

Gradually, under the pressure of the
epidemiologists’ need to publish, the media hunger for stories and the
lawyers’ lust for fees, the size of clusters came down. By the 1970s it had
reached a low of eight in a case in Woburn, Massachusetts, which was made famous
by being turned into a Disney film *A Civil Action*, in which John Travolta
played the crusading lawyer who sought “justice”, naturally in the form of
millions of dollars, against two companies who allegedly dumped chemicals in the
drinking water that, by deft application of the *post hoc* fallacy, were
deemed to have caused eight children’s deaths from leukaemia.

By July 2000, however, the size of a cluster fell to a startling four, when a pair of victims of new variant CJD were found to have lived in the same village, Queniborough in Leicestershire. A feeding frenzy developed in the media and the village was besieged by reporters and “experts”.

Which brings us back to birthdays. One of the many beautiful aspects of mathematics is that an application in one area can be applied to an apparently unrelated one by analogy. An age old conundrum is “How many children would you need in a classroom for there to be an evens chance of two of them having the same birthday?” Most people would plump for an answer of about half (or even twice) 365, but this is an attempt to answer a different question, namely “ How many would you require for an evens chance of two birthdays occurring on a specific date?” In fact, the correct answer is just 23. The method of calculation is given in an excellent web site for schools at http://www.mste.uiuc.edu/reese/birthday/intro.html .

At the time, ninety people had died of vCJD. The probability of two of them having the same birthday is 0.999993848, i.e. a certainty. In fact there was an evens chance of three of them having the same birthday. Likewise, if we divided the UK up into 1000 areas of equal population, we would find by the same calculation that the probability of two of the ninety coming from the same area is about 0.98. Yet the original Queniborough two were claimed to be statistically significant and the hunt was begun for more “linked pairs”.

Once the micro-cluster was identified, of course, the search was widened to include people who “came from “the same part of Leicestershire” to expand it to five. This is known in the trade as the method of the Texas sharp-shooter, who sprays the side of a barn with bullets and then draws a target round the most prominent cluster. The cause of the outbreak was even identified as the practices of two butchers in the village, regardless of the fact that they were the same as the practices almost everywhere else at the time. Such vagaries are beyond the power of mathematics, but not, of course, of epidemiology. It was claimed that the number of times the victims were more likely to use those butchers than other sources was 15 (though in the small print there was a one in twenty chance that this number could be either a tenth of that or ten times it, but that’s epidemiology for you.)

And so the search for clusters goes on, usually around the usual suspects of manufacturing industry and power generation. Most recently, pylons were blamed for the spread of the newsworthy foot and mouth disease, by a UK academic who believes that pylons are responsible all sorts of diseases. In the new age of unreason it is the story that matters and to hell with the science.

ãJohn Brignell 2001