The Joy of Stats

I work in the no-manís land between biologists and statisticians. The trenches on either side are full but in between I have found a niche where if Iím shelled by the biologists I run and hide with my statistician colleagues, and vice versa. Such is the work of the quantitative geneticist. It is difficult to maintain interest when describing this work: a recent paper in PNAS - ďHeavy use of equations impedes communication among biologists.Ē (2012:109:11735Ė11739) estimated that papers receive 28% fewer citations overall for each additional equation per page in the main text.Ē Though bucking this trend, this self-referencing paper has been cited 20 times already, so perhaps there is hope.

Class of 2008
Class of 2008

Like many of you I spend most of my time in meetings and writing grant applications, but in the gaps in between, whereas you may go to the lab or the field, I analyse data. An exception and rewarding change to this routine is teaching quantitative genetics and statistics. NIAB has run a two week course every year since 2008: Quantitative Methods in Plant Breeding. Developing and teaching this course has taught me a lot. Most of the new things Iíve learnt in the last few years: genomic selection for example, have come about from first trying to understand the methods myself and then developing teaching materials around them. The course has been a success: we didnít need to advertise this year to fill it and Iíve put versions of it on in France, Malaysia, Australia, India and even Birmingham. It is great that there is a demand to learn the sorts of things that I think are important in plant breeding.

Here is one of the class exercises; a bit of light relief for the participants really as it involves neither equations nor computers. Iím not sure if it counts as quantitative genetics but as it was told to me by one of my quantitative genetics mentors, Mike Kearsey, Iím counting it as such:

A paper was published about four years ago which reported the use of high density genetic maps to identify sites of recombination in some doubled haploid mapping populations. The authors counted the number of recombinations in each doubled haploid and treated this as a phenotype for which they then detected and mapped QTL in the usual way. What is so fundamentally wrong with this that it should have never been published? For the answer skip to the end.

There are some recurring themes that come up in my job, many of which I now get a chance to rant about during my teaching. Lack of awareness of statistical power has become a particular hobby horse. Power, in the statistical sense, is the probability of rejecting the null-hypothesis when itís false. It can be calculated and I think it should be calculated routinely for most experiments. It obliges you to think about what magnitude of effect you would like to detect and how probable your planned experiment is to detect it. Typically this probability is much lower than you imagine, so you should design a better (usually bigger) experiment. Even after an experiment is finished, calculation of power can provide a reality check on your results: have you really detected 40 QTL in your GWAS with 50 individuals and 60 markers? Actually you donít need to do the math in this case. The answer is no.

An interesting corollary is the use of the underpowered experiment to show there is no effect. I donít think this is intentional but I sometimes wonder. Iím sad to say that certain official testing authorities have used this approach in the past: a change in trials protocol is tested experimentally on a small scale to demonstrate that it does not interact with variety performance. Typically the experiment is modest in size so the magnitude of any interaction would have to be huge to stand any chance of detection. There may be no alternative to changing trials protocols but to justify it in this way is scientifically delusional and gives statistics a bad name. In this example, if you really wanted to justify the process statistically, you would test whether the new process had an effect significantly smaller that some threshold that would cause you concern. This would usually require an impractically large experiment so the change must be justified in some other manner.

Correlation-causality
Cartoon from http://xkcd.com/

The answer to the problem: Itís a cause and effect thing. The number of crossovers is determined by the parent and not by the gametes which it produces. As the F1 from which the doubled haploids are produced is a single genotype there can be no genetic variation among the DH lines and so no QTL to detect.

Ian Mackay

comments powered by Disqus