LSE Logo MBoC Logo

Beware the tail that wags the dog: informal and formal models in biology

    Published Online:


    Informal models have always been used in biology to guide thinking and devise experiments. In recent years, formal mathematical models have also been widely introduced. It is sometimes suggested that formal models are inherently superior to informal ones and that biology should develop along the lines of physics or economics by replacing the latter with the former. Here I suggest to the contrary that progress in biology requires a better integration of the formal with the informal.

    In a series of previous essays, I discussed how formal mathematical models have played a far more significant role in biology than most biologists typically appreciate (Gunawardena, 2012, 2013, 2014). Here I want to focus on the interplay between formal and informal models.

    The word “model” has many meanings in biology. We speak of model organisms as institutionalized representatives of particular phyla. We occasionally build physical models, as Crick and Watson did for DNA. Mostly, however, a model refers to some form of symbolic representation of our assumptions about reality, and that is the sense in which I will use the word here. An informal model is one in which the symbols are mental, verbal, or pictorial, perhaps a scrawl of blobs and arrows on the whiteboard; in contrast, a formal model is one in which the symbols are mathematical.

    Informal models pervade biology. They help to guide our thinking, and experimentalists rely on them to design experiments. The model may turn out to be nonsense, and an experiment may reveal that, but one has to start somewhere. It is sometimes claimed that one starts with data, from which a model is constructed. But why those data? And how should those data be interpreted? The answers reveal informal models that precede the acquisition of data.

    Models, whether informal or formal, allow us to capture assumptions and to undertake reasoning. Informal models have two classes of assumptions: those that are explicit in the model itself, or foreground assumptions; and those that are only implicit but potentially significant, or background assumptions. In molecular biology, a foreground assumption might be that blob X is an activated enzyme, which implements an informal arrow. A background assumption might be that X has multiple posttranslational modifications, which influence activation but differ depending on the organism. Whether a particular fact is in the foreground or relegated to the background depends on the problem at hand and the questions being asked. This allows us to tolerate much ambiguity. Does X mean chicken X or fly X? Does it matter? When it does, background becomes foreground; when it does not, foreground becomes background. The ever-present inconsistencies in biological life can be managed by relegating some findings to a background limbo until they can be reliably brought into the foreground or rejected. Informal models are readily corrected and updated and change organically with the changing context.

    Foreground assumptions allow us to embrace reductionism—the properties of the blobs determine those of the system—whereas background assumptions reconcile this with the actual behavior of living organisms, in which the properties of the blobs depend on the system of which they are a part (Gunawardena, 2013). What X actually does can depend, sometimes, on whether it is chicken X or fly X.

    In contrast to their flexibility with assumptions, informal models permit only limited forms of reasoning. The psychologist Daniel Kahneman, drawing upon his collaboration with the late Amos Tversky, which led to Kahneman's 2002 Nobel Memorial Prize in Economics, distinguishes between fast thinking and slow thinking (Kahneman, 2011). Fast thinking is analogical, intuitive, and emotional; slow thinking is logical, deliberative, and effortful. Fast thinking can sometimes lead you astray (Kahneman, 2011, p. 44):

    A bat and ball together cost $1.10.

    The bat costs one more dollar than the ball.

    How much does the ball cost?

    If your answer is a dime, you were thinking fast; if it is a nickel, you were thinking slow. When Harvard, MIT, and Princeton students were tested, half of them were too fast for their own good (Kahneman, 2011). Informal models do not necessarily lead to fast thinking, but they encourage intuitive plausibility over logical deduction.

    Formal models are couched in equations with mathematical symbols. In marked contrast to their informal counterparts, they cannot tolerate any form of ambiguity: all assumptions are in the foreground, and there is no background. If it is not in the equations, it is not in the model. Moreover, it is often necessary to make more concrete assumptions than in the corresponding informal model, so that, for instance, an informal arrow is given a particular mathematical structure, which may amount to no more than a guess for the purposes of formalization (Gunawardena, 2014). Formal models are extremely brittle, and the tiniest change in assumptions requires new conclusions to be derived, which may differ from the previous conclusions. The value of a formal model does not rest on its ability to deal with assumptions, in which it compares very poorly to an informal model, but on its capacity for reasoning by logical deduction, the impact of which I have discussed elsewhere (Gunawardena, 2012, 2013, 2014).

    Formal models have begun to coexist with informal models at the molecular and cellular levels in biology. Will that continue to be the case, or will formal models take over, as some envisage (Brenner, 2010)? Physics, for instance, has been governed exclusively by formal models since it emerged from natural philosophy. A more informative comparison might be with economics, which should be closer to biology in studying aspects of primate behavior. Economic arguments used to be conducted in ordinary language and relied exclusively on informal models. That is no longer the case (Morgan, 2012). Professional economists now speak mathematics to each other, and what is not mathematical is not taken seriously. Paul Krugman, in a meditation on metaphors and models in complex systems,1 recounts how important work in developmental economics undertaken using informal models in the 1950s was overlooked until recast using formal models in the 1990s. In the intervening years, formalization led to misunderstanding and ignorance. In Krugman's view, this was a necessary evil because of the clarity that formalization brought elsewhere.

    Economics, unlike biology, has become an experimental discipline only very recently, when the work of psychologists like Kahneman and Tversky revealed that some primates do not always act economically as economists had assumed in their models (Tversky and Kahneman, 1981). Taxi drivers, for instance, were expected to optimize their earnings by working longer hours when business is good and shorter hours when it is bad. When the experiment was done, it was found that they prefer to have a steady wage for the day (which surely makes their families less insecure), and so they do the opposite of what the formal model says and work shorter hours when business is good and longer hours when it is bad (Camerer et al., 1997). However, this intrusion of reality has been largely limited to the microeconomic behavior of agents like us, rather than to the macroeconomics of countries and the global economic system, where, it seems, at least from what we hear on the news, the gap between models and reality remains substantial.

    So, perhaps biology will be saved from formalization by its grounding in experimental reality. To assess this, let us turn to evolutionary biology, which provides two instructive examples of the interplay between formal and informal models.

    Charles Darwin convinced most scientists of his time that species had evolved through descent by modification. He was markedly less successful in showing that it happened through natural selection. Two serious objections were made to this proposed mechanism. The engineer Fleeming Jenkin pointed out that because inheritance was believed to blend the characters of parents, chance variation in an individual could not persist over generations (Jenkin, 1867). This undermined many of the informal arguments that Darwin made in The Origin of Species but it was not fatal. Darwin's cousin, Francis Galton, the first to undertake a statistical analysis of inheritance, came up with formal arguments for why the overall variation in the population might still be maintained despite blending—Galton created eugenics and was deeply worried about blending with those less fortunate than himself—and these ideas were further developed by Galton's protégé, Karl Pearson, into a “biometric” theory of heredity and evolution (Pearson, 1897). However, a better resolution of Jenkin's objection had to wait for the rediscovery of Mendel's genetics, which we will come to later.

    The second and more devastating objection came from the renowned physicist William Thompson. Thompson had been knighted for his contributions to the first transatlantic telephone cable, an engineering feat on which he worked closely with Fleeming Jenkin, and such was his prestige that he eventually became the first British scientist to be elevated to the peerage. For his baronetcy, Thompson took the name Kelvin, after the river that ran near his Glasgow laboratory, and that is how we now remember him and his contributions to thermodynamics in the degrees Kelvin of the absolute scale of temperature (Smith and Wise, 1989). Kelvin had long thought that geological thinking was inconsistent with thermodynamics, and he developed a formal model of planetary cooling, which gave an estimate of 100,000,000 years for the age of the Earth (Thomson, 1863).

    This was much too short for Darwin. A much longer span of geological time was crucial for Darwin's informal model of natural selection. In the first, 1859, edition of The Origin of Species, Darwin had wanted to drive home this point so strongly that he estimated the age of the geological formation known as the Weald in southern England and arrived at the extraordinary figure of 306,662,400 years (Darwin, 1985, p. 297)! Darwin should have known better than to publish in his magnum opus what a physicist would dismiss as a “back of the envelope” calculation. Although couched in numbers, it was decidedly informal in its reasoning. The estimate was rapidly criticized in both the scientific and the popular press, and in the third, 1861, edition Darwin entirely omitted his “confounded Wealden calculations” (Burchfield, 1990).

    It was, however, too late. Kelvin and his allies now had an easy target to attack. Kelvin also had a broader purpose. He wanted not just to discredit natural selection—he believed in the argument from design for the existence of a Creator— but to make geology subservient to physics (Thomson, 1871). It was hard to argue with a formal model wielded by someone as intimidating as Kelvin, whom Darwin came to see as “an odious spectre” (Burchfield, 1990). T. H. Huxley, Darwin's great friend and bulldog defender, who had so successfully fought off religious objections to evolution in his famous debate with Bishop Samuel Wilberforce, could only concede that “Biology takes her time from geology. … If the geological clock is wrong, all the naturalist will have to do is to modify his notions of the rapidity of change accordingly” (Huxley, 1869).

    Darwin was left isolated. He knew that Kelvin's time scale was woefully inadequate for the scale of biological change that he had studied for so long. His own informal model of how natural selection must be working, through gradual positive selection, told him something very different. But few of Darwin's peers felt comfortable with an organic world run by blind chance. They were more inclined toward mechanisms of evolutionary progress, such as Lamarckism or orthogenesis, and they were only too happy to let Kelvin make their case for them. It is poignant to think that Darwin died believing that he had failed to convince his contemporaries of natural selection (Bowler, 1983). It was not religion that did the damage; it was Kelvin's formal model for the age of the Earth.

    Which was wrong. Huxley had, in fact, fought a superb rearguard action while retreating under fire and had made a remark that we should all take to heart: “Mathematics may be compared with a mill of exquisite workmanship, which grinds you stuff of any degree of fineness; but, nevertheless, what you get out depends upon what you put in” (Huxley, 1869). Kelvin's formal model may have been correct mathematics, but it led to wrong science. Kelvin was still alive when Becquerel's 1896 discovery of radioactivity provided the missing heat source that had been left out of his formal model. In a broader sense, of course, Kelvin was right. Geology and biology have to base themselves on physics, which does provide the means to calculate the age of the Earth. But only if the physicists get their assumptions right.

    The second example of the interaction between formality and informality requires some background. The rediscovery of Mendel's genetics provided the first clue that Fleeming Jenkin's objection might be repudiated more convincingly than by Pearson's biometric theory. However, it was no easy matter to understand how alleles mix and match in a population under recombination, mutation, and selection. This was a problem that could only be worked out by mathematics, a task largely accomplished by R. A. Fisher, J. B. S. Haldane, and S. Wright. The key step in setting up their formal model of population genetics was figuring out how to deal with selection, which, in contrast to recombination and mutation, acted on the phenotype. Phenotypes are extraordinarily complicated, arising as they do from an intricate dialogue between genotypes, development, and the environment. Neither the process of development nor the influence of the environment could be readily formalized. The solution they found was simplicity itself. The phenotype—the organism, in other words—was omitted from the formal model, and selection was assumed to act on the genotype. Mathematically speaking, it was a stroke of genius, which set up a rigorously formulated problem of allele frequency dynamics while avoiding the morass of organismal biology. Biologically speaking, well, that was another matter, as we will see.

    The resulting formal model was, without question, the most successful ever devised in biology. It showed how variation could persist in a population, how Mendelian genetics could account for the statistics of the continuously varying characters that the biometricians had studied, how natural selection could act efficiently in large populations, so that alleles of even small selective advantage would eventually become fixed, much as Darwin had imagined, and how in small populations, random mutational drift could dominate over selection. These mathematical results provided the foundation on which the evidence for natural selection in nature was assembled (Dobzhansky, 1937). This, in turn, compelled the vast majority of working biologists to abandon Lamarckism and orthogenesis and adopt the modern, neo-Darwinian synthesis.

    Despite the vital role played by mathematics in this hugely significant development, which would have been literally unthinkable in its absence, the mathematics itself remained invisible to most biologists. Even today, many of those who would staunchly defend evolution against its detractors would also confidently assert that there is no place for mathematics in biology. The field biologists who completed the modern synthesis (T. Dobzhansky, E. Mayr, G. G. Simpson, and others) knew better, but they largely kept their distance from the mathematical details, leaving those to the population geneticists.

    This brings us to our second example. In 1963, Ernst Mayr, one of the founders of the modern synthesis, made the confident assertion in his Animal Species and Evolution that “Much that has been learned about gene physiology makes it evident that the search for homologous genes is quite futile except in very close relatives (Dobzhansky, 1955)” (Mayr, 1963, p. 609).

    We might call this Ernst Mayr's “great mistake.” We now know that you can extricate a Hox gene from a chicken, put it in a fly, and it will make a fly, up to a point (Lutz et al., 1996). Mayr's “great mistake” is every bit as embarrassing as Darwin's “confounded Wealden calculations,” but Darwin's mistake is almost endearing, whereas Mayr's mistake is thoroughly disturbing. Anyone reading Mayr's wonderful book will be struck by its integrative biological perspective and its disdain for simplistic explanations, genetic or otherwise. Indeed, Mayr had a famous dispute with his friend J. B. S. Haldane about “beanbag genetics” (Rao et al., 2011). Why did such a fine biologist, who was so thoughtful, get an issue like this so wrong? Sadly, I never had the chance to ask him. He died in 2005 at the age of 100, working at Harvard to the end (Mayr, 2004).

    There was little evidence for such an assertion in 1963. If anything, as Mayr's reference to Dobzhansky, another of the founders, makes clear (Dobzhansky, 1955, pp. 243–251), the functional homologies in metabolic biochemistry between yeast and animals were already suggesting deeper connections. But, here, too, Dobzhansky says, “it is probable that different genes do it in man and in yeast” (Dobzhansky, 1955, p. 251).

    I suspect that Mayr, along with Dobzhansky and their other colleagues, had taken the conclusions of formal population genetics too seriously. They had come to believe in the awesome power of natural selection to act upon variation, and this had infected their informal models and the way they imagined evolution to be at work. Natural selection may act, but what is the nature of the variation from which it draws its power? The formal model can say what happens to existing variation under whatever evolutionary dynamics is assumed to be acting, but how can it say what the existing variation is in the first place? Even to know what kind of genetic variation is invisible to selection, let alone to know what becomes phenotypically selectable in a given environment, one has to know how an organism deals with variation—how it converts genotype to phenotype (Lewontin, 1974). But there is no organism in the model. If you do not put it in, as T. H. Huxley pointed out, it is not going to miraculously emerge. The stroke of genius in leaving the organism out now comes back to haunt a later generation of biologists. If Mayr had not been so convinced that small mutations accumulating through positive selection explain everything, he might have been more tentative in his assertion. Instead, he allowed the formal model to overwhelm his informal model; he allowed the tail to wag the dog.

    The Hox genes were the first in a series of striking discoveries about the molecular basis by which animals are constructed, which have in turn begun to tell us about the nature of phenotypic variation. Deeply conserved developmental modules have been found that, unlike modules in technological artifacts such as computers, are not rigidly wired together but instead are only “weakly linked” (Kirschner and Gerhart, 2005). Polydactyly is a not-uncommon developmental abnormality, which can be corrected by surgery. But how can a digit primordium in a developing limb bud construct a sixth finger when we usually have five? How do the modules responsible for cartilage, bone, muscle, vasculature, and nerves respond to the challenge from the primordium? “You are doing what? You are making a sixth finger? Absolutely not! We did not sign up for that.” They put down their tools and call an official strike and the organism breaks. That would be strong linkage. But that is not what happens. There is no contractual agreement written in the genes. It is all more laid back, more Californian: “Hey, those primordia dudes are making an extra finger. Let's go join the party!” That is weak linkage. In this way, mutation can be hidden from selection to become heritable phenotypic variation. A sixth digit may be mildly deleterious today, but the capability to make extra digits was useful when animals were learning to survive on land. Perhaps—who knows?—it might come in handy again.

    The emergence of “evo-devo” as a discipline offered hope for better understanding of how variation and selection work together. One might have expected, consequently, that formal and informal models would have achieved a better rapport than in Ernst Mayr's day. But here is Sean Carroll on the subject: “Millions of biology students have been taught the view (from population genetics) that evolution is change in gene frequencies. Isn't that an inspiring theme? This view forces the explanation toward mathematics and abstract descriptions of genes and away from butterflies and zebras. … Instead of change in gene frequencies, let's try evolution of form is change in development” (Carroll, 2005), to which Michael Lynch responds, “This statement illustrates two fundamental misunderstandings. Evolutionary biology is not a story-telling exercise, and the goal of population genetics is not to be inspiring, but to be explanatory. … Nothing in evolution makes sense except in the light of population genetics” (Lynch, 2007). The dog and the tail have parted company across a gulf of misunderstanding.

    We still have a way to go. But that is another story.

    To sum up what we have learned from our excursion into evolutionary biology: formal models are not descriptions of reality; they are descriptions of our assumptions about reality; they are only as good as their assumptions; if you make the wrong assumptions, correct mathematics can still produce wrong science; it is more important to understand the assumptions than to believe the conclusions; the latter are logically ordained in the former, but the former reflect the price that you are actually paying; if those assumptions are at odds with your informal model, then life gets interesting, and you have to decide which to change; formal and informal models ought to work together, each influencing the other in a virtuous cycle; but this may be harder than it looks, if evolutionary biology is any guide; in particular, the tail should not wag the dog and destroy the cycle.

    It seems unlikely that biology will go the way of economics. In biology, formal models rely on informal models to bridge the gap between reductionism and reality, between what we can logically infer about molecules and what we can discover about life by observation and experiment. We may need to be rigorous, but we had better be right. We need informal models to keep our formal models on the straight and narrow. Our tails need their dogs. Let us spare a thought in passing for our poor colleagues toiling in the “dismal science” of economics, which seems, from a safe academic distance, to be all tail and no dog.



    1P. Krugman, “The rise and fall of developmental economics.” Available at


    I am grateful to Evelyn Keller, Doug Kellogg, and Michael Lynch for very helpful comments, which clarified several issues, but they are not to blame for the remaining deficiencies of this essay. I also thank Mary Welstead for her stringent editorial consultancy.


  • Bowler PJ (1983). The Eclipse of Darwinism: Anti-Darwinian Evolution Theories in the Decades around 1900, Baltimore: Johns Hopkins University Press. Google Scholar
  • Brenner S (2010). Sequences and consequences. Phil Trans R Soc 365, 207-12. Crossref, MedlineGoogle Scholar
  • Burchfield JD (1990). Lord Kelvin and the Age of the Earth, Chicago: University of Chicago Press. CrossrefGoogle Scholar
  • Camerer C, Babcock L, Loewenstein G, Thaler R (1997). Labor supply of New York City cabdrivers: one day at a time. Q J Econ 112, 407-41. CrossrefGoogle Scholar
  • Carroll S (2005). Endless Forms Most Beautiful: The New Science of Evo-Devo, New York: W. W. Norton. Google Scholar
  • Darwin C (1985). The Origin of Species by Means of Natural Selection or the Preservation of Favoured Races in the Struggle for Life, London: Penguin Books, [reprint of the original 1859 edition]. Google Scholar
  • Dobzhansky T (1937). Genetics and the Origin of Species, New York: Columbia University Press. Google Scholar
  • Dobzhansky T (1955). Evolution, Genetics and Man, New York: John Wiley & Sons. Google Scholar
  • Gunawardena J (2012). Some lessons about models from Michaelis and Menten. Mol Biol Cell 23, 517-519. LinkGoogle Scholar
  • Gunawardena J (2013). Biology is more theoretical than physics. Mol Biol Cell 24, 1827-1829. LinkGoogle Scholar
  • Gunawardena J (2014). Models in biology: “accurate descriptions of our pathetic thinking.”. BMC Biol 12, 29. Crossref, MedlineGoogle Scholar
  • Huxley TH (1869). Geological reform. Q J Geol Soc Lond 25, 38-53. Google Scholar
  • Jenkin F (1867). The origin of species. North British Rev 46, 277-318. Google Scholar
  • Kahneman D (2011). Thinking, Fast and Slow, New York: Farrar, Strauss and Giroux. Google Scholar
  • Kirschner MW, Gerhart JC (2005). The Plausibility of Life, New Haven, CT: Yale University Press. Google Scholar
  • Lewontin RC (1974). The Genetic Basis of Evolutionary Change, New York: Columbia University Press. Google Scholar
  • Lutz B, Lu H-C, Eichele G, Miller D, Kauman TC (1996). Rescue of Drosophila labial null mutant by the chicken ortholog hoxb-1 demonstrates that the function of Hox genes is phylogenetically conserved. Genes Dev 10, 176-184. Crossref, MedlineGoogle Scholar
  • Lynch M (2007). The frailty of adaptive hypotheses for the origins of organismal complexity. Proc Natl Acad Sci USA 104, 8597-8604. Crossref, MedlineGoogle Scholar
  • Mayr E (1963). Animal Species and Evolution, Cambridge, MA: Harvard University Press. CrossrefGoogle Scholar
  • Mayr E (2004). 80 years of watching the evolutionary scenery. Science 305, 46-47. Crossref, MedlineGoogle Scholar
  • Morgan MS (2012). The World in the Model. How Economists Work and Think, Cambridge, UK: Cambridge University Press. CrossrefGoogle Scholar
  • Pearson K (1897). Mathematical contributions to the theory of evolution. On the law of ancestral heredity. Proc R Soc Lond 62, 386-412. Google Scholar
  • Rao V, Nanjundiah V, Haldane JBS (2011). Ernst Mayr and the beanbag genetics dispute. J Hist Biol 44, 233-281. Crossref, MedlineGoogle Scholar
  • Smith C, Wise CN (1989). Energy and Empire: A Biographical Study of Lord Kelvin, Cambridge, UK: Cambridge University Press. Google Scholar
  • Thomson W (1863). On the secular cooling of the Earth. Phil Mag 25, 1-14. CrossrefGoogle Scholar
  • Thomson W (1871). On geological time. Trans Glasgow Geol Soc 3, 1-28. Google Scholar
  • Tversky A, Kahneman D (1981). The framing of decisions and the psychology of choice. Science 211, 453-458. Crossref, MedlineGoogle Scholar