The most important number in human biology

There is one number that encapsulates our evolutionary history. It dictates the degree to which natural selection has shaped us. It has impacted how our bodies are put together, how our genome works, how we interact with parasites, and how susceptible we are to disease. It is instrumental to understanding what is, and isn’t functional in the genome. It is the most important number in human biology.

This number is the effective population size for the human species.

The reason for this post is that buried in the supplemental data of a recent paper by Bergeron et al., there are two new estimates of the effective population size for humans: 6,600 and 1,500 (rounding to two significant digits). That's really low. Previous estimates were in the 10,000 range.

To understand why this downward revision has profound effects on how we think about function and evolution of the human genome, we will need to cover some background.

The effective population size, is not a number that you can easily count or calculate. It is mathematical abstraction derived from population estimates of diversity in the genome. It is the equivalent of saying “population x” has the properties of an idealized population with this size. In this ideal population, all members of reproduce, all mating is random, the number of males and females are equal, the population size does not fluctuate, and other effects, such as rates of recombination, are eliminated. It asks, given how humans evolve in the real world, what is the equivalent idealized population that has all of the same general properties.

So why is this number so important?

To understand the importance of this number we need to discus some of the work of Motoo Kimura, the most important scientist of the past century that no one knows about. He showed that the effective population size places a limit on the effectiveness of natural selection. Natural selection will only act on a genomic change if

|s| > 1/2Ne

So lets break this equation down.

First, we need to introduce “s”, the selection coefficient. It is the difference in reproductive success of a genetic variant compared to the wildtype. In other words, if I change a genome to create a new variant and I measure how many offspring this variant has in comparison to the original unmodified organism, then “s” is a measure of this change in the number of offspring. If there is no change, then s=0 and we would call this variant “neutral”. If, on average, the variant has 1% more offspring than the original organism, then s = 0.01, and the variant would be said to be under positive selection. If, again on average, the variant has 1% more offspring than the original organism, then s = -0.01, and the variant would be selected against. One thing to keep in mind is that “s” is an average - it’s as if we compare the variant and the unmodified organisms over and over again and calculate the average difference in offspring number. At the individual level, the number of offspring can vary quite a bit. This simply reflects the randomness of life, filled with chance and stochastic events. It is also important to realize that we have two copies of our genome, and that any new change is present on only one copy. So for every offspring our variant has, it will pass down the change only half of the time.

Secondly, we have Ne. This is the effective population size that I describe earlier.

To fully understand the implications of this equation, we need to cover one more concept. It is the idea that when a new genomic change is created, something that we refer to as a mutation, it has one of two ultimate fates: at some point in the future, it is lost as no descendents inherit it, or it spreads throughout the entire population so that all members of a population have it. So it either dies, never to be seen again, or it permanently alters the species and becomes a permanent part of the genome, altering it forever. The former event is called extinction, the later is fixation.

So now thinking back on the equation, it divides genomic changes into two classes. Those, where |s| > 1/2Ne, are in the realm of natural selection. When s is positive, natural selection will favour them pushing them towards fixation. Where s is negative, natural selection will purge them from the population, making them go extinct. Note that despite the effects of selection, most new mutations will go extinct. This is simply due to the stochastic nature of inheritance. Then there are genomic changes where |s| < 1/2Ne. These mutations are for all intensive purposes neutral. The chance that they reach fixation, as opposed to extinction, are purely due to random chance alone. Most of the mutations that go on to be fixed in the human genome fall in this second class. Most of evolution in the human genome (i.e., permanent changes in the genome due to the fixation of variants over extended geological times) is neutral in nature.

So where does this barrier lie? What separates changes that can be seen by, from those that are hidden from, natural selection? This divide is highly dependent on the effective population size. That’s why this number is so important.

With this new lower estimates for the effective population size for humans, we have to come to grips with the fact that humans evolve in a very weak selection regime. This means that any “functional” part of the genome needs to offer a quite substantial fitness advantage (where |s|> 1/2Ne), or else it will be inevitably lost due to the onslot of mutations constantly altering the genome. Any biochemically active part of the genome can simply be there because biology is messy, and that genomic regions associated with biochemical activity will not be subject to negative selection unless they posse a significant burden (where |s| > 1/2Ne).

This most important number should always be at the back of our mind when we think about the evolution of our genome, its various bits and parts, and why we are the way we are.