Coalescence time: Is it different for haploids and diploids in population genetics?

Coalescence time: Is it different for haploids and diploids in population genetics?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I'm trying to model Cyanobacteria cells divergence in 2 populations with mutation rate $-mu$ and I need to verify my model with a valid theory. I don't have much biology background and all the theories I can find are either valid for haploid population or mating populations. All I need to know is a theory for coalescence time/divergence for a haploid population without mating. So far I have this:

$$T = 2 imes N imes d D = 2 imes mu imes T$$ in which N is the number of diploid individuals, d is the number of subpopulations (in this case 2), T is the coalescence time and D is divergence. Can I apply this to my case even though it's defined for diploid population?


Coalescence time: Is it different for haploids and diploids in population genetics?

Short answer

The coalescent time is twice as high in a diploid population than in a haploid population

Long answer

Imagine you sample at random one chromosome. Then, you sample a second one and you ask the question, what is the probability that they coalesce in the previous generation. This probability is $frac{1}{PN}$, where $N$ is the population size and $P$ is the ploidy number (1 for haploids, 2 for diploids). The probability that they do not coalesce is therefore $1-frac{1}{PN}$. The probability that the two chromosomes coalesce $t$ generations ago is the probability that they don't coalesce for $t-1$ generations and then coalesce. That is the probability that the two chromosomes coalesce $t$ generations ago is

$$P(t) = left(1-frac{1}{PN} ight)^{t-1}frac{1}{PN}$$

You might recognize here a geometric distribution, where the probability of success is $frac{1}{PN}$. In order to simplify the math, let's assume that the population size is very large. In such case the above formulation is well approximated by

$$P(t) = frac{1}{PN} e^{-tfrac{1}{PN}} $$

You might now recognize the exponential distribution with rate $frac{1}{PN}$. We can now compare the mean expected time to coalescent of a diploid population and a haploid population.

The mean of the above exponential distribution is simply $PN$. Therefore, the expected time to coalescent is $2N$ in diploids and only $N$ in haploids.

Details of your post

In your post you talk about structured populations. The calculations are a little more complicated in structured populations and I don't remember on the top of my head how they work.

You talk about divergence $D$. I don't know what you mean by divergence. From coalescent results you can compute the expected heterozygosity, the number of pairwise differences, and even the whole site frequency spectrum. You can as well make those calculations for the structured coalescent model.

Yes you can, as the model speaks of allele coalescence as states the wiki article on that.

It comprises a probabilistic assessment of variation in time to common ancestry of alleles in a relatively small sample of individuals, from a much larger population.

What you are trying to check is how many generations have passed since any number of alleles were a common ancestral copy. For that we have:


This was developed by JFC Kingman (based on the Wright-Fisher previous model) for a constat population size of $N$ where $t$ is the coalescence time of $n$ alleles. In this equation the numerator is $4N$ because $2N$ is the total allele population size. You have to multiply it again by $2$ because the total time two alleles have been differing is twice the time it has been separating from the most common recent ancestor. Therefore for haploid populations you should use $2N$ in the numerator.

There are versions of this model that include mutation rate and non-constant $N$, but maybe it would be easier to simply look a good model to find your $N_e$ (effective size of a population, or the N that in a constant size population has the same genetic drift as your actual population) as it can be used in this formula when $N$ is not constant.

The population genetics of haplo-diploids and X-linked genes

From the available electrophoretic data, it is clear that haplodiploid insects have a much lower level of genetic variability than diploid insects, a difference that is only partially explained by the social structure of some haplodiploid species. The data comparing X -linked genes and autosomal genes in the same species is much more sparse and little can be inferred from it. This data is compared with theoretical analyses of X -linked genes and genes in haplodiploids. (The theoretical population genetics of X -linked genes and genes in haplodiploids are identical.) X -linked genes under directional selection will be lost or fixed more quickly than autosomal genes as selection acts more directly on X -linked genes and the effective population size is smaller. However, deleterious disease genes, maintained by mutation pressure, will give higher disease incidences at X -linked loci and hence rare mutants are easier to detect at X -linked loci. Considering the forces which can maintain balanced polymorphisms, there are much stronger restrictions on the fitness parameters at X -linked loci than at autosomal loci if genetic variability is to be maintained, and thus fewer polymorphic loci are to be expected on the X -chromosome and in haplodiploids. However, the mutation-random drift hypothesis also leads to the expectation of lower heterozygosity due to the decrease in effective population size. Thus the theoretical results fit in with the data but it is still subject to argument whether selection or mutation-random drift are maintaining most of the genetic variability at X -linked genes and genes in haplodiploids.


Despite a great deal of theoretical attention, we have limited empirical data about how ploidy influences the rate of adaptation. We evolved isogenic haploid and diploid populations of Saccharomyces cerevisiae for 200 generations in seven different environments. We measured the competitive fitness of all ancestral and evolved lines against a common competitor and find that in all seven environments, haploid lines adapted faster than diploids, significantly so in three environments. We apply theory that relates the rates of adaptation and measured effective population sizes to the properties of beneficial mutations. We obtained rough estimates of the average selection coefficients in haploids between 2% and 10% for these first selected mutations. Results were consistent with semi-dominant to dominant mutations in four environments and recessive to additive mutations in two other environments. These results are consistent with theory that predicts haploids should evolve faster than diploids at large population sizes.

Inheritance of Mitochondria (With Experiments)

Mitochondria are the cytoplasmic organelles which contain various respiratory enzymes and are main source of energy for the cell. They are found in cell in variable numbers. The complexity of structure of the mitochondria and their similarity in some ways to plastids suggest the possibility that they may be inherited in the same way as the plastids.

The mitochondria have a double membrane and contain besides respiratory enzymes of electron transport chain (the cytochromes), their own genetic determinants (DNA). There is certain amount of electron micrographic evidence for the continuity of this cytoplasmic organelle through cell division.

Although most of the mitochondrial proteins and enzymes are produced by nuclear genes, yet nearly 20% of them result due to activity of mitochondrial genes.

The fact that mitochondrion contains its own DNA has led some to speculate that it evolved from symbiotic micro-organism that gradually lost the ability to exist independently. There is enough evidence in support of this. But, some people disagree with this. Mitochondria cannot be regarded truly autonomous cytoplasmic organelles as they require both their own genes and nuclear genes in order to exist.

The mitochondrial heredity has been exemplified by yeast (saccharomyces cereviceae) and neurospora crassa.

Ephrussi’s Experiment with Yeast:

Certain strains of yeast (s. cereviceae) produce tiny colonies when grown on agar medium. Ephrussi (1953) observed that one or two out of every one thousand colonies were only about one-third or one- half of the diameter of the remainder. The small colonies are termed as petite colonies.

Cells from the normal large colonies, when spread on culture medium, further produced a small proportion of petite colonies and this happened so time after time. The cells from the small colonies were true breeding and they produced only petites.

Biochemical studies have established that the slow growth of petite colonies was due to the loss of aerobic respiratory enzymes particularly cytochrome a and b and enzyme cytochrome oxidase occurring in mitochondria of the cells and the utilization of the less efficient fermentation process by the cells.

The petite phenotype can result either from mutation of nuclear genes or from mitochondrial genes. Petite mutants resulted due to mutation in a nuclear gene follow Mendelian pattern of inheritance with segregation occurring in heterozygotes.

This type of petite mutation is called segregational petite or nuclear petite. When the individuals of petite colony are crossed to the individuals from normal large sized colony, normal zygotes are formed which produce normal cells vegetatively.

When meiosis takes place in diploid cells, haploid cells are recovered that will form petite and normal colonies in 1: 1 ratio as shown below:

Such any colonies are formed evidently due to mutant nuclear genes, they are called segregational petites.

Bores Ephrussi and his associates (1953) also found that in presence of small amount of acridine dyes such as acriflavine and euflavine many cytochrome deficient petite colonies developed which showed extra-chromosomal (Non Mendelian) inheritance for petite characters. They were called vegetative petites.

The rate of mutation was much higher at low concentrations than that normally expected for chromosomal mutation. The vegetative petites may arise directly from mutations in mitochondrial genes leading to defective mitochondria.

There are two classes of vegetative or extra-chromosomal petites:

(ii) Suppressive petites, which show different patterns of inheritance.

Neutral Extra-chromosomal or Vegetative Petites:

When a cross is made between wild type haploid yeast and neutral petite haploid yeast, normal diploid offsprings are obtained. The diploid individuals by budding process produce several normal diploids. When meiosis occurs in normal diploids, haploid ascospores are formed which produce normal haploid colonies.

If the determinants of this trait were chromosomal, one would expect normal and petite traits in 1:1 ratio in the population of haploid spore cells. This suggests that the inheritance is non-chromosomal.

The genetic basis of this type of inheritance can be explained assuming the presence of an extra-chromosomal or cytoplasmic factor [rho + ] in normal strain and missing or [rho – [N]] in neutral petite mutants. The neutral petites [rho – [N]] usually lack in mitochondrial DNA. Now, if the haploid neutral petite is crossed to haploid normal strain, the diploid would be normal.

The normal condition in diploid cells appears because of normal mitochondria with [rho + ] factor contributed by normal haploid strain. These normal mitochondria replicate and are passed on to haploid spores after meiosis in diploid cells. The mitochondria contributed by neutral petite mutant possibly do not replicate and gradually degenerate. So all the haploid spores and their descendants would be normal.

The pattern of inheritance is as follows:

Inheritance Pattern of Suppressive Petites:

The suppressive petite mutant shows different behaviour than the neutral petite. When a cross is made between haploid cells of suppressive petite and haploid cells of normal strain, diploid cells are obtained which are in part normal and in part petite and as their name indicates, they can suppress normal aerobic respiration in presence of normal cytoplasm.

The normal diploids after meiosis produce normal haploid spores while the diploid petites after budding produce diploids which may be all petites or some normal and some petites. Normal diploids after sporulation produce only normal and no petite.

It is thus obvious that suppressive petites follow Non-Mendelian pattern of inheritance. The genetic basis of this type of inheritance can be explained by assuming the presence of an extra-chromosomal factor [rho + ] in normal strain and [rho – [s]] in suppressive petites.

The genetic cause for suppressive petite is mitochondrial mutation. Unlike neutral petites, the mitochondria of suppressive petites contain mutant DNA. The mutant mitochondria can replicate and can be passed on to the progeny cells which can, in turn, express mutant phenotype. In the cross in question it is the relative proportion of normal and mutant mitochondria that determines the phenotype of the particular cell.

The diploid cells and haploid spores would be normal if normal mitochondria predominated and they would show mutant phenotype if mutant mitochondria predominated. The lack of normal segregation and also the high mutability of normal colony cells provide good evidence that vegetative petite phenotype is due to extra-chromosomal or cytoplasmic genes.

The Poky Strain in Neurospora:

There are several examples for mitochondrial enzyme deficiency which are cases of extra-chromosomal inheritance. One of the classical examples of extra-chromosomal inheritance of plasma genes came from studies of neurospora.

In this fungus, there is a slow growing mutant strain called poky. The mitochondria contain cytochromes a, b and c which are electron transport proteins necessary for oxidative phosporylation.

In poky strains, either cytochrome a or cytochrome b is absent but cytochrome c is present in excess. Poky differs from petite in that the two mutants are not deficient for the same enzymes. When poky as female parent was crossed with a normal strain as a male parent, the progeny were found to be poky.

In reciprocal cross (normal poky ♀× poky ♂), the progeny were normal. This Non-Mendelian uniparental inheritance suggested that the cytoplasm of female parent was important because the only difference between reciprocal crosses was in contribution of cytoplasm.

The male gametes in neurospora contribute negligible amount of cytoplasm just as in animals or higher plants. So, it is probable that the factor for pokyness resided somewhere in the cytoplasm. The segregation of poky from normal is never observed and the progeny of poky ♀× normal ♂ of will always be poky. Thus nuclear genotype has no effect on this particular phenotype.

Principles of Evolution, Ecology and Behavior

Chapter 1. Introduction [00:00:00]

Professor Stephen Stearns: Today we’re going to talk about Adaptive Genetic Change. And in order to set the stage for this, before I get into the slides, I would like you to consider the following proposition. Every evolutionary change on the planet, that has ever led to something that you think is cool and interesting and is well designed, whether it is the brain of a bat, or the vertebrate immune system, or the beautiful structure of the ribosome, or the precision of meiosis, has occurred through a process of adaptive genetic change. A mutation has occurred that had an effect on a process or a structure and, if it increased the reproductive success of the organism that it was in, it was retained by evolution and if it did not, it disappeared.

So what we’re talking about today is a look into a very basic mechanism that is operating in all of life and is causing the accumulation of information. Now, these are the keys to the lecture. In the middle of the lecture you’re going to get a couple of slides that have tables and equations on them and stuff like that, and I’ll lead you through one of those tables, and I’ll ask you to go through another one. But they’re not the point. The point is this. There are four major genetic systems, and there are some interesting exceptions to them. But you can capture a big chunk of the variation in the genetics of the organisms on the planet with just four systems. Okay?

They are sexual versus asexual and haploid versus diploid, and those differences make a big difference to how fast evolution occurs. You guys are sexual diploids and you evolve slowly, and your pathogens are asexual haploids and they evolve fast. That’s important, the kind of thing you ought to know.

Now when we get into the equations of population genetics–they’re just algebra–the point is that you can always go find them in a book and you can program them pretty easily, even in simple spreadsheet programs like Excel, and you can understand their basic properties by playing around with them. If you go on the web and go to Google and type Hardy-Weinberg equations, you’re going to get 20 websites around the country where some professor of population genetics has put up some package for students to play with and it’s going to generate all kinds of beautiful pictures and stuff like that.

It’s real easy for you to lay your hands on these tools now. What’s important for you to know is (a) that they are there and represent something important (b) what their major consequences are and (c) how to get a hold of them when you need them. I am not going to ask you to repeat the derivation of the Hardy-Weinberg equations on a mid-term. Okay? But I do expect you to know why they’re important and what they’re about.

The third thing that I want you to take home from this lecture is that when adaptive genetic change starts to occur, it is virtually always slow at the beginning, fast in the middle and slow at the end. So that if you are looking at a graph of gene frequencies over time, it looks like an S and that’s the third thing. That’s it, there’s the lecture, ta-da. Now background to this decision.

When, in 1993, Rolf Hoekstra and I began to put together the first edition of this book, I asked Rolf to be my co-author because he is a population geneticist. He has a marvelously clear mind. He likes those kinds of equations and he’s really good at them. And we, Rolf and I, went around and we asked about fifteen of the leading evolutionary biologists in the world, “What’s important? What should every biologist know about evolution? This is for everybody. This is for doctors and molecular biologists and developmental biologists, everybody. What should they know?” And I said, “Rolf, your job is to figure out the part from population genetics.” And he came back, after about two weeks, and he said, “You know Steve, I don’t think there is anything.”

I was shocked. I said, “Rolf, you’re a population geneticist. This stuff is important, right?” And then he said, “You know, the way we normally teach population genetics, which is as a big bunch of equations that are about drift and frequency change under selection and so forth, most people end up not really needing that. What they need to know is that there are four main genetic systems and that genetic change is slow, fast, slow.”

So that’s where this lecture came from. It came from somebody thinking deeply about that, and asking lots of people. Now if you like this, there’s a whole field there, there’s a whole bunch of wonderful stuff that you can do. But these are the things that everybody I think should know.

So here’s the outline. I’m going to give you the context, the historical context that led to the concentration on genetics in evolutionary biology. I’ll talk a little about the main genetic systems. Then I’ll run through changes in gene frequencies under selection and, if I have time, I’ll get to selection on quantitative traits. If I don’t get to selection on quantitative traits, it will be because I have engaged in a dialog with you about some interesting puzzles, and that dialog is more important to me than getting to quantitative genetics. Okay?

Chapter 2. History of Genetics [00:05:45]

So here’s how genetics became a key element in evolutionary thought. Darwin did not have a plausible genetic mechanism and he failed to read Mendel’s paper, which came out six years after he wrote The Origin, but before he constructed some of the later editions of his book, and so he reacted by incorporating elements of Lamarck into his later editions. If you read the Sixth Edition of The Origin of Species, it’s got some really Lamarckian statements in it, inheritance of acquired characteristics.

Anybody here know what the problem was with Darwin’s original model? Anybody know how Darwin thought genetics worked in 1859? He had a model of blending inheritance. That meant that he thought that when the gametes were formed, gemmules from all over the body, that had been out there soaking up information about the environment, swam down into the gametes, into the gonads, carrying with them information about the environment into the gametes, and that then when the zygote was formed, that the information from the mother and the information from the father blended together like two liquids.

In other words, he didn’t think of genes as distinct material particles. He thought of them as fluids. Now if I give you a glass of red wine and a glass of white wine, and I pour them together, I get pink wine. And if I take that glass of pink wine and I pour it together with another glass of white wine, I get even lighter pink. And you can see that if I continue this, pretty soon red disappears completely. The problem with blending inheritance is that the parental condition gets blended out and there isn’t really a preservation of information.

That’s why Darwin came under attack. And Mendel wasn’t known, and he resorted to Lamarckianism, and he was wrong. So genetics became an issue. In the year 1900 there was a simultaneous rediscovery of Mendel’s Laws, and at that point people went back and they read Mendel’s paper, and they realized that they had missed this 35 years earlier.

Then the so-called ‘fly group’ of Thomas Hunt Morgan and Sturtevant and Bridges, who were working at Cal-Tech, demonstrated that genes are carried on chromosomes. And enough then was known about cytology, so that we knew that chromosomes had an elaborate kind of behavior, at mitosis and meiosis, and people then, about 1915, showed that in fact the behavior of chromosomes was consistent with Mendel’s Laws. They didn’t know at that point what chromosomes were made out of. They had no notion of the genetic code, but they could establish experimentally that genes were on chromosomes and that was done by 1915.

However, there were still issues about whether all of this would actually work at the population level. It was not immediately clear that you could take Mendelian genetics and then construct populations out of it, that obeyed Mendel’s Laws, and have natural selection work. To do that actually required a fair amount of math, and the people who did it were Ronald Fisher, J.B.S. Haldane and Sewall Wright, and they did it between about 1918 and 1932.

In so doing, they also invented much of what is now regarded as basic statistics. So Fisher had to invent analysis of variance in order to understand quantitative genetics, and Wright had to invent path coefficients in order to understand how pedigrees translate into patterns of inheritance. So these guys laid the foundations.

As a result of that, genetics really became regarded as kind of the core of evolutionary biology during the twentieth century, and there’s been a tremendous concentration on it. And it is still true that many people will not accept a claim about any evolutionary process unless it can be shown to be consistent with genetics. That’s sort of a Gold Standard. If you can’t do it genetically, if you can demonstrate it’s genetically illogical, then a claim just falls theoretically you don’t even have to go out and get the data. Therefore, of course, the Young Turks have great joy in discovering cases that don’t fit and come up with epigenetics and lots of stuff like that. At any rate, that’s ahead of you that’s not today.

Chapter 3. Different Genetic Systems [00:10:56]

The genetic system of a species is really the basic determinant of its rate of change. So we have sexual versus asexual species–there are complications to this–and we have haploid versus diploid, and there are other ploidy levels. Can anybody name me asexual vertebrates not sexual vertebrates but asexual vertebrates? Anybody ever heard of an asexual vertebrate? Fish, amphibians, reptiles, birds, mammals?

Student: Wasn’t there a recent documentation of a shark? You mentioned it.

Professor Stephen Stearns: I could imagine that a shark might be capable of being asexual. I haven’t heard of that case.

Student: I think it was kind of a [inaudible]

Professor Stephen Stearns: Yes, there are some. There are some asexual lizards. There are some interestingly asexual fish. There are some frogs that manage to be kind of quasi-asexual by using male sperm but then not incorporating it into gametes–excuse me, in the developing baby. So they use it just to stimulate development. There’s one case in captivity of an asexual turkey.

But asexual types are not frequent among vertebrates. They are common in plants. Of course, most bacterial sex is asexual, although bacteria do have a bit of sex. You’re diploid your adult large form is diploid. Anybody know what group of plants is haploid in the state in which you normally see them in nature, where the big recognizable thing is haploid? I’ll show you one in a minute. I just wanted to check. Mosses mosses are haploid. Okay, so this is what’s going on with these four systems.

Basically the difference between sexual haploids and sexual diploids is the point in the lifecycle where meiosis occurs. If the adult is diploid and meiosis occurs in gonads in the adults that produce gametes, and then the zygote form develops so that all of the cells in the developing organism are diploid, you get the diploid cycle. If you have the zygote having meiosis immediately, or shortly after being formed, so that the developing young are haploid, then you get a haploid adult. So this is what moss do and this is what we do. Then we have asexual haploids and asexual diploids, and at least in outline they look pretty simple. Asexual diploid, just makes a copy of itself just goes through mitosis, makes babies. Asexual haploid, same kind of thing.

So those are the four major genetic systems. There are many, many variations on them. So the asexual haploids are things like the tuberculosis pathogen, blue-green algae, the bread mould, the penicillin fungus, cellular slime moulds, and they constitute the bulk of the organisms on the planet.

Sexual haploids are things like moss, and red algae most fungi are sexual haploids. In this case you can see that’s where the haploid adult is in the lifecycle. There are where the gametes are formed. They are formed up on the head of the adult. You can see the pink and the blue are coding for the male and the female gametes, on different parts of the gametophyte. Then the zygote forms where the sperm gets into an ovule, on the tip of the plant, and then the young actually develop up here. So this is haploid up here and then the spores go out–meiosis has occurred in here and the spores go out as haploid spores. So that is a sexual haploid lifecycle.

The asexual diploids include the dynoflagelates there are about ten groups of the protoctists–that’s the modern name for what you think of as protozoa, but it also includes some single-celled organisms that have chloroplasts in them–the unicellular algae, some protozoa, some unicellular fungi. There are a lot of multi-cellular animals that are asexual diploids, and this one here, the bdelloid rotifer is one of them. It is called a scandalous ancient asexual. Anybody know why the word ‘scandalous’ is used in this context? Yes? What?

Student: No males.

Professor Stephen Stearns: There are no males bdelloid rotifers do not have any males, nobody’s ever seen a male bdelloid rotifer. But that’s not the scandal I mean, if you’re a male you might think it was scandalous. Right? [Laughter] But for an evolutionary biologist, no, that’s not scandalous.

Well it actually has to do with this part of it right here. Almost all asexual organisms on the planet, that are multi-cellular–leaving out the bacteria–but all the multi-cellular ones are derived from sexual ancestors and originated relatively recently, with a few exceptions, and this is one of the exceptions. There is a whole huge body of literature on the evolution of sex that says one of the things that sex is good for is that it allows long persistence.

We see that sexual things have been in a sexual state on the Tree of Life for a long time, and the asexual things have branched off of it, and we don’t see very many ancient ones. The reason for that–we’ll come to that, when we get to the evolution of sex–is that both because of mutations and because of pathogens, sex repairs damage and defends the organism against attack. So this is a low maintenance, poorly defended organism, and it looks like it’s been around without sex for perhaps 300 million years. The scandal is we don’t know how it did it. Okay? That’s why it’s called a scandalous ancient asexual. Yes, that’s a very intellectual definition of scandal I agree.

Okay, sexual diploids. You guys are sexual diploids, this bee is a sexual diploid, and that flower is a sexual diploid. They have this kind of lifecycle, as is sketched here, the one that I talked about earlier. So about twenty animal phyla are sexual diploids. Many plants, most multi-cellular plants are, and there are some algae protozoa and fungi that are sexual diploids. They include the malaria and sleeping sickness pathogens. There are some things that don’t fit the sexual diploid part doesn’t fit, for malaria and sleeping sickness.

The things that are alternating between being haploid and diploid, with neither one dominating, are mushrooms, microsporidian parasites, which are things that are actually quite common in many insects, and the malaria–malaria has a very complex lifecycle. So it is haploid inside your red blood cells, it’s diploid at a certain point in a mosquito, and it’s moving back and forth.

The things that alternate sexual and asexual reproduction: there are some rotifers, some cnidarians, some water fleas, some annelids. There’s a great little annelid that lives in the bottom of the Harbor of Naples in Italy, and it actually does everything. It can be asexual–the same species–it can be asexual it can be born as a female and turn into a male it can be born as a male and turn into a female and it can be born as both and do both. So some things are really flexible, but most things aren’t. And the timing of sexuality and asexuality is an important part of the lifecycle of all of these things.

Last fall, for example, there were huge jellyfish blooms over much of the world’s oceans, and that’s part of a complex lifecycle in which there is an asexual phase on the floor of the ocean, that builds up what looks like a stack of dinner plates, and then the top plate flips off and turns into a jellyfish. It goes off as a jellyfish and has sex and makes larvae, and then goes down and turns into an asexual thing on the bottom that makes stacks of dinner plates. So there’s a lot of variety out there. All of these things probably evolved from an asexual haploid and we say that because we believe that the bacterial state was the ancestral condition.

Chapter 4. Math of Genetics [00:20:45]

Okay, now genetics constrains evolution, and genetics is doing something to evolutionary thought which is about what chemistry does to metabolism and structure, and is about what physics does to chemistry. Okay? There’s a broad analogy there. If you want to understand molecular and cell biology, you learn a lot of chemistry. If you want to understand some evolution, then you need to learn a little bit about how genetics constrains evolution, and so you need a little math. So I’m going to give you some simple math, and here’s some terminology to soak up.

So we’re going to represent these ideas by symbols. We’re going to call alleles Aa. So those are two alleles at one locus a little exercise of genetic terminology. We’re going to let p be the frequency of A1, and q the frequency of A2. And frequency just means the following: some traits are Mendelian, which means that they’re easily recognized in the phenotype.

One of the Mendelian traits in humans is the ability to curl your tongue. I am a tongue curler. Okay? How many of you can curl your tongues? Okay, let’s say it’s about 45. How many of you cannot curl your tongues? Let’s say it’s about 30. So the frequency of tongue curling is going to be–I’m just making up the numbers, right?󈞙 divided by 75. That’s how we get the number. And by the way, the frequency of the other one is going to be 1 minus that frequency, because p plus q is equal to 1 and we’ll let s be the selection coefficient, which is measuring the reproductive success of the organism carrying this trait, the difference that it makes.

And if we look at the genetic change in asexual haploids, basically what one does is make a table of the process and it is moving from young, in the present generation, through the adult stage, to young in the next generation. So we try to go through one generation. This is an active Cartesian reduction. We’re taking a complex process and breaking it down into the parts that are essential for the thing that we’re thinking about.

We have genotype frequencies–for genotypes A1 and A2 they’re p and q–and we have relative fitnesses up here. The only place that selection is making any difference, on this whole page that’s in front of you, is right here. And basically what–our placing that there is an act that means the following. We are only going to think about the case in which there is some difference in the juvenile survival of A2 it’s different from A1. If it makes it to adulthood, there’s no difference we don’t put that down in the table. So this is a case where we’re just–you know, it’s a special case–we’re just looking at the juvenile survival difference between A1 and A2. What happens?

Well it changes the frequencies of A1 and A2 in the adults. Basically it changes them by reducing the number of A2s. Some of them have died out that’s 1 minus s, that’s what the 1 minus s is doing. You can take these expressions here and you can simplify them so they look like this–it’s just a little bit of algebra–and because these are the frequencies in the adults, the young in the next generation have exactly those frequencies, because there is no selective difference in the adult stage. Okay? That’s what that table means.

Now a little bit about this. This little process that I’ve gone through, which probably looks like remarkably simplistic bookkeeping to you, is actually the part of doing applied mathematics which is the most difficult. It is the translation of a process into something analytically simple, that you can deal with. In the act of doing it, you make certain assumptions to simplify the situation, and by writing them down it helps you to remember what assumptions you made and what thing you’re actually looking at.

We’re not looking at all of evolution here, we’re looking at a very special case we’re looking at asexual haploids where selective differences only occur in juveniles. What happens is you get a change in the gene frequencies of the adults that result from that process, and then that exact change is passed on to the next generation. So that’s the part of this process that I want you to remember. You can go look this stuff up any time. You don’t need to memorize that. You can program this as recursion equations and apply them repeatedly. Okay?

Now let’s do it for sexual diploids. In the sexual diploids, you’ve already been exposed to the Hardy-Weinberg Law, this p 2 2pq q 2 law. In order to get it, we have to assume random mating in a big population. The reason you need the big population is so that those p’s and q’s are actually accurate measures. In a small population they’re noisy, but in a big population they are good stable estimates. And if there’s random mating, that means that matings are occurring in proportion to the frequency of each type.

So you get a Punnet diagram like this. You have the probability of one of these alleles occurring and one parent is going to p, the other allele in that parent q. Same for the other parent. These are the possible zygotes that will result from that. This one has probability p 2 this one has probability q 2 and these two together have probability 2pq. That’s just simple basic probability theory.

Now, the important thing about the Hardy-Weinberg Law is that it implies that there’s no change from one generation to the next. The gene frequencies under Hardy-Weinberg don’t change. That means that the information that’s been accumulated on what works in the population doesn’t change for random reasons. If it’s going to change, it’s going to change because that big population is going to come under selection. Okay?

That means that replication is accurate and fair, at the level of the population, just as it is at the level of the cell. Now, of course, gene drift is going on, but we’re not so worried here about gene drift, because gene drift is affecting things that aren’t making a difference to selection, and we’re building models of selection. What Hardy-Weinberg does is tell you if there isn’t any drift, if there isn’t any mutation, if there isn’t any selection, if there isn’t any migration in the population, and if you don’t have a high mutation rate, things are going to stay the same. So if they’re changing, one of those things is making a difference. Okay? And that gives us a baseline.

So it gives us a baseline to see the process of selection occurring, but it also means that random mating in large populations preserves information on what worked in the past. So you don’t have to invent everything all over again. And a note for future lectures, these are also the conditions that remove conflict by guaranteeing fairness. So basically the Hardy-Weinberg situation is one in which everything that was in the population last generation has exactly the same chance of getting into the next generation, in proportion to its frequency nothing is going to change.

Okay, here’s a genetic counseling problem, and I’m going to take a little time on this. We go back to John and Jill. They’ve fallen in love, they want to get married, but they’re worried. John’s brother died of a genetic disease, and that is a nasty one. It’s recessive, it’s lethal, it kills anybody that carries it before they can reproduce. That’s fact one. Jill doesn’t have any special history of this disease in her family, but that history’s not well known, and so we estimate the probability that Jill carries the disease from the frequency of deaths in the general population, and that frequency is 1% to make it easier for you to calculate. Okay?

What’s the probability that they will have a child that dies from this disease in childhood? The probability is .03. Your problem is not to tell me .03, your problem is to tell me why did I use that equation? Okay? So take a look at that equation for a minute, take a look at that problem, and let’s go through and pull it apart. Can anybody see why either the two-thirds or the one-quarter is in the equation?

Student: We know that his brother has a recessive version of the lethal gene, and therefore John is either heterozygous–doesn’t look like it’s dominant, looks like it’s recessive. So if he is heterozygous or homozygous recessive, then he’s carrying the gene which is what we’re worried about. So there’s a two-thirds chance that he is either carrying it or actually has the disease.

Professor Stephen Stearns: That’s correct. The only slip you made in expressing that is that we know that if they are going to have a child that has the defect, they both must be heterozygous, and so we’re concentrating specifically on what’s the probability that they’re heterozygous. You then gave me that probability. Does anybody have a problem seeing why the probability that John is a heterozygote is two-thirds, rather than 50% excuse me, that the baby is a heterozygote is two-thirds? Yes?

Student: So we’re going to keep him as a [inaudible].

Professor Stephen Stearns: Yes, you do. Okay. This is for the baby. Okay? If John is a heterozygote and if Jill is a heterozygote, they can have either a homozygous recessive, and that one will die before birth they can have a homozygous dominant, perfectly healthy or they can have a heterozygote. The probability of the homozygote recessive is 25%, the probability of the homozygous dominant is 25%, and the probability of the heterozygote is 50%.

But, the probability that John and Jill will have a baby that dies from this disease in childhood is going to be therefore this one-quarter. This two-thirds is going to be the probability that John is a heterozygote. How do we know that John–John’s parents were both heterozygotes?

Student: They had a recessive son.

Professor Stephen Stearns: They had a recessive son. John’s parents had to be heterozygotes. Therefore, given that John’s parents were heterozygotes, his probability is two-thirds. We know he survived to adulthood the other 25% died. So of those who survived to adulthood, two-thirds are heterozygotes and one-third are homozygotes.

Student: Why can’t one be homozygote recessive and the other one be heterozygous? [Inaudible].

Professor Stephen Stearns: Because if one, the parent–if one parent was a homozygote, it could only have been homozygous dominant, because it survived to adulthood, to have a child. And if the other parent was a heterozygote, the only possibilities for the children are both heterozygotes and that wasn’t the case, because John’s brother died. Okay? So this is the probability that John is a heterozygote. This is the probability that if John and Jill have a baby, it will have the problem. What’s this thing in the middle𔃀 times 0.9 times 0.1?

Student: [Inaudible]

Professor Stephen Stearns: Right. That’s the probability that Jill is a heterozygote, and we get that from here. The square root of 1% is .1. 1 minus .1 is .9. This is q and this is p and this is 2pq. Okay? Where did we get this from? That’s in Jill’s part of the population. Those are the baby–oh you’ve got it.

Student: The probability has to be out of the entire population, and the long-term population, they can’t reproduce–[inaudible].

Professor Stephen Stearns: Right. So we have to correct the percentages for the ones that have died. Yes, you got it. Do you see how much goes into dissecting an equation like that? But because we’ve set up the logical apparatus, we can go through a sequence of steps and say, “Okay, first we know they both have to be heterozygous. Then, if they are both heterozygous, the probability that Jack is, is two-thirds the probability that Jill is, is 2pq, corrected for the fact that 1% have died. She has survived, so we have to correct for that. Then this is the probability that their baby has the disease.” That’s the kind of process that one goes through when thinking about population genetics.

This is the table for sexual diploids that reflects this kind of thinking. It is more complicated because now we have to keep track of both the haploid and the diploid condition. So we have these haploid gametes, with frequencies p and q. We have the diploid zygotes. Then another process comes in.

We can have a selective difference–I made a +S here I made a -S in the last one. I made that change deliberately, just so that you’d see it as arbitrary because we can make S negative or positive itself. Right? S doesn’t have to be a positive number neither does H. Anybody have an idea what H might be in there for? It’s in there to represent something that’s going on in genetics. Yes?

Student: Is it heritability?

Professor Stephen Stearns: No it’s not heritability, in this context. Okay? Yes?

Student: Is it the Marsh’s coefficient for being heterozygous? [Inaudible]

Professor Stephen Stearns: Not in this context. Good idea, but no. What is it about that heterozygote that doesn’t necessarily have anything to do with selection? H expresses dominance. It expresses the degree to which A1 is covering up A2 in the phenotype.

Dominance itself is not something that’s always there. If there isn’t any dominance, then the heterozygote is just exactly halfway in the phenotype between the two homozygotes. So H is a little mathematical symbol that allows us to deal with situations in which either there’s a lot of dominance or none at all. If H = 0, there’s no dominance. Okay? No excuse me, the way it’s set up, if H = 0, then A1, A2 is just exactly like A1, A1, and there is dominance. So we have to make H something non-zero, in order to express deviations from dominance, the way this one is set up.

At any rate, the–what’s going on here is essentially the same kind of selection process. There is a selective difference, which is disadvantaging A2. So A2 doesn’t survive as well as A1. When it is in the heterozygous form, it may do better, if there’s some dominance. And that results in a more complicated set of equations.

W here is defined as this big term. We have basically the adults being p 2 , 2 pq times 1 plus hs. And A2, A2 has a frequency of q 2 times 1 plus s, which is the selection coefficient over here. So q is changing the most, and to the degree that A2 can be seen in the heterozygote, it will also be affected by s, but it won’t be affected if there is complete dominance. Okay? So if h is zero, there’s no effect of selection on the heterozygote this term cancels out. The result of that is that you get these frequencies forming the next generations.

Now there a couple of ways of setting up this whole derivation, and in the Second Edition of the book, Box 4.1 and Box 4.2 do it a little bit differently. You might want to just step through those things in section. The goal here is not to memorize how to derive the equations, or to memorize the equations. Because, as I’ve said, you can always pick them up in a book, or pull them off the web, and you can find programs that will do it all the time. The goal is to understand what it is that population geneticists are thinking about when they set it up this way, and what power it gives them.

So let me just show you what happens when you program these recursion equations. By the way, they’re called recursion equations because they give us the frequency in the next generation as a function of the frequency in this generation. So they form kind of a Markov chain. They allow us to calculate next time from this time that’s something computers are really good at.

Chapter 5. Rates of Change in Different Genetic Types [00:40:42]

So this is the take-home message of all that analysis: you look at genetic change, in asexual haploids, sexual diploids, and it’s slow at the beginning, fast in the middle it’s slow at the end. The haploids change faster than the diploids, and the dominants change faster than the recessives. So let’s step through that and see if you can tell me why this is the case.

First let’s take the asexual haploids, or haploids of any kind. Why is it that haploids change gene frequencies faster, for given selection pressures, than do diploids? Yes?

Student: The entire gene–all the genes are inherited. It’s not all [inaudible] it’s sort of a complete replication of them, the order.

Professor Stephen Stearns: Well that is what a haploid is, but that doesn’t explain why it’s faster. The statement is true, but it’s not an answer to my question. Another try. Yes?

Student: Well all the [inaudible], the bad genes die off. [Inaudible]

Professor Stephen Stearns: Okay, that’s going in the right direction, but I think it can be expressed even more clearly. Yes?

Student: [Inaudible]

Professor Stephen Stearns: That’s interesting. That actually gets into the evolution of sex. I’m actually thinking though about an answer that has more to do with developmental biology and not so much to do with sex, at this point. Um, actually I think that, uh, your answer is partially correct, but it’s more complicated than what I was looking for. [Laughs] Yes?

Student: Is it that all asexuals can reproduce?

Professor Stephen Stearns: No, it’s not that all of the asexuals can reproduce. Many of them die as juveniles. It has to do with haploidy versus diploidy. Yes?

Student: Then if the organism has the allele that’s different, it’s going to best.

Professor Stephen Stearns: Yes.

Student: And that’s when this other comes along.

Professor Stephen Stearns: Every gene is expressed, and there’s no dominance covering up any hidden genetic information. The genes are exposed to selection, in haploids. Yes?

Student: So why is that faster than a dominant zygote, [inaudible]?

Professor Stephen Stearns: Good. We’ll find out as we go through the next questions. Okay? So the haploids are faster than a dominant diploid because–?

Student: [Inaudible]. That’s why it’s a recessive gene.

Professor Stephen Stearns: Basically, yes. The heterozygotes react like the dominant, but contain the recessive. And so if you’re measuring the rate of evolution as the rate at which the dominant takes over the population, it’s carrying along in the heterozygotes a bunch of recessives. Okay? They’re doing just as well as it is. So development, which is covering up the difference between the two, is actually giving the recessives an advantage and slowing down the rate at which the dominant can take over. Okay?

Recessive diploid I think that you now see why that would be the slowest. If we have an advantageous recessive gene, it gets slowed down by the fact that when it’s in the heterozygote, its effects are being covered up by the other allele. Okay, why is it S-shaped? Why is the trait–let’s do it for a dominant diploid sexual. Okay? Slow at the beginning, fast in the middle, really slow at the end. Let’s concentrate on first why this is really slow at the end, and then we can also look at why a recessive diploid sexual is really slow at the beginning.

What do you have to think about in order to pull the answer out of that diagram? What proportion of the population is in heterozygous form, as you get near the end? If you’re a dominant diploid sexual and you’re at a frequency of .9, 81% of you are going to be dominant homozygotes 18% of you are going to be heterozygotes and 1% of you are going to be recessive. There are eighteen times as many heterozygotes as there are recessive homozygotes. Selection, at that point, is trying to eliminate that 1% of recessive homozygotes. It can’t touch the 18%.

If you carry that process over, where we’re dealing with .01 and .99, it gets even more extreme. A tinier and tinier fraction of that population is a recessive homozygote. A larger and larger fraction of the remaining recessive alleles are tied up in heterozygotes, where selection can’t operate. So this thing just slows way down. It gets harder and harder to get rid of the disadvantageous alleles, because a larger and larger proportion of them–not an absolute number but a larger proportion of them–are hidden in the heterozygotes.

The same thinking describes why evolutionary change in a recessive diploid, where the recessive gene has the advantage, is very slow at the beginning. If a new recessive mutation comes into the population, it’s a very low frequency. Its frequency is 1 divided by the number of individuals in the population. The only things it can mate with are dominant forms. All of its babies are heterozygotes.

So at the beginning selection can’t operate on it at all. Only after two heterozygotes manage to get together and mate, which means they must have come to fairly high frequency, will they have a baby that is a recessive homozygote that selection will operate on. So it takes awhile to get this going. And because of dominance, it takes a long time to build up to the point where it accelerates. But then at the end it’s fast, because at the end the thing that’s being selected is the recessive, and it speeds up as it goes through.

Okay, I thought this would happen, uh, it’s time for class to end, and I’m just getting to quantitative genetics, and so I’m going to let you pick up quantitative genetics from the lecture notes and from the reading. I do want to indicate as potential paper topics though that quantitative genetics has got some of the most interesting questions that we encounter in evolutionary biology, and that it includes questions like the heritability of intelligence, the heritability of SAT scores–those are all things where the apparatus you need to analyze the issue is given to you by quantitative genetics.

And there is a good paper on this, and I have put it up on the course website, under Recommended Readings there’s now a folder called Recommended Readings, PDFs of Recommended Readings. You can find this paper and some other ones in there, if that’s something that strikes your fancy. Go take a look at the title and abstract. So this is the summary of today’s lecture. And the next time we’re going to talk about the origin and the maintenance of genetic variation.


The IICR and the PSMC

In this study we have shown that it is always possible to find a demographic history involving only population size changes that perfectly explains any distribution of coalescence times T2, even when this distribution was actually generated by a model in which there was no population size change. To illustrate this we first focused on a simple n-island model for which the pdf of T2 can be derived, and obtained an analytic formula of the fictitious population size change history, named IICR, as a function of the number of islands and the migration rate of the model. We also showed that the IICR can be computed for any (neutral) model from any observed distribution of T2 values. We showed that the empirical and theoretical IICRs were identical when the latter could be obtained. We then obtained the empirical IICR under models involving changes in migration rates or in deme size. This suggests that, at least for a sample of size 2, even an infinite amount of genetic data from independent loci alone may not allow to distinguish structure and population size change models. In addition, the history of population size changes in Figure 5 would suggest that four demographic changes occured, two expansions and two contractions, whereas only three changes of the migration rate were actually simulated.

The theory presented here is simple and general. It allows us to predict the IICR and state that any method ignoring population structure will try to estimate the IICR. In the case of complex demographic histories with population structure, interpreting the IICR as a population size or a ratio of population sizes can be misleading. To clarify the difference between the IICR and an effective population size we can consider the following rationale. If a structured population could be summarized by a single Ne then a change in gene flow should be matched by a simultaneous change in Ne. In that case, changes in Ne would be misleading (as the size would not change) but their timing might still be meaningful. For instance a ‘hump’ inferred using diCal or the PSMC could be easily translated into a change in gene flow patterns. In such a case, we could reinterpret the changes in Ne by saying, for each hump, that gene flow decreased and then increased again. What the IICR shows is that it is not that simple. The fact that a structured model can only be summarized by a trajectory of spurious population sizes means that the timing of changes in migration rates will interact in a complex manner, hence generating IICR profiles that may be only loosely related with population-related events. This can be seen in Figures 5 and 6 (and the Supplementary Figures S1–S4).

Human history with changes in migration rates. This figure shows, in red, the history of population size changes inferred by Li and Durbin (2011) from the complete diploid genome sequences of a Chinese male (YH) (Wang et al., 2008). The 10 green curves correspond to the IICR of 10 independent replicates of the same demographic history involving three changes in migration rates. The x axis represents time in years in a log scale, whereas the y axis represents real or inferred population size in units of diploid genomes. The times at which these changes occur are represented by the vertical arrows at 2.52 Myr ago, 0.95 Myr ago and 0.24 Myr ago. The blue shaded areas correspond to (1) the beginning of the Pleistocene (Pleist.) at 2.57–2.60 Myr ago, (2) the beginning of the Middle Pleistocene (Mid. Pleist.) at 0.77–0.79 Myr ago and (3) the oldest known fossils of anatomically modern humans (AMH) at 195–198 kyr ago. Following Li and Durbin (2011), we assumed that the mutation rate was μ=2.5 × 10 −8 and that generation time was 25 years. We also kept their ratio between mutation and recombination rates. Each deme had a size of 530 diploids and the total number of haploid genomes was thus constant and equal to 10 600. A full color version of this figure is available at the Heredity journal online.

These results do not invalidate the use of panmictic models for the reconstruction of population history as long as population structure can indeed be neglected (Figure 3 and Supplementary Figure S3), but it certainly stresses the need for caution in the interpretation of this history. When Li and Durbin published their landmark study in 2011 (Li and Durbin, 2011), they showed for the first time that it was possible to reconstruct the demographic history of a population by using the genome of a single diploid individual. It was a remarkable feat based on the SMC model introduced by McVean and Cardin (2005). Its application to various species (Groenen et al., 2012 Prado-Martinez et al., 2013 Zhao et al., 2013 Zhan et al., 2013 Green et al., 2014 Hung et al., 2014 Zhou et al., 2014) has been revolutionary and led to the development of new methods (Sheehan et al., 2013 Schiffels and Durbin, 2013 Liu and Fu, 2015). However, the increasing number of studies pointing at the effect of population structure (Leblois et al., 2006 Nielsen and Beaumont, 2009 Chikhi et al., 2010 Heller et al., 2013 Paz-Vinas et al., 2013) or changes in population structure (Wakeley, 1999, 2001 Wakeley and Aliacar, 2001 Städler et al., 2009 Broquet et al., 2010 Heller et al., 2013 Paz-Vinas et al., 2013) in generating spurious changes in inferred population size suggested that new models should be analyzed that can incorporate population structure (Goldstein and Chikhi, 2002 Harding and McVean, 2004. For instance, Mazet et al. (2015) have recently shown that genomic data from a single diploid individual can be used to distinguish an n-island model from a model with a single population size change. Their likelihood-based approach uses the distribution of coalescence times for a sample of size two (T2). This study represents an interesting alternative as it should be possible to determine whether a model of population structure is more likely than a model of population size change to explain a particular data set. The approach of Mazet et al. (2015) is however limited to a very simple model of population size change. Demographic models inferred by several recent methods (Li and Durbin, 2011 Schiffels and Durbin, 2013 Sheehan et al., 2013 Liu and Fu, 2015) are not limited to one population size change. They are thus more realistic and, as we have shown here, this comes at a certain price. As they allow for several tens of population size changes, they mimic more precisely the genomic patterns arising from structured models. Therefore, they reconstruct a demographic history that can optimally explain any particular pattern of genomic variation only in terms of population size changes. As we have shown here, and until we can separate models (see below), this casts doubts on any history reconstructed from genomic data by the above-mentioned approaches. Indeed, if any pattern of (neutral) genomic variation can be interpreted efficiently in terms of population size changes, then how can we identify the cases where the observed genomic data were not generated by population size changes?

Li and Durbin (2011) acknowledged that one should be cautious when interpreting the changes inferred by their method. For instance, they showed (see their Supplementary Materials, Figure S5) that when one population of constant size N splits in two half-sized populations that later merge again, their method will identify a change of N even though N actually never changed. Still, their method is implicitly or explicitly used and interpreted in terms of population size changes, including by themselves. There are therefore several issues that need to be addressed. One issue is to determine whether it is possible to separate models of population size change from models of population structure (Mazet et al., 2015, and see perspectives below). When population structure can be ignored, our results actually contribute to the validation of the PSMC (Figure 3 and Supplementary Figure S3). We found that the PSMC performed impressively well and generally reconstructed the IICR with great precision. It is therefore at this stage one of the best methods (Sheehan et al., 2013 Schiffels and Durbin, 2013 Liu and Fu, 2015) published so far and remains a landmark in population genetics inference.

The IICR: toward a critical interpretation of effective population sizes

The concept of effective size is central to population genetics. It allows population geneticists to replace complex real-world populations by equivalent and simpler Wright–Fisher populations that would have the same ‘rate of genetic drift’ (Wakeley and Sargsyan, 2009). The concept is however far from trivial and it is not always clear what authors mean when they mention the Ne of a particular species or population, as rightly noted by Sjödin et al. (2005) among others. Several Nes have been defined depending on the property of interest (inbreeding, variance in allele frequency over time and so on) and its relationship to genetic drift (Wakeley and Sargsyan, 2009). This is a complex issue that we do not aim at reviewing or discussing in detail here.

The IICR is related to the coalescent Ne (Sjödin et al., 2005 Wakeley and Sargsyan, 2009) but it is explicitly variable with time. Given that most species are likely to be spatially structured, interpreting the IICR as a simple (coalescent) effective size may generate serious misinterpretations.

The IICR is a trajectory of instantaneous ‘population sizes’ that fully explains complex models without loss of information. The circumstances under which this trajectory can indeed be appropriately summarized by one effective population size are still to be determined and will depend on the questions asked and the amount of markers used. For instance, for ‘strong migration scenarios’ (M=500 and M=100) the inferred population size changes are recent and abrupt, and the period during which the population was stationary will be significant in generating patterns of genetic diversity (Wakeley, 1999, 2001 Wakeley and Aliacar, 2001 Charlesworth et al., 2003 Wakeley and Sargsyan, 2009). However, even for such cases of low genetic differentiation (FST≈1/2001=0.0005 and FST≈1/401=0.0025, respectively), the spurious population size drop could perhaps be detected with genomic information. For M=100 the population size decrease starts between t=0.05 and t=0.10, which for N=100 to N=1000 could correspond to values between 5 and 100 generations ago, respectively. In other words, an n-island model may actually behave differently from a Wright–Fisher model even under some ‘strong migration’ conditions. The approximation will therefore be valid for some questions and data sets, and invalid for others (Charlesworth et al., 2003 Wakeley and Sargsyan, 2009). Note also that for very low migration rates (M=0.1, M=0.2, corresponding to very high FST≈0.71 and FST≈0.56, respectively) the recent history is also characterized by a stationary IICR. Most genes will then coalesce within demes and only a small proportion will provide information on the ancient IICR values and therefore on population structure (see Mazet et al., 2015).

The IICR and the complex history of species: toward a critical reevaluation of population genetics inference

The PSMC has now been applied to many species, generating curves that are very similar to those represented in Figure 5. In Figure 5a, the population size changes detected by the PSMC were not correlated in a simple manner to the changes in gene flow or deme size. This is likely the result of two factors. First, a structured population cannot always be summarized by a single number. Second, the PSMC requires a discretized distribution of time that may lead to missing abrupt changes such as those simulated here. For real data sets where changes in migration rates or in population size may be smoother, this may not be so problematic. For the human data, assuming a simple model of population structure, we inferred periods of change in gene flow that correspond to major transitions in the recent human evolutionary history, including the emergence of anatomically modern humans. Given that humans are likely to have been subjected to a complex history of spatial expansions and contractions and changes in the levels of gene flow (Wakeley, 1999, 2001 Harpending and Rogers, 2000 Goldstein and Chikhi, 2002 Harding and McVean, 2004), our results are necessarily simplistic but suggest that a reinterpretation of panmictic models may be needed and possible. Our results are at odds with a history of population crashes and increases depicted in various population genetic studies, but it is in phase with fossil data and provides a more realistic interpretation framework. We thus wish to call for a critical reappraisal of what can be inferred from genetic or genomic data. The histories inferred by methods ignoring structure represent a first approximation but they are unlikely to provide us with the information we need to better understand the recent evolutionary history of humans or other species. It is difficult to imagine that humans have been one single panmictic population whose size has changed over the last few million years (that is, since the appearance of the Homo genus). This does not minimize the achievement of the Li and Durbin (2011) study, but it does question how inference from genetic data are sometimes presented and interpreted.


We focused throughout this study on T2, the time to the most recent common ancestor for a sample of size two. For larger samples we can define Tk as the time during which there are k lineages. It would be important to determine whether, for structured models, the IICR estimated from the distribution of Tk varies significantly with k. If that were the case, that would suggest that it is possible to separate structure from population size change with the distributions of Tk for various k values. The reason for this is that population size change models should generate identical IICR for all Tk distributions, as they should all correspond to the same (real) history of population size change. To our knowledge the distribution of Tk for k>2 has not yet been explicitly derived for the n-island or other structured models (but see interesting studies such as Herbots, 1994 Wakeley and Aliacar, 2001 Wakeley, 2001 Nielsen and Wakeley, 2001).

One simple solution to this question is to simulate genetic data under a structured model of interest and then compare the simulated Tk distributions under that model and the Tk distributions of the corresponding model of population size change identified using the T2 distribution. Preliminary simulations suggest that the Tk distributions produce different IICRs, at least for some models of population structure. For instance, we predict that the analysis of human genomic data with the PSMC and MSMC (multiple sequentially Markovian coalescent) should produce different curves under a model of population structure but identical ones for a model of population size change. This prediction can be tested by comparing the PSMC and MSMC curves of Li and Durbin (2011) and Schiffels and Durbin (2013), respectively. Visual inspection of the corresponding figures suggests indeed that they are different, and therefore that our model of population structure is a valid alternative. However, we stress that an independent study is required. Indeed, the history reconstructed by these methods with real data is not very precise and the two curves are not easily comparable because they are expected to provide poor estimates at different moments. Any difference between the two analyses should thus be evaluated and validated with simulations.

Finally, one underlying assumption of our study is that the coalescent represents a reasonable model for the genealogy of the genes sampled. Given that the coalescent is an approximation of the true gene genealogy, and that there are species for which the coalescent may not be the most appropriate model (Wakeley and Sargsyan, 2009), we should insist that our results can, at this stage, only be considered for coalescent-like genealogies. The development of similar approaches for other genealogical models would definitely be a very interesting avenue of research.


Throughout we are concerned with neutral genetic diversity at a single nonrecombining locus in a haploid population. As usual, N is the population size. The results should hold for a diploid population with gametic migration if we replace N with 2N. The population model we consider is a modification of the well-known Moran model of reproduction (M oran 1958, 1962). In the Moran model, a single randomly chosen individual reproduces each time step. To keep the population size constant a randomly chosen individual, but not the offspring, dies to make room for the offspring.

In our model, which was first presented in E ldon and W akeley (2006), a single randomly chosen individual (the parent) reproduces each time step. With probability 1 – ɛ the parent has one offspring. Alternatively, with probability ɛ the parent has ψN – 1 offspring (a large reproduction event) with 0 < ψ < 1. To keep population size constant when a large reproduction event occurs, a total of ψN – 1 individuals die to make room for the new offspring. In our model the parent always persists. The parameter ψ represents the fraction of the population that is replaced by the offspring of the parent. E ldon and W akeley (2006) show that this modified Moran model of overlapping generations gives rise to a coalescent process that allows for asynchronous multiple mergers of ancestral lines, i.e., is of the same type as the ancestral process considered by P itman (1999) and S agitov (1999).

For ease of presentation, we define the following quantities: Nγ, cN, λγ, and IA. The quantity Nγ is the coalescence timescale in our model. The coalescence timescale is proportional to the number of time steps, on average, it takes for two individuals to coalesce (in a single population). It depends on the value of ɛ that we assume has the form ɛ ≡ 2φ/N γ for some constants φ and γ with 0 < φ, γ < ∞. In our model, the coalescence timescale is N γ /2 time steps when 0 < γ < 2. In the usual Moran model, the timescale is N 2 /2 time steps, which is also the value of Nγ when γ ≥ 2.

For a single population, E ldon and W akeley (2006) show that different coalescent processes result depending on γ. Multiple mergers of ancestral lines are allowed in the coalescent process when 0 < γ ≤ 2, while Kingman's coalescent (K ingman 1982a,b) results when γ > 2. The probability that two individuals do coalesce in a single time step is denoted by cN and depends on ɛ. The rate λγ of coalescence of two individuals is obtained from cN by “speeding up” time by a factor of Nγ. When 0 < γ ≤ 2, λγ depends on the reproduction parameters φ and ψ. In mathematical notation, Nγ is expressed as , and the coalescence probability cN is

For notational convenience, we also define the indicator function IA as

For example, Iγς = 1 if γ < 2, and zero otherwise. In our model a large reproduction event occurs when the number of offspring of the parent equals ψN – 1. These events occur with probability ɛ. Our choice of ɛ = 2φ/N γ results in the coalescence timescale being Nγ. The rate λγ of coalescence is then

The coalescence rate λγ is a key quantity in nearly all of our results below.

Model of subdivision:

We now consider the finite island model of population subdivision with the simplifying assumption that migration does not change the sizes of the subpopulations (N agylaki 1980 S trobeck 1987 H erbots 1997). Reproduction in all the subpopulations follows the modified Moran model described above. The discrete-time ancestral process for a sample of size 2 is a Markov chain with transition probabilities given in Equation A1 in the appendix .

We are concerned with small migration rates, specifically those on the order of 1/Nγ time steps. This means that a single individual resides in the same subpopulation for 2Nγ time steps, on average, before migrating to a different subpopulation. When 0 < γ < 2, each individual resides in the same subpopulation for only N γ time steps, on average. This time can be much shorter (when 0 < γ < 1) than the usual average of N time steps assumed in Wright𠄿isher populations. In other words, a large number of individuals migrate during N time steps when 0 < γ < 1. We let m denote the probability that a single individual resided in a different subpopulation in the previous time step and model m as m = mγ ≡ κ/(2Nγ) in which κ is a finite constant (0 < κ < ∞).

To illustrate the difference between our migration rate κ and the usual migration rate Nm let M* ≡ N 2 mγ denote a migration rate scaled in units of N 2 time steps (or N generations). This corresponds to the usual “Nm” in the Wright𠄿isher model. Substituting for mγ gives . If, for example, γ = , then . When γ < 2 the migration rate M* is very high i.e., as since κ is finite. However, in our modified model of reproduction coalescence also occurs on the timescale of N 3/2 time steps (or generations when γ = ) and thus 𠇌ounteracts” the effects of high migration rate.

The main results of this work concern expected coalescence times (Equations 3 and 5) and FST-like measures (Equations 10�). We also derive the densities of the coalescence times (see appendix ). The densities are used to derive distribution functions for the number of segregating sites between two sequences (see the appendix ), which in turn yield expressions for FST-like measures including mutation (Equations 13 and 14).

The distributions of the coalescence times are functions of λγ:

DNA sequences differ because they have accumulated mutations from the time of their most recent common ancestor until they are sampled. By assuming a very low mutation rate, S latkin (1991) derived an expression for FST in terms of expected values of coalescence times. The time until two genes coalesce is therefore a fundamental quantity in theoretical work on structured populations. Given two genes sampled from a structured population, two different coalescence times arise that are of interest: the time T0 until two genes sampled from the same subpopulation coalesce and time T1 until two genes sampled from different subpopulations reach a common ancestor. The densities of T0 and T1 were previously derived under the structured coalescent by T akahata (1988) and N ath and G riffiths (1993) in the case of two subpopulations and by H erbots (1997) for any finite number of subpopulations.

Given the transition rates in Equation A2, we can obtain the distributions of the coalescence times T0 and T1 (see the appendix ). Figure 1 shows the distributions of T0 and T1, respectively, as functions of time for different values of ψ (the fraction of the population replaced by the offspring of a single individual). As ψ increases (i.e., tends to 1), the coalescence times T0 and T1 become very short.

The densities and of times to coalescence for two genes sampled from the same (T0), or different (T1), subpopulations as functions of time for different values of ψ when the number of subpopulations D = 3 and φ = κ = 1. The coalescence timescale is N 2 /2 in a and c and N γ /2 with 0 < γ < 2 in b and d. The solid lines in a and c are the densities obtained under the standard coalescent (i.e., γ > 2).

The expected value and variance of T0 are both less than the corresponding quantities for T1. Specifically,

The significance of the result in Equation 3 is best understood by an example. When γ < 2, say then the timescale is Nγ = N 3/2 , and λγ = ψ 2 (assuming φ = 1). Our migration parameter is then κ = mγNγ = mγN 3/2 . Migration is scaled in units of N 2 time steps in a standard Moran population. If we let M* ≡ N 2 mγ be a scaled migration rate in units of N 2 time steps, then if mγ is of order 1/N 3/2 as above, M* becomes very high in a large population. Specifically, since γ = , we have (as ), since κ is a constant. The result in Equation 3 says that even when one will still see evidence of population structure in DNA sequence data, since coalescence occurs on a timescale of time steps (in a large population) when γ = . In fact, as whenever 0 < γ < 2.

Similarly, is always less than . In addition, the expected value and variance of T0 are inversely proportional to λγ and thus will be small when the probability of large reproduction events is close to one. The expressions for E(T0) and E(T1) (Equation 3) obtained under the usual reproduction models (N ei and F eldman 1972 L i 1976 G riffiths 1981) can be recovered by assuming that large reproduction events occur on a longer timescale (γ > 2) than usual (e.g., Wright𠄿isher) sampling, in which case λγ = 1. The variances of T0 and T1 were first derived by H ey (1991) under the structured coalescent and can be recovered in the same way from Equation 4.

A many-demes limit:

The structured coalescent simplifies under certain migration mechanisms when the number of subpopulations is taken to be much greater than the sample size of DNA sequences (W akeley 1998). The convergence of the ancestral process under a many-demes limit (i.e., when ) follows from the work of M öhle (1998), which shows how events in a stochastic process that occur on different timescales can be separated (see the appendix for a more detailed description). We consider the ancestral process in the limit and . Switching the order of the limits leads to the same coalescent process (see the appendix ).

The limit process of two genes sampled from a population subdivided into very many subpopulations (), each of which is very large (), is of the form P*e t G * in which P* and G* are given by Equations A16 and A19, respectively. The form of P* tells us that the ancestral process immediately enters the continuous-time process if the two genes are sampled from two different demes. If the two genes are sampled from the same subpopulation, they coalesce with probability or enter the continuous-time process by moving to different subpopulations with probability . In the continuous-time process the two lines wait with exponential time with rate on a timescale of DNγ time steps until they coalesce. The ancestral process under the many-demes limit model (Equation A19) differs from the limit process obtained when the number of subpopulations is finite (Equation A2), in that G* has a zero entry for the transition where the two alleles enter the same subpopulation, after having been separated. When D < ∞, the corresponding rate is κ/(D – 1) (Equation A2). Ancestral lines can coalesce, however, only if they reside in the same subpopulation. The matrix B* (Equation A18) ensures that the two lines do arrive in the same subpopulation.

Again we are interested in the coalescence times T0 and T1 of two genes sampled from the same, or different, subpopulations, respectively. The distribution of T0 is a mixture distribution (see appendix ), and we obtain

The expressions for the expected value and variance of T0 and T1 obtained under the many-demes limit model (Equations 5 and 6) are functions of λγ and κ in the same way as the corresponding expected values and variances (Equations 3 and 4) obtained for a finite number of subpopulations. In particular, we always expect a shorter coalescence time for two ancestral lines sampled from the same subpopulation than if they were sampled from different subpopulations.

Deriving FST and NST:

The quantity FST is commonly used to assess levels of population subdivision. The inbreeding coefficient of an individual relative to a collection of subpopulations, FIT, can be attributed to nonrandom mating within a subpopulation (FIS) and to differences among subpopulations (FST W right 1951). Two sequences are identical by descent if they have not experienced mutation from the time of their most recent common ancestral sequence until they are sampled. If we let f0 and f denote the probability of identity by descent of two genes sampled from the same subpopulation (f0) and at random from the collection of subpopulations (f), we can express FST as

(N ei 1973). By the definition of FST in terms of inbreeding coefficients (as in Equation 7), FST depends on the mutation rate (μ). By forcing μ to be very low S latkin (1991) derived an approximation of FST that is a function of expectation of coalescence times and is given by

in which T is the coalescence time of two lines randomly sampled from the collection of subpopulations, T0 is the time to coalescence of two lines from the same subpopulation, and μ is the mutation rate.

To obtain an expression of in terms of coalescence times under skewed offspring distribution, we can proceed by first obtaining the expected coalescence time E(T) of two genes randomly sampled from the collection of subpopulations, which is readily obtained from Equations 3 and A10 and is given by

When the number of subpopulations D is finite, the general form of is

For example, when 0 < γ < 2, the rate of coalescence is λγ = ψ 2 (with φ = 1) and Equation 10 gives . The expression for in Equation 10 has the same form as the one derived by S latkin (1991) under the standard coalescent. The key difference is that, under skewed offspring distribution, FST is a function of the rate λγ (Equation 2) of coalescence and thus a function of the reproduction parameters φ and ψ. The result that S latkin (1991) obtained can be recovered from Equation 10 by taking γ > 2, in which case λγ = 1 (recall that the probability of large reproduction events ∝ 1/Nγ).

When the number of subpopulations , we obtain from Equation 10

In Equation 11 we have taken two limits: and . Switching the order of the limits gives the same limit result for FST in Equation 11.

Following W right (1951), the value of FST has often been used to estimate levels of gene flow. Figure 2 shows , obtained from Equation 11, as a function of ψ for different values of FST ( Figure 2a ) and φ ( Figure 2b ) and for two different values of λγ. Since FST is a function of ψ and φ, so is any estimate of gene flow based on FST, as Figure 2 clearly shows.

The estimate of migration rate from Equation 11 as a function of ψ. (a) when FST = 0.1 (solid line), FST = 0.2 (dashed line), and FST = 0.5 (dotted line). (b) with FST = 0.1 and φ = 1 (solid line), φ = 2 (dashed line), and φ = 5 (dotted line).

L ynch and C rease (1990) used the number of pairwise sequence differences of DNA sequences to estimate levels of genetic heterogeneity. In that context, L ynch and C rease (1990) introduced the quantity NST that has the form in which and are the average number of pairwise differences between sequences sampled from different, or the same, subpopulations, respectively. If mutation rate is constant and mutations occur according to the infinite-sites model (W atterson 1975), then NST estimates (S latkin 1993). Using the results obtained for expected coalescence times (Equation 3), we obtain as in Equation 11 for the many-demes limit model of population subdivision and

when D < ∞. The effect of skewed offspring distribution is the same on NST as it is on FST. Under the infinite-sites mutation model we do not need an assumption of small mutation rate to obtain an expression of NST in terms of coalescence times, unlike the case for FST. As NST is defined, the mutation parameter cancels out (S latkin 1993).

Number of segregating sites between pairs of sequences:

By the definition of FST in terms of probabilities of identity by descent (Equation 7), FST depends on mutation. E ldon and W akeley (2006) show that the limit process (as ) of our model of skewed offspring distribution predicts nonzero levels of genetic variation only when γ > 1. If we (as in E ldon and W akeley 2006) let μ denote the probability of mutation for each offspring in a single time step, we define the mutation rate θ as (and γ > 1). We can include mutation in an expression for FST by first obtaining the probability distributions of the number of segregating sites, under the infinite-sites model (W atterson 1975), between two genes given a model of population subdivision with migration. Let K0 denote the number of segregating sites between two genes sampled from the same subpopulation and K denote the number of segregating sites between two genes sampled randomly from the collection of subpopulations. The distributions of K0 and K are derived in the appendix , along with the distribution of the number of segregating sites K1 between two genes sampled from different subpopulations. Then by the definition of FST given in Equation 7 we obtain

From Equation 14 we conclude that mutation can affect FST only if θ is large relative to λγ. The expression for FST in Equation 14 has the same form as the one derived by W ilkinson -H erbots (1998) and by N ei (1975) and T akahata (1983) by other methods, under the Wright𠄿isher model, including mutation. In Figure 3 , FST from Equation 14 is graphed as a function of ψ for different values of θ and κ. The interpretation of Figure 3 is that FST, as a function of ψ, can vary considerably when the timescale of coalescence (and migration) is in units of N γ /2 generations with 1 < γ < 2 ( Figure 3, b and d ).

The quantity FST from Equation 14 as a function of ψ (with φ = 1) for different values of θ, κ, and rate of coalescence (λγ). Solid lines, θ = 10 dashed lines, θ = 1 dotted lines, θ = 0.1.

Nei's genetic distance d:

Not all indicators of separation between populations depend on λγ. N ei' s (1972) genetic distance is more appropriate for estimating divergence time between species, and FST-like quantities are more suitable for inferring population structure within species (S latkin 1991). N ei' s (1972) genetic distance measure is given by in which f0 and f1 are the probabilities of identity by descent of two genes sampled from the same or different subpopulations, respectively, and we add the subscript N to remind us that time is discrete. If we now assume that 0 < μE(ti) < 1 for i = 0, 1, then using the Maclaurin series expansion of the logarithmic function we obtain (previously obtained by S latkin 1991) in which t0 and t1 are the coalescence times for two genes sampled from the same, or different, subpopulations, respectively. To obtain an expression of d for continuous time, we assume that the product converges to a constant θ as (and γ > 1). Rewriting the approximation for dN gives

which has the continuous-time limit

However, using the expressions for E(T1) and E(T0) (Equation 3), we obtain and so N ei' s (1972) genetic distance is independent of λγ. Another way of deriving an expression for d is to note that the probability of identity by descent of two genes is the same as the probability that no mutations occur from the time they are sampled until they reach a common ancestor. Thus fi = P(Ki = 0) for i = 0, 1. We can therefore write

for any model of population subdivision. For the many-demes limit model under consideration,

Using either the limit approach (Equation 16) or the substitution approach (Equation 17) in the many-demes limit model, and assuming small θ/κ (i.e., 0 < θ/κ < 1), d is of the form θ/κ. The same result is obtained for a finite number of subpopulations. Indeed, when D is finite, we obtain from Equations A28 and A29

Even if (θ/2)(D – 1)/κ > 1, we have from Equations 17 and 18 that d is not a function of λγ. Thus N ei' s (1972) genetic distance can be used to estimate divergence times of species even if one or both species have skewed offspring distribution, since d is proportional to the time of separation of two populations (N ei 1972 S latkin 1991).

Wrapping Up Haploid vs. Diploid

Diploid and haploid cells and organisms occur in nature. The differences between haploid and diploid chromosome sets are in the number of chromosomes present, and in the types of cells in which they occur. Haploid cells contain half the chromosome count of diploid cells, and are mostly germ cells, whereas diploid cells are somatic cells. Some organisms have a haploid and a diploid life cycle, such as algae. Diploid cells reproduce via mitosis creating daughter cells identical to the parent cells and each other. Haploid, on the other hand, reproduce via meiosis producing offsprings or cells different from other parent but containing a little bit of each parent and each cell different from the other.

Let’s put everything into practice. Try this Biology practice question:

Looking for more Biology practice?

Check out our other articles on Biology.

You can also find thousands of practice questions on lets you customize your learning experience to target practice where you need the most help. We’ll give you challenging practice questions to help you achieve mastery in Biology.

Are you a teacher or administrator interested in boosting Biology student outcomes?

Learn more about our school licenses here.

Leave a Comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.


Our PCA for chromosome 2L illustrates the influence that inversions can have on polymorphism data. Importantly, the structure created by In(2L)t extends several megabases beyond the inversion’s breakpoint 28,42 . Therefore, we excluded all of chromosome 2L for the demographic analyses in this study and recommend that future demographic studies of natural populations of Drosophila melanogaster address the potential effect of this inversion prior to demographic inference. We also identified an excess of singletons among low-recombining regions (smaller than 1.5 cM/Mb) in the autosomal data of our Swedish sample, which is likely caused by linked negative selection and should therefore be filtered out when conducting demographic inference.

Early studies about the demographic history of European populations did not consider the effect of gene flow when estimating the age of the population split 3,4,5 . However, Li and Stephan 4 already predicted that accounting for gene flow would lead to older divergence times, owing to the homogenizing effect of migration on allele frequencies in isolated populations. Our results confirm this prediction (Table 1, Table S2) and show that taking gene flow into account almost doubles the estimated divergence time (2247 for model NOMIG vs. 4139 for model ASYMIG, Table S2). However, estimations presented in this study (Table 1) are substantially lower than previous estimations of the divergence time (despite taking gene flow into account) because different mutation rates and generation times were used to rescale the timing of demographic events from coalescence units into years. While increasingly sophisticated methods are improving the performance of demographic estimations, it is also becoming evident that empirical measurements of mutation rates or generation times are critical to assess the absolute age of evolutionary events. However, demographic models are often employed to predict neutral distributions of statistics of genetic variation that are then used as null-distributions in statistical tests of selection and for such purposes, no information about mutation rates or generation times is required.

A major difference between the demographic histories estimated using MSMC2 and dadi is the absence, in the former, of an obvious bottleneck in the ancestral lineages of the European sample. The existence of a population size bottleneck in the demographic past of cosmopolitan populations has been reported for the first time by Li and Stephan 4 as well as Thornton and Andolfatto 27 and has since been considered a major confounding effect for the detection of selective sweeps 24 . More work is needed in order to evaluate whether such bottlenecks are artefacts caused by over-simplistic parameterizations of demographic models or whether MSMC2 (or similar methods) cannot detect such population size bottlenecks under specific conditions. This could be investigated by testing whether MSMC2 can identify a bottleneck when used to analyze simulated datasets obtained from the ASYMIG model. Another promising avenue is RELATE, a recently published demographic inference method based on ancestral recombination graph (ARG) reconstruction 43 . Similarly to MSMC2, RELATE can be used to estimate continuous changes in population sizes but it also allows for the analyses of significantly larger sample sizes, which is expected to improve the quality of demographic inferences. It is also noteworthy that MSMC2 appears to confirm the existence of a recent admixture event between African and European lineages while the RASYMIG model in the dadi analysis (in which gene flow could start after population divergence) provided a poorer fit to the observed data than the model with ongoing gene flow. More work is needed to identify whether this corroborates the earlier description of a recent event of cosmopolitan pulse-admixture into the African gene pool 12 or rather reflects a loss of statistical signal in the most recent past. This work could be achieved by evaluating the performance of dadi and MSMC2 using simulations with recent gene flow and by applying the recently published ARG-based method, which facilitate the estimation of gene flow and local admixture mapping 43,44 . MSMC2 and dadi rely on different summarization of population genomic variation, which in principle could be capturing different aspects of the evolutionary signal but it remains unclear how precisely this may contribute to the differences observed in this study.

Yeast Adaptation Study Finds Diploids Evolve More Slowly than Haploids

Experimental evolution is a good way to enhance our current understanding of how genomes&mdashor sets of chromosomes in an organism&rsquos cells&mdashevolve and the role of individual mutations in adaptation.

Organisms differ in ploidy, or how many copies of the genome they carry in their cells. For example, says Gregory Lang, assistant professor of biological sciences, humans have two copies of our genome in each cell&mdashone from the mom and one from the dad. Bacteria have one copy of their genome in each cell. The common strawberry has eight copies. In other words, humans are diploid, bacteria are haploid, and strawberries are octoploid.

Understanding the influence of ploidy on evolution is only possible through experimental evolution organisms such as yeast. Not only can yeast undergo as many as ten generations in a 24-hour period, it can also be stably maintained at different ploidies.

Lang, along with graduate student Daniel A. Marad and post-doc Sean W. Buskirk, set out to answer a basic question: How do the rates of adaptation differ between haploid and diploid organisms? They found that diploids, with two copies of the genome, evolve more slowly than haploids, which have only one copy. They also found that the beneficial mutations diploids pick up look different compared to what is seen in haploids.

Their results have been published in a paper in Nature Ecology & Evolution called &ldquoAltered access to beneficial mutations slows adaptation and biases fixed mutations in diploids.&rdquo

To understand these dynamics, the team measured the rate of adaptation for 48 diploid populations through 4,000 generations of the yeast Saccharomyces cerevisiae and compared these results to previously evolved haploid populations. They sequenced two clones each from 24 populations after 2,000 generations and performed whole-genome whole-population time-course sequencing on two populations.

&ldquoUsing a powerful combination of experimental evolution and whole-genome sequencing, we determined the rate of adaptation and the types of mutations that arise in populations of yeast that are identical except for the number of copies of their genome,&rdquo says Lang.

&ldquoWe show that diploids adapt more slowly than haploids, that ploidy alters the spectrum of beneficial mutations, and that the prevalence of homozygous mutations depends on their genomic position,&rdquo says Marad. &ldquoIn addition, we validate haploid-specific, diploid-specific and shared mutational targets by reconstruction.&rdquo

According to Lang, evolutionary biology has a rich history, with many classical theories still in need of experimental tests. One theory, known as Haldane's sieve postulates that beneficial mutations that are recessive, or have no selective benefit when present in only one copy in a diploid, are unable to increase in frequency in diploid populations.

Lang and his colleagues tested Haldane's sieve by taking beneficial mutations that arose in haploids and moving them individually into a different context: diploids, to see if they have a selective advantage. Consistent with Haldane's sieve they found that beneficial mutations that arise in diploids are not recessive, but that most beneficial mutations in haploids are.

&ldquoCollectively, this work fills a gap in our understanding of how ploidy impacts adaptation, and provides empirical support for the hypothesis that diploid populations have altered access to beneficial mutations,&rdquo says Marad.

Lang says a key takeaway from the paper is that ploidy, which varies considerably in the natural world, has a significant effect on how genomes can and will evolve.

This work was supported by the Charles E. Kaufman Foundation of The Pittsburgh Foundation.

Watch the video: Κληρονομικό Μελάνωμα: από την ταυτοποίηση στην εξατομίκευση (May 2022).