6.2.1: Gene Expression in Evolution - Biology

6.2.1: Gene Expression in Evolution - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Learning Objectives

  • Using examples, describe how changes in gene expression can be association with changes in phenotype and evolution.

Mutations can occur in both cis-elements and trans-factors; both can result in altered patterns of gene expression. If an altered pattern of gene expression results in a selective advantage (or at least do not produce a major disadvantage), they may be selected and maintained in future populations. They may even contribute to the evolution of new species. An example of a sequence change in an enhancer is found in the Pitx gene.

Example: Pitx expression in Stickleback

The three-spined stickleback provides an example of natural selection of a mutation in a cis-regulatory element. This fish occurs in two forms: (1) populations that inhabit deep, open water and have a spiny pelvic fin that deters larger predator fish from feeding on them; (2) populations from shallow water environments and lack this spiny pelvic fin. In shallow water, it appears that a long, spiny pelvic fin would be a disadvantage because it frequently contacts the sediment at the bottom of the pond and allows parasitic insects in the sediment to invade the stickleback. Researchers compared gene sequences of individuals from both deep and shallow water environments. They observed that in embryos from the deep-water population, a gene called Pitx was expressed in several groups of cells, including those that developed into the pelvic fin. Embryos from the shallow-water population expressed Pitx in the same groups of cells as the other population, with an important exception: Pitx was not expressed in the pelvic fin primordium in the shallow-water population. Further genetic analysis showed that the absence of Pitx gene expression from the developing pelvic fin of shallow-water stickleback was due to the absence (mutation) of a particular enhancer element upstream of Pitx.

Figure (PageIndex{1}): Development of a large, spiny pelvic fin in deep-water stickleback (left) depends on the presence of a particular enhancer element upstream of a gene called Pitx. Mutants lacking this element, and therefore the large pelvic fin (right), have been selected for in shallow-water environments. (Wikipedia-Richard Wheeler-GFDL)

Thinking about the mutation

Consider the enhancer element mutated in the shallow-water sticklebacks:

What is the function of that DNA sequence? What type of protein would bind there?

Do you think this mutation acts in a dominant or recessive manner?

Which of these processes are affected by the mutation: DNA replication, transcription, splicing, and/or translation?

Would it be possible for another mutation to reverse the effects of this mutation and a shallow-water stickleback with a long fin?

Example: Hemoglobin expression in placental mammals.

Hemoglobin is the oxygen-carrying component of red blood cells (erythrocytes). Hemoglobin usually exists as tetramers of four non-covalently bound hemoglobin molecules. Each hemoglobin molecule consists of a globin polypeptide with a covalently attached heme molecule. Heme is made through a specialized metabolic pathway and is then bound to globin polypeptide through post-translational modification.

The composition of the tetramers changes during development. From early childhood onward, most tetramers are of the type (mathbf{alpha})2(mathbf{eta})2, which means they contain of two copies of each of two slightly different globin proteins named (alpha) and (eta). A small amount of adult hemoglobin is (alpha)2(delta)2, which has (delta) globin instead of the more common (eta) globin. Other tetrameric combinations predominate before birth: (zeta)2(varepsilon)2 is most abundant in embryos, and (alpha)2(gamma)2 is most abundant in fetuses. Although the six globin proteins ((alpha) = alpha, (eta) = beta , (gamma) = gamma, (delta) =delta, (varepsilon) =epsilon , (zeta) = zeta) are very similar to each other, they do have slightly different functional properties. For example, fetal hemoglobin has a higher oxygen affinity than adult hemoglobin, allowing the fetus to more effectively extract oxygen from maternal blood. The specialized (gamma) globin genes that are characteristic of fetal hemoglobin are found only in placental mammals.

Figure (PageIndex{3}): Expression of globin genes during prenatal and postnatal development in humans. The organs in which globin genes are primarily expressed at each developmental stage are also indicated. (Origianl-Deyholos-CC:AN)

Each of these globin polypeptides is encoded by a different gene. In humans, globin genes are located in clusters on two chromosomes (Figure (PageIndex{4})). We can infer that these clusters arose through a series of duplications of an ancestral globin gene. Gene duplication events can occur through rare errors in processes such as DNA replication, meiosis, or transposition. The duplicated genes can accumulate mutations independently of each other. Mutations can occur in either the regulatory regions (e.g. promoter regions), or in the coding regions, or both. In this way, the promoters of globin genes have evolved to be expressed at different phases of development, and to produce proteins optimized for the prenatal environment.

Figure (PageIndex{4}): Fragments of human chromosome 11 and human chromosome 16 on which are located clusters of (eta)-like and (alpha)-like goblin genes, respectively. Additional globin genes (( heta), (mu)) have also been described by some researchers, but are not shown here. (Origianl-Deyholos-CC:AN)

Of course, not all mutations are beneficial: some mutations can lead to inactivation of one or more of the products of a gene duplication. This can produce what is called a pseudogene. Examples of pseudogenes ((psi)) are also found in the globin clusters. Pseudogenes have mutations that prevent them from being expressed at all. The globin genes provide an example of how gene duplication and mutation, followed by selection, allows genes to evolve specialized expression patterns and functions. Many genes have evolved as gene families in this way, although they are not always clustered together as are the globins.

Exercise (PageIndex{1})

Individuals with diseases such as sickle cell disease or (eta)-thalassemia have mutations that either cause a malformed protein or the absence of the HBB (beta-globin) protein. How could understanding the normal expression of other hemoglobin proteins help develop new therapies for these patients? What tools and processes might be used?


Recent clinical trials are exploring the possibility of using gene editing to express fetal hemoglobin in sickle cell and beta-thalessemia patients. This process isolates stem cells from patients, edits the DNA in vitro, and then transplants the edited cells back into the patient (reviewed in

A news report about one of these patients is at this link

Selected references:

Ye L, Wang J, Tan Y, et al. (2016) Genome editing using CRISPR-Cas9 to create the HPFH genotype in HSPCs: An approach for treating sickle cell disease and β-thalassemia. Proc Natl Acad Sci U S A. 2016;113(38):10661-10665. doi:10.1073/pnas.1612075113 (

Weber L, Frait G, Felix T, et al. (2020) Editing a γ-globin repressor binding site restores fetal hemoglobin synthesis and corrects the sickle cell disease phenotype. Science Advances 12 Feb 2020 (

Demirci S, Leonard A, Tisdale JF. (2020) Genome editing strategies for fetal hemoglobin induction in beta-hemoglobinopathies. Human Molecular Genetics 14 May 2020.

The Evolution and Functional Significance of Nested Gene Structures in Drosophila melanogaster

Nearly 10% of the genes in the genome of Drosophila melanogaster are in nested structures, in which one gene is completely nested within the intron of another gene (nested and including gene, respectively). Even though the coding sequences and untranslated regions of these nested/including gene pairs do not overlap, their intimate structures and the possibility of shared regulatory sequences raise questions about the evolutionary forces governing the origination and subsequent functional and evolutionary impacts of these structures. In this study, we show that nested genes experience weaker evolutionary constraint, have faster rates of protein evolution, and are expressed in fewer tissues than other genes, while including genes show the opposite patterns. Surprisingly, despite completely overlapping with each other, nested and including genes are less likely to display correlated gene expression and biological function than the nearby yet nonoverlapping genes. Interestingly, significantly fewer nested genes are transcribed from the same strand as the including gene. We found that same-strand nested genes are more likely to be single-exon genes. In addition, same-strand including genes are less likely to have known lethal or sterile phenotypes than opposite-strand including genes only when the corresponding nested genes have introns. These results support our hypothesis that selection against potential erroneous mRNA splicing when nested and including genes are on the same strand plays an important role in the evolution of nested gene structures.

The distribution of genes in the genome is not random. There are regions with few functional genes and regions where genes are densely packed. It has been known that the close proximity between genes can have significant functional consequences. Indeed, neighboring genes were shown to have correlated expression patterns in eukaryotes (including yeast [Cohen et al. 2000], Caenorhabditis elegans [Lercher et al. 2003], Drosophila [Boutanaev et al. 2002], Arabidopsis thaliana [Williams and Bowles 2004], and humans [Lercher et al. 2002 Trinklein et al. 2004]), as well as biological functions and/or signaling pathways (Elo et al. 2003 Lee and Sonnhammer 2003 Al-Shahrour et al. 2010). In extreme cases, the distance between neighboring genes is 0, and parts or all of their gene structures (exons, introns, or untranslated regions [UTRs]) overlap with each other (overlapping genes). These structures are commonly observed in eukaryotes (e.g., C. elegans [Chen and Stein 2006], Drosophila [Misra et al. 2002], and mammals [Veeramachaneni et al. 2004]).

An especially interesting class of overlapping genes is in which one gene is completely nested within an intron of another gene (nested and including gene, respectively [reviewed in Kumar 2009]). Even though the coding sequences of these nested/including gene pairs do not overlap, their intimate structures raise questions about the evolutionary forces governing the origination of nested gene structures and their subsequent functional and evolutionary impacts. We found that, in Drosophila melanogaster, approximately 16% of the genes (2,295 out of 14,072 genes) overlap with at least one other gene in exons, introns, or UTRs. Genes in nested structures account for 9.5% of the D. melanogaster genes (1,338 genes), which is more than C. elegans (2.7%, Chen and Stein 2006) and human (2.73%, Yu et al. 2005). To examine the evolutionary and functional significance of nested gene structures in D. melanogaster while controlling for intrinsic attributes of genes in close proximity, we compared nested/including gene pairs to 𠇌ontrol gene pairs,” which have matching chromosomal distributions to that of nested/including gene pairs and are within 500 bp of each other but do not overlap (see Materials and Methods).

Mutational Input Is a Key Determinant of the Location of Nested Genes

Previous analysis showed that most nested gene structures in Drosophila originated through insertions or de novo origination of coding sequences in introns (Assis et al. 2008). Larger introns are larger targets for insertion or de novo mutations and should be more likely to harbor nested genes. Indeed, we found that the total intron lengths of including genes are significantly longer than control genes, even after excluding the sequence contributed by nested genes (median: 12,183 [including] and 308 [control] Mann–Whitney U test (MWU) P < 10 � ). Including genes also have more introns than both nested genes and control genes (median: 7 [including], 2 [control], and 1 [nested] MWU, P < 10 − 16 for both comparisons). Focusing on including genes, introns with nested genes are significantly longer than introns without nested genes (median: 4,826 [with nested genes] and 138 [without nested genes] MWU, P < 10 − 16 ). Because long introns were found to be more evolutionarily conserved and suggested to more likely harbor functional sequences (Haddrill et al. 2005), this observation is unlikely due to larger introns being more tolerant of insertions. Moreover, the D. melanogasterD. simulans divergence of the longest introns of including genes is smaller than that of other introns of including genes even after excluding nested genes (median: 0.071 [longest] and 0.082 [others] MWU test, P = 0.0012), indicating that the observation of long introns being more evolutionarily conserved does not result from a fraction of nested genes in them. These results support that the mutational process is a key determinant of the location of nested genes.

Selection Plays an Important Role in the Maintenance and Functional Significance of Nested Gene Structures

Several hypotheses that potentially explain the selective pressures influencing the fixation of nested structures in the population, and their subsequent functional evolution, make specific predictions about the current expressional and functional correlations of nested and including genes. In addition to the common chromosomal environment that might have led to correlated expression of genes in proximity (reviewed in Hurst et al. 2004 Oliver and Misteli 2005), genes in nested structures might be selectively favored if their expression and/or biological functions are coregulated, resulting in even stronger positively correlated expression and/or biological functions than neighboring genes. On the other hand, the proximity of nested and including genes may result in interference during transcription, leading to selection against spatially and temporally correlated expression of nested and including genes (“transcriptional interference” [Shearwin et al. 2005 Liao and Zhang 2008]). Still, the evolution of nested gene structures could be a nearly neutral process (Lynch and Conery 2003 Lynch 2006), and the expression and functional correlations between nested and including genes would be similar to those of genes in proximities.

Nested/including gene pairs are significantly positively correlated (estimated using Spearman rank ρ) in gene expression levels across tissues (FlyAtlas, Chintapalli et al. 2007, MWU, P = 0.025). This is also observed for control gene pairs (MWU, P < 2 × 10 − 16 ]. However, the correlations in expression of nested/including gene pairs are significantly weaker (Spearman rank ρ median 0.019 [nested/including gene pairs] vs. 0.174 [control gene pairs], MWU, P = 8.6 × 10 − 14 , fig. 1 ) and less likely to be positive (52.74% [nested/including gene pairs] vs. 69.44% [control gene pairs] Fisher’s exact test [FET], P = 4 × 10 − 9 ) than control gene pairs. In fact, the correlations in expression of nested/including gene pairs are not different from two randomly chosen genes that are not adjacent but on the same chromosome (“random control gene pairs” Spearman rank ρ median 0.019 [nested/including gene pairs] vs. 0.032 [random control gene pairs] MWU, P = 0.76, fig. 1 ). Furthermore, we employed logistic regression and found that nested/including gene pairs are less likely than control gene pairs to have one gene (nested gene of nested/including gene pairs) to be expressed in the subset of tissues of another gene (including gene of nested/including gene pairs P = 0.05 odds ratio = 0.78), to have the same highest expressed tissues (P = 8 × 10 − 11 odds ratio = 0.25), and to be associated with the same GO (Gene Ontology) categories (P = 0.002, 0.001, 0.02 odds ratios = 0.14, 0.17, 0.16 for biological process, molecular function, and cellular component, respectively). Yet, again, when we compared nested/including gene pairs with random control gene pairs, none of these three differences were significant. The correlations in expressional patterns and involvement in biological functions of nested/including genes pairs are significantly different from what have been observed for nearby nonoverlapping genes, suggesting that selection against transcriptional interference might have led to their expression in different tissues and involvement in different biological functions.

Distributions of Spearman ρ in gene expression for nested/including gene pairs and control gene pairs. Nested/including gene pairs are less positively correlated in their expression level across 20 tissues than control gene pairs, but have similar correlations in expression with nonadjacent pairs of genes on the same chromosome (“random control gene pairs”).

Paucity of Same-Strand Nested/Including Gene Pairs Might Result from Selection against Missplicing

Nested genes can be transcribed from the same strand as their including genes (same strand) or different strand from their including genes (opposite strand). The majority of nested genes (71.27%) were found to be on the opposite strand. This proportion is significantly different from the proportion of the control gene pairs (53.55%) and from the expected proportion if the orientations are random (50% FET, P < 10 − 16 for both comparisons). Although the strand biases of nested genes have been widely reported in different eukaryotes (63% of same-strand nested genes in human [Yu et al. 2005] and 88% in C. elegans [Chen and Stein 2006]), the biological cause of this bias has not been specifically discussed and tested on a genomic scale.

The paucity of same-strand nested gene structures may have resulted from the intrinsic strand biases of the mutational processes leading to nested gene structures. Alternatively, this may be due to differential selection on same-strand and opposite-strand nested genes. Several cases of genes, transposable elements, or endogeneous retroviruses that are nested within introns of another genes are known to cause aberrant splicing of the outer including genes (Horowitz and Berg 1995 Kaer et al. 2011 Maksakova et al. 2006). The missplicing of including genes was shown to be dependent on the presence of splice sites within the sequences of transposable elements or endogeneous virus (van de Lagemaat et al. 2006 Kaer et al. 2011). The splice sites of nested genes are more likely to interfere with splicing of including genes when the two genes are transcribed from the same strand. Consistent with this hypothesis, we found that same-strand nested genes are more likely to be single-exon genes (72.53%) than opposite-strand nested genes (37.41% FET, P < 10 − 16 ). Focusing on nested genes that have more than one exon, same-strand nested genes still have fewer introns than opposite-strand nested genes (median: one intron (same-strand nested genes) vs. two introns [opposite-strand nested genes] MWU, P = 0.00013). Our observation is not due to opposite-strand nested genes being longer than their same-strand counterparts because the coding sequence length is not statistically different between same-strand and opposite-strand nested genes (median: 817.5 [same strand] vs. 898 [opposite strand] MWU, P = 0.11).

Seventy-three nested genes are young (less than 35 million years old [Clark et al. 2007 Zhang et al. 2010]) and originated through duplication of another gene (parental gene). The duplication process can be via either DNA or RNA intermediates. A characteristic of RNA-based duplication is that the new genes lose all introns that were originally present in their parental gene (reviewed in Kaessmann et al. 2009), and this process accounts for around 12.10% of duplicated genes in Drosophila (Zhang et al. 2010). Among the 73 duplicated nested genes, only 16.67% of opposite-strand nested duplicated genes originated through RNA-based duplication, while 42.11% of same-strand nested duplicated genes originated via RNA intermediates (FET, P = 0.054). This difference is marginally significant, likely due to the small sample size. Additionally, the decrease in intron number of duplicated nested genes, when compared with their respective parental genes, is significantly larger for same-strand nested duplicated genes than opposite-strand nested duplicated genes (median: one intron difference [same-strand nested genes] vs. zero intron difference [opposite-strand nested genes] MWU, P = 0.028). Note that this difference is not due to the variation in intron numbers of the parental genes of same-strand and opposite-strand nested genes, which is not significantly different (MWU, P = 0.41).

If missplicing is indeed more likely to happen when including genes and nested genes are on the same strand than when they are on opposite strands, we expect that same-strand including genes are less likely to be essential for the fitness of flies. In extreme cases, we expect that loss of function or expression knockdown by RNA interference (RNAi) of same-strand including genes is less likely to be associated with lethal phenotypes. When considering all same-strand and opposite-strand including genes, there is no significant difference in the proportion of genes having known lethal phenotypes (38.85% [same strand] vs. 44.66% [opposite strand] table 1 ). Yet, when we only considered including genes whose nested genes have introns (and therefore are more likely to cause missplicing), same-strand including genes are significantly less likely to have known lethal phenotypes (26.0% [same strand] vs. 42.33% [opposite strand] table 1 ). The result is strengthened if we consider both lethal and sterile phenotypes (30.00% [same strand] vs. 47.44% [opposite strand] table 1 ). It is worth noting that the genetic disturbance (null mutant or expression knockdown) we considered here is extreme, and it is likely that, when considering more subtle influences on fitness, the difference between same-strand and opposite-strand including genes will be more significant and should be more general. Overall, our observations that same-strand nested genes contain fewer introns and that same-strand including genes have a lower probability of being associated with lethal and sterile phenotypes suggest that the paucity of same-strand nested/including gene pairs could be attributable to purifying selection against missplicing when nested genes are transcribed from the same strand.

Table 1

Known Phenotypic Effects of Including Genes

LethalSterileViableFET P Value
Lethal vs. Nonlethal a Affected b vs. Viable
All including genesSame strand689980.230.2
Opposite strand15919178
Including genes with intron-containing nested genesSame strand132350.0370.027
Opposite strand9111113

a Genes without known lethal phenotype (could have known sterile phenotype).

b Genes with known lethal or sterile phenotype.

Nested Genes Evolve Faster, Are More Narrowly Expressed, and Are Enriched with Testis-Related Functions While Including Genes Show the Opposite Patterns

To test whether genes in nested structures show different patterns of evolution, we examined the site frequency spectrum of coding variants (using Tajima’s D [Tajima 1989]), relative rates of protein evolution (dN/dS, [Yang 2007]), and proportion of amino acid substitutions fixed by positive selection (α, [Smith and Eyre-Walker 2002]) of including genes, nested genes and control genes, and classified genes into those that are present in all 12 Drosophila species (i.e., genes older than 35 million years Clark et al. 2007) or not (Zhang et al. 2010) ( table 2 ). Including genes have more negative Tajima’s D, lower dN/dS, and are more likely to be conserved across the Drosophila species than either nested genes or control genes, suggesting they are under stronger purifying selection. On the other hand, nested genes, while not differing in Tajima’s D from control genes, have larger dN/dS and α, and tend to be younger than both including genes and control genes. We did not detect any significant difference between same- and opposite-strand including genes or nested genes in these analyses.

Table 2

Evolutionary Properties and Expression Patterns of Nested, Including, and Control Genes

Median MWU Test P Value
IncludingNestedControlIncluding vs. NestedIncluding vs. ControlNested vs. Control
Tajima’s D𢄢.76𢄡.77𢄡.87㰐 𢄨 㰐 𢄨 Ϡ.05
dN/dS0.0420.1070.073㰐 𢄨 㰐 𢄨 <10 𢄨
Breadth of expression (number of tissues)18419<10 � 0.363<10 �
Proportion (%) FET P value
IncludingNestedControlIncluding vs. NestedIncluding vs. ControlControl vs. Nested
Conserved across 12 Drosophila species99.0588.1391.24㰐 � 㰐 � 0.027
Highest expression in brain29.095.219.44<10 � <10 � 0.003
Highest expression in testis6.4343.9113.52<10 � 1.45 × 10 𢄦 <10 �
Highest expression in ovary13.785.3623.941.3 × 10 𢄧 9.06 × 10 𢄨 <10 �
Young duplicate genes0.98.47㰐 � 5.2 × 10 � 0.02

We also found that nested and including genes have unusual gene expression patterns. Nested genes are expressed in significantly fewer tissues (have narrower breadth of expression) than either including genes or control genes ( table 2 ). They also have significantly higher expression specificity (see Materials and Methods) than either including or control genes (MWU, P < 10 � for both comparisons fig. 2 ). While same- and opposite-strand nested genes do not differ in their breadth of expression (MWU, P = 0.15), same-strand nested genes have significantly higher expression specificity than opposite-strand nested genes (0.95 [same-strand] vs. 0.93 [opposite-strand] MWU, P = 0.009). The composition of tissues where genes have their highest expression is also significantly different between including genes, nested genes, and control genes (chi-square test, P < 10 − 16 for all comparisons fig. 3 ). This composition is not different between same- and opposite-strand including genes but significantly different between same- and opposite-strand nested genes (chi-square test, P = 0.024 fig. 3 ). Including genes are more enriched with genes having their highest expression in brain than either nested genes or control genes ( table 2 ). In contrast, nested genes are significantly enriched with genes having highest expression in testis but are deficient for genes having highest expression in ovaries ( table 2 ). The enrichment of high testis expression is especially strong for same-strand nested genes (58.46% [same strand] vs. 38.18% [opposite strand] FET, P = 1.67 × 10 − 6 ).

Expression specificity of genes in nested structures and control genes. Boxplots for the expression specificity of including genes, nested genes, and control genes. The expression specificity is highest for same-strand nested genes followed by opposite-strand nested genes, both of which are significantly higher than either including genes or control genes.

The distributions of tissues where genes have their highest expression. Nested genes, especially same-strand nested genes, are enriched with genes having their highest expression level in testis when compared with both including and control genes. On the contrary, including genes are enriched with genes having their highest expression in brain.

Consistent with previous finding that the majority of nested gene structures originated through insertion of DNA sequences into introns of including genes via gene duplications (Assis et al. 2008), we observed significantly larger proportion of nested genes that were previously identified as young duplicated genes (Zhang et al. 2010) than either including genes or control genes ( table 2 ). Young duplicated genes tend to evolve rapidly (Chen et al. 2010), which could have led to the observed exceptional evolutionary properties of nested genes. On the other hand, the two interesting properties of nested genes—narrow expression (Larracuente et al. 2008) and enrichment of highest expression in testis (reviewed in Swanson and Vacquier 2002)𠅊re widely known to be correlated with rapid protein evolution. To test whether the unusual evolutionary and expression properties of nested genes are due to the larger proportion of duplicate genes, we compared nested genes with a set of control genes that have the same proportion of young duplicated genes (𠇍uplication control genes,” see Materials and Methods). Nested genes still show faster rates of protein evolution (dN/dS, MWU, P < 10 − 9 ), have greater α (MWU, P = 0.0021], are expressed in fewer tissues (MWU, P < 10 − 16 ), have higher expression specificities (MWU, P < 10 − 16 ), and are enriched with genes having highest expression in testis (FET, P < 10 − 16 ). These results indicate that the observed patterns could not be simply explained by the higher proportion of duplicate genes. On the contrary, when using another set of control genes that have the same expression patterns as nested genes (𠇎xpression control genes,” see Materials and Methods), nested genes are not significantly different from control genes with respect to dN/dS, α, or gene age (MWU, P > 0.05 for all comparisons). Accordingly, the evolutionary properties of nested genes might have been the 𠇋y-product” of their expressional attributes. However, selection to decouple the functions of nested genes from those of including genes due to their nested structures could have led to the observed narrow expression of nested genes and could be the ultimate cause for the evolutionary properties of nested genes.

While including genes are slowly evolving, highly conserved, broadly expressed, and enriched with genes having their highest expression in brain, nested genes are the opposite: fast evolving, narrowly expressed, and enriched with genes having their highest expression in testis. Thus, positive selection for coregulation in gene expression and biological function, which might have driven the evolution of gene clusters (reviewed in Hurst et al. 2004), is unlikely to apply to the fixation of nested gene structures. The fixation of nested gene structures, similarly to evolution of other complex genomic organizations (Lynch and Conery 2003 Lynch 2006), could have been a nearly neutral process. However, we have evidence supporting the role of natural selection in shaping the relative orientations and functional importance of nested gene structures. We showed that nested/including gene pairs are less likely to be transcribed from the same strand and that same-strand nested genes are more likely to be single-exon genes and have fewer exons if they are multiexon genes. Together with the finding that including genes with same-strand nested genes that contain introns are less likely to be essential for fitness of flies, our results support that selection against missplicing events of same-strand nested/including gene pairs leads to this bias. In addition, the correlations in expressions and biological functions of nested/including gene pairs are lower than those of nearby gene pairs but similar to any two random genes of the same chromosome. This is consistent with the hypothesis that selection against transcriptional interference plays an important role in shaping the functional significance and indirectly affects evolutionary properties of nested gene structures. In sum, despite the proximity of nested and including genes, we found that they are nowhere similar to each other in terms of evolutionary properties, expressional patterns, and biological functions, and selection against the potential deleterious impacts caused by their close proximity might have been the main force governing their evolution.


Polyunsaturated fatty acids (PUFAs) are essential components of the plasma membrane. Various PUFAs have crucial roles in plant physiological and cellular processes such as cold acclimation, defense mechanisms against biotic and abiotic stresses, and chloroplast development [1]. PUFAs biosynthesis occurs through different and complex pathways of desaturation and elongation steps [2]. Fatty acid desaturase (FAD) enzymes introduce double band into fatty acids hydrocarbon chain. Two groups of FAD have been identified in plants, including acyl–acyl carrier protein (acyl-ACP) desaturases and membrane-bound FADs or acyl-lipid desaturases [3]. While identified FADs in plants, animals, algae, and fungi are membrane-bound desaturase, the plant acyl-ACP desaturase (FAB2/SAD) is the only soluble FAD [4, 5]. The acyl-ACP desaturases introduce the first double band into the acyl chain of saturated fatty acid in plastids. Besides, Membrane-bound FADs exist in chloroplast and endoplasmic reticulum (ER). Desaturation processes occur through two different pathways in the chloroplast and the ER [6]. In the chloroplast and ER, double bond formation requires NADPH/ferredoxin and NADH/cytochrome b5 systems as the electron donors, respectively [7].

On the other hand, the quality of edible oils depends on the unsaturated fatty acids content [8]. FADs are essential to determine the quality of edible oils [9]. They have been attracted more attention due to their ability to adjust the level of unsaturated fatty acids to increase the quality of these oils and plant resistance against various stresses including drought, salt, heat, cold, and pathogen [10,11,12,13]. For instance, the cell membrane is the primary site for cold-induced injury, and the melting temperature of the unsaturated fatty acids is less than saturated fatty acids. Therefore, adjustment of membrane lipid fluidity through manipulation of FADs and changing the levels of unsaturated fatty acids might seem helpful for cold acclimation [14]. To date, several studies have been conducted to assess the expression of genes encoding fatty acid desaturase in response to biotic and abiotic stresses [12, 15,16,17]. Investigation of the expression of SACPD-A and SACPD-B genes (encoding soluble Δ9 stearoyl-ACP desaturases) and the amount of stearic acid (C18:0) and oleic acid (C18:1) in soybean revealed that the number of transcripts of both genes and oleic acid had been dramatically increased in low temperature. Reversely, we observed an increased amount of C18:0 and decreased the expression of the genes above at high temperatures [18]. Wang et al. (2012) ascertained the expression of oleate desaturase (GbFAD2 and GbFAD6) and GbSAD genes under various temperatures in Ginkgo biloba L. leaves. Based on their results, the expression of GbFAD2 and GbSAD genes has been increased in 4 and 15 °C, while it has been prevented in 35 and 45 °C.

In contrast, the expression of GbFAD6 was constant at different temperatures [19]. The expression of FAD2–1 and FAD2–2 genes of olive has been increased in response to wounding [20]. Likewise, FAD2 and FAD6 genes are necessary for salt tolerance during early seedling in Arabidopsis [21, 22]. Zhang et al. (2005) developed transgenic tobacco plants with the overexpressing FAD3 or FAD8 genes. According to their findings, the over-expression of FAD8 or FAD3 genes caused enhanced tolerance to drought [23]. The importance of FADs in plant pathways has been confirmed previously. A homologous region based on a conserved sequence of a gene family can be applied to identify new genes. The FAD gene family is vital for the production of PUFAs in plants thus, a comprehensive understanding of FAD genes using bioinformatics studies can help disclose their functions in the studied plants.

Wheat (Triticum aestivum L.) is one of the most important cereal crops. Because of the high amount of unsaturated fatty acids, wheat germ oil, one of the essential by-products of wheat, can be a good alternative for edible oils with clinical benefits. Based on studies, wheat germ oil contain different fatty acids, including linoleic acid (C 18:2), palmitic acid (C 16:0), oleic acid (C 18:1), linolenic acid (C18:3), and stearic acid (C 18:0) [24]. Wheat is a good source of edible oil, and the characterization and analysis of the FAD family in wheat plants have not yet been performed. On the other hand, comprehensive analyses on gene families help to address a better understanding of their evolutions and functions in plants [25]. Therefore, in this study, identification, evolutionary relationship, duplication and selection pressure, exon-intron structure, promoter analysis, transcript-targeted miRNA and simple sequence repeat markers prediction, RNA-seq data analysis, three-dimensional structure, and docking studies of the TaFADs have been investigated in wheat using bioinformatics tools. Figure 1 provides a flow-chart of the data analysis process.

A flow-chart of the data analysis process

Materials and Methods

Genome Sequences

We retrieved all publicly available prokaryotic genome sequences and associated annotations from the Integrated Microbial Genomes (IMG) system ( ( Markowitz et al. 2009).

Horizontally Transferred Genes

We used three large data sets of HGTs. The first data set ( Sorek et al. 2007) included genes that can and cannot be transformed into E. coli in laboratory. The second data set ( Lercher and Pal 2008) described genes that were naturally transferred into E. coli at different evolutionary times, inferred from the presence/absence of genes across species. The inference was based on the DELTRAN algorithm, with relative penalties of 2:1 for HGTs and gene losses ( Lercher and Pal 2008), as in a recent study ( Gophna and Ofran 2011). We identified the likely donor species of each horizontally transferred gene in this data set by Blasting the gene with an E value cutoff of 10 −6 in all 1,127 finished Bacteria and Archaea genomes in IMG that are outside the family Enterobacteriaceae, to which E. coli belongs ( fig. 2A). The genome harboring the best basic local alignment search tool (Blast) hit is considered the donor of the transferred gene. Reciprocal Blast searches are unnecessary, because the best Blast hit of the identified donor gene in E. coli will be 1) either the original gene under investigation or 2) a paralog of the original gene under investigation. But, because the gene under investigation was identified by phylogenetic analysis to be horizontally transferred to E. coli rather than a recent paralog of another gene in E. coli, (2) is not possible. Thus, the only possibility is (1), which makes it unnecessary to Blast the E. coli genome using the identified donor gene as the query. Furthermore, errors in donor identification are expected to be random, which would weaken the true signal but not bias our result. The third data set included relatively recent HGTs identified from 171 recipient genomes by nucleotide composition-based Bayesian inference ( Nakamura et al. 2004). We discarded 38 of these genomes because of the lack of any annotation of ribosomal protein genes that are required for determining the preferred codons for codon adaptation index (CAI) estimation.

Genome-Wide Gene Expression Data

We used published E. coli gene expression data from the log growth phase obtained from a high-density oligonucleotide tiling array experiment ( Cho et al. 2009). To download all publicly available microarray expression data from other prokaryotes, we used the Stanford Microarray Database ( Hubble et al. 2009) that houses hundreds of expression data sets based on cDNA microarrays. Expression data from six species (Bacillus subtilis, ID: 66211 Campylobacter jejuni, ID: 28770 Helicobacter pylori, ID: 16576 Mycobacterium tuberculosis, ID: 14047 Salmonella typhimurium, ID: 23956 and Vibrio cholerae: ID 66211) were used in our analysis. We also used the NCBI Gene Expression Omnibus and downloaded the microarray data of Dehalococcoides ethenogenes (GSE 10185), Geobacter sulfurreducens (GSE 22511), Listeria monocytogenes (GSE 16336), and Streptococcus agalactiae (GSE 21564).

Synonymous Codon Usage Bias

To calculate the relative synonymous codon usage (RSCU) in a species ( Sharp and Li 1986), we used ribosomal protein genes, which are generally among the most highly expressed genes in a genome ( Sharp et al. 1986). Based on the RSCU values, the CAI was calculated for each gene in a genome ( Sharp and Li 1987). Briefly, CAI of a gene is the geometric mean of RSCU of all codons divided by the highest possible geometric mean of RSCU given the same amino acid sequence.

Classification of Informational Genes and Operational Genes

Following an earlier study ( Jain et al. 1999), we regarded genes annotated with “transcription,” “translation,” “DNA replication,” or any of their subterms in Gene Ontology ( Ashburner et al. 2000) as informational genes. All other genes were considered operational genes.

Protein–Protein Interactions

The E. coli protein–protein interaction data were retrieved from a recent publication ( Hu et al. 2009), in which 5,993 nonredundant pairwise physical interactions among 1,757 proteins were identified by an affinity-based method and genomic context-based inferences.

Statistical Analysis

We estimated the relative contributions of all predictors to the total variance in gene transferability by calculating the relative contribution of variability explained (RCVE) for each predictor using RCVE = 1 − R reduced 2 / R full 2 ⁠ , where R full 2 and R reduced 2 are the R 2 (square of the correlation coefficient) for the full linear model and the model without the predictor of interest, respectively ( Park and Makova 2009). To diagnose multicollinearity of each predictor, variance inflation factors (VIFs) ( Kutner et al. 2005) were calculated. All predictors in the model used had VIFs below 2, suggesting that multicollinearity did not adversely affect our model. Linear multiple regression analysis was performed in the R statistical package.

Materials and methods

Transcriptome and assemblies

We used the embryonic samples of Idiosepius and Nautilus as well as their adult tissues to capture regulatory genes critical for systemic development of the eye and lens across species. For embryonic eye transcriptomics (RNA-seq) analysis, we utilized assemblies (stage 25 embryos of the pygmy squid, Idiosepius paradoxus and 3-month-old embryos of the chambered nautilus, Nautilus pompilius ) obtained by Ogura et al. (2013) . For adult Idiosepius and Nautilus , we generated novel sets of RNA-seq data. Tissues of Idiosepius and Nautilus were removed and homogenized in TRIzol reagent (Invitrogen) immediately after the animals were sacrificed. To minimize possible nucleotide polymorphism, we utilized a single individual of Nautilus . However, due to small sizes of Idiosepius , we pooled tissues from several individuals. Total RNAs were isolated according to the manufacture’s protocol, followed by on column DNase treatment using a QIAGEN RNeasy kit. Qualities of the RNAs were tested by Agilent Nanodrop and Agilent 2100 bioanalyzer. The RNA samples were sent to the BGI Inc and short read sequences were obtained by Illumina Hiseq2000 according to the company’s procedures.

FASTQ sequences of Idiosepius or Nautilus were pooled into one dataset and were assembled using the Trinity platform ( Grabherr et al. 2011 ). To obtain normalized intensities of gene expression across tissues (fragments per kilobase per million reads, FPKM), reads from each sample was mapped onto the Trinity assembly with Bowtie ( Langmead et al. 2009 ) and analyzed with RSEM ( Li and Dewey 2011 ) and edgeR ( Robinson et al. 2010 ). In the assembly procedure, variants of putative alternative splicing (sub-components of the Trinity output) were estimated as different contigs, but we merged variants from one sub-component based on the “%comp_fpkm” values of edgeR output. Analytical pipelines on a NIG Cell Innovation program ( ) were used with the annotation steps to the assembled contigs.

Data from the eyes of Idiosepius were assembled together with data from brain, arm, gonad, and gut. Contigs shorter than 500 bp and FPKM less than 1 were filtered out. Contigs that passed the criteria are used as “the eye genes”. Data from Nautilus eyes were assembled together with data from brain, arm, and siphuncule and processed in the same way. Sequence homology was tested using NCBI BLAST 2.2.30+ ( Camacho et al. 2008 ) after filtering out genes shorter than 500 bp to remove gene fragments having traceability. For comparative analysis, we obtained gene models from two gastropods, the sea hare Aplysia californica (AplCal3.0, GCF_000002075.1, July 2013) and the giant owl limpet Lottia gigantea (Lotgi1, INSDC Assembly GCA_000327385.1, January 2013) the Pacific oyster, Crassostrea gigas (oyster_v9, INSDC Assembly GCA_000297895.1, September 2012) the polychaete annelid, Capitella telata (Capitella teleta v1.0, INSDC Assembly GCA_000328365.1, December 2012) the fly, Drosophila melanogaster (BDGP6, INSDC Assembly GCA_000001215.4) and human (GRCh38, INSDC Assembly GCA_000001405.15, December 2013) from Ensembl. Eye transcriptome data from human fetuses were obtained from an EST analysis by Choy et al. (2006) . Choy et al. (2006) listed 4010 human gene models as the fetal eye genes using the previous human genome build. However, 669 genes were missing in the current human genome build (the Ensembl Human Build 38). To compensate, we obtained EST sequences from NCBI (BY794942-BY800475) and used these sequences in the search for homology.

Molecular phylogenetic analysis

The nucleotide sequences obtained in this study are available under the following accession numbers: [DDBJ: LC021432-LC021456] and listed in Supplementary Table S2 . For each set of genes (opsins, arrestins, and crystallins), we obtained 97, 16, and 31 sequences from the NCBI and made alignments together with 7, 3, and 14 cephalopod sequences found in this study, respectively. The NCBI accession numbers of the genes are shown in the respective figures.

We used MUSCLE on the EMBL-EBI Web Services to generate a multiple sequence alignment ( Edgar 2004 McWilliam et al. 2013 ). To remove poorly aligned sequences we used TrimAl v1.4.rev15 build[2013-12-17] with -gappyout option ( Capella-Gutierrez et al. 2009 ). Maximum-likelihood inference of phylogenetic trees was inferred using RAxML version 8.0.26 (-f a -No. 1000 -m PROTGAMMAGTR options were applied) ( Stamatakis 2014 ). One thousand bootstrap replicates were performed with the same search options as described above.

In situ hybridization

To generate Idiosepius Tbx20 DIG-labeled RNA targeted probes, we performed RT-PCR using the following primer set (F: ACCAGCCTCGAATTCACATC, R: GGAGGCCCAAATTAGGAAAG). To generate the Idiosepius cDNA, we utilized SMARTer RACE kit (Takara Clontech). The PCR fragments obtained from the RT-PCR were sub-cloned into T-vector (Promega) and used as templates for in vitro transcription using DIG RNA probe synthesis kit (Roche). Whole-mount in situ hybridization was performed using stage 25 embryos of Idiosepius according to the previously published protocol ( Yoshida et al. 2010 ).


Linking Gene Expression and Phenotypic Traits

To advance our understanding of how molecular mechanisms allow organisms to adapt to and persist in altered environments, we linked gene coexpression networks with changes in phenotypic traits using resurrected Daphnia isolates separated by centuries of evolution and anthropogenic change. Network analyses allowed us to identify gene clusters and their networks that may underlie organismal responses to environmental shifts. To provide a direct phenotype–genotype link, we applied such a network approach and combined it with quantitative trait data observed in members of a single Daphnia population before and after a historic shift in nutrient supply associated with modern agricultural activities ( Frisch et al. 2014 Roy Chowdhury et al. 2015). Specifically, we explored the transcriptional regulation of two physiological traits related to P acquisition (RE and bP), and a higher order phenotypic trait dependent on RE and bP, that is, somatic GR, using a trait-associated gene coexpression network. The resulting network suggests a strong relationship of transcriptional responses with P-supply, with over 50% of the 17 observed modules significantly associated with the P-related phenotypic traits.

Our analysis identified distinct genes and pathways that were tied to individual phenotypic traits, and that are potential candidates for further exploration of their role in evolutionary adaptation to P enrichment.

Retention Efficiency

Two hubgenes of brown_RE belonged to the jumonji gene family that is known to regulate chromatin organization and thus gene expression ( Takeuchi et al. 2006). This finding suggests a certain degree of epigenetic regulation (previously described as DNA compaction in Daphnia Jalal et al. 2014) in RE. Overrepresentation of genes in purple_RE involved in amino acid metabolism (including trypsins) may indicate the exploitation of alternate P-sources under P-limitation as seen in plants ( Abel et al. 2002). One of the three top hubgenes was a MCO, a gene family essential for iron metabolism in many organisms ( Lang et al. 2012). Previous research in Daphnia identified a significant interaction between P-limitation and iron-kinetics ( Lind and Jeyasingh 2018). This finding, together with our results suggests a central role of MCOs in modulating essential cellular processes under P-limitation.

Body P

Regulation of body P might be necessary in order to counteract the effect of unusually high RE in ancient clones, and to retain cellular homeostasis, for example, by active release of inorganic P ( Rigler 1961) or moulting ( He and Wang 2007).

We speculate that the tan_bP genes highly expressed under HiP in ancient clones including many nonannotated genes (supplementary fig. S5c, Supplementary Material online) may contribute to the maintenance of defined bP concentrations when P is abundant. Regulation of bP may additionally be achieved by one of the hubenes of greenyellow_bP identified as a histone tail meythylase. Histone tail methylation has profound effects on gene transcription and can be passed transgenerationally in invertebrates, with the possibility of a long-lasting epigenetic memory of environmental conditions ( Klosin et al. 2017). Correlation of bP with genes involved in protein metabolism suggest that both ancient and modern genotypes are able to maintain homeostasis in body P-content in response to dietary P-supply by producing metabolic adjustments in P-usage.

Growth Rate

In contrast to the trait–module correlations of RE and bP, which were driven by evolutionary history (contrasting ancient or modern clones), GR module correlations were driven by treatment, with similar responses of ancient and modern clones, suggesting environmentally induced gene expression. The observed functional enrichment in signaling cascades involving transmitters and receptors indicates such environmental triggering of gene expression, particularly in blue_GR. According to the Growth Rate Hypothesis ( Main et al. 1997 Sterner and Elser 2002), GR is a trait that strongly depends on various molecular and physiological parameters controlling P-allocation to ribosomal RNA. Thus, when P supply in the environment is not limiting, these signaling cascades may lead to an increased rRNA biogenesis, thus increasing GR ( Sterner and Elser 2002). Coregulation of these genes by several transcription factors lends further support to this idea: genes controlled by the top three promoter motifs in each of the two modules reflect the same functional enrichment as the entire module ( fig. 3c). Notably, a large number of blue_GR genes are potentially coregulated by two or more promoter binding sites ( fig. 3c). Phosphoglycerate dehydrogenase, one of the lightgreen_GR hubgenes is crucial to L-serine biosynthesis, an amino acid central to cellular proliferation ( de Koning et al. 2003). One of the blue_GR hubgenes, carbonic anhydrase, is a zinc metalloenzyme involved in several biological processes such as the transport of CO2, maintaining acid-base balance, glycogen, and lipid synthesis ( Zolfaghari et al. 2014). Both enzymes therefore mediate multiple pathways that can play a significant role in regulating growth. Enriched gene families of blue_GR genes included signal transduction mechanisms, involving phosphodiesterases and phosphatases. These are known to be involved in P-scavenging in plants ( Plaxton and Tran 2011), but also in Daphnia (e.g., alkaline phosphatase McCarthy et al. 2010).

Evolution of Gene Expression Patterns and Networks

Adaptation to environmental change is typically associated with divergent gene expression patterns ( DeBiasse and Kelly 2016 Kenkel and Matz 2016 Sikkink et al. 2019). The trait-associated, transcriptional responses of ancient and modern Daphnia observed here support such findings ( fig. 2b), providing evidence of distinct evolved patterns of gene expression: constitutive gene expression (brown_RE), conserved gene expression plasticity (lightgreen_GR, blue_GR), and evolved plasticity (purple_RE, greenyellow_bP), sensu Renn and Schumer (2013). Given that the interaction of genes within modules is stronger than between modules, and that modules are regarded as “semi-independent” units that evolve independently due to reduced pleiotropic constraints ( Wagner et al. 2007 Snell‐Rood et al. 2010 Lotterhos et al. 2018), the coexistence of these observed gene expression patterns within a single network strongly supports the idea of individual evolutionary trajectories of these trait-associated modules.

Plastic gene expression in response to P-supply was common to almost all focal modules, but was not limited to ancient or modern clones ( fig. 2 and table 1). For example, RE correlated strongly with modules that showed signs of newly evolved plasticity (i.e., purple_RE). In contrast, genes in both GR-associated modules (i.e., blue_GR, lightgreen_GR) maintained similar gene expression plasticity in ancient and modern clones. While our data suggest that gene expression is often plastic, such plasticity did not always translate into similar plasticity in the tested phenotypic traits (e.g., GR): a plastic gene expression in ancient clones was less obvious in their phenotypic response. A potential explanation for this may be the complexity of this trait that depends on many other factors, including nutrient availability and assimilation, and other factors involved in cellular and developmental processes.

Complementing the trait-associated network, the use of network preservation statistics can identify the “wiring” of molecular mechanisms that are shared or divergent between ancient isolates and their modern counterparts ( Oldham et al. 2006). Our results highlight a highly preserved network structure with >70% of the ancient Daphnia modules preserved in modern descendants that was also reflected by shared gene regulatory mechanisms in ancient and modern modules (i.e., bluePres). Such a pattern is not unexpected in members of the same population, considering a similar level of preservation in closely related taxa that diverged from a common ancestor several million years ago (such as humans and chimpanzees Oldham et al. 2006). However, the analysis of network preservation also revealed patterns of evolutionary divergence of ancient Daphnia and their modern descendants in individual modules. In this context, the detection of a newly formed module (yellowPres) in the modern Daphnia genotypes was especially striking. Such new modules provide evidence for evolutionary novelty on the level of transcription ( Oldham et al. 2006). Analysis of the gene expression pattern of yellowPres revealed a plastic response of module members to P-availability, highlighting the role of gene expression plasticity in the evolutionary adaptation to P-supply. However, we found that in order to obtain detailed information about the evolutionary history of such plasticity, the consideration of network preservation statistics and resulting modules in isolation is insufficient. By integrating preservation and trait-associated networks, we were able to establish a link between the evolution of gene expression and phenotypic plasticity. Here, the presence of a high percentage of the yellowPres genes in two trait-associated modules, purple_RE (“evolved plasticity”) and blue_GR (“conserved plasticity”) indicated the coexistence of different types of plasticity in a single module (here: yellowPres), and that this plasticity may be associated with different traits.

Concluding Remarks

To genuinely advance the understanding of phenotypic evolution, comprehensive methods are required that consider entire organisms instead of single traits ( Forsman 2015). Such a holistic understanding is vital in order to predict evolutionary trajectories that result from major geochemical shifts that currently affect our planet, and is essential for the development of conservation strategies. The results presented here are a contribution toward such an understanding, and emphasize the need for an integrative approach that combines physiological and “omics data” in keystone species.

Genomic manifestation: theory predicts relaxed selection on loci underlying genetically controlled phenotypic plasticity and thus higher genomic variation in associated genes (Snell‐Rood et al. 2010). This raises the question whether the observed differences in gene expression between ancient and modern Daphnia clones are manifested in the genomic sequence, for instance as increased genetic divergence in modules with evolved plasticity.

Molecular regulators of phenotypic plasticity: while transcriptional regulation provides a critical mechanism for organisms to respond rapidly and efficiently to environmental change ( Turner 2009), the contribution of different molecular regulators of plasticity (e.g., epigenetic modifications, transcription factors) remain largely unknown and should be considered in future research.

Role of hubgenes: on a functional level, recent advantages in molecular techniques now allow for a detailed analysis of network structures to test if molecular cascades collapse when predicted hub-genes are modified via gene editing approaches (e.g., RNAi or CRISPr and Talen techniques Nakanishi et al. 2014 Naitou et al. 2015 Rivetti et al. 2018).

Contrasting resurrected members of populations that lived hundreds of years ago with their modern descendants, as done here, is a rare opportunity to track evolutionary trajectories in natural environments. Our study highlights the prospects of resurrection ecology when integrated with modern biology. It further emphasizes the applicability of this approach to numerous other organisms that produce dormant stages with long-term viability, and its significance for an in-depth understanding of evolutionary adaptation to global environmental change.

Results and Discussion

Old-Biased Genes Are Not under Weaker Selection

Evolutionary theories of ageing predict weaker selection on genes which are expressed in old individuals due to low effective population size and reduced fecundity ( Kirkwood and Austad 2000 Flatt and Partridge 2018). In ant queens, we may expect a reduction of this “selection shadow” as low extrinsic mortality and lifelong, high fertility should lead to a stable effective population size up to old age. We tested this by estimating and comparing selection strength between three groups of genes. These were 1) old-biased genes n = 46: significantly over-expressed in seven old (18 weeks) compared with seven young (4 weeks) C. obscurior queens 2) young-biased genes (n = 96): significantly over-expressed in young compared with old queens 3) unbiased genes (n = 2,616): no significant difference in expression between young and old queens. To estimate direction and strength of selection, we measured dN/dS (ratio of nonsynonymous to synonymous substitution rates) for one-to-one orthologs with a set of 10 ant species (see Materials and Methods). A dN/dS ratio ≈ 1 indicates neutral evolution, whereas values ≪ 1 signify purifying selection. We find no evidence for weaker purifying selection in old-aged queens, since dN/dS in old-biased genes (median: 0.084) is in fact significantly lower than in young-biased genes (median: 0.127 P value = 0.016 Mann–Whitney U test fig. 1), indicating increased purifying selection with age. This is in contrast to published results for age-biased genes in humans, in which old-biased genes had a significantly higher dN/dS (median: 0.22) than young-biased genes (median: 0.09, P = 1.4 × 10 – 50 ), as would be expected for a reduction in purifying selection with age ( Jia et al. 2018). This was confirmed by a further study on several mammalian tissues, in which an adjusted dN/dS metric correlated more strongly with expression in young compared with old individuals ( Turan et al. 2019). Interestingly, dN/dS in young-biased genes is also significantly higher than in unbiased genes (median: 0.100 P value = 2.2 × 10 −4 Mann–Whitney U test), as has previously been reported for the ant, Lasius niger ( Lucas et al. 2017). To further test the ability of this method to detect a selection shadow in insects, we repeated the analysis for D. melanogaster. Age-biased gene expression was measured for a novel data set containing expression data for young (10 days) and old (38 days) female flies across two tissues (head and fat body) and different feeding regimes. Evolutionary rates were obtained for these genes from published analyses based on alignments of 12 Drosophila species ( Clark et al. 2007). In contrast to our results for ant queens but in agreement with expectations for a selection shadow, we find significantly higher dN/dS levels in old-biased fly genes (median: 0.060) compared with young-biased genes (median: 0.047 P = 5.1 × 10 −8 Mann–Whitney U test).

Evolutionary rates (dN/dS) in genes with unbiased expression, young-biased, and old-biased expression in C. obscurior queens and D. melanogaster adult females. Significance was tested with Mann–Whitney U test.

Watch the video: EVOLTREE Online Seminar: Emily Josephs The evolutionary forces shaping gene expression variation (July 2022).


  1. Tory

    Thanks for the help in this question. I did not know it.

  2. Jaivyn

    You are definitely right

  3. Shaddoc

    It is compliant, the very useful piece

  4. Avner

    Perhaps I agree with your opinion

  5. Thain

    Fill the gap?

  6. Carmelo

    On your place I would go another by.

Write a message