Critical discussion: Gene expression and systems biology allows us to understand the control of complex traits

Critical discussion: Gene expression and systems biology allows us to understand the control of complex traits

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I'm having a bit of trouble tackling this question. Any help on what I should write about or the points I should mention would be much appreciated. I keep seeing articles about eQTL and QTL are these relevant as well?

Turns to be too long for a comment.

If you are looking for explanation of the title/question, than it is following (roughly).

If we get information about gene expression from all different cell types, and extract also interactions between those genes (if a gets higher then b gets lower), this will allow studying multigene traits. E.g., life quality, height, mental disabilities, cardiovascular risks etc. Thing is that for most health problems that stem from single gene variation we already know named gene, or can easily find it out.

More complex issues are often hidden in statistics, links are not so clear. So, for example, autism is described by genetic variations (so far) only in less than 30% of cases or so. In other cases there is no genetic predisposition or strong evidence. Mainly that is so because many genes interact to produce final trait or illness. Looking with microscope one gene at a time will not provide information about it.

You should look into multi-gene traits, maybe pick your favourite illness (e.g. autism) and see how it is hereditary (tip: not fully). Whole-genome sequencing is important tool in tracing such issues because it gives wide overview and a lot of genetic information about patient.

Journal list menu

Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324 India

Both authors contributed equally: Lekha T Pazhamala and Himabindu Kudapa

Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324 India

Both authors contributed equally: Lekha T Pazhamala and Himabindu Kudapa

Department of Ecogenomics and Systems Biology, University of Vienna, Vienna, Austria

Vienna Metabolomics Center, University of Vienna, Vienna, Austria

ARC Centre of Excellence in Plant Energy Biology and School of Molecular Sciences, The University of Western Australia, Perth, WA, Australia

Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324 India

State Agricultural Biotechnology Centre, Crop Research Innovation Centre, Food Futures Institute, Murdoch University, Murdoch, WA, Australia

Rajeev K Varshney, Center of Excellence in Genomics & Systems Biology, ICRISAT, Patancheru- 502 324, Hyderabad, India

Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324 India

Both authors contributed equally: Lekha T Pazhamala and Himabindu Kudapa

Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324 India

Both authors contributed equally: Lekha T Pazhamala and Himabindu Kudapa

Department of Ecogenomics and Systems Biology, University of Vienna, Vienna, Austria

Vienna Metabolomics Center, University of Vienna, Vienna, Austria

ARC Centre of Excellence in Plant Energy Biology and School of Molecular Sciences, The University of Western Australia, Perth, WA, Australia

Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, 502 324 India

State Agricultural Biotechnology Centre, Crop Research Innovation Centre, Food Futures Institute, Murdoch University, Murdoch, WA, Australia

Rajeev K Varshney, Center of Excellence in Genomics & Systems Biology, ICRISAT, Patancheru- 502 324, Hyderabad, India

Assigned to Associate Editor Henry T. Nguyen.

Complex systems biology

Complex systems theory is concerned with identifying and characterizing common design elements that are observed across diverse natural, technological and social complex systems. Systems biology, a more holistic approach to study molecules and cells in biology, has advanced rapidly in the past two decades. However, not much appreciation has been granted to the realization that the human cell is an exemplary complex system. Here, I outline general design principles identified in many complex systems, and then describe the human cell as a prototypical complex system. Considering concepts of complex systems theory in systems biology can illuminate our overall understanding of normal cell physiology and the alterations that lead to human disease.

1. The science of complex systems theory

Science and technology allow us to understand our environment as well as manipulate it and create new environments and new systems. This led humans to emerge out of nature, and recently to create new complex worlds that highly resemble natural systems [1]. Human-made systems often follow the same design principles governing natural systems. The most important of these design principles is evolution by natural selection [2]. However, human-made systems are not exactly the same as those created by nature. We are gaining an increasing ability to create new complex environments and new machines that perform as well as, or even better than, natural organisms [3]. Man-made complex systems, such as stock markets, or multi-user social online networks, and technologies that can be used to collect and process increasing amounts of data offer us an opportunity to better observe and understand complex systems, natural or man-made. We can increasingly measure the activity of the variables that constitute these systems. This provides a better glimpse at the quantity and connectivity of most variables that control a complex system. When all these variables work together, they make up a system that appears to us as one unit that is alive.

We are beginning to realize that, in general, complex systems, man-made or natural, share many universal design patterns concepts and principles of design that reappear in diverse, seemingly unrelated systems [4,5]. These design patterns are the essential elements for building successful complex systems that can function, compete, survive, reproduce and evolve for long periods through multiple generations towards increased fitness and overall growth. The science of complex systems theory attempts to gain an understanding about these emerging repeating design principles that reappear in different natural and man-made complex systems and environments [6]. The goal of complex systems science is to define more precisely these properties towards a greater understanding of complex systems as a whole, beyond the understanding of one specific system, or one specific design concept. Better understanding these universal principles will enable us to better digest the rapid changes that occur around us due to technological and social evolution [3]. To study and understand complex systems, when possible, researchers conduct multivariate experiments, recording measurements of the system's variables under a relatively controlled condition to track the system's dynamics under different perturbations over time. These measurements and recordings are used for building models. These models are needed for generating hypotheses consistent with the data. Models attempt to represent the system at a coarse-grained abstraction level, a skeleton of the real complex system under investigation. The process of modelling aims to capture the essence of the complexity, abstracting the real system into a manageable size that is cognitively, mathematically and theoretically explainable. Models that simulate real-world complex systems are built to capture the dynamics and architecture of a system to predict the system's future behaviour and to explain its past behaviour. Such models help us to better understand and potentially fix system failures, such as those happening in disease processes inside human cells. The famous saying about models is that they are all wrong, but some are useful [7], and as such, models play an important role in understanding and taming complex systems. From these models, insightful theoretical rules can be extracted.

However, while we desire to have dynamical models that would explain the behaviour of complex systems, in reality, these models are often too difficult to construct, and when constructed, these models suffer from many shortcomings mainly because of missing information. The problem is both lack of data and data deluge. For dynamical models to be realistic, they need to have accurate initial conditions, exact causality between systems variables [8] and defined kinetics. Such data are often not easily observable. Hence, dynamical models of complex systems suffer from the free-parameter problem where many models can fit the same observed data [9]. The other issue with dynamical models of complex systems is the nonlinearity characteristic of complex systems [10]. Because of the complex relationships between the variables in complex systems, the dynamics of the system quickly become nonlinear and complex, most of which current mathematics cannot explain well. Statistical methods such as correlation analysis, on the other hand, are simpler approaches that today are much more practical [11]. Although correlation-based approaches do not provide full explanation of the system behaviour over time, which is because there are so much data, and because data are missing and inaccurate, finding correlations between system variables provides immediate new knowledge.

In biology, emerging technologies such as deep sequencing of DNA and RNA [12], or mass spectrometry proteomics [13] and metabolomics [14], allow a glimpse into the dynamical state of many components making up the complex systems within human cells. These emerging multivariate biotechnologies, although inaccurate and noisy, help accelerate the discovery of the inner workings of cells in their entirety because they can measure the level of thousands of molecular species all at once, in one experiment. As more knowledge is accumulated about complex systems, such as the human cell, this knowledge can be fed back into the mathematical or computational models to refine them, making them more accurate. This additional information adds more power and value to the models' ability to capture the systems’ functionality in greater detail, and this enables making better predictions about how components and processes of the system come together to enable cellular behaviours such as responses to stimuli that induce cell proliferation, cell growth, cell differentiation/specialization or programmed cell death. The goal is to fill in the missing pieces of the model's puzzle towards better understanding of specific complex systems such as the natural cell. With the accumulation of more data, the scientific method is transforming to rely increasingly on the organization, integration, visualization and utilization of background prior knowledge extracted from large datasets that are composed of measurements recorded from real complex systems variables. This computationally organized background knowledge is used to analyse newly acquired data [15]. As technology advances, recorded data about a complex system's history are accumulating more rapidly than our current ability to store and analyse such data for useful understanding or in other words, for optimal knowledge extraction. As storage devices are rapidly decreasing in cost, and devices to record almost everything around us are emerging rapidly, we find ourselves surrounded by a sea of data [11]. Such data provide great opportunity to conquer the secrets of complexity but also overwhelm us with bits and bytes of data with no clear meaning. We often find ourselves only using a small fraction of the measured data, only scratching the surface of a mine full of treasures.

2. Emerging patterns in complex systems

Different areas of scientific research such as computer science, sociology, mathematics, physics, economics and biology are increasingly realizing the importance of complex systems theory, because the same design patterns and concepts are emerging in these different fields of science. Models that capture complex systems' structure and dynamics are commonly explained by a few governing principles such as survival of the fittest [2], rich-get-richer [16] and duplication–divergence [17] whereas in fact, there are more forces all acting in concert to shape the structure and behaviour of many different types of complex systems. In combination, these forces can work in parallel, and sometimes counteract one another, to produce the final outcome behaviour of the system that is manifested as continual dynamical and functional structural changes. Different complex systems have slightly different sets of forces, different ingredients that compose their wholes. The proper combination of design concepts and forces, if understood correctly, can lead to an ability to better create, control, predict and fix the complex systems around us, including ourselves and our society, and our natural, economic and technological environments. The human cell, multicellular organisms, economic systems, intricate engineered systems and the Web are all evolving complex systems existing in complex and ever dynamically changing environments. These systems share similar emerging design patterns, the blueprint for generating a complex system. Some of those patterns can be unravelled using modelling.

3. Complex environments versus complex agents

When using the generalized term complex systems and discussing concepts of complex system design, we can distinguish between two main types: complex environments and complex agents. Complex agents are those systems that have clearly defined boundaries, a physical border that encases the system. Complex agents typically have one or a few central processing units, a clock, as well as mechanisms to efficiently obtain and use energy. The agents commonly include sensors and actuators. These types of complex systems interact with their environments through sensors and their actuators, and can typically move, grow, self-repair and self-reproduce. Often, these agents are aware of their existence. Some examples of complex agents are us, our cells, trees, birds, fish, worms, cars, airplanes and some robots (figure 1). The complex agents exist in complex environments, or within other larger encompassing complex agents. On the other hand, complex environments have less defined boundaries. Their governance is also commonly not well defined. These complex systems typically do not have a central processing unit they do not have a single central brain. Agents in such complex environments are sometimes all similar, or of the same type, or at least have some basic properties in common. Agents in complex environments act as individuals but give rise to the entire dynamics of the system. Examples of complex environments are natural and man-made ecosystems such as flocks of birds, cities, traffic systems, beehives, countries or social networks (figure 1).

Figure 1. Examples of complex environments: flock of birds, beehive, social networks, cities and states. Examples of complex agents: plane, worm, car, fish, cell, bird, tree, robot. Complex environments gradually tend to evolve into a complex agent. Once many copies of a complex agent exist, these copies can populate a new complex environment. (Online version in colour.)

The distinction between complex agents and complex environments is blurry because some typical properties of complex environments are present in some complex agents and vice versa. Complex environments are typically populated by complex agents. Intuitively, complex environments grow faster as they become more complex and diverse. On the other hand, complex agents become less flexible as they grow in complexity, so, in principle, evolution slows down as complexity increases for complex agents. As there is a blurry line that separates complex agents from complex environments, it is plausible that these complex systems are just at different stages of their evolution. The complex environments are at the young, newly created stage of a complex system. Over time, these complex environments will begin to congeal, accumulating properties of complex agents one by one as they evolve towards becoming an agent. However, once the system is completely an agent, and there are many almost exact copies of those agents in the environment, these many interacting agents will populate complex environments (figure 1, arrows). This abstract view can be supported by our basic understanding of how biological natural cells came into being, or how multicellular organisms evolved from unicellular organisms. At first, the system was a complex environment where cellular components such as RNA were mixing in the primordial soup [18]. Once more organization had evolved, cells were formed, surrounded by their membranes. Then membranous cells evolved to have sensors and other components that made them become prototypical agents. Once cellular agents existed and proliferated, they started forming multicellular organisms. The first multicellular organisms were created by the same type of cells, but then cell types emerged where different cells assumed different specialized roles. As cells became increasingly specialized, they also became more dependent on one another, ultimately producing a new type of a complex agent, that is, a multicellular organism. Hence, complex environments may be just at an early stage within the complex system evolutionary process, on their way to gradually moving towards becoming a complex agent once many complex agents of the same type exist in the environment, they can form a new layer of complexity which can serve as a foundation for the next layer.

4. Natural versus technological evolution

Complex systems have emerged through natural or man-made evolution. This has produced parallels between natural and technological systems despite their differences. While natural evolution has been evolving for billions of years, man-made technological and economical evolution has made a significant impact on the Earth only in the past few thousands of years [1]. Hence, evolutionary rates are much different when comparing the two types of complex systems: man-made versus natural [3]. Natural evolution needs to wait for random favourable mutations in the DNA of an organism to occur over many generations, whereas in technological evolution new ideas can become new products overnight. It seems that technological evolution is constantly accelerating it is moving at various rates across the planet, but overall, since the industrial revolution, the rate of complexity of man-made systems seems to be generally accelerating. Different evolutionary rates across the planet are also true for natural evolution. In the rainforest, many species can rapidly emerge because the conditions in that environment are plentiful and favourable for life. There is fresh water, sun and rain, and the temperatures are just right for natural biological life to evolve and thrive. Other areas on the planet such as arid hot or cold deserts do not promote rapid natural evolution, and the emergence of complexity there is slower. Permissive conditions for growth are obvious for natural systems, but less defined for technological evolution. Technological evolution is moving at much faster rates in major cities or on the Web, where interactions between people and the demand for new products are greater than in less habitable regions on the globe. However, there are forces that balance these trends. Geographical diffusion of innovations [19] and the spread of complexity make technological and natural complexity spread to remote places on Earth. Technological complexity is increasingly populating the air, sea and outer space. The sea is full of natural life, but it is not favourable for human life and technological evolution. Space, on the other hand, might be found to be the best place for robots and computers because it is isolated from damaging heat, dust and bacterial agents [20].

5. Types of systems versus their instances

A snapshot of a complex system at one particular moment of time captures the systems variables’ state as they are at that time. Such a frozen-in-time state of a system is the manifestation of the instantiation of variables of different types. The distinction between variable types and instantiation of variables, or complex systems types versus actual complex systems, is critical for introducing more clarity. An instance of a variable that is a part of a complex system, or the state of an entire complex system, typically follows the born-live-and-die cycle. On the other hand, the variable, or the complex system type, is an abstract representation of the kind of variable or complex system it is. It is not an actual physical entity but a template. Both complex system and variable instances, as well as their types, can evolve. However, actual instances of variables, or entire complex systems, evolve only during the time that they are present, or alive, whereas templates can evolve indefinitely. You are an instance of the complex system that is a human template. The template of a variable, or the type of a complex system, the abstract generalization of the kinds of the real thing, can evolve without a need to be bounded to real existence. The template does not have temporal boundaries. In computer programming languages, the distinction between variables and variable types is clear. Variables can be of different types. Variables are first declared to become instantiated. The variables are then assigned the values that fit their type during program execution. Such values can change while the program is running, and the variables containing the values live within the program for a short period of time when the program runs. Similarly, cells have DNA that serves as a template to produce instances of RNA and protein molecules. Such analogies can help with considering the distinction between an instance and a type, or a template, of a complex system or a variable within a complex system.

6. Summary of design principles with initial relations

Complexity theory often focuses on only a few of the design principles of complex systems, most of the time applied to only one real-world complex system: ironically, still reductionism. The reductionist view proposes that complex systems are made of parts, and understanding these parts can lead to the understanding of the entire system [21]. This view dominated science in the past, but it is now accepted that new methods are required to better understand complexity, how the parts come together to give rise to something greater than the parts [22,23]. To achieve such understanding, it may be insightful to examine how design patterns of complex systems are related. To develop intuition about this idea, an initial collection of design principles of complex systems is mentioned below with a brief description of each principle. The next step is to try to identify how these principles are related. The hope is that the relationships between these design principles will become immediately and intuitively obvious. One thing to keep in mind is that definitions of many of these abstract concepts may not be precise this is a problem because one definition may mean different things to different people. These definitions can surely improve, but making them perfect is challenging, and may require formal mathematical representation. The descriptions of the design principles presented below are abstract but real. So try to not worry for a moment about the specific phrasings of the definitions but the essence of their meaning. Some of these design principles are observed in complex systems in general, covering both natural and technological systems, with some hinted relationships between concepts.

Survival of the fittest is a central design pattern shaping complex systems [2]. This concept is an outcome of competition. Competition is often not fair, where the rich and fit usually become richer or fitter faster than the others [16]. Rich get richer is a growth process where the rich, the ones having many relationships, central, essential and fit, grow faster than the poor, lonely, unfit, weak and less-connected. Complex agents in complex environments commonly also grow by duplication–divergence [17]. Duplication–divergence is a known biological design principle of natural evolution that is also common in technological evolution, economics or on the Web. For example, successful car models, websites and software in general evolve through duplication–divergence. Hence, the successful novel and fit complex agent, organism or product can become an attractor, drawing more connections and copies from it than to it [10]. Sometimes successful novel and fit complex agents emerge from the merger of two existing agents, to form a new innovative and more competitive agent, or product, or organism. Once successful, innovative agents replicate and diversify fast. So innovation plays an important role in the continual evolution of a complex system. Innovations can only become realized on the foundation of already existing, solidified and successful previous innovations [19]. Hence, as mentioned above, complex systems are organized in layers where each layer establishes a solid foundation for the next-order layer to be able to evolve.

Another essential and related principle is information transfer. Information is constantly flowing, commonly compressed, decompressed and translated. Transmitters broadcast information, and then sensors intercept it. Agents in complex systems not only have the ability to passively listen and adapt to their environment, but can also communicate with the environment and change the environment to match their need. Sensors pass information about the state of the environment into the internal central processing centres. Before information is passed to such centres, the signal can be amplified and filtered. In the processing centres, classifiers use the information intelligently, learning from experiences to make optimal decisions about responding and adapting to the state of the environment the next time they are exposed to a previously experienced state. Hence, these classifiers use memory to determine the appropriate future response of the agent. Often this response is simply turning on or off a switch. Sensors, and other components that pass information, implement such switches as well as filters and amplifiers to convert noisy information from the environment to valuable and useful messages, often through the process of discretization or digitization. Tagging, symbolizing, grouping and classifying signals are ways to abstract many similar objects and observations related to forms from the environment into abstract simplified representations. Groups and classes are labelled, converted from their physical reality to symbols encoded into messages. These symbols make it easier for the central processing unit to process information from the environment, and to compute the appropriate response, which involves transmitting information to other complex agents. To compute the right response, internal processing centres use learning, memory and adaptation. The ability to adapt to new environments is critical for the survival of the complex agent living in the complex environment. Robustness to fluctuations and changes in the environment is required for overall fitness and viability [24]. However, a balance between rigidity, robustness and tolerance to change versus flexibility to change is required for providing the necessary level of plasticity for proper adaptation [25]. When learning is successful, responses are commonly automated. Automation is also needed for efficient production. Efficient and sophisticated mechanisms are in place to manufacture many (almost exact) replicas of complex agents and their parts. This allows the cycle of birth–life–death to continue, and for the complex system type to continually proliferate. The birth–life–death concept is related to the observation that complex systems and their parts are dynamically replaced by new parts, while global patterns of the entire complex system and ecosystem remain. For example, proteins in a cell continually turn over, water molecules in a river are not the same but the river stays in constant flow, cars on a highway keep passing, blood cells travel through blood vessels, and people commute back and forth to and from work in and out of a big city these are only some examples. In some of those cases, these complex agents, or their parts, circulate. This is the case for blood cells, or the people that commute to work, while in other cases the flowing complex agents, or their parts, are completely replaced every time. Hence, complex systems have elaborate and efficient transportation systems that permit the transfer of resources and agents to remote locations quickly and efficiently. Such transportation systems are commonly organized in a tree-like hierarchical structure, where the leaves of the tree, the terminal locations on the tree-like system, often have a unique address encoded in a string of symbols. The hierarchical structure of transportation systems is common in complex systems. To move around, locomotion is necessary. Locomotion is the ability of complex agents to move about in their complex environment. Economic systems rely on planes, ships and trucks to transfer goods and workers from one unique terminal address to another address. Botanic plants lack the ability to move, and this handicap is compensated with an amazing ability to use solar energy, capacity to extract nutrients from the ground, and capability to pollinate and reproduce effectively without the need to travel. Plants and other complex natural systems have seeds that contain compressed information that can be used to create completely new copies of the same complex agents. Such seeds often have mechanisms to travel and diffuse to reach their target for optimal fertilization. They are generated in many copies where each copy is slightly different, and where only a few will be selected to pollinate the next generation.

Notable barriers are present to protect complex agents from other agents and the outside. These containers, or modules, hide internals from exposed externals. The externals have an interface, facilitating the ability to communicate with the environment and other systems, using standard protocols, symbols and flags. Related to this is the plug-and-play design principle that allows reusability and generality. This principle permits complex systems to work together to form higher-order systems. This modularity creates hierarchies. Interacting complex systems have the ability to switch between individual behaviour and behaviour once in a pack. When in a pack, complex systems often form distinct geometrical shapes. Shapes in complex systems commonly tessellate, forming elaborate mosaics [26]. Polymorphic complex systems in a pack behave randomly in parallel but often display amazing synchronicity. Synchronicity can be achieved through governance, for example, by a conductor who signals to an orchestra, but often synchronicity does not require governance in complex systems. Randomness and noise are required for such emergent behaviour. Noise is also required for other aspects of dynamical behaviour that supports complexity and evolution. Noise is a mechanism needed for overcoming being stuck in an evolutionary minimum state. Randomness and noise result in a constant search for homeostasis, but complex systems never settle at a steady state forever [27]. Complex systems continually grow, improve in fitness and increase in complexity because their environment is constantly changing in that direction [28]. Phase transitions happen in short time periods where a system, being in a rather stable state, goes through one small change that induces many changes, turning the system into another new quasi-stable state [10]. Finding an improved fitness state is a design principle directly related to efficiency and energy utilization.

While most processes in complex systems use energy, and where complex agents compete for energy resources, the systems' utilization of energy is more concerned with overall fitness and less with energy conservation and energy efficiency [29]. This is one of many concepts that makes complex systems different from the typical systems that are studied in physics. However, energy conservation and efficiency can help complex systems to better compete. It is interesting that often dead organisms become the energy resource for other organisms, while the most decomposed organic material, crude oil, serves as the major energy source for the initial phase of the technological evolution we see today. Most complex systems typically produce waste in balanced ecosystems, the waste from one complex system is a resource for another. However, technological man-made complex systems produce waste that is not well recycled. Related to this are feedback loops which are important dynamical structures that set the creation of complex systems in motion. The primordial metabolic soup was made of simple enzymes forming competing feedback loops [18]. Competition involves taking action in markets, where trade makes two or more complex systems winners. Successful trade requires diversity of products and specialization of services. Winners in trade are often the innovators, or the best listeners to innovation. Trade results in cooperation, which can develop into symbiosis: the codependency of two separate complex systems on each other in order to coexist. Unidirectional symbiosis is parasitism. Parasitic complex agents use the success of their hosts for their own survival needs. Successful complex agents must learn how to self-repair and fight parasites, while parasites engage in a game of creative evasion strategies. Parasites sometimes kill their hosts, but not before they replicate and have their copies jump to other hosts, so they can spread.

All the concepts listed above briefly introduce some of the design principles of complex systems with some hinted relationships between them. But more detailed explanations are needed to describe all of these concepts with less ambiguity. In addition, specific examples are required to illustrate how these concepts take shape in real-world natural and technological systems. Such detailed descriptions are beyond the scope of this review here, however, we are concerned with thinking about how some of those general observations about complex systems apply to human cells and how such a perspective can inform systems biology.

7. The human cell: an example of a complex system

The human cell is a complicated living natural machine. Cells that together compose our bodies are a prototypical example of a natural complex system that was evolved and optimized over billions of years. What partially makes human cells a typical complex system is that they are made of many different types of components with many copies of the same components, all working together, interacting in concert and in parallel to form a high-order functional entity that is a part of an organism.

We are made of approximately 50 trillion cells. Almost all these cells contain the same genetic code which is made of long DNA molecules that are strings that hold the template and symbolic instructions that are needed to make an entire organism. Information about how to construct a complete organism is well compressed in the nuclei of human cells. Although the DNA in all our cells is the same, the approximately 400 different cell types constituting our body are markedly different from one another. This is because within each cell type, different sets of genes are expressed. This differential expression of genes is the result of the different extracellular signals that instruct cells how to behave. Cells receive extracellular signals from other cells telling them which genes to express, and in turn, what proteins to make and ultimately how to behave which cell type they should become. Cells can form elaborate structures and become specialized due to such cell–cell communication protocols that result from either cell–matrix interactions, or from paracrine or endocrine signals coming from other cells carried by small molecules that can pass through the cell membrane, or bind to receptors at the cell surface. These are the complex system sensors. Intracellular cell-signalling pathways are triggered by the complex combination of the extracellular factors all acting in parallel to inform cells about the state of the environment. This form of signalling controls the dynamics of gene regulatory networks that determine the cell's gene expression programme. Cell surface receptors span through the cell's plasma membrane lipid bilayer. This is the barrier of the cell's complex system. These receptors listen to what is happening outside the cell and communicate changes from the environment to components inside cells. When the biochemical concentration of a neurotransmitter in a brain region, or a hormone in the blood, is altered, receptors on the cell's surface can become activated or inhibited. Information of such change is communicated into the cell's central processing unit machinery, which is an intricate signalling network of proteins and metabolites that amplify, filter, process, decode and transmit information. Extracellular small molecules called ligands, such as hormones, neurotransmitters or drugs, bind directly to receptor proteins. The binding of extracellular biomolecules to receptors potentiates receptors to transduce signals by changing the receptors' three-dimensional structure. This change in structural conformation of a receptor results in other proteins present inside the cell, such as enzymes, to change their activity level, for example, by binding or unbinding to the receptors. These intracellular interactions can lead to activation of other enzymes that catalyse biochemical reactions inside the cell. These biomolecular dynamics result in the transfer of information from the outside of the cell into the cell's internal regions. A cascade of biochemical reactions is constantly acting inside cells in parallel where different signalling pathways are constantly becoming activated and deactivated. Hence, information from thousands of receptors of different types, present on the surface of each cell, is integrated to determine the cell's behaviour. This can be achieved by regulating gene expression through activation or inhibition of transcription factors. Transcription factors are proteins that bind to the cell's DNA to regulate gene expression. Other effectors of cell-signalling events are proteins that regulate protein translation, protein degradation, electrical activity modulation through post-translational modifications of channel proteins in the membrane, as well as regulation of several other cellular machineries and organelles inside cells [30].

One of the outcomes of such regulation is the ability of some human cells to crawl [31–33]. The direction and speed of the crawl are determined by the cell signalling network [32], and can be considered one of the cell's actuators. Another organelle that is regulated by the cell signalling network is the mitochondrion. The mitochondria in cells are acting as engines and sensors [34]. They produce the common currency energy sources ATP, GTP and NAD+. These energy-charged molecules can be used by many proteins to perform their work. Interestingly, the mitochondria in cells sense energy levels, and if they receive certain signals, the mitochondria can induce programmed cell death, also called apoptosis [35]. Such altruistic behaviour is initiated by the mitochondria by releasing proteins that trigger signals that lead the cell to commit suicide for the betterment of the entire organism. The mitochondrion's evolutionary origins also exemplify symbiosis. The similarity of the mitochondria to some bacteria that exist today strongly suggests that cells were initially infected with the bacteria, and gradually the bacteria became part of the cell through an evolving endosymbiotic relationship [36].

Programmed cell death is sometimes needed if the cell is damaged or infected. However, before taking such a drastic measure to deal with infection or damage, cells evolved to have defence and self-repair mechanisms. One example of a defence system in human cells is the interferon response to viral infection [37]. Cells have specific receptor and intracellular proteins that can detect viral double-stranded RNA, and signal to the cell signalling network to turn on an immune response. Such an immune response signals to neighbouring cells the news about the infection, as well as triggering an internal reaction to deal with the foreign object in various ways [38]. Similarly, an example of a self-repair mechanism is the DNA damage response, a machinery that can repair double-stranded DNA breaks [39]. The DNA damage response machinery is also linked to the programmed cell death machinery. If the DNA damage is too extensive, the machinery signals to the cell signalling network to activate apoptosis. The DNA damage response machinery is also linked to the cell cycle apparatus, the amazing ability of cells to efficiently self-reproduce a copy of themselves. If DNA damage is detected, the cell cycle programme is signalled to halt. Cell damage can be caused by reactive oxygen species, a by-product of metabolism [40,41]. This can be considered one of the cell's waste products. Cells have developed mechanisms to neutralize reactive oxygen species as well as use them for cell signalling, but at elevated levels these can cause damage and lead to disease. Another example of a cell waste product disposal mechanism is the recent observation that our brain shrinks while we sleep. A recent study suggested that this is needed to remove metabolic toxins accumulated during the day while we were awake and using our brains fully [42]. In Alzheimer's disease, the amyloid plaques that form in the brain could be considered cellular waste that is improperly handled [43]. The circadian cycle in cells is only one of several clocks that are embedded within the cell signalling and gene regulatory networks. These clocks ensure the cyclic regulation of processes that need to be active periodically [44,45]. The above connections between general design patterns observed in many complex systems and those observed in human cells are visually summarized (figure 2). The connections listed are not all inclusive and only made here to illustrate the general concept. It is also expected that as we increase our understanding of the internal components of human cells, many more examples will emerge.

Figure 2. The human cell is a prototypical complex system. In red and outside the box are general complex systems properties. Inside are manifestations of these abstract concepts in human cells. Review articles that further explain some of the subcellular systems mentioned in the figure are as follows: cell crawling [31–33], mitochondria [34–36], interferon response [37,38], cell signalling network [30], DNA damage response [39], reactive oxygen species [40,41], circadian rhythms [44,45] and autophagy [46]. (Online version in colour.)

8. Conclusion

Cells and their internal constituents are too small for us to observe with the naked eye, and the macromolecular components within cells are only possible to observe with the best microscopes. Until recently, we could only study a few molecular components within a cell in a single experiment. However, with the new biotechnological breakthroughs of the past few decades, we can now understand the inner workings of cells at a greater global scale with refined resolution and detail. This is because these emerging new biotechnologies, for example DNA, RNA and protein sequencing, can measure the level of many molecular species in a single experiment all at once. These technologies produce snapshots of the state of the many variables composing the cellular complex system. This revolution in cell and molecular biology is called systems biology [22], a term that is now interchangeable with big data bioinformatics [47]. It enables the understanding of cell regulation more globally and more holistically. However, to achieve such understanding, new theories explaining how all these parts come together to produce high-order functions are also required. But before such theories can form, we need to be able to handle the masses of data collected using these new technologies. With the rapid reduction in cost for computing and storage, and technologies that permit recording almost everything, we can now track the state of the variables that make up many types of complex systems, over time and under various controlled or natural spontaneous perturbations, including human cells. How many such data do we need to collect in order to build an accurate coarse-grained representation of an entire human cell system? How can we best extract the knowledge nuggets from such data, and make predictions about behaviours and conditions of the system that are not yet measured, or not yet seen? How can we visualize and integrate these high-dimensional data? These are some of the grand challenges facing data scientists today including computational systems biologists.

The field of systems biology is both data-rich and data-poor. It is data-rich because there are mounds of data already collected but needed to be further analysed, and data-poor because the system is so complex and so difficult to observe, and thus currently, the data that we have already collected are clearly insufficient to fully understand the intricate molecular mechanisms that drive human cellular behaviour.

Currently, we do not fully understand all of the molecular details about how cell signalling networks actually integrate and process information to regulate cellular function. Open questions include how the many different ligands, diffusing in the extracellular media, and capable of binding to different and multiple receptor types, initiate intracellular activity changes that result in alternative cellular phenotypes. Until recently, cell and molecular biologists had been using a reductionist approach to study such a complex system. Reductionism in biology has entailed that experimentalists spent their entire scientific careers focusing on analysing only one, or a few, genes and their protein products where in fact, each mammalian cell has thousands of different types of genes and proteins expressed from these genes, to function altogether simultaneously. All these different types of proteins are working together in concert, influencing each other's activity and level of abundance. However, because such biomolecules are so small, we cannot see exactly how they work, and we have to resort to measuring their activity using indirect methods. Studying only a few genes or proteins by individual laboratories still dominates biomedical research today. The information from the labour-intensive low-throughput single-gene experiments conducted by many different laboratories around the world is continually accumulating. Information from such studies, characterizing individual proteins and their interactions, can be used to reconstruct, through data integration, a more global picture of the cell regulatory puzzle [30]. However, such data collection suffers from research focus biases [48] and reproducibility concerns [49]. However, systems biology approaches are gradually becoming the new standard. The concept of studying systems in biology was introduced before, but then not enough molecular details were available to link molecular interactions to system behaviour [22].

In recent years, much excitement has been generated from the opportunities presented by the promise of artificial intelligence and machine learning, and in particular deep learning. Deep-learning applications to systems biology can indeed accelerate discovery by knowledge imputation [50]. Deep learning can provide answers without the need to know all the details, but can also discover new knowledge that researchers overlooked, similarly to the way a deep neural network discovered new strategies for the game Go, strategies never considered by humans for over 2000 years of mastering the play of this complex game [51]. Technological evolution is also seeing rapid progress due to advances in making deep-learning algorithms more accessible through specialized hardware and open-source easy-to-use software libraries. While these developments enable progress, such progress can sometimes be achieved without fully understanding the implications drawn from a complex systems theory perspective. In this review, I have attempted to further highlight the importance of obtaining a deeper understanding of the human cell as a complex system, as well as other complex systems around us and inside us.

Network inference

The emergence of high-throughput methods has revolutionized the study of diseases in the past decade and has greatly facilitated the exploration of interactions between biological entities. However, examining all of these experimentally still remains technically and financially infeasible. Reconstructing the underlying dependent relationships from observed data arises as an alternative strategy, which can also help potentially generate new biological hypotheses. These techniques have been widely explored in transcriptomics, where microarray gene expression data are used to infer GRNs. Other inferential networks of interest are transcript-binding network, PPI networks, gene co-expression networks and metabolic networks. As most methods are generalizable, we will review methods with a focus on transcriptome applications. Such data-driven network inference strategies have recently demonstrated great potential in discovering networks perturbed in disease, as discussed later in this section.

The simplest way to estimate the pairwise relevance is by correlation coefficients or mutual information (MI). On top of that, we can either define module networks, such as implemented in the weighted gene co-expression network analysis toolbox (WGCNA) [ 47], or generate a normal network by opting out those edges with relevance below a certain threshold. Networks generated in this way are undirected and known as correlation/relevance networks. Caution should be taken in direct application of these methods, as they are likely to generate numerous indirect connections as false positives. To overcome this limitation, context likelihood of relatedness (CLR) [ 48] derived a new score based on the distribution of MI to serve as the edge attribute, so that false-positive rate can be controlled. Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE) [ 49] performed an extra filtering step in which the weakest edge in each triplet would be interpreted as indirect interaction and therefore be removed. This approach was found to be helpful in a GRN inference study in mammalian cells [ 49], but its computation complexity increases significantly with the network size. MRNet, which combines both criteria in CLR and ARACNE, was implemented in the R package MINET [ 50]. Such methods, though superior in simplicity, have limitations in identifying joint regulation effects, and their computational advantages disappear when extra filtering steps are involved to control the false-positive rate.

Network inference can also be formulated as a regression problem, such as in TIGRESS [ 51]. For example, the expression of one gene (response) is considered as a function of the expression of all other genes (predictors). In TIGRESS, lasso is applied to help yield sparse patterns, as links identified in this way are less likely to be indirect. The output of this strategy is also well known as an estimator of partial correlation relationships between genes. Owing to the predefined setting of ‘predictor’ and ‘response’, networks inferred in this way are both weighted and directional. GENIE3 [ 52] is a similar algorithm that implements random forest regression instead of lasso. This algorithm was recently extended in iRafNet [ 53] (available as an R package), where multiple data are integrated to significantly reduce the search space.

Probabilistic graph models, such as Bayesian network (BN) inference [ 54], could also be used to estimate direct influences. They compute the probability of the observed data given various a priori networks. Then, the one with the highest probability is selected as the most probable network. BN algorithms can capture linear, nonlinear, stochastic and combinatorial relationships between variables and are powerful for handling noisy data given their probabilistic nature. However, these BN algorithms are usually time-consuming. BNFinder [ 55] addresses this concern by taking advantage of multiple CPU cores, speeding up the algorithms linearly with increasing the number of cores. BNFinder has been recently used to identify genes with transcriptomic changes in smokers and to estimate the directional relationships between these changes [ 56]. In this work, the search space was further narrowed by performing BN inference only on those genes that were differentially expressed between groups.

BNs are directed acyclic graphs and do not allow feedback loops, which are important features in many biological networks. To overcome this limitation, dynamic Bayesian network (DBN) algorithms have been proposed. In these algorithms, DBNs are estimated as a function of time series observations, where each entity is unfolded into several nodes corresponding to the time points. Then, the algorithms construct priors, indicating the statistical dependence between variables at the initial time point, and a transition network, indicating the dependence between nodes at consecutive time points. Unlike BN algorithms, DBN algorithms have the power of detecting cyclic loops through the transition networks. Most tools, such as Banjo [ 57], SEBINI [ 58] and BNFinder [ 55], are designed for both static BN and DBN inference. An alternative approach for time series data is ordinary differential equations, which are specifically designed for inferring dynamic interactions between entities. In these models, the change in one measure, rather than the measure itself, is assumed to be the outcome of all other variables. Simple module methods and correlation/relevance network inference approaches are also applicable to time series data. TD-ARACNE [ 59] and the recently proposed algorithm, MIDER [ 60], are two example packages that allow time series data as input.

One natural question when facing many so methods to choose from is which of the methods is the best in a given circumstance. However, this question has not yet been conclusively answered. Even though many new methods claim themselves to be superior, it could also be argued that these methods are complementary rather than competitive to each other. All methods have their own advantages and limitations, especially when applied under different experimental settings. Marbach et al. [ 61] did an extensive comparison study on most of abovementioned methods. They evaluated those methods based on the area under precision–recall curve, which accounts for both false-positive rate and true-positive rate. Interestingly, they found that no method performed optimally across all experimental settings. In contrast, integration of multiple network inference methods shows the most robust and high performance across diverse data sets. This finding implies that different methods can only capture partial network structures individually, but fortunately can complement each other well. The same conclusion was also made in a previous work [ 62]. Inspired by this, efforts have been made to develop tools that combine networks inferred by multiple methods (e.g. NAIL [ 63]). However, most of these methods will become limited when the data volume is significantly increased. Relevance networks would perform better in high data volume situations, but unfortunately they are not directional and will be more likely to include many indirect edges with a high false-positive rate. A simple yet effective strategy to solve this problem is to narrow down the search space using prior knowledge as discussed in [ 53] and [ 56]. Though the majority of existing network inference methods are currently used for GRNs and co-expression networks, they are equally valuable for inference of other networks, such as metabolic networks [ 64]. Finally, most current studies are focused on network inference under various experimental conditions, but some have extended these methods to assess networks that are disturbed in diseases. For example, two recent studies [ 65, 66] have successfully used these methods to identify several key regulators of transcriptomic changes in AD.


The goal of our study was to demonstrate the use of an integrative systems approach for connecting gene expression patterns to physiological characteristics, thereby providing mechanistic insight into genome function under abiotic stress conditions. Central to our approach is the use of the genomic signature concept to characterize the plant stress phenotype and provide a link to the underlying network pathways, modules, and eventually genes. The use of expression array data to create a signature cataloging system (reference signature database) has been used previously to characterize chemical perturbations on tissue samples and cell culture populations [47, 48], and more recently to link genes and disease states to potential therapies [22, 23]. In the present study, we extend the signature cataloging approach to plant biology/ecological genomics by using the ATGenExpress abiotic stress dataset to compile our first-generation reference signatures database.

Validation of the reference database, and the approach in general, was accomplished with independent datasets for UV-B [41], cold [42] and our own datasets for heat, drought, and the simultaneously imposed heat-drought treatments. Altogether, more than half of the stress treatments included in the signature database were scanned by independent query signatures. Our results are encouraging and show that despite differences in array platform, growth conditions, and even the application of treatments, the signature approach is robust in classifying the plant stress phenotype. This was particularly evident with highly conserved stress specific responses such as heat and UV-B. At the same time, our results illustrate the complexity of the stress response that is characteristic of cross-talk pathways [8–14] and multiple secondary effects from prolonged treatments. For example, the early cold stress query signatures (3 h and 6 h) showed very high similarity to cold signatures with only weak similarity scores to other signature phenotypes. Alternatively, the 24 h cold query showed similarity to cold signatures as well as drought and osmotic signatures. This result likely reflects the secondary effects of the prolonged (3 h and 6 h vs. 24 h) cold treatment. Not surprisingly, the co-occurrence of cold and dehydration response reflected in their signaling pathways, or cross-talk, is widely reported in the literature [7, 49].

One promising aspect of the signature approach as applied in this study is in the potential use for classification of the dual imposed heat/drought treatment. In nature, a departure from the homeostatic equilibrium, or stress, is often brought about by multiple environmental factors [46]. Heat and drought, for example, are co-occurring stresses that have been implicated in severe yield losses ([46], and citations within). In this study, the highest similarity scores were observed with the heat and drought reference signatures, but the significance of the drought score was dependent upon the depth the signature lists interrogated (see Additional File 5). One explanation for this finding is that the drought responsive transcripts were further down the signature list than the more responsive heat induced transcripts, thereby requiring a greater depth of the signature lists to be compared. This suggests that care must be taken with comparisons between multiple stress phenotypes. However, our results are encouraging in this regard and future research should consider additional statistical means for determining depth of signature list comparisons.

Network theory and analysis was used in an attempt to relate the phenotypic signature information to genome-wide transcriptional programs. Network theory, in general, is promising in this regard because it allows us to view the biology as a system of networks and interacting modules [27]. Here, we use the weighted gene coexpression network approach recently proposed by Zhang and Horvath [20], which has been used successfully to link molecular targets to oncogenic signals [30], complex traits (e.g., mouse weight [50]), and even network divergence between human and chimpanzee neural patterns [19]. This approach is particularly relevant for our application because it is based on unsupervised clustering, bypasses multiple testing problems when relating gene information to physiological traits, and does not need a priori gene ontology information. The latter point is especially important for ecological genomics, which continues to transition from the use of model organisms to those of more ecological relevance.

Results from weighted gene coexpression network analysis produced six distinct modules from the abiotic stress dataset. Importantly, this unsupervised approach grouped genes into network modules that are reflective of biological process. For example, brown module genes clearly participate in photosynthetic processes while turquoise module genes contribute to starch and sucrose regulation. In addition, specific stress responsive modules were identified. The green module, for example, was almost entirely unique to the heat stress pathway and was, in fact, enriched with genes known to participate in heat responsive programs. Equally interesting was the identification of module genes participating in multiple stress responsive pathways. This was apparent for modules consisting of conserved metabolic pathways i.e., brown (photosynthesis) and blue (starch/sucrose metabolism) modules.

One of the more promising aspects of weighted gene coexpression network analysis was the identification of a common abiotic stress responsive module (red module) that enriched differentially expressed genes for all treatments investigated. The most connected gene, or hub, within this module was an uncharacterized ankyrin repeat family protein that was specific to our analysis. Ankyrin proteins have been reported to act as regulators in salicylic acid signaling, which is a key molecule in signal transduction of biotic stress responses [51]. The discovery of this ankyrin family member as the hub in our common stress responsive module suggests that salicylic acid signaling may play a role in abiotic stress response, which would corroborate results from exogenously applied salicylic acid [52]. In addition, this common stress responsive module was enriched with genes known to participate in calcium and calmodulin signaling pathways, which have been shown to participate in a multitude of cellular functions including cell death [53].

Although our findings are robust within the current context, a number of questions remain to be answered. For example, the reference database is generated from immediately perturbed systems that typically exhibit marked and highly significant changes in transcript abundance, and does not include acclimated states where changes in transcript abundance are typically smaller. This has recently been shown in studies investigating changes in gene expression in response to long-term growth at elevated carbon dioxide concentration [54–56]. Therefore the feasibility of scanning the database with a signature from a fully acclimated organism and obtaining a highly correlated signature is uncertain. However, we hypothesize that the acclimated state will also be characterized by unique expression patterns that, in theory, should be amenable to our approach. Like Lamb et al. [22], we are also uncertain how to interpret the significance of the similarity score. Unique to our approach is the use of ordered lists statistics to compare signatures. This statistical test provides a p-value based on permuted data that indicates if signature comparisons are more similar than by chance alone. Unfortunately, the interconnectedness among stress responsive pathways resulted in low p-values even for some low similarity score comparisons (data not shown). However, we are reluctant to disregard the p-value entirely, because as the reference signature database grows and more diverse datasets are included, the p-value may help assign phenotypes to general category (e.g., abiotic stress vs. development).

Here, a first-step approach toward classifying and understanding the processes behind the plant stress phenotype is presented. We integrated two analytical techniques that have traditionally been applied only within the biomedical community. Results from our adaptation of the these techniques show that one can take an unknown query signature and through pattern matching software scan a reference database to classify both singular and multiple plant stress phenotype(s). Then, one can use a number of inferential techniques to link phenotypic attributes to their corresponding signaling modules and genes. In essence, this technique provides a tool allowing one to navigate the potential phenotypes of a given Arabidopsis genotype. In the current context, the approach is restricted to a single organism. However, a number of technical advances, including sequence-based transcriptomics [57], comparative gene ontology algorithms [58], and analytical approaches for linking network characteristics to quantitative genetics [59] illustrates the potential to enrich our methodology to address questions of evolutionary and ecological interest, particularly physiological trait development.

There are two attributes of our approach that facilitate its use for such purposes. First, the technique is applied within a network framework. Network theory, has been well received in molecular biology for providing a 'systems biology' framework for the discipline (see [60] for a historical perspective), and has more recently been proposed as possible means for determining the evolutionary basis of complex phenotypic traits [61]. Second, and just as important, is the potential to link our approach with population-based genetic analyses. Many of the molecular-based, systems biology experiments are conducted within a narrow adaptive context, with little or no regard for other nonadaptive evolutionary forces (drift, mutation, recombination, gene flow). The inclusion of a genetic association with network analysis, as demonstrated by [59], and placed within a population genetic context allows the appropriate testable null models (e.g., genetic drift) to be included in such studies. Therefore, relating genomic information to genetic information, e.g., quantitative trait loci, is not only possible, but crucial for those interested in exploring the full potential of the evolutionary mechanisms shaping phenotypic development.

Although useful in its current form, we envision that the true potential of our approach will be realized when the scientific community accepts, critiques, and eventually amends our methods with current and future applicable analytical and technological improvements. To facilitate this process, we conducted our analysis within the publicly available R statistical language and make available to the scientific community our signature compendium, R scripts used within, and a brief tutorial illustrating the process, with the near-term goal of providing the community with an integrative systems tool for connecting genes and signaling networks to phenotypic characteristics in order to further the continuing goal of understanding plant genome function.


Data collection, merging, and standardization

An overview of the general approach used for data collection and analysis is provided in Supplementary Figs. 1 and 2. Gene expression profiles were selected from curated public-access repositories Gene Expression Omnibus ( 34 and ArrayExpress ( 35 To be included in initial analysis studies had to: (a) be performed in the rat (Rattus norvegicus) or human, (b) profile chondrocytes from tissue or in vitro culture systems, (c) provide adequate phenotypic information, (d) provide complete raw data for a minimum of three biological replicates, (e) be performed on Affymetrix microarray platforms (Affymetrix ® Inc., USA) using 25-mer oligo probe sets. All studies released up to December 2015 were considered. All raw data were imported into and analyzed using R. 36 A quality control and pre-processing pipeline was applied to each autonomous study, and these assessed for systematic technical issues. Expression data were background-corrected using the RMA algorithm 37 with cyclic loess normalization method applied across each data set. Probe sets were re-annotated with the appropriate Ensembl gene identifier. Expression data for each gene were aggregated and collapsed into a single-gene measurement consisting of the maximum mean expression value using the “collapseRows” function in the WGCNA. 38 The output of this workflow was a normalized matrix of expression values consisting of one summarized gene per row. Intersection of data sets by common gene identifiers was performed such that all data sets contained the same gene identifiers. The matrix of merged data sets was termed a “meta-set”. The rat meta-set consisted of 115 arrays (10,159 common annotations) and the human meta-set consisted of 129 arrays (11,392 common annotations). A Z-score normalization was applied to each species meta-set using the inSilicoMerging R package. 39 A complete description of data sources and retained samples is provided in Supplementary Data SD43 (human) and SD44 (rat).

Weighted gene co-expression network analysis

To establish universal gene identifiers and facilitate comparison across species, rat gene identifiers from Affymetrix probes were re-annotated with human Ensembl gene orthologs using biomaRt, an R interface with the Biomart database ( 40 Only identifiers that were common to both meta-sets were retained and genes with a global variance less than 0.3 were removed to reduce noise and computational demands. The data met the assumptions of a scale-free network. The general approach employed to develop co-expression modules is described by Miller et al. 16 and is outlined in graphical workflows in Supplementary Figs. 1 and 2. A consensus network represents a single network arising from multiple sources of data constructed from the weighted average of correlation matrices from both the human and rat in this study. By definition, consensus modules are the branches of a clustering tree developed from a consensus gene dissimilarity, comparable to the single-network approach consensus modules contain genes that are closely related in both networks, i.e., the modules are present in both networks. 41 Consensus network and module generation were performed in WGCNA (version 1.49) with the following changes to the default settings for consensus network generation: β = 7, deepSplit = 1, cutHeight = 0.25, and a minimum module size of 30 genes. This was consistent with the parameters used for single-species network generation. Only consensus module colors are equivalent across the species. Modules were characterized based upon a representative gene termed the “ME defined as the first principal component of the module gene expression profile. MEs were used for module–trait association analysis, differential eigengene network analysis, and for differential gene expression analysis. Difference in expression across trait groups was tested using a Kruskall–Wallis one-way analysis of variance. A gene's module membership (k ME) is defined as the Pearson correlation between each gene and each ME genes with high k ME values were considered “hub” genes and were highly co-expressed within a subnetwork. How well these hubs were preserved across species determined by correlating gene k ME values between species.

Module preservation statistical tests 42 were used to assess how well network properties of a module in one reference data set were preserved in a comparator data set (modulePreservation function in WGCNA). Preservation statistics are influenced by a number of variables (module size, network size, etc). A composite preservation Z-score (Z summary) was used to define preservation relative to a module of randomly assigned genes where values 5 > Z < 10 represent moderate preservation, while Z > 10 indicated high preservation. The composite statistic summarized density-based and connectivity-based preservation statistics (Eq. 1):

Density-based measures assessed whether module nodes remained densely connected in a test network connectivity-based measures defined whether intranode connectivity patterns in the reference network were similar to those in the test network. A separate summary p value for module preservation, given as the median of the log-p values for the associated permutation Z statistics, was calculated. Permutation tests, where the module labels of the test network were randomly permuted, were employed to determine the significance of the observed preservations test statistics. A module of randomly assigned genes, “gold” (R21) module, was prepared as a sham module to evaluate bias in the module preservation across species. The reader is referred to other sources for glossaries of terms associated with WGCNA. 16, 42

The gene expression profile for a consensus module of highly co-expressed genes could be summarized by a single representative gene, the eigengene (described as the first right-singular vector of the standardized expression profile for each module), i.e., a module could be characterized by a single representative gene. 41 Eigengene networks for single (species-specific) co-expression networks were prepared using the correlations exhibited by pairs of eigengenes from different modules where the connection strength (adjacency) between eigengenes (E) I and J (Eq. 2):

The study considered the correlation preservation between all pairs of consensus MEs across the two species networks, AEigen (human) and AEigen (rat) , where AEigen (s) is the adjacency matrices for data set (s) defined in Eq. 2. A preservation network Preserv (human,rat) = Preserv(AEigen (human), AEigen (rat) ) was prepared in which adjacencies are defined as:

where (E_< m>^> ight)>) is the eigengene of the I-th module in the data set s. High values of (< m>_<< m>>^<< m>>) indicated robust preservation, across the two networks, of the correlation between module eigenegenes I and J. The scaled connectivity C I, or degree, for the I-th module (Eq. 4) is described as the mean connection strength with all other eigengenes the scaled connectivity of the preservation network is given by:

(where N denotes the number of MEs) this value is found to be close to 1 if there is preservation of the correlation between the I-th eigengene and all other eigengenes across the two networks. The density of the eigengene network D(Preserv (human,rat) ) (Eq. 5), defined as the average scaled connectivity, is given by:

Values of D(Preserv (human,rat) ) that are large, approaching 1, indicate strong preservation of correlation between all the eigengene pairs across the two networks (human and rat). Procedures to detect modules in networks could be applied to eigengene networks to find modules of highly positively correlated eigengenes, term “meta-modules”. 41

Module–trait relationships

To determine whether modules were associated with chondrocyte phenotypes or traits the MEs were correlated with a binary matrix coding, the membership of an individual sample to a phenotypic trait or experimental group (1 = member, 0 = non-member). Multidimensional scaling plots of each meta-set was used to define clusters of samples rather than using the phenotypic data from the published data set to define sample groups.

Network visualization and annotation

The module network structure, consisting of nodes (genes filtered for high module membership, kME) and edges (weighted intramodular connections based upon the topological overlap matrix) were represented graphically using Cytoscape (v3.3.0, January 2016). 43 Only nodes with high degree were retained for clarity. Enrichment of protein–protein interaction networks was assessed using STRING v10 ( 44, 45 Pathway enrichment analysis was undertaken for each consensus module using the ConsensusPathwayDB platform (release 31 September 2015) ( 46 Modules were functionally annotated using DAVID ( 47

Class prediction analysis

Class prediction analysis was performed using the pamr package implemented in R 48 (see Supplementary Fig. 3). This method employs a “nearest shrunken centroids” approach to determine cohorts of genes that best characterize classes from high dimensional data. The average gene expression for every gene in a class is divided by the within-class SD for the gene this is the standardized centroid for each class. The gene expression profile of a new (test) sample is compared to the class centroid in nearest centroid classification the predicted class for the new sample is the nearest class centroid by squared distance. The nearest shrunken centroid modification “shrinks”, by a threshold value, all class centroids toward an overall centroid the threshold is defined by a 10-fold cross-validation for a range of threshold values. Genes from modules with important trait associations were used as the selected features for class prediction where the two classes were “healthy” or “osteoarthritic” cartilage. Classification training was performed on gene expression data (Illumina) from an independent data set 49 profiling healthy (n = 7) and osteoarthritic (n = 33) cartilage. This was repeated for each of ten randomized test and training sets. Receiver operator characteristic (ROC) curves and area under the curveanalysis was undertaken using the ROCR package in R for each gene signature. 50

Code availability

Code contributing to the analysis presented here is available in Supplementary Methods 1 with supporting processed and annotated data files for rat and human.


West-Eberhard MJ: Developmental plasticity and evolution. 2003, Oxford: Oxford University Press

Dewitt TJ, Scheiner SM, (eds): Phenotypic plasticity: functional and conceptual approaches. 2004, Oxford: Oxford University Press

Schlichting CD, Smith H: Phenotypic plasticity: linking molecular mechanisms with evolutionary outcomes. Evol Ecol. 2002, 16: 189-211.

Pigliucci M: Phenotypic plasticity: beyond nature and nurture. 2001, Baltimore, MD: Johns Hopkins University Press

Nijhout HF: Development and evolution of adaptive polyphenisms. Evol Dev. 2003, 5: 9-18.

Evans JD, Wheeler DE: Gene expression and the evolution of insect polyphenisms. Bioessays. 2001, 23: 62-68.

Spradbery JP: Wasps: An account of the biology and natural history of solitary and social wasps. 1973, London: Sidgwick & Jackson

Greene A: Dolichovespula and Vespula. The social biology of wasps. Edited by: Ross KG, Matthews RW. 1991, Ithaca: Comstock Publishing Associates, 263-305.

Brothers DJ: Phylogeny and evolution of wasps, ants and bees (Hymenoptera, Chrysidoidea, Vespoidea and Apoidea). Zool Scripta. 1999, 21: 233-249.

Romualdi C, Bortoluzzi S, D'Alessi F, Danieli GA: IDEG6: a web tool for detection of differentially expressed genes in multiple tag sampling experiments. Phys Genomics. 2003, 12: 159-162.

Krieger MJB, Ross KG: Identification of a major gene regulating complex social behavior. Science. 2002, 295: 328-332.

Gore JC, Schal C: Gene expression and tissue distribution of the major human allergen Bla g 1 in the German cockroach, Blattella germanica L. (Dictyoptera : Blattellidae). J Med Entomol. 2004, 41: 953-960.

Scharf ME, Wu-Scharf D, Zhou X, Pittendrigh BR, Bennett GW: Gene expression profiles among immature and adult reproductive castes of the termite Reticulitermes flavipes. Ins Mol Biol. 2005, 14: 31-44.

Mauro VP, Edelman GM: The ribosome filter hypothesis. Proc Natl Acad Sci USA. 2002, 99: 12031-12036.

Mathavan S, Lee SGP, Mak A, Miller LD, Murthy KRK, Govindarajan KR, Tong Y, Wu YL, Lam SH, Yang H, et al: Transcriptome analysis of zebrafish embryogenesis using microarrays. PLOS Genet. 2005, 1: 260-276.

Wagner A: Energy constraints on the evolution of gene expression. Mol Biol Evol. 2005, 22: 1365-1374.

Arbeitman MN, Furlong EEM, Imam F, Johnson E, Null BH, Baker BS, Krasnow MA, Scott MP, Davis RW, White KP: Gene expression during the life cycle of Drosophila melanogaster. Science. 2002, 297: 2270-2275.

The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology. Nature Genet. 2000, 25: 25-29.

Wagner RA, Tabibiazar R, Liao A, Quertermous T: Genome-wide expression dynamics during mouse embryonic development reveal similarities to Drosophila development. Dev Biol. 2005, 288: 595-611.

Goodisman MAD, Isoe J, Wheeler DE, Wells MA: Evolution of insect metamorphosis: A microarray-based study of larval and adult gene expression in the ant Camponotus festinatus. Evolution. 2005, 59: 858-870.

Jiang M, Ryu J, Kiraly M, Duke K, Reinke V, Kim SK: Genome-wide analysis of developmental and sex-regulated gene expression profiles in Caenorhabditis elegans. Proc Natl Acad Sci USA. 2001, 98: 218-223.

Marinotti O, Calvo E, Nguyen QK, Dissanayake S, Ribeiro JMC, James AA: Genome-wide analysis of gene expression in adult Anopheles gambiae. Ins Mol Biol. 2006, 15: 1-12.

Arbeitman MN, Fleming AA, Siegal ML, Null BH, Baker BS: A genomic analysis of Drosophila somatic sexual differentiation and its regulation. Development. 2004, 131: 2007-2021.

Swanson WJ, Vacquier VD: Reproductive protein evolution. Annu Rev Ecol Syst. 2002, 33: 161-179.

Cutter AD, Ward S: Sexual and temporal dynamics of molecular evolution in C. elegans development. Mol Biol Evol. 2005, 22: 178-188.

Meiklejohn CD, Parsch J, Ranz JM, Hartl DL: Rapid evolution of male-biased gene expression in Drosophila. Proc Natl Acad Sci USA. 2003, 100: 9894-9899.

Singh RS, Kulathinal RJ: Male sex drive and the masculinization of the genome. Bioessays. 2005, 27: 518-525.

Boomsma JJ, Baer B, Heinze J: The evolution of male traits in social insects. Annu Rev Entomol. 2005, 50: 395-420.

Wilson EO: The insect societies. 1971, Cambridge: Harvard University Press

Evans JD, Wheeler DE: Differential gene expression between developing queens and workers in the honey bee, Apis mellifera. Proc Natl Acad Sci USA. 1999, 96: 5575-5580.

Evans JD, Wheeler DE: Expression profiles during honeybee caste determination. Genome Biol. 2000, 2: 1-6.

Hepperle C, Hartfelder K: Differentially expressed regulatory genes in honey bee caste development. Naturwissenschaften. 2001, 88: 113-116.

Pereboom JJM, Jordan WC, Sumner S, Hammond RL, Bourke AFG: Differential gene expression in queen-worker caste determination in bumble-bees. Proc Roy Soc Lond B. 2005, 272: 1145-1152.

Hartfelder K, Makert GR, Judice CC, Pereira GAG, Santana WC, Dallacqua R, Bitondi MMG: Physiological and genetic mechanisms underlying caste development, reproduction and division of labor in stingless bees. Apidologie. 2006, 37: 144-163.

Judice CC, Carazzole MF, Festa F, Sogayar MC, Hartfelder K, Pereira GAG: Gene expression profiles underlying alternative caste phenotypes in a highly eusocial bee, Melipona quadrifasciata. Ins Mol Biol. 2006, 15: 33-44.

Sameshima S, Miura T, Matsumoto T: Wing disc development during caste differentiation in the ant Pheidole megacephala (Hymenoptera: Formicidae). Evol Dev. 2004, 6: 336-341.

Tian H, Vinson SB, Coates CJ: Differential gene expression between alate and dealate queens in the red imported fire ant, Solenopsis invicta Buren (Hymenoptera: Formicidae). Ins Biochem Mol Biol. 2004, 34: 937-949.

Sumner S, Pereboom JJM, Jordan WC: Differential gene expression and phenotypic plasticity in behavioural castes of the primitively eusocial wasp, Polistes canadensis. Proc Roy Soc Lond B. 2006, 273: 19-26.

Scharf ME, Wu-Scharf D, Pittendrigh BR, Bennett GW: Caste- and development-associated gene expression in a lower termite. Genome Biol. 2003, 4: R62.61-R62.11.

Zhou XG, Oi FM, Scharf ME: Social exploitation of hexamerin: RNAi reveals a major caste-regulatory factor in termites. Proc Natl Acad Sci USA. 2006, 103: 4499-4504.

Zhou X, Tarver MR, Bennett GW, Oi FM, Scharf ME: Two hexamerin genes from the termite Reticulitermes flavipes : sequence, expression, and proposed functions in caste regulation. Gene. 2006, 376: 47-58.

Edwards RE: Social wasps: their biology and control. 1980, East Grinstead: Rentokil

Telfer WH, Kunkel JG: The function and evolution of insect storage hexamers. Annu Rev Entomol. 1991, 36: 205-228.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.

Nei M: Molecular evolutionary genetics. 1987, New York: Columbia University Press

Whitfield CW, Band MR, Bonaldo MF, Kumar CG, Liu L, Pardinas JR, Robertson HM, Soares MB, Robinson GE: Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. Genome Res. 2002, 12: 555-566.

Page RDM, Holmes EC: Molecular evolution: a phylogenetic approach. 1998, Oxford: Blackwell Science

Audic S, Claverie J-M: The significance of digital gene expression profiles. Genome Res. 1997, 7: 986-995.

Okubo K, Hori N, Matoba R, Niiyama T, Fukushima A, Kojima Y, Matsubara K: Large-scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene-expression. Nature Genet. 1992, 2: 173-179.

Megy K, Audic S, Claverie J-M: Heart-specific genes revealed by expressed sequence tag (EST) sampling. Genome Biol. 2002, 3: research0074.1-0074.11.

Romualdi C, Bortoluzzi S, Danieli GA: Detecting differentially expressed genes in multiple tag sampling experiments: comparative evaluation of statistical tests. Human Mol Genet. 2001, 10: 2133-2141.

IDEG6: (web tool for the identification of differentially expressed genes in multiple tag sampling experiments). []

Chapter 1
Scientific Thinking
Your best pathway to understanding the world
Science is a process for understanding the world.
1.1 Scientific thinking and biological literacy are essential in the modern world.
A beginner’s guide to scientific thinking.
1.2 Thinking like a scientist: how do you use the scientific method?
1.3 Element 1: Make observations.
1.4 Element 2: Formulate a hypothesis.
1.5 Element 3: Devise a testable prediction.
1.6 Element 4: Conduct a critical experiment.
1.7 Element 5: Draw conclusions, make revisions.
Well-designed experiments are essential to testing hypotheses.
1.8 Controlling variables makes experiments more powerful.
1.9 This is how we do it: Is arthroscopic surgery for arthritis of the knee beneficial?
1.10 We’ve got to watch out for our biases.
1.11 What are theories? When do hypotheses become theories?
Scientific thinking can help us make better decisions.
1.12 Visual displays of data can help us understand phenomena.
1.13 Statistics can help us make decisions.
1.14 Pseudoscience and anecdotal evidence can obscure the truth.
1.15 There are limits to what science can do.
What are the major themes in biology?
1.16 Important themes unify and connect diverse topics in biology.

Chapter 2
The Chemistry of Biology: Atoms, molecules, and their roles in supporting life
Atoms, molecules, and compounds make life possible.
2.1 Everything is made of atoms.
2.2 An atom’s electrons determine whether (and how) the atom will bond with other atoms.
2.3 Atoms can bond together to form molecules and compounds.
Water has features that enable it to support all life.
2.4 Hydrogen bonds make water cohesive.
2.5 Hydrogen bonds between molecules give water properties critical to life.
Living systems are highly sensitive to acidic and basic conditions.
2.6 The pH of a fluid is a measure of how acidic or basic the solution is.
2.7 This is how we do it: Do antacids impair digestion and increase the risk of food allergies?

Chapter 3
Molecules of Life:
Macromolecules can store energy and information and serve as building blocks
Macromolecules are the raw materials for life.
3.1 Carbohydrates, lipids, proteins, and nucleic acids are essential to organisms.
Carbohydrates can fuel living machines.
3.2 Carbohydrates include macromolecules that function as fuel.
3.3 Many complex carbohydrates are time-release packets of energy.
3.4 Not all carbohydrates are digestible by humans.
Lipids serve several functions.
3.5 Lipids store energy for a rainy day.
3.6 Dietary fats differ in degrees of saturation.
3.7 This is how we do it: How do trans fatty acids affect heart health?
3.8 Cholesterol and phospholipids are used to build sex hormones and membranes.
Proteins are building blocks.
3.9 Proteins are bodybuilding macromolecules essential in our diet.
3.10 A protein’s function is influenced by its three-dimensional shape.
3.11 Enzymes are proteins that speed up chemical reactions.
3.12 Enzyme activity is influenced by chemical and physical factors.
Nucleic acids encode information on how to build and run a body.
3.13 Nucleic acids are macromolecules that store information.
3.14 DNA holds the genetic information to build an organism.
3.15 RNA is a universal translator, reading DNA and directing protein production.

Chapter 4
The smallest part of you
What is a cell?
4.1 All organisms are made of cells.
4.2 Prokaryotic cells are structurally simple but extremely diverse.
4.3 Eukaryotic cells have compartments with specialized functions.
Cell membranes are gatekeepers.
4.4 Every cell is bordered by a plasma membrane.
4.5 Faulty membranes can cause diseases.
4.6 Membrane surfaces have a “fingerprint” that identifies the cell.
4.7 Connections between cells hold them in place and allow for communication.
Molecules move across membranes in several ways.
4.8 In pPassive transport is the spontaneous diffusion , of molecules are spontaneously diffused across a membrane.
4.9 In active transport, cells use energy to transport molecules across the cell a membrane.
4.10 Endocytosis and exocytosis are used for bulk transport of move large particles into and out of cells.
Important landmarks distinguish eukaryotic cells.
4.11 The nucleus is the cell’s genetic control center.
4.12 The cytoskeleton provides support and can generate motion.
4.13 Mitochondria are the cell’s energy converters.
4.14 This is how we do it: Can cells change their composition to adapt to their environment?
4.15 Lysosomes are the cell’s garbage disposals.
4.16 In the endomembrane system, cells build, process, and package molecules, and disarm toxins.
4.17 The cell wall provides additional protection and support for plant cells.
4.18 Vacuoles are multipurpose storage sacs for cells.
4.19 Chloroplasts are the plant cell’s solar power plant.

From the sun to you in just two steps

Energy flows from the sun and through all life on earth.
5.1 Can cars run on french fry oil?
5.2 Energy has two forms: kinetic and potential.
5.3 As energy is captured and converted, the amount of energy available to do work decreases.
5.4 ATP molecules are like rechargeable batteries floating around in all living cells.
Photosynthesis uses energy from sunlight to make food.
5.5 Where does plant matter come from?
5.6 Photosynthesis takes place in the chloroplasts.
5.7 Light energy travels in waves.
5.8 Photons cause electrons in chlorophyll to enter an excited state.
5.9 The energy of sunlight is captured as chemical energy.
5.10 The captured energy of sunlight is used to make sugar.
5.11 We can use plants adapted to water scarcity in the battle against world hunger.
Living organisms extract energy through cellular respiration.
5.12 Cellular respiration: the big picture.
5.13 Glycolysis is the universal energy-releasing pathway.
5.14 The citric acid cycle extracts energy from sugar.
5.15 ATP is built in the electron transport chain.
5.16 This is how we do it: Can we combat jet lag with NADH pills?
There are alternative pathways for acquiring energy.
5.17 Beer, wine, and spirits are by-products of cellular metabolism in the absence of oxygen.
Chapter 6
DNA and Gene Expression

DNA: what is it, and what does it do?
6.1 Knowledge about DNA is helping to increase justice in the world.
6.2 DNA contains instructions for the development and functioning of all living organisms.
6.3 Genes are sections of DNA that contain instructions for making proteins.
6.4 Not all DNA contains instructions for making proteins.
6.5 How do genes work? An overview.
Information in DNA directs the production of the molecules that make up an organism.
6.6 In transcription, the information coded in DNA is copied into mRNA.
6.7 In translation, the mRNA copy of the information from DNA is used to build functional molecules.
6.8 Genes are regulated in several ways.
Damage to the genetic code has a variety of causes and effects.
6.9 What causes a mutation and what are the consequences?
6.10 This is how we do it: Does sunscreen use reduce skin cancer risk?
6.11 Faulty genes, coding for faulty enzymes, can lead to sickness.

Chapter 7
Harnessing the genetic code

Living organisms can be manipulated for practical benefits.
7.1 What is biotechnology and what does it promise?
7.2 A few important processes underlie many biotechnology applications.
7.3 CRISPR is a tool with the potential to revolutionize medicine.
Biotechnology is producing improvements in agriculture.
7.4 Biotechnology can improve food nutrition and farming practices.
7.5 Rewards, with risks: what are the possible dangers of genetically modified foods?
7.6 This is how we do it: How can we determine whether GMOs are safe?
Biotechnology has the potential for improving human health.
7.7 Biotechnology can help treat diseases and produce medicines.
7.8 Gene therapy: biotechnology can help diagnose and prevent genetic diseases, but has had limited success in curing them.
7.9 Cloning offers both opportunities and perils.
Biotechnology can improve the criminal justice system.
7.10 The uses (and abuses) of DNA fingerprinting.

Chapter 8
Chromosomes and Cell Division

There are different types of cell division.
8.1 Immortal cells can spell trouble.
8.2 Some chromosomes are circular others are linear.
8.3 There is a time for everything in the eukaryotic cell cycle.
8.4 Cell division is preceded by chromosome replication.
Mitosis replaces worn-out old cells with fresh new duplicates.

8.5 Overview: mitosis leads to duplicate cells.
8.6 The details: mitosis is a four-stage process.
8.7 Cell division out of control may result in cancer.
Meiosis generates sperm and eggs and a great deal of variation.
8.8 Overview: sexual reproduction requires special cells made by meiosis.
8.9 The details: Sperm and egg are produced by meiosis.
8.10 Male and female gametes are produced in slightly different ways.
8.11 Crossing over and meiosis are important sources of variation.
8.12 What are the costs and benefits of sexual reproduction?
There are sex differences in the chromosomes.
8.13 How is sex determined in humans (and other species)?
8.14 This is how we do it: Can the environment determine the sex of a turtle’s offspring?
Deviations from the typical chromosome number lead to problems.
8.15 Down syndrome can be detected before birth.
8.16 Life is possible with too many or too few sex chromosomes.

Chapter 9
Genes and Inheritance
Family resemblance: how traits are inherited

Why (and how) do offspring resemble their parents?
9.1 Your mother and father each contribute to your genetic makeup.
9.2 Some traits are controlled by a single gene.
9.3 Mendel’s research in the nineteenth century informs our current understanding of genetics.
9.4 Segregation: you have two copies of each gene but each sperm or egg you produce has just one copy.
9.5 Observing an individual’s phenotype is not sufficient to determine its genotype.
Tools of genetics highlight a central role for chance.
9.6 Using probability we can make predictions in genetics.
9.7 A test-cross enables us to figure out which alleles an individual carries.
9.8 We use pedigrees to decipher and predict the inheritance patterns of genes.
How are genotypes translated into phenotypes?
9.9 The effects of both alleles in a genotype can show up in the phenotype.
9.10 Blood types: Some genes have more than two alleles.
9.11 How are continuously varying traits such as height influenced by genes?
9.12 Sometimes one gene influences multiple traits.
9.13 Sex-linked traits differ in their patterns of expression in males and females.
9.14 This is how we do it: What is the cause of male-pattern baldness?
9.15 Environmental effects: identical twins are not identical.
Some genes are linked together.
9.16 Most traits are passed on as independent features.
9.17 Genes on the same chromosome are sometimes inherited together.

Chapter 10
Evolution and Natural Selection

Evolution is an ongoing process.
10.1 We can see evolution occurring right before our eyes.
Darwin journeyed to a new idea.
10.2 Before Darwin, many believed that species had been created all at once and were unchanging.
10.3 Observing living organisms and fossils around the world, Darwin developed a theory of evolution.

Four mechanisms can give rise to evolution.
10.4 Evolution occurs when the allele frequencies in a population change.
10.5 Mechanism 1: Mutation—a direct change in the DNA of an individual—is the ultimate source of all genetic variation.
10.6 Mechanisms 2: Genetic drift is a random change in allele frequencies in a population.
10.7 Mechanism 3: Migration into or out of a population may change allele frequencies.
10.8 Mechanism 4: When three simple conditions are satisfied, evolution by natural selection is occurring.
10.9 A trait does not decrease in frequency simply because it is recessive.
Populations of organisms can become adapted to their environments.
10.10 Traits causing some individuals to have more offspring than others become more prevalent in the population.
10.11 Populations can become better matched to their environment through natural selection.
10.12 There are several ways that natural selection can change the traits in a population.

10.13 This is how we do it: Why do zebras have stripes?
10.14 Natural selection can cause the evolution of complex traits and behaviors.
The evidence for evolution is overwhelming.
10.15 The fossil record documents the process of natural selection.
10.16 Geographic patterns of species distributions reflect species’ evolutionary histories.
10.17 Comparative anatomy and embryology reveal common evolutionary origins.
10.18 Molecular biology reveals that common genetic sequences link all life forms.
10.19 Experiments and real-world observations reveal evolution in progress.

Chapter 11 <reviewed CE ms>
Evolution and Behavior
Communication, cooperation, and conflict in the animal world

Behaviors, like other traits, can evolve.
11.1 Behavior has adaptive value, just like other traits.
11.2 Some behaviors are innate.
11.3 Some behaviors must be learned (and some are learned more easily than others).
11.4 Complex-appearing behaviors don’t require complex thought to evolve.
Cooperation, selfishness, and altruism can be better understood with an evolutionary approach.
11.5 “Kindness” can be explained.
11.6 Apparent altruism toward relatives can evolve through kin selection.
11.7 Apparent altruism toward unrelated individuals can evolve through reciprocal altruism.
11.8 In an “alien” environment, adaptations produced by natural selection may no longer be adaptive.
11.9 Selfish genes win out over group selection.
Sexual conflict can result from unequal reproductive investment by males and females.
11.10 Males and females invest differently in reproduction.
11.11 Males and females are vulnerable at different stages of the reproductive exchange.
11.12 Competition and courtship can help males and females secure reproductive success.
11.13 Mate guarding can protect a male’s reproductive investment.
11.14 This is how we do it: When paternity uncertainty seems greater, is paternal care reduced?
11.15 Monogamy versus polygamy: mating behaviors vary across human and animal cultures.
11.16 Sexual dimorphism is an indicator of a population’s mating behavior.
Communication and the design of signals evolve.
11.17 Animal communication and language abilities evolve.
11.18 Honest signals reduce deception.

Chapter 12 <reviewed CE ms>
The Origin and Diversification of Life on Earth
Understanding biodiversity

Life on earth most likely originated from non-living materials.
12.1 Cells and self-replicating systems evolved together to create the first life.
12.2 This is how we do it: Could life have originated in ice, rather than in a “warm little pond”?
Species are the basic units of biodiversity.
12.3 What is a species?
12.4 Species are not always easily defined.
12.5 How do new species arise?
Evolutionary trees help us conceptualize and categorize biodiversity.
12.6 The history of life can be imagined as a tree.
12.7 Evolutionary trees show ancestor–descendant relationships.
12.8 Similar structures don’t always reveal common ancestry.
Macroevolution gives rise to great diversity.
12.9 Macroevolution is evolution above the species level.
12.10 Adaptive radiations are times of extreme diversification.
12.11 There have been several mass extinctions on earth.
An overview of the diversity of life on earth: organisms are divided into three domains.
12.12 All living organisms are classified into one of three groups.
12.13 The bacteria domain has tremendous biological diversity.
12.14 The archaea domain includes many species living in extreme environments.
12.15 The eukarya domain consists of four kingdoms: plants, animals, fungi, and protists.

Chapter 13 <final ms released with edits still to be approved by Jay at CE stage>
Animal Diversification
Visibility in motion
Animals are just one branch of the eukarya domain.
13.1 What is an animal?
13.2 There are no “higher” or “lower” species.
13.3 Four key distinctions divide the animals.Invertebrates—animals without a backbone—are the most diverse group of animals.
13.4 Sponges are animals that lack tissues and organs.
13.5 Jellyfishes and other cnidarians are among the most poisonous animals in the world.
13.6 Flatworms, roundworms, and segmented worms come in all shapes and sizes.
13.7 Most mollusks live in shells.
13.8 Arthropods are the most diverse group of animals.
13.9 This is how we do it: How many species are there on earth?
13.10 Flight and metamorphosis produced the greatest adaptive radiation ever.
13.11 Echinoderms are vertebrates’ closest invertebrate relatives.
The phylum Chordata includes vertebrates--animals with a backbone.
13.12 All vertebrates are members of the phylum Chordata.
13.13 The movement onto land required several adaptations. All terrestrial vertebrates are tetrapods.
13.14 Amphibians live a double life.
13.15 Birds are reptiles in which feathers evolved.
13.16 Mammals are animals that have hair and produce milk.
Humans and our closest relatives are primates.
13.17 We are descended from arboreal primates, but our human ancestors left the trees.

13.18 How did we get here? The past 200,000 years of human evolution.

Chapter 14 <final ms released with edits still to be approved by Jay at CE stage>
Plant and Fungi Diversification
Where did all the plants and fungi come from?

Plants face multiple challenges.
14.1 What is a plant?
14.2 Colonizing land brought new opportunities and new challenges.
14.3 Non-vascular plants lack vessels for transporting nutrients and water.
14.4 The evolution of vascular tissue made large plants possible.
The evolution of the seed opened new worlds to plants.
14.5 What is a seed?
14.6 With the evolution of the seed, gymnosperms became the dominant plants.
14.7 Conifers include the tallest and longest-living trees.
Flowering plants are the most diverse plants.
14.8 Angiosperms are the dominant plants today.
14.9 A flower is nothing without a pollinator.
14.10 Angiosperms improve seeds with double fertilization.
Plants and animals have a love-hate relationship.
14.11 Flowering plants use fruits to entice animals to disperse their seeds.
14.12 Unable to escape, plants must resist predation in other ways.
Fungi and plants are partners but not close relatives.
14.13 Fungi are more closely related to animals than they are to plants.
14.14 Fungi have some structures in common but are incredibly diverse.
14.15 Most plants have fungal symbionts.
14.16 This is how we do it: Can beneficial fungi save our chocolate?

Chapter 15 <final ms released with edits still to be approved by Jay at CE stage>
Microbe Diversification
Bacteria, archaea, protists, and viruses: the unseen world

There are microbes in all three domains.
15.1 Not all microbes are closely related evolutionarily.
15.2 Microbes are the simplest but most successful organisms on earth.
Bacteria may be the most diverse of all organisms.
15.3 What are bacteria?
15.4 Metabolic diversity among the bacteria is extreme.
Bacteria can hurt or help human health.
15.5 Many bacteria are beneficial to humans.
15.6 This is how we do it: Are bacteria thriving on our office desks?
15.7 Only a small percentage of microbial species cause diseases, but they kill millions of people.
15.8 Bacteria’s resistance to drugs can evolve quickly.
Archaea define a prokaryotic domain distinct from bacteria.
15.9 Archaea are profoundly different from bacteria.
15.10 Archaea thrive in habitats too extreme for most other organisms.
Most protists are single-celled eukaryotes.
15.11 The first eukaryotes were protists.
15.12 There are animal-like protists, fungus-like protists, and plant-like protists.
15.13 Some protists are very harmful to human health.
At the border between living and non-living, viruses do not fit into any domain.
15.14 Viruses are not exactly living organisms.
15.15 Viruses infect a wide range of organisms and are responsible for many diseases.
15.16 HIV illustrates the difficulty of controlling infectious viruses.

Chapter 16 <final ms released>
Population Ecology
Planet at capacity: patterns of population growth

Population ecology is the study of how populations interact with their environments.
16.1 What is ecology?
16.2 Populations can grow quickly for a while, but not forever.
16.3 A population’s growth is limited by its environment.
16.4 Some populations cycle between large and small.
16.5 Maximum sustainable yield is useful but nearly impossible to implement.
A life history is like a species summary.
16.6 Life histories are shaped by natural selection.
16.7 There are trade-offs between growth, reproduction, and longevity.
16.8 This is how we do it: Rapid growth comes at a cost.
16.9 Populations can be depicted in life tables and survivorship curves.
Ecology influences the evolution of aging in a population.
16.10 Things fall apart: what is aging and why does it occur?
16.11 What determines the average longevity in different species?
16.12 Can we slow down the process of aging?
The human population is growing rapidly.
16.13 Age pyramids reveal much about a population.
16.14 Demographic transitions often occur as less developed countries become more developed.
16.15 Human population growth: how high can it go?

Chapter 17 <final ms released with edits still to be approved by Jay at CE stage>
Ecosystems and Communities
Organisms and their environments

Ecosystems have living and non-living components.
17.1 What are ecosystems?
17.2 Biomes are the world’s largest ecosystems, each determined by temperature and rainfall.
Interacting physical forces create climate and weather patterns.
17.3 Global air circulation patterns create deserts and rain forests.
17.4 Local topography influences the climate and weather.
17.5 Ocean currents influence the climate and weather.
Energy and chemicals flow within ecosystems.
17.6 Energy flows from producers to consumers.
17.7 Energy pyramids reveal the inefficiency of food chains.
17.8 Essential chemicals cycle through ecosystems.
Species interactions influence the structure of communities.
17.9 A species’ role in a community is defined as its niche.
17.10 Interacting species evolve together.
17.11 Competition can be hard to see, yet it influences community structure.
17.12 Predation produces adaptation in both predators and their prey.
17.13 Parasitism is a form of predation.
17.14 Not all species interactions are negative.
17.15 This is how we do it: Investigating ants, plants, and the unintended consequences of environmental intervention.
Communities can change or remain stable over time.
17.16 Primary succession and secondary succession describe how communities can change over time.
17.17 Some species have greater influence than others within a community.
Chapter 18
Conservation and Biodiversity
Human influences on the environment

Biodiversity is valuable in many ways.
18.1 Biodiversity has intrinsic and extrinsic value.
18.2 This is how we do it: When 200,000 tons of methane disappears, how do you find it?
18.3 Biodiversity occurs at multiple levels.
18.4 Where does the greatest biodiversity occur?
Extinction reduces biodiversity.
18.5 There are multiple causes of extinction.
18.6 We are in the midst of a mass extinction.
Human activities can damage the environment.
18.7 The effects of some ecosystem disturbances are reversible and others are not.
18.8 Human activities can damage the environment: 1. Introduced non-native

18.9 Human activities can damage the environment: 2. Acid rain.

18.10 Human activities can damage the environment: 3. Greenhouse gas releases.

18.11 Human activities can damage the environment: 4. Tropical deforestation.
We can develop strategies for effective conservation.
18.12 Reversal of ozone layer depletion is a success story.
18.13 We must prioritize which species should be preserved.
18.14 There are multiple effective strategies for preserving biodiversity.

Chapter 19
Plant Structure and Nutrient Transport
How plants function, and why we need them

Plants are a diverse group of organisms with multiple pathways to evolutionary success.
19.1 Older, taller, bigger: plants are extremely diverse.
19.2 Monocots and eudicots are the two major groups of flowering plants.
19.3 The plant body is organized into three basic tissue types.
Most plants have common structural features.
19.4 Roots anchor the plant and take up water and minerals.
19.5 Stems are the backbone of the plant.
19.6 Leaves feed the plant.
19.7 Several structures help plants resist water loss.
Plants harness sunlight and obtain usable chemical elements from the environment.
19.8 Four factors are necessary for plant growth.
19.9 Nutrients cycle from soil to organisms and back again.
19.10 Plants acquire essential nitrogen with the help of bacteria.
19.11 This is how we do it: Carnivorous plants can consume prey and undergo photosynthesis.
Plants transport water, sugar, and minerals through vascular tissue.
19.12 Plants take up water and minerals through their roots.
19.13 Water and minerals are distributed through the xylem.
19.14 Sugar and other nutrients are distributed through the phloem.

Chapter 20 <final released ms>
Growth, Reproduction, and Environmental Responses in Plants
Problem solving with flowers, wood, and hormones

Plants can reproduce sexually and asexually.
20.1 Plant evolution has given rise to two methods of reproduction.
20.2 Many plants can reproduce asexually when necessary.
20.3 Plants can reproduce sexually, even though they cannot move.
20.4 Most plants can avoid self-fertilization.
Pollination, fertilization, and seed dispersal often depend on help from other organisms.
20.5 Pollen grains and embryo sacs contain the plant gametes.
20.6 Plants need help getting the male gamete to the female gamete for fertilization.
20.7 This is how we do it: Does it matter how much nectar a flower produces?
20.8 Fertilization occurs after pollination.
20.9 Ovules develop into seeds, and ovaries into fruits.
Plants have two types of growth, usually enabling lifelong increases in length and thickness.
20.10 How do seeds germinate and grow?
20.11 Plants grow differently from animals.
20.12 Primary plant growth occurs at the apical meristems.
20.13 Secondary growth produces wood.
Hormones regulate growth and development.
20.14 Hormones help plants respond to their environments.
20.15 Gibberellins and auxins stimulate growth.
20.16 Other plant hormones regulate flowering, fruit ripening, and responses to stress.
External cues trigger internal responses.
20.17 Tropisms influence plants’ direction of growth.
20.18 Plants have internal biological clocks.
20.19 With photoperiodism and dormancy, plants prepare for winter.
Chapter 21 <final released ms>
Introduction to Animal Physiology
Principles of animal organization and function

Animal body structures reflect their functions.
21.1 Animal organ systems are built from four tissue types with distinct functions.
21.2 Connective tissue provides support.
21.3 Epithelial tissue covers and protects most inner and outer surfaces of the body.
21.4 Muscle tissue enables movement.
21.5 Nervous tissue transmits information.
21.6 Each organ system performs a coordinated set of related body functions.
Animals maintain a steady internal environment.
21.7 Animal bodies function best within a narrow range of internal conditions.
21.8 Animals regulate their internal environment through homeostasis.
How does homeostasis work?
21.9 Negative and positive feedback systems influence homeostasis.
21.10 Animals employ various mechanisms to regulate body temperature.
21.11 This is how we do it: Why do we yawn?
21.12 Animals regulate their water balance within a narrow range.
21.13 In humans, the kidneys regulate water balance.

Chapter 22
Circulation and Respiration
Transporting fuel, raw materials, and gases into, out of, and around the body

The circulatory system is the chief route of distribution in animals.
22.1 What is a circulatory system, and why is one needed?
22.2 Circulatory systems can be open or closed.
22.3 Vertebrates have several different types of closed circulatory systems.
The human circulatory system consists of a heart, blood vessels, and blood.
22.4 Blood flows through the four chambers of the human heart.
22.5 Electrical activity in the heart generates the heartbeat.
22.6 Blood flows out of and back to the heart in blood vessels.
22.7 This is how we do it: Does thinking make your head heavier?
22.8 Blood is a mixture of cells and fluid.
22.9 Blood pressure is a key measure of heart health.
22.10 Cardiovascular disease is a leading cause of death in the United States.
22.11 The lymphatic system plays a supporting role in circulation.
The respiratory system enables gas exchange in animals.
22.12 Oxygen and carbon dioxide must get into and out of the circulatory system.
22.13 Oxygen is transported while bound to hemoglobin.
22.14 Gas exchange takes place in the gills of aquatic vertebrates.
22.15 Gas exchange takes place in the lungs of terrestrial vertebrates.
22.16 Muscles control the flow of air into and out of the lungs.
22.17 Birds have unusually efficient respiratory systems.
22.18 Adaptation or acclimation to low-oxygen conditions at high elevation improves oxygen delivery.

Chapter 23
Nutrition and Digestion
At rest and at play: optimizing human physiological functioning

Food provides the raw materials for growth and the fuel to make it happen.
23.1 Why do organisms need food?
23.2 Animals have a variety of diets.
23.3 Calories count: organisms need sufficient energy.
Nutrients are grouped into six categories.
23.4 Water is an essential nutrient.
23.5 Proteins in food are broken down to build proteins in the body.
23.6 Carbohydrates and fats provide bodies with energy and more.
23.7 Vitamins and minerals are necessary for good health.
We extract energy and nutrients from food.
23.8 We convert food into nutrients in four steps.
23.9 Ingestion is the first step in the breakdown of food.
23.10 Digestion dismantles food into usable parts.
23.11 Absorption moves nutrients from your gut to your cells.
23.12 Elimination removes unusable materials from your body.
23.13 Some animals have alternative means for processing their food.
What we eat profoundly affects our health.
23.14 What constitutes a healthy diet?
23.15 This is how we do it: Does human judgment depend on blood sugar?
23.16 Obesity can result from too much of a good thing.
23.17 Weight-loss diets are a losing proposition.
23.18 Diabetes is caused by the body’s inability to regulate blood sugar effectively.

Chapter 24
Nervous and Motor Systems
Actions, reactions, sensations, and addictions: meet your nervous system
What is the nervous system?
24.1 Why do we need a nervous system?
24.2 Neurons are the building blocks of all nervous systems.
24.3 The vertebrate nervous system consists of the peripheral and central nervous systems.
How do neurons work?
24.4 Dendrites receive external stimuli.
24.5 The action potential propagates a signal down the axon.
24.6 At the synapse, a neuron interacts with another cell.
24.7 There are many types of neurotransmitters.
Our senses detect and transmit stimuli.
24.8 Sensory receptors are our windows to the world around us.
24.9 Taste: an action potential serves up a taste sensation to the brain.
24.10 Smell: receptors in the nose detect airborne chemicals.
24.11 Vision: seeing is the perception of light by the brain.
24.12 Hearing: sound waves are collected by the ears and stimulate auditory neurons.
24.13 Touch: the brain perceives pressure, temperature, and pain.
The muscular and skeletal systems enable movement.
24.14 Muscles generate force through contraction.
24.15 The skeletal system functions in support, movement, and protection.
The brain is organized into distinct structures dedicated to specific functions.
24.16 The brain has several distinct regions.
24.17 Specific brain areas are involved in the processes of learning, language, and memory.
24.18 This is how we do it: Can intense cognitive training induce brain growth?
Drugs can hijack pleasure pathways.
24.19 Our nervous system can be tricked by chemicals.
24.20 A brain slows down when it needs sleep. Caffeine wakes it up.
24.21 Alcohol interferes with many different neurotransmitters.

Chapter 25
Mood, emotions, growth, and more: hormones as master regulators
Hormones are chemical messengers regulating cell functions.
25.1 The “cuddle” chemical: oxytocin increases trust and enhances pair bonding.
25.2 Hormones travel through the circulatory system to influence cells elsewhere in the body.
25.3 Hormones can regulate target tissues in different ways.
Hormones are produced in glands throughout the body.
25.4 The hypothalamus controls secretions of the pituitary.
25.5 Other endocrine glands also produce and secrete hormones.
Hormones influence nearly every facet of an organism.
25.6 Hormones can affect physique and physical performance.
25.7 Hormones can affect mood.
25.8 Hormones can affect behavior.
25.9 Hormones can affect cognitive performance.
25.10 Hormones can affect health and longevity.
Environmental contaminants can disrupt normal hormone functioning.
25.11 Chemicals in the environment can mimic or block hormones, with disastrous results.
25.12 This is how we do it: Would you like your receipt? (Maybe not.)
Chapter 26
Reproduction and Development
From two parents to one embryo to one baby
How do animals reproduce?
26.1 Reproductive options (and ethical issues) are on the rise.
26.2 There are costs and benefits to having a partner: sexual versus asexual reproduction.
26.3 Fertilization can occur inside or outside a female’s body.
Male and female reproductive systems have important similarities and differences.
26.4 Sperm are made in the testes.
26.5 There is unseen conflict among sperm cells.
26.6 This is how we do it: Can males increase sperm investment in response to the presence of another male?
26.7 Eggs are made in the ovaries (and the process can take decades).
26.8 Hormones direct the process of ovulation and the preparation for gestation.
Sex can lead to fertilization, but it can also spread sexually transmitted diseases.
26.9 In fertilization, two cells become one.
26.10 Numerous strategies can help prevent fertilization.
26.11 Sexually transmitted diseases reveal battles between microbes and humans.
Human development occurs in specific stages.
26.12 Early embryonic development occurs during cleavage, gastrulation, and neurulation.
26.13 There are three stages of pregnancy.
26.14 Pregnancy culminates in childbirth and the start of lactation.
Reproductive technology has benefits and dangers.
26.15 Assisted reproductive technologies are promising and perilous.

Chapter 27
Immunity and Health
How the body defends and maintains itself
Your body has different ways to protect you against disease-causing invaders.
27.1 Three lines of defense prevent and fight pathogen attacks.
27.2 External barriers prevent pathogens from entering your body.
27.3 The non-specific division of the immune system recognizes and fights pathogens and signals for additional defenses.
27.4 The non-specific system responds to infection with the inflammatory response and with fever.
Specific immunity develops after exposure to pathogens.
27.5 The specific division of the immune system forms a memory of specific pathogens.
27.6 The structure of antibodies reflects their function.
27.7 Lymphocytes fight pathogens on two fronts.
27.8 Clonal selection helps in fighting infection now and later.
27.9 This is how we do it: Does contact with dogs make kids healthier?
27.10 Cytotoxic T cells and helper T cells serve different functions.
Malfunction of the immune system causes disease.
27.11 Autoimmune diseases occur when the body turns against its own tissues.
27.12 AIDS is an immune deficiency disease.
27.13 Allergies are an inappropriate immune response to a harmless substance.

Look Inside

7. Mechanism of Host Gene Expression Inhibition by Nsp1 of α-CoVs

Although our understandings of the biological functions of α-CoV nsp1 are somewhat limited, several studies have revealed the biological functions of TGEV nsp1. Expression of TGEV nsp1 strongly inhibits reporter gene expression and host protein synthesis in mammalian cells [60]. Expressed TGEV nsp1 is detected in both the nucleus and the cytoplasm [89]. TGEV nsp1 neither binds the 40S ribosomal subunit nor promotes host mRNA degradation, indicating that TGEV nsp1 suppresses host gene expressing by using a mechanism that differs from SARS-CoV nsp1. Interestingly, TGEV nsp1 inhibits protein translation in HeLa cell extracts, whereas it does not affect translation in RRL. Furthermore, TGEV nsp1 suppresses translation in RRL supplemented with HeLa S100 post-ribosomal supernatant or HeLa S10 extract [60]. These data suggest that RRL lacks a factor(s) that is needed for TGEV nsp1-mediated translational inhibition and that HeLa S10 extract and post-ribosomal HeLa S100 supernatant contain this putative factor. Inactivation of TGEV nsp1-mediated translation inhibition activity does not affect virus replication, yet it significantly reduces virus virulence in piglets [52], strongly supporting the possibility that nsp1 of α-CoV is a major pathogenic determinant.

Expression of HCoV-229E nsp1 or HCoV-NL63 nsp1 in mammalian cells inhibits reporter gene expression driven by SV40, HSV-TK, or CMV promoters [53,81]. Shen et al. reported that a conserved region (amino acid position at 91�) of nsp1 from α-CoVs, including FIPV, PEDV, HCoV-229E, HCoV-NL63, and TGEV, is important for their function of host gene expression inhibition [52].

Cate Livingstone - Executive Editor, Global Research, Wiley
Georgi Hristov - Assistant Editor, EMBO Press
Vivian Killet - Assistant Editor, EMBO Press
Uta Mackensen - Graphics Editor

Ido Amit is a Professor in Immunology at the Weizmann Institute of Science. His lab is using systems biology approaches and single cell functional genomics to understand how gene regulatory networks and chromatin dynamics controls cell fate and differentiation in the immune system, nuero degeneration and cancer.

Johan Auwerx is Professor at the École Polytechnique Fédérale in Lausanne, Switzerland. He uses cross-species systems genetics and physiology to understand metabolism in health, aging and disease. Much of his work focused on understanding how genetic and environmental factors, such as diet, exercise and hormones, control mitochondrial metabolism through modulating the activity of transcription factors and their associated coregulators.

Gary Bader is Associate Professor, Molecular Genetics and Computer Science at The Donnelly Centre, University of Toronto. His research is focused on interpreting genomic data using biological pathways to gain a better understanding of cellular function. This is supported by extensive protein interaction prediction and pathway and network database projects.

Philippe Bastiaens is Director of the Department of Systemic Cell Biology at the Max Planck Institute of Molecular Physiology and professor at the Faculty of Chemistry and Chemical Biology at the TU Dortmund. He experimentally and theoretically explores how the spatial organization of signaling molecules emerges from their collective dynamics and how this defines cellular identity.

Ewan Birney is co-Director of EMBL-EBI, and runs a small research group. He played a vital role in annotating the genome sequences of human, mouse, chicken and several other organisms and led the analysis group for the ENCODE project. Ewan’s main areas of research include functional genomics, assembly algorithms, statistical methods to analyse genomic information and compression of sequence information.

Michael Boutros is a Group leader at the German Cancer Research Center (DKFZ) and Professor at Heidelberg University. His research interests are in systems genetics, in particular with relation to cellular signaling in developmental biology and cancer. His group uses large-scale functional approaches to dissect genotype to phenotype relationships.

Markus Covert is an Associate Professor in the Department of Bioengineering at Stanford University. His major interests are in developing the technology and computational approaches to facilitate live-cell imaging of mammalian signaling networks, and also in building comprehensive mathematical "whole-cell" models of cellular behavior based on the genotype.

Patrick Cramer is Director at the Max Planck Institute for Biophysical Chemistry in Göttingen, Germany. His laboratory combines functional genomics, computational biology, and structural biology to elucidate the mechanisms of gene transcription and regulation. The research defined mechanisms of transcription initiation and elongation, and principles of mRNA metabolism in eukaryotic cells.

Ileana Cristea is Professor in the Department of Molecular Biology at Princeton University, and currently acts as President elect of US HUPO. Her research aims to define mechanisms of cellular host defense against viral pathogens. She has developed methods for characterizing dynamic protein interactions in space and time during the progression of an infection, and the studies of her group integrate proteomics, molecular virology, microscopy, bioinformatics, and computational modeling.

Roland Eils heads the Division of Theoretical Bioinformatics at the German Cancer Research Center – DKFZ in Heidelberg. He also is founding and managing director of BioQuant, Heidelberg University’s systems biology center. His research interest lies in deciphering complex pathomechanisms in diseases by an integrated genomics, imaging and computational modeling approach.

Jan Ellenberg is a group leader at the EMBL in Heidelberg. His research group works on the functional dynamics of nuclear structure during the cell cycle combining advanced quantitative fluorescence microscopy approaches and computer simulations of biological processes.

Michael Elowitz is Assistant Professor, biology and applied physics at Caltech. His team is interested in how genetic circuits operate and evolve in living cells. Using experimental and theoretical approaches, the group studies the behaviour of simple genetic elements, and the circuits they comprise, at the single cell level.

James Ferrell is Professor and Chair of Chemical and Systems Biology at Stanford University School of Medicine. His research focuses on the design principles of signaling systems, particularly the cell cycle.

Eileen Furlong is joint head of the Genome Biology unit at EMBL, Heidelberg. Her research interests focus on transcriptional networks during development. For this purpose, her group combines genomic, genetic and bioinformatic approaches to gain predictive insights into developmental progression.

Anne-Claude Gavin is group leader and senior scientist at the European Molecular Biology Laboratory (EMBL) in Heidelberg - Structural and Computational Biology Unit. Her group integrate biochemical, mass spectrometry, structural and computational methods to characterize cellular networks and circuitry at molecular levels, both spatially and temporally. Her research aims at understanding how cellular components work collectively and achieve biological function.

Ronald Germain is Chief of the Lymphocyte Biology Section, Laboratory of Immunology and Director of the Program in Systems Immunology and Infectious Disease Modeling at the National Institute of Allergy and Infectious Diseases, NIH, USA, as well as Associate Director, Trans-NIH Center for Human Immunology. His primary research interests are in the workings of the immune systems at multiple scales of biological resolution and the use of imaging and computational approaches in advancing our understanding of immune processes.

Mark Gerstein is the Albert L Williams Associate Professor of Biomedical Informatics at Yale University. He is co-director the Yale Computational Biology and Bioinformatics Program. His research is focused on bioinformatics, and he is particularly interested in large-scale integrative surveys, biological database design, macromolecular geometry, molecular simulation, human genome annotation, gene expression analysis, and data mining.

Anne-Claude Gingras is a Senior Investigator at the Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital in Toronto and an Associate Professor in the Department of Molecular Genetics at the University of Toronto. Her research focuses on the study of signalling pathways using systematic approaches and on the development of quantitative proteomics technologies, both experimental and computational.

Alexander Hoffmann is Professor of Microbiology and Immunology at UCLA, and the founding director of the Institute for Quantitative and Computational Biosciences (QCBio), following a decade at UCSD where he developed infrastructures for Systems Biology research and education. He holds degrees in Physics and Zoology (Cambridge University), and owes his training to Robert Roeder and David Baltimore, as well as his many computational biology students.

Frank Holstege is Head of the Genomics Laboratory at the University Medical Center Utrecht in the Netherlands. The laboratory has interests in mechanisms of eukaryotic transcription regulation. It combines a microarray facility and technology group with a bioinformatics group that is engaged in mining of DNA microarray data and integrative analyses of genome-scale datasets.

Laurence D. Hurst is the Professor of Evolutionary Genetics at The University of Bath, U.K.. Employing bioinformatic, comparative genomic and systems biological tools, his research interests concern how genes and genomes evolve. In particular, his work focuses on understanding gene order evolution, why most genes appear to be dispensable and why synonymous mutations are under selection.

Terence Hwa is the Presidential Chair Professor at the Department of Physics, and co-director of graduate program in Quantitative Biology at University of California, San Diego. His research focus is on microbial systems, where he uses a combination of experimental and theoretical approaches to connect molecular networks to microbial physiology.

Trey Ideker is Professor of Medicine and Bioengineering at the University of California at San Diego. He serves as Division Chief of Medical Genetics, Director of the National Resource for Network Biology and Director of the San Diego Center for Systems Biology, as well as being Adjunct Professor of Computer Science and Member of the Moores UCSD Cancer Center. He is a pioneer in assembling genome-scale measurements to construct network models of cellular processes and disease. His recent research activities include assembly of networks governing the response to DNA damage development of the Cytoscape and NetworkBLAST software packages for biological network visualization and cross-species network comparison and methods for identifying network-based biomarkers in development and disease.

Dirk Inzé is full Professor in Plant Physiology and Plant Molecular Biology at Ghent University and Director of the Department of Plant Systems Biology of VIB (Flanders Institute for Biotechnology). Dirk Inzé’s research ambition is to decipher the complex molecular networks regulating plant organ growth and crop productivity.

Boris Kholodenko is the Science Foundation Ireland Stokes Professor of Systems Biology, Deputy Director of Systems Biology Ireland, University College Dublin and Adjunct Professor at Thomas Jefferson University, Philadelphia, USA. He studies are aimed to understand cellular information transfer and cell-fate decisions governed by the spatiotemporal dynamics of signaling and gene networks.

Hiroaki Kitano is director of Sony Computer Science Laboratories, Inc. and Director of the Systems Biology Institute, Tokyo Japan. His recent research interests concern biological robustness, cancer systems biology, software platforms for systems biology and robotics.

Jan Korbel is group leader at the EMBL in Heidelberg and at EMBL-EBI in Hinxton. His group is investigating the mutational origins and functional consequences of genetic variation, with a special focus on genomic structural variation using experimental and computational approaches in molecular genetics and genomics. His interests range from germline genetics to somatic DNA alterations occurring in cancer, and one main research objective is to dissect the mechanistic basis of DNA alteration processes associated with disease.

Nevan J. Krogan is Professor of Cellular & Molecular Pharmacology at the University of California-San Francisco, and Director of the California Institute for Quantitative Biosciences at UCSF. He holds affiliated appointments at the Helen Diller Family Comprehensive Cancer Center at UCSF, and the Gladstone Institutes for Cardiovascular Disease and Virology & Immunology. Dr. Krogan is an expert in the fields of functional genomics and systems biology. His lab is now developing a “systems to mechanism” approach to biology, applying global proteomic and genomic techniques to study various biological processes, notably the functional interface between pathogenic organisms and their hosts. Ultimately, this approach will provide insight into host pathways that are routinely hijacked and re-wired, providing a clearer vision of where future efforts for therapeutic intervention should take place

Galit Lahav is an Associate Professor of Systems Biology at Harvard Medical School. Her lab combines quantitative live-cell imaging and mathematical modeling to study the dynamics of signaling networks in human cells in order to understand cellular decision-making in individual cancer and healthy cells

Rune Linding is Professor at the Biotech Research and Innovation Centre (BRIC) at the University of Copenhagen (UCPH). His lab is a network biology research group that explore biological systems by developing and deploying algorithms on quantitative global cell signaling data. The longterm aim is to forecast cell behavior with an accuracy similar to that of weather or aircraft models and use this capability to target complex diseases such as cancer.

Andrew Millar holds a Chair of Systems Biology at the University of Edinburgh. He studied Genetics at Cambridge University, then gained his Ph.D. at The Rockefeller University, New York, studying the 24-hour biological clock. His laboratory combines experiments, bioinformatics and mathematical modelling in order to understand daily and seasonal rhythms in plants and algae.

Vamsi Mootha is a Professor of Systems Biology and Medicine at Harvard Medical School and Massachusetts General Hospital. His laboratory utilizes genomics, biochemistry, and systems biology to study mitochondria and inborn errors of metabolism.

Felix Naef is Associate Professor at the École Polytechnique Fédérale in Lausanne, Switzerland. His lab studies systems with rich temporal dynamics such as circadian rhythms and transcriptional bursting. By combining theory and experiments, his approaches span multiple scales ranging from single cell time-lapse imaging to functional genomics in tissues.

Jeremy Nicholson is Professor and Head of Biological Chemistry at Imperial College, London University. His research interests include: biological NMR spectroscopy, novel LC-MS and electrochemical approaches to bioanalysis, chemometrics, metabolic modelling and studies leading to the understanding the molecular basis of disease and toxic processes.

Garry Nolan is Professor of Microbiology & Immunology at the Stanford University School of Medicine. His group uses high throughput single cell analysis technology of kinase driven signaling cascades to interrogate autoimmunity, cancer, virology, bacterial pathogens as well as understanding normal immune system function. Using advanced flow cytometric techniques— including a new hybrid mass spectrometer/flow cytometer that allows for as many as 50 – 100 epitope-specific parameters to be measured per cell— and computational biology approaches, the team focuses on high throughput drug screening, mouse models of disease, and application of this approaches to primary patient materials from clinical trials for disease management and understanding disease processes at the single cell level.

Béla Novák is the Professor of Systems Biology at the Oxford Centre for Integrative Systems Biology, Department of Biochemistry of University Oxford. His research group is interested in the dynamics of intracellular signal transduction networks like the one that controls cell cycle progression in eukaryotes.

Duncan Odom obtained his PhD in bioinorganic chemistry from the California Institute of Technology in 2001. He was a postdoctoral fellow at the Whitehead Institute / MIT until 2006, after which he moved to the University of Cambridge, Cancer Research UK – Cambridge Institute. Since 2011, he has also been an Associate Faculty member at the Wellcome Trust Sanger Institute.

Alexander van Oudenaarden is the Director of the Hubrecht Institute for Developmental Biology and Stem Cell Research in Utrecht. The van Oudenaarden lab is using a combination of experimental, computational, and theoretical approaches to quantitatively understand decision-making in single cells with a focus on questions in developmental and stem cell biology.

Bernhard Palsson is Professor of Bioengineering and Adjunct Professor Medicine at the University of California, San Diego. His current research at UCSD focuses on the reconstruction of genome-scale biochemical reaction networks, the development of mathematical analysis procedures for genome-scale models, and the experimental verification of genome-scale models with current emphasis on cellular metabolism and transcriptional regulation in E. coli and yeast.

Lucas Pelkmans is Professor of Quantitative Biology and holds the Ernst Hadorn Chair at the University of Zurich. The Pelkmans lab applies experimental cell biology, large-scale imaging, computer vision, data-driven modelling, and theoretical approaches to understand emergent phenomena and self-organisation in molecular and cellular systems.

Norbert Perrimon is Professor of Genetics at Harvard Medical School and an Investigator of the Howard Hughes Medical Institute. Over the years, Perrimon and his colleagues have made a number of contributions to our understanding of the structure of signal transduction pathways. In addition, his laboratory has developed a number of techniques that have proven useful to identify gene functions. Recently, most of his efforts have focused on applying the RNA interference methodology to high-throughput screening in Drosophila cells and in vivo with the ultimate goal to study genetic redundancy in biological networks.

Ana Pombo is Group Leader at the Berlin Institute for Medical Systems Biology, and Professor at Humboldt University. Her research team study mechanisms of gene regulation at multiple scales, from the local action of transcription factors and regulation of RNA polymerase II, to chromatin looping events that connect regulatory DNA sequences with their target genes, to how whole chromosomes are positioned relative to each other within cell nuclei.

Joshua Rabinowitz is Professor of Chemistry and Integrative Genomics at Princeton University. His lab develops technologies for measuring cellular metabolites and their fluxes, and combines these “metabolomic” approaches with quantitative modeling to understand cellular and tissue metabolism, including its normal regulation and its dysregulation in disease.

Nikolaus Rajewsky is Scientific Head of the "Berlin Institute for Medical Systems Biology since 2008. His lab combines theoretical/computational and experimental methods to understand more about gene regulation in animals. Research teams are interdisciplinary, employing techniques from molecular biology and biochemistry on different model organisms and driving the analysis with tools and concepts from bioinformatics, statistics, and physics. A major focus is on post-transcriptional gene regulation by small RNAs and RNA binding proteins.

Rama Ranganathan is Professor and Director at The Center for the Physics of Evolving Systems at the University of Chicago. He is interested in understanding the structural principles of function in cellular signaling systems and how these systems are built through the process of evolution.

Aviv Regev received her Ph.D. in computational biology from Tel Aviv University. She joined the Broad Institute as a core member, and MIT as a faculty member in 2006, and has been an Early Career Scientist at the Howard Hughes Medical Institute since 2009. Regev’s research centers on understanding how complex molecular networks function and evolve in the face of genetic and environmental changes, over time-scales ranging from minutes to millions of years. Prior to joining the Broad Institute, Regev was a fellow at the Bauer Center for Genomics Research at Harvard University, where she developed new approaches to the reconstruction of regulatory networks and modules from genomic data.

Frederick (Fritz) Roth is a Professor at the University of Toronto’s Donnelly Centre and Departments of Molecular Genetics and Computer Science and at the Lunenfeld-Tanenbaum Research Institute of Mt. Sinai Hospital. His team develops genetic and computational technologies to map protein and genetic interactions, understand complex traits, and relate genomic variation to human disease.

Eytan Ruppin is Professor of computer science and the Director of the Center for Bioinformatics and Computational Biology in the University of Maryland. His research is focused on the computational study of cancer genomics and metabolism, with emphasis on the development of new system-wide approaches for precision oncology.

Uwe Sauer is Professor of Systems Biology at the ETH Zurich. His research interests focus on complex metabolic and regulation networks in bacteria and yeast. For this purpose, his group develops methods for 13 C-flux analysis and metabolomics, both for quantitative and high throughput analyses, that are combined with computational models.

Eric Schadt is Chief Scientific Officer at Pacific Bioscience where he oversees the scientific strategy for the company, including creating the vision for next-generation sequencing applications of the company’s technology. He is also a founding member of Sage Bionetworks, an open access genomics initiative designed to build and support databases and an accessible platform for creating innovative, dynamic models of disease.

Dirk Schübeler is a group leader at the Friedrich Miescher Institute for Biomedical Research in Basel and Professor at the University of Basel. His group combines functional genomics and targeted mutagenesis towards understanding how chromatin and DNA modifications impact genome regulation.

Luis Serrano is Head of the Structural and Computational Biology Unit at the EMBL, Heidelberg. His research group is investigating how to combine protein structural information with protein design algorithms to predict protein-protein and protein-dna interactions.

Lucy Shapiro is Ludwig Professor of Cancer Research in the Department of Developmental Biology at the Stanford University School of Medicine and the Director of the Beckman Center for Molecular and Genetic Medicine at Stanford. The focus of her work is the genetic circuitry that controls the progression of the cell cycle and the 3 D deployment of regulatory proteins that coordinates the cell cycle regulatory network.

James Sharpe is coordinator of the Systems Biology Unit at the Centre for Genomic Regulation (CRG) in Barcelona, Spain. His group studies multicellular systems - both dynamical modeling (2D and 3D) and experimental approaches. They focus on image-driven simulations of limb development, and the spatial dynamics of gene circuits. He invented Optical Projection Tomography as a method of 3D data-capture, and the group also continues to develop mesoscopic imaging technology.

Benny Shilo is a group leader in the Department of Molecular Genetics, at the Weizmann Institute, Israel. Following his discovery of proto-oncogene homologues in Drosophila, his lab has focused on dissection of developmental signaling pathways. For the past decade he has also been involved in collaborations with computational biologists aimed at understanding the basis for robustness and scaling during development.

Pamela Silver is Professor of Systems Biology at Harvard Medical School and the Director of the Harvard University-wide PhD Program in Systems Biology. She is also a member of the Department of Cancer Biology at the Dana Farber Cancer Institute. Her research interests include the systems biology of RNA, understanding the dynamics of intranuclear networks, using cell-based screens for pathway discovery, and synthetic biology to design eukaryotic cells.

Jan Skotheim is an Associate Professor of Biology at Stanford University. His lab work on quantitative aspects of cell signaling and computation as applied to understanding cell division. A central motivating question for his laboratory is to understand how cell growth triggers cell division.

Christina Smolke is an Assistant Professor in the Department of Bioengineering at Stanford University. She is also a member of the Cancer Immunotherapeutics Program at the City of Hope. Her research program focuses on the design of RNA-based information processing and control devices, and their integration into cellular computation, signal processing, and systems-level engineering strategies.

Michael Snyder is the Stanford Ascherman Professor and Chair of Genetics and the Director of the Center of Genomics and Personalized Medicine. His laboratory study was the first to carry out a large-scale functional genomics project in any organism, and currently carries out a variety of projects in the areas of genomics and proteomics both in yeast and humans.

Peter Sorger is Professor of Systems Biology at Harvard Medical School and Biological Engineering at MIT and received his Ph.D. in 1993 from Trinity College Cambridge under the supervision of Hugh Pelham. Sorger then trained as a Markey Postdoctoral Fellow with Harold Varmus and Andrew Murray at UCSF. Sorger’s research focuses on modeling and measuring death and survival pathways in mice and humans and on chromosome segregation. He directs the Center for Cell Decision Processes ( and is cofounder of Merrimack Pharmaceuticals and Glencoe software.

Igor Stagljar is a Professor of Molecular Genetics and Biochemistry at Donnelly Centre at the University of Toronto in Canada. His lab uses high-throughput interactive proteomics, genetic, cell biological and bioinformatic tools to understand how cell signaling and membrane transport pathways control cell behavior in normal and disease cells.

Alexander Stark is a Group Leader and Senior Scientist at the Research Institute of Molecular Pathology (IMP) in Vienna, Austria. He studies transcriptional regulation using an interdisciplinary approach that combines genome-wide functional assays with bioinformatics and biochemistry. A particular focus of his work are transcriptional enhancers, core promoters, and transcriptional regulatory proteins (transcription factors and cofactors).

Lars Steinmetz is a Professor of Genetics at Stanford University and the Associate Head of the Genome Biology Unit at the European Molecular Biology Laboratory (EMBL). His research focuses on developing and applying interdisciplinary, genome-wide technologies to investigate the function and mechanism of genome regulation, the genetic basis of complex phenotypes and the genetic and molecular systems underpinning disease.

Amos Tanay is an Associate Professor and Kimmel investigator in the department of Computer Science and the department of Biological Regulation at the Weizmann Institute. His research is focused on genomic and chromosomal regulation of heterogeneous populations of single cells within tissues, aiming to understand how cells acquire, stabilize, and later modify their epigenetic and functional states.

Marc Vidal is an Associate Professor in Cancer Biology and the Director of the Center for Cancer Systems Biology at the Dana-Farber Cancer Institute and an Associate Professor of Genetics at Harvard Medical School. His laboratory studies how complex macromolecular networks are organized and how perturbations in those networks can lead to diseases such as cancer.

Albertha (Marian) Walhout obtained a PhD in Medicine at the University of Utrecht, the Netherlands, with a focus on Molecular Biology and Biochemistry. In 1998 she came to the USA for a post-doc at Harvard Medical School in Functional Genomics and Systems Biology. She started her lab at UMass Medical School in the spring of 2003. Her work centers on the elucidation of gene regulatory networks and how they relate to development, physiology and disease.

Jun Wang is Director of BGI (previously known as the Beijing Genomics Institute) and was instrumental in founding the BGI Bioinformatics Department in 1999. Dr. Wang also holds a position as an Ole Rømer professor at the University of Copenhagen. He has authored 200+ peer-reviewed original papers and his research focuses on genomics and related bioinformatics analysis of complex diseases and agricultural crops, with the goal of developing applications using the genomic information.

Hans V. Westerhoff is AstraZeneca Professor of Systems Biology at the Manchester Centre for Integrative Systems Biology (heading the Doctoral Training Centre Systems Biology), Professor of Microbial Physiology at the Free University Amsterdam and of Mathematical Biochemistry at the University of Amsterdam. Chairing the Steering Committee of the German HepatoSys program, he has worked on Hierarchical Control and Regulation, the siliconcell, EGF signalling, and DNA structure, exemplifying bottom-up Systems Biology.

Lingchong You is Paul Ruffin Scarborough Associate Professor of Engineering in the Pratt School of Engineering at Duke University. His laboratory explores design principles of biological networks and uses synthetic gene circuits for applications in computation, engineering and medicine.

Marino Zerial is Managing Director of the MPI-CBG in Dresden. He has been conducting interdisciplinary research on endocytosis and signalling, combining biochemistry, live cell imaging, image processing, functional genomics and computational approaches to systems analysis.


  1. Tuktilar

    Agree, a useful idea

  2. Fareed

    Life is this. You're not going to do anything.

  3. Golden

    well ...... test !!!

  4. Boone

    What a talented idea

  5. Shaktirn

    I apologise, but it does not approach me.

  6. Dallas

    As is curious. :)

Write a message