Information

Confusion related to gene expression

Confusion related to gene expression


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I have a set of gene expression data downloaded from http://www.ncbi.nlm.nih.gov/geo. I have two sets of data, one is the raw probe intensity data set in the form of CEL files. Another is processed txt file.

Here is a link for example http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7904

I don't know what kind of processing is done to generate the processed data from the raw CEL files. For example in the above link, I have processed data like

SOFT formatted family file(s) MINiML formatted family file(s) Series Matrix File(s)

I don't know what sort of processing is done to generate those files. Any suggestions? I can read the raw CEL files in matlab but what sort of preprocessing am I supposed to do?

I also want to map these probe its to gene ids and get the corresponding gene. How can I accomplish this? I know matlab. But I am bit confused about the terminology and all. Any suggestions?


If you click through to some of the samples in the study, eg. GSM194397, GSM194398, etc. it mentions under the "Data processing" section that "The data were analyzed with dchip with default normalization settings".

You can learn more about dChip over at the dChip website.


The methods used for the processign of the date should be mentioned in the Methods section of the paper the data come from (here it is Richardson AL, Wang ZC, De Nicolo A, Lu X et al. X chromosomal abnormalities in basal-like human breast cancer. Cancer Cell 2006 Feb;9(2):121-32. PMID: 16473279). I quote from the paper:

"RNA extraction, cRNA synthesis, and hybridization to Affymetrix Human Genome U133 Plus 2.0 Arrays were performed as described previously (Signoretti et al., 2002 and Wang et al., 2004). Raw expression data obtained using Affymetrix GENECHIP software was normalized and analyzed using DNA-Chip Analyzer (dChip) custom software (W.H. Wong and C. Li, http://www.dChip.org/). Array probe data were normalized to the mean expression level of each probe across a sample set. Where indicated, tumors were classified as BLC or non-BLC on the basis of their expression array characteristics, using dChip hierarchical clustering analysis as previously described (Matros et al., 2005 and Wang et al., 2004). Comparisons between results obtained on BLC or BRCA1 tumors, non-BLC tumors, and normal breast samples were performed using the dChip “Compare Sample” function. A threshold of 1.2-fold overexpression in BLC and BRCA1 tumors was applied with 90% confidence. Of 1271 gene probes that map to the X chromosome, 60 satisfied these overexpression criteria with a range of fold difference from 1.35 to 5.11. The false discovery rate (number) of 1000 permutations was as follows: median, 0% (0); 90th percentile, 3.3% (2). Of the 60 probes, 19 were redundant (two or more probes mapping to the same gene) and excluded, leaving 41 gene-specific probes for use in the expression plot of Figure 5B. The complete gene expression array data set is available on the NCBI GEO database (accession no. GSE3744)."

Since this is a standard Affymetrix human genome chip, it's CDF should either be embedded in most packages used to analyse this kind of data (limma is an industry gold standard example ;-), or downloaded from the Afymetrix website. Please note that the CDF itself may be quite old and it's annotation not always accurate any more (I have no experience with human data), so it's worth using an updated CDF (see Dai et al 2005).


Why You Should Know Your Gene’s Accession Number

A typical mammalian gene usually does not encode a single protein, thanks partly to the phenomenon of alternative mRNA splicing. Nearly all mammalian genes contain multiple introns, and greater than 90% of all intron-containing genes undergo alternative splicing to generate multiple transcript variants, and, subsequently, different protein isoforms (Pan, et al., 2008 Park, et al., 2018). Differential alternative splicing usually occurs within and between tissues, but some (5%) can occur between individuals (Kwan, et al., 2007 Wang, et al., 2008). As a result, one gene can potentially express many different proteins.

At GeneCopoeia, we provide customers with DNA-based tools that are used for many different types of gene function studies. These include plasmids for open reading frame (ORF) expression, gene knockout via CRISPR sgRNA, microRNA (miRNA) validation studies using 3 &rsquoUTRs, etc.as well as qPCR primers. When customers request these reagents, they often encounter multiple accession numbers for each gene and do not know which one they need to order. Likewise, a customer might be interested in using plasmids to study a particular gene, but, when asked, they will not know the accession number of the variant or isoform they are working with. In this Technical Note, we talk about the multi-variant diversity of mammalian genes, and how the accession number of the gene you are working with needs to be a major consideration when requesting different types of plasmids from GeneCopoeia.


Abstract

Analyses of gene set differential coexpression may shed light on molecular mechanisms underlying phenotypes and diseases. However, differential coexpression analyses of conceptually similar individual studies are often inconsistent and underpowered to provide definitive results. Researchers can greatly benefit from an open-source application facilitating the aggregation of evidence of differential coexpression across studies and the estimation of more robust common effects. We developed Meta Gene Set Coexpression Analysis (MetaGSCA), an analytical tool to systematically assess differential coexpression of an a priori defined gene set by aggregating evidence across studies to provide a definitive result. In the kernel, a nonparametric approach that accounts for the gene-gene correlation structure is used to test whether the gene set is differentially coexpressed between two comparative conditions, from which a permutation test p-statistic is computed for each individual study. A meta-analysis is then performed to combine individual study results with one of two options: a random-intercept logistic regression model or the inverse variance method. We demonstrated MetaGSCA in case studies investigating two human diseases and identified pathways highly relevant to each disease across studies. We further applied MetaGSCA in a pan-cancer analysis with hundreds of major cellular pathways in 11 cancer types. The results indicated that a majority of the pathways identified were dysregulated in the pan-cancer scenario, many of which have been previously reported in the cancer literature. Our analysis with randomly generated gene sets showed excellent specificity, indicating that the significant pathways/gene sets identified by MetaGSCA are unlikely false positives. MetaGSCA is a user-friendly tool implemented in both forms of a Web-based application and an R package “MetaGSCA”. It enables comprehensive meta-analyses of gene set differential coexpression data, with an optional module of post hoc pathway crosstalk network analysis to identify and visualize pathways having similar coexpression profiles.


Main Concepts and Definitions

Two main paradigms exist in the field of machine learning: supervised and unsupervised learning. Both have potential applications in biology.

In supervised learning, objects in a given collection are classified using a set of attributes, or features. The result of the classification process is a set of rules that prescribe assignments of objects to classes based solely on values of features. In a biological context, examples of object-to-class mappings are tissue gene expression profiles to disease group, and protein sequences to their secondary structures. The features in these examples are the expression levels of individual genes measured in the tissue samples and the presence/absence of a given amino acid symbol at a given position in the protein sequence, respectively. The goal in supervised learning is to design a system able to accurately predict the class membership of new objects based on the available features. Besides predicting a categorical characteristic such as class label, (similar to classical discriminant analysis), supervised techniques can be applied as well to predict a continuous characteristic of the objects (similar to regression analysis). In any application of supervised learning, it would be useful for the classification algorithm to return a value of “doubt” (indicating that it is not clear which one of several possible classes the object should be assigned to) or “outlier” (indicating that the object is so unlike any previously observed object that the suitability of any decision on class membership is questionable).

In contrast to the supervised framework, in unsupervised learning, no predefined class labels are available for the objects under study. In this case, the goal is to explore the data and discover similarities between objects. Similarities are used to define groups of objects, referred to as clusters. In other words, unsupervised learning is intended to unveil natural groupings in the data. Thus, the two paradigms may informally be contrasted as follows: in supervised learning, the data come with class labels, and we learn how to associate labeled data with classes in unsupervised learning, all the data are unlabeled, and the learning procedure consists of both defining the labels and associating objects with them.

In some applications, such as protein structure classification, only a few labeled samples (protein sequences with known structure class) are available, while many other samples (sequences) with unknown class are available as well. In such cases, semi-supervised techniques can be applied to obtain a better classifier than could be obtained if only the labeled samples were used [5]. This is possible, for instance, by making the “cluster assumption,” i.e., that class labels can be reliably transferred from labeled to unlabeled objects that are “nearby” in feature space.

Life science applications of unsupervised and/or supervised machine learning techniques abound in the literature. For instance, gene expression data was successfully used to classify patients in different clinical groups and to identify new disease groups [6–9], while genetic code allowed prediction of the protein secondary structure [10]. Continuous variable prediction with machine learning algorithms was used to estimate bias in cDNA microarray data [11].

To support precise characterization of both supervised and unsupervised machine learning methods, we have adopted certain mathematical notations and concepts. In the next sections, we employ vector notation (x denotes an ordered p-tuple of numbers for some integer p), matrix notation (X denotes a rectangular array of numbers, where xij will denote the number in the ith row and jth column of X), conditional probability densities, and sufficient matrix algebra to define the multivariate normal density. Necessary formal background in algebra and probability can be found elsewhere [12].


Identification of a DREB-related gene in Triticum durum and its expression under water stress conditions

Genes from the DREB family are involved in plant’s responses to dehydration and possibly play a role in their ability to tolerate water stress. Understanding the relationship between water stress tolerance and expression of specific genes requires the isolation and characterisation of the sequences that may be involved. We report the isolation and characterisation of a gene in Triticum durum, namely TdDRF1, which belongs to the DREB gene family and produces three forms of transcripts through alternative splicing. The relationship between the expression profile of the TdDRF1 gene and water stress was assessed by real-time reverse transcription–polymerase chain reaction in a time-course experiment up to 7 days. Water stress experimental conditions were selected to relate changes in gene expressions during a time frame reflecting as closely as possible those during which water stress starts having a visible effect under field conditions. Among the three isoforms of TdDRF1, the truncated form TdDRF1.2 was at all times the most expressed. Its expression, together with the TdDRF1.3 transcript, increased sharply after 4 days of dehydration, but then decreased at 7 days. The TdDRF1.1 transcript was the least expressed overall and varied least with the duration of dehydration. Genotypic differences in TdDRF1 gene expression are currently under investigation.


Physiological characteristics and related gene expression of after-ripening on seed dormancy release in rice

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Z. Wang and H. Zhang, Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China.

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Z. Wang and H. Zhang, Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China.

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Z. Wang and H. Zhang, Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China.

Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing, Jiangsu, China

Z. Wang and H. Zhang, Laboratory of Seed Science and Technology, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China.

Institutional Login
Log in to Wiley Online Library

If you have previously obtained access with your personal account, please log in.

Purchase Instant Access
  • View the article PDF and any associated supplements and figures for a period of 48 hours.
  • Article can not be printed.
  • Article can not be downloaded.
  • Article can not be redistributed.
  • Unlimited viewing of the article PDF and any associated supplements and figures.
  • Article can not be printed.
  • Article can not be downloaded.
  • Article can not be redistributed.
  • Unlimited viewing of the article/chapter PDF and any associated supplements and figures.
  • Article/chapter can be printed.
  • Article/chapter can be downloaded.
  • Article/chapter can not be redistributed.

Abstract

After-ripening is a common method used for dormancy release in rice. In this study, the rice variety Jiucaiqing (Oryza sativa L. subsp. japonica) was used to determine dormancy release following different after-ripening times (1, 2 and 3 months). Germination speed, germination percentage and seedling emergence increased with after-ripening more than 95% germination and 85% seedling emergence were observed following 1 month of after-ripening within 10 days of imbibition, compared with <45% germination and 20% seedling emergence in freshly harvested seed. Hence, 3 months of after-ripening could be considered a suitable treatment period for rice dormancy release. Dormancy release by after-ripening is mainly correlated with a rapid decline in ABA content and increase in IAA content during imbibition. Subsequently, GA1/ABA, GA7/ABA, GA12/ABA, GA20/ABA and IAA/ABA ratios significantly increased, while GA3/ABA, GA4/ABA and GAs/IAA ratio significantly decreased in imbibed seeds following 3 months of after-ripening, thereby altering α-amylase activity during seed germination. Peak α-amylase activity occurred at an earlier germination stage in after-ripened seeds than in freshly harvested seeds. Expression of ABA, GA and IAA metabolism genes and dormancy-related genes was regulated by after-ripening time upon imbibition. Expression of OsCYP707A5, OsGA2ox1, OsGA2ox2, OsGA2ox3, OsILR1, OsGH3-2, qLTG3-1 and OsVP1 increased, while expression of Sdr4 decreased in imbibed seeds following 3 months of after-ripening. Dormancy release through after-ripening might be involved in weakening tissues covering the embryo via qLTG3-1 and decreased ABA signalling and sensitivity via Sdr4 and OsVP1.


Supplementary information

Mutated proteins that can antagonize the function of the wild-type protein, often because the proteins are part of a macromolecular complex, which is rendered defective by the presence of the mutated protein.

Thalassaemia is a blood disorder causing anaemia. In the severe form of β 0 -thalassaemia, no β-globin protein is detectable in peripheral blood.

An mRNA feature that increases the probability of the mRNA undergoing nonsense-mediated mRNA decay. Examples include an exon–exon junction complex deposited as a consequence of splicing more than

30–35 nucleotides downstream of a termination codon, an unusually long (>1 kb) 3′ untranslated region or a selenocysteine codon that is interpreted as a stop codon.

Upstream open reading frame

(uORF). A short ORF in the 5′ end of mRNA (upstream of the main ORF) that can regulate the translation of the main ORF.

The amino-acyl site on the ribosome is where charged tRNA molecules (with the exception of the translation-initiating charged tRNA) bind during protein synthesis.

A process that occurs during translation when a water molecule attacks the bond between the nascent peptide and the tRNA molecule in the ribosome, thereby releasing the completed polypeptide.

Staufen-mediated mRNA decay

An mRNA decay pathway in which the staufen protein recruits UPF1 to an mRNA 3′ untranslated region, causing translation-dependent destabilization of the mRNA.

A large protein complex that degrades mRNAs through its 3′-to-5′ exoribonuclease activities.

An amino acid that is inserted into mRNA bearing a selenocysteine insertion sequence that directs its incorporation at UGA codons, which otherwise would be recognized as termination codons.

Occurs when an intron fails to be excised out of a pre-mRNA during alternative splicing, giving rise to a transcript with a premature termination codon.

Programmed ribosomal frameshifts

(PRFs). During translation, incidents of ribosome ‘slippage’ and adoption of a new reading frame.

A tertiary RNA structure formed by base pairing between the loop of a stem–loop structure and nearby ribonucleotides. It is extremely difficult for helicases to unwind this structure.

A disease manifesting defects in tissues derived from cells in the neural crest lineage (neurocristopathy). Individuals with Waardenburg syndrome have defects in hair, skin and eye pigmentation and may suffer from hearing loss.

A congenital malady in which nerve cells are missing from the end of the bowel, thereby causing problems with passing stool.

An X-linked disorder causing mild to moderate intellectual disability, facial dysmorphism and arms and legs that are abnormally long and slender.

An X-linked disorder characterized by intellectual disability, poor muscle tone and macrocephaly.

Peptides absent from normal cells that are produced by tumour-mutated genes that are presented to and activate the immune system.


Conclusions and perspectives

Accumulation of Arabidopsis transcriptome data has facilitated the genome-wide analysis of gene co-expression profiles. Several co-expression databases provide condition-independent correlation coefficients computed from large sets of microarray data. These databases have allowed the search of co-expressed genes with genes of interest. Co-expression networks constructed from pair-wise correlation coefficients have provided an efficient way to identify functional transcription modules associated with specific biological processes. The biologically relevant hypotheses developed using co-expression analysis have assisted in the design of hypothesis-driven experiments and gene prioritization for those experiments. In summary, co-expression analysis, using microarray data accumulated so far, is now within reach of many researchers, even if they do not compute the correlation coefficients themselves.

Correlation coefficients provided in the databases are a convenient measure for estimating gene-to-gene co-expression. However, we emphasize that it is crucial to review original expression data. Genes naturally exhibit high correlation if entire expression patterns across diverse conditions are similar. On the other hand, genes also exhibit high correlation if they are expressed together under a few conditions and are otherwise silent. Thus, reviewing original expression data provides insights into the reason why genes of interest show high correlation. Some of the co-expression databases implement a browser of original expression data, which helps in the discrimination of meaningful co-expression profiles from less meaningful ones.

Co-expression analysis has laid the foundation for the system-level understanding of physiological processes. The next steps include development of methodologies to integrate multiple omics data sets, as has been proposed for human and zebrafish (Aerts et al. 2006 , Butte and Kohane 2006 ). Associations between genome, transcriptome, proteome, metabolome and phenome will be considered together to uncover regulatory relationships that cannot be extracted from a single omics data set. This line of study may reveal the function of genes that do not show an apparent co-expression with any other genes. In addition, the next steps include the analysis of time series expression data. The extent of time displacement existing between gene expression and its end-points (e.g. metabolite accumulation, phenotype change) needs to be gauged when relating gene expression to other omics data using classical correlation methods. A time scale of response and re-equilibration of gene expression may include information such as the nature of interaction within the cellular system (Nicholson et al. 2004 ). Finally, with the development of the methodology, correlation-based analysis will shed new light not only on the static but also on the dynamic aspects of behavior of plant cellular systems.


Patterns of inheritance

The word “expression” can mean different things in different contexts. In molecular biology, “expression” means “transcribed and translated,” or the process of making a protein from the genetic instructions in DNA.
In discussions of phenotypes, sometimes people use the word “expressed” to mean “visible” in the phenotype.
These very different definitions create a lot of confusion about the difference between gene expression and phenotypic appearance, because it can make it sounds like a recessive allele is recessive because it must not be transcribed or translated. This is not the case. Often both the dominant and the recessive alleles are expressed (transcribed and translated), but the behavior of the protein encoded by the dominant allele “masks” or “hides” the behavior of the protein encoded by the recessive allele.
Recognizing this distinction is extremely helpful for understanding the behavior of both Mendelian (single gene, dominant/recessive inheritance) and “non-Mendelian” traits (anything other than single gene, dominant/recessive inheritance).

Beyond dominant/recessive traits

Mendel identified the rules of particulate inheritance (inheritance based on genes) using pea plants which have many single-gene traits with a dominant/recessive inheritance pattern. This is the simplest inheritance pattern possible, and most traits are NOT controlled this way. Other (more common) inheritance patterns include:

  • Incomplete dominance: where heterozygotes have an intermediate phenotype in-between the two homozygous phenotypes. An example is petal color in four-o’clock flowers, where homozygotes are either white or red, and heterozygotes are pink. Each R allele contributes one ‘unit’ of petal color, while each r allele contributes no ‘units’ of petal color. So two R alleles results in a red, one R allele results in pink, and no R alleles results in white.
  • Co-dominance: where heterozygotes display each phenotype associated with each allele. An example is AB blood type in humans, where the A allele results in one specific type of sugar on a red blood cell, and B results in a different type of sugar on a red blood cell. Two A alleles results in only A-type sugars, two B alleles results in only B-type sugars, and the heterozygote has both A- and B-type sugars on the red blood cell. (Type O results in no sugar we’ll discuss this more in class.) Though they seem similar at first glance, incomplete dominance and co-dominance are different from each other, and are based on the molecular phenomenon underlying the trait.

Here is a video describing the difference between incomplete dominance and co-dominance using an artificial example

  • Quantitative traits: where the trait has a continuous phenotype controlled by additive alleles at multiple genes. This means that the trait is not controlled by just one gene with several alleles, but MULTIPLE genes (polygenic inheritance), each of which can have multiple alleles. An example is human height: we have differences in height down to fractions of an inch, rather than being either 4 ft, 5 ft, or 6ft tall. Each height allele at each gene controlling height contributes a ‘unit’ of height which is additive. Quantitative traits are in contrast to discrete traits where the trait has only a few possible phenotypes which fall into discrete classes (ie, peas are either round or wrinkly, and there are no in-between phenotypes).
  • Multiple allelism: where a particular gene has more than two alleles. An example is human blood type (described above) where the single gene controlling blood type can be have an A, B, or O allele.

This video describes the difference between polygenic traits and multiple allelism:


And this video works through some real examples of multiple allelism and quantitative traits (stop at 6:20 min):

  • Gene-by-gene interactions: where the phenotype associated with one allele depends on the allele(s) present at another gene. This is different from a quantitative trait where alleles at multiple genes are additive. The gene-by-gene inheritance pattern can also be called epistasis. The take home-message on gene-by-gene interactions is that this phenomenon alters the expected phenotypic ratios of a Mendelian dihybrid cross (9:3:3:1) to a different pattern.

This video gives an overview of a gene-by-gene interaction that controls coat color in mice:

  • Pleiotropy is the phenomenon where a single gene influences multiple, seemingly unrelated traits. For example, in the human disorder phenylketonuria (PKU), a single mutation in a single gene can cause intellectual disability, seizures, reduced skin pigmentation, hair color, “musty” smelling urine, and a predisposition to eczema.
  • Gene by environment interactions: where the environment plays a role in determining phenotypecontrolled by alleles. An example is human height (which is also an example of a quantitative trait) where childhood nutrition plays a role in an adult height. We have gotten taller as a species in the last 200 years (mostly) not because of changes in our alleles but due to access to better nutrition in much of the world.


While these types of inheritance ‘violate’ Mendel’s rules for inheritance of single-gene discrete traits, they are all still controlled by the behavior of chromosomes during meiosis. In addition, the single-gene inheritance pattern Mendel discovered is actually pretty rare compared to all these other inheritance patterns described above: most traits are controlled by one or more of the inheritance patterns described above. In class, we’ll predict genotypes, phenotypes, and phenotypic ratios for incomplete dominance and co-dominance inheritance patterns.


Discussion

In this study, a comprehensive analysis of key genes and pathological processes associated with asthma severity is carried out in expression profiling with 108 samples. The goal of this study is to provide insights into the relationship between disease biology and the development of asthma. The findings address the shortage of objectivity in disease pathological diagnosis and in guiding the clinical treatment applications.

Machine learning feature selection has been widely used due to its objective assessment and optimal accuracy in artificial intelligence (Li et al., 2017 Nidheesh, Abdul Nazeer & Ameer, 2017). The feature genes for the development of asthma are screened out using machine learning feature selection. 37 genes associated with asthma development are all retained after feature selection of machine learning. These feature genes can accurately distinguish different severity of asthma (Fig. 9), playing an essential role in asthma. In previous analysis of this asthma dataset (GSE43696), thyroid peroxidase (TPO) plays an important role in asthma (Voraphani et al., 2014). TPO and its metabolome drives nitrative stress in severe asthma. Similarly, TPO is attributed to the feature gene set after the screening of feature genes in our study. These gene sets can effectively distinguish severe asthma patients from the control. However according to the classification, feature gene contribution shows that TPO is low-ranked in the feature gene set. Thus, asthma, a complex disease, is more likely to be the result of multi-gene interactions.

Due to the multiple functions of genes, it is challenging to locate the exact asthma mechanism (Cao et al., 2015 Li et al., 2017 Singh & Sivabalakrishnan, 2015). Hence, WGCNA, based on biological and medical background, is used to endow these genes with clinical significance and cluster the feature genes according to the specific pathological process. However, WGCNA, being considered as a correlation analysis, cannot solve all problems, but needs to combine other appropriate methods. (Li et al., 2016).

This study combines machine learning and WGCNA for the improvement of assessment regarding pathogenic mechanisms. After these processes, the feature genes that played a role in asthma severity can be classified into three major pathological processes: hormone secretion regulation, airway remodeling, and regulation of immune response. These pathological processes and related feature genes can determine the development of asthma. As a result, some genes screened out have been actually reported to be associated with respiratory diseases, such as the gene of superoxide dismutase 2 (SOD2). Previous study identifies production of H2O2 as a key driver of reactive oxygen species (ROS) that leads to lung damage in asthma. SOD2 could promote the development of inflammation since it is a generator of H2O2. On the contrary, in our study, superoxide dismutase 2 (SOD2), is identified as an inhibitor of immune responses, as validated by the latest research (Seo et al., 2019). Codonopsis lanceolata extract (CLE) has anti-asthmatic and anti-inflammatory effects. Treatment with CLE enhanced the expression of SOD2, which is related to mitochondrial ROS (mROS) scavenge and Th2 cell regulation. It indicates that CLE has a potential to enhance the immune-suppressive property by regulating mROS scavenging through SOD2. Furthermore, previous studies have reported that SOD2 can be used as an anti-inflammatory agent due to its ROS scavenging capacity (Li & Zhou, 2011). The SOD2 expression level is decreased in multiple diseases, including cancer, neurodegenerative diseases, and psoriasis. The reduction of SOD2 mRNA expression was also observed in our study from mild to severe asthma. Therefore, SOD2 should be identified as an inhibitor of immune response. In addition, the above results also prove the effectiveness of our method.



Comments:

  1. Freowine

    I thought about it and deleted this question

  2. Ghedi

    Dear blog author, are you by any chance from Moscow?

  3. Shadwell

    I am sorry, that has interfered... This situation is familiar To me. Write here or in PM.



Write a message