Information

Batch convert miRNA names to Accession IDs?

Batch convert miRNA names to Accession IDs?



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Does anyone know of tools to convert lists of miR names to their miRBase Accession IDs? I know they exist, but my search keeps pulling up gene ID converters.

EDIT: both @rg255 and @shigeta have provided solutions to the underlying issue, but I am curious to know if something exists for miRNAs that is similar to a site like this.

EDIT 2: miRNA names are of the form 'hsa-let-7a' and Accession IDs are of the form 'MI0000060'


in what species? this website looks like it will have them mirbase.org/cgi-bin/browse.pl?org=hsa If you are trying to connect them to data I'd suggest you find the table you want, save it in to a .txt file and use R (using the merge() function - df=merge(df1,mirbase,by="miR_names") would be a rough guide) to match them up.

Here is a batch converter - http://atlas.dmi.unict.it/mirandola/tools.php - add the text list to box on the right and it produces a table.


If you want it all, download the mirRNA.xls.zip file from mirbase here:

ftp://mirbase.org/pub/mirbase/CURRENT/

its a spreadsheet that includes these data as columns and also more information besides - for the entire database.


MatchMiner: a tool for batch navigation among gene and gene product identifiers

MatchMiner is a freely available program package for batch navigation among gene and gene product identifier types commonly encountered in microarray studies and other forms of 'omic' research. The user inputs a list of gene identifiers and then uses the Merge function to find the overlap with a second list of identifiers of either the same or a different type or uses the LookUp function to find corresponding identifiers.


Significant changes occurred with the March 2004 release, please refer to the column descriptions below.

Beginning with the October 2003 release, BLAST and Ortholog/Homolog annotations are being provided in separate files. Proteome BioKnowledge® Library data will no longer be provided (due to licensing issues).

The following columns have been removed:

  • Protein Similarities BLASTP (GenBank NR)
  • Protein Similarities BLASTX (SwissProt/TrEMBL)
  • Orthologs/Homologs
  • All 5 Proteome columns

Conclusion

This review is by no means comprehensive, but is intended to be representative of the currently available Id converters. Thus, there are several other Id converters that are part of other integrative analysis systems which are not reviewed here but might be of interest to researchers--such as Babelomics, [20] BioMart, [21] ID Converter System, [22] BridgeDB etc [23]. Many of the users provide their feedback after using these tools at internet forums (eg http://biostar.stackexchange.com/questions/22/gene-id-conversion-tool). Comparisons are made using a test set of Ids to test the performance of different Id converters (eg http://www.scribd.com/doc/18966500/Id-Converters-Test) that might aid in the selection of an appropriate Id converter. Such comparative analysis is not presented in this review, as the intended use of each of the Id converters is different and each has its own unique features which may not be measured by direct comparison. It is, however, recommended that one should base the choice of an Id converter application on the researcher's conversion needs for example, the availability of the required input and output Id type, acceptable mapping algorithm and database update frequency, which are described in this review and summarised in Table ​ Table2, 2 , as well as other factors that might be of interest for the biological experiment being conducted.


Bartel, D. P. Metazoan microRNAs. Cell 173, 20–51 (2018).

Mehta, A. & Baltimore, D. MicroRNAs as regulatory elements in immune system logic. Nat. Rev. Immunol. 16, 279–294 (2016).

O’Connell, R. M., Rao, D. S., Chaudhuri, A. A. & Baltimore, D. Physiological and pathological roles for microRNAs in the immune system. Nat. Rev. Immunol. 10, 111–122 (2010).

Montagner, S., Dehó, L. & Monticelli, S. MicroRNAs in hematopoietic development. BMC Immunol. 15, 14 (2014).

Kuchen, S. et al. Regulation of microRNA expression and abundance during lymphopoiesis. Immunity 32, 828–839 (2010).

Mildner, A. et al. Mononuclear phagocyte miRNome analysis identifies miR-142 as critical regulator of murine dendritic cell homeostasis. Blood 121, 1016–1027 (2013).

Landgraf, P. et al. A mammalian microRNA expression atlas based on small RNA library sequencing. Cell 129, 1401–1414 (2007).

Monticelli, S. et al. MicroRNA profiling of the murine hematopoietic system. Genome Biol. 6, R71 (2005).

Basso, K. et al. Identification of the human mature B cell miRNome. Immunity 30, 744–752 (2009).

Wu, H. et al. miRNA profiling of naïve, effector and memory CD8 T cells. PLoS ONE 2, e1020 (2007).

Butovsky, O. et al. Identification of a unique TGF-β-dependent molecular and functional signature in microglia. Nat. Neurosci. 17, 131–143 (2014).

Agudo, J. et al. The miR-126-VEGFR2 axis controls the innate response to pathogen-associated nucleic acids. Nat. Immunol. 15, 54–62 (2013).

Fehniger, T. A. et al. Next-generation sequencing identifies the natural killer cell microRNA transcriptome. Genome Res. 20, 1590–1604 (2010).

Fukao, T. An evolutionarily conserved mechanism for microRNA-223 expression revealed by microRNA gene profiling. Cell 129, 617–631 (2007).

Fazi, F. et al. A minicircuitry comprised of microRNA-223 and transcription factors NFI-A and C/EBPα regulates human granulopoiesis. Cell 123, 819–831 (2005).

Taganov, K. D., Boldin, M. P., Chang, K. J. & Baltimore, D. NF-κB-dependent induction of microRNA miR-146, an inhibitor targeted to signaling proteins of innate immune responses. Proc. Natl Acad. Sci. USA 103, 12481–12486 (2006).

Ye, Z. et al. Regulation of miR-181a expression in T cell aging. Nat. Commun. 9, 3060 (2018).

Kirigin, F. F. et al. Dynamic microRNA gene transcription and processing during T cell development. J. Immunol. 188, 3257–3267 (2012).

Georgakilas, G. et al. microTSS: accurate microRNA transcription start site identification reveals a significant number of divergent pri-miRNAs. Nat. Commun. 5, 5700 (2014).

Chang, T. C., Pertea, M., Lee, S., Salzberg, S. L. & Mendell, J. T. Genome-wide annotation of microRNA primary transcript structures reveals novel regulatory mechanisms. Genome Res. 25, 1401–1409 (2015).

Marson, A. Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134, 521–533 (2008).

de Rie, D. An integrated expression atlas of miRNAs and their promoters in human and mouse. Nat. Biotechnol. 35, 872–878 (2017).

Suzuki, H. I., Young, R. A. & Sharp, P. A. Super-enhancer-mediated RNA processing revealed by integrative microRNA network analysis. Cell 168, 1000–1014 (2017).

Mestdagh, P. et al. Evaluation of quantitative miRNA expression platforms in the microRNA quality control (mirQC) study. Nat. Methods 11, 809–815 (2014).

Jayaprakash, A. D., Jabado, O., Brown, B. D. & Sachidanandam, R. Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing. Nucleic Acids Res. 39, e141 (2011).

Giraldez, M. D. et al. Comprehensive multi-center assessment of small RNA-seq methods for quantitative miRNA profiling. Nat. Biotechnol. 36, 746–757 (2018).

Brown, B. D. et al. Endogenous microRNA can be broadly exploited to regulate transgene expression according to tissue, lineage and differentiation state. Nat. Biotechnol. 25, 1457–1467 (2007).

Cho, S. et al. miR-23 approximately 27 approximately 24 clusters control effector T cell differentiation and function. J. Exp. Med. 213, 235–249 (2016).

Trifari, S. et al. MicroRNA-directed program of cytotoxic CD8 + T-cell differentiation. Proc. Natl Acad. Sci. USA 110, 18608–18613 (2013).

O’Connell, R. M., Rao, D. S. & Baltimore, D. microRNA regulation of inflammatory responses. Annu. Rev. Immunol. 30, 295–312 (2012).

Rodríguez-Galán, A., Fernández-Messina, L. & Sánchez-Madrid, F. Control of immunoregulatory molecules by miRNAs in T cell activation. Front. Immunol. 9, 2148 (2018).

He, M. et al. Cell-type-based analysis of microRNA profiles in the mouse brain. Neuron 73, 35–48 (2012).

Yoshida, H. et al. The cis-regulatory atlas of the mouse immune system. Cell 176, 897–912 (2019).

Harrow, J. et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 22, 1760–1774 (2012).

Johanson, T. M. et al. Transcription-factor-mediated supervision of global genome architecture maintains B cell identity. Nat. Immunol. 19, 1257–1264 (2018).

Bouvy-Liivrand, M. et al. Analysis of primary microRNA loci from nascent transcriptomes reveals regulatory domains governed by chromatin architecture. Nucleic Acids Res. 45, 12054 (2017).

Ozsolak, F. et al. Chromatin structure analyses identify miRNA promoters. Genes Dev. 22, 3172–3183 (2008).

Ribas, J. et al. A novel source for miR-21 expression through the alternative polyadenylation of VMP1 gene transcripts. Nucleic Acids Res. 40, 6821–6833 (2012).

Ruan, Q. et al. MicroRNA-21 regulates T-cell apoptosis by directly targeting the tumor suppressor gene Tipe2. Cell Death Dis. 5, e1095 (2014).

O’Connell, R. M. et al. MicroRNAs enriched in hematopoietic stem cells differentially regulate long-term hematopoietic output. Proc. Natl Acad. Sci. USA 107, 14235–14240 (2010).

Emmrich, S. et al. miR-99a/100

125b tricistrons regulate hematopoietic stem and progenitor cell homeostasis by shifting the balance between TGFβ and Wnt signaling. Genes Dev. 28, 858–874 (2014).

Mullokandov, G. et al. High-throughput assessment of microRNA activity and function using microRNA sensor and decoy libraries. Nat. Methods 9, 840–846 (2012).

Bosson, A. D., Zamudio, J. R. & Sharp, P. A. Endogenous miRNA and target concentrations determine susceptibility to potential ceRNA competition. Mol. Cell 56, 347–359 (2014).

Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

Gasperini, M., Tome, J. M. & Shendure, J. Towards a comprehensive catalogue of validated and target-linked human enhancers. Nat. Rev. Genet. 21, 292–310 (2020).

Baccarini, A. et al. Kinetic analysis reveals the fate of a microRNA following target regulation in mammalian cells. Curr. Biol. 21, 369–376 (2011).

Kingston, E. R. & Bartel, D. P. Global analyses of the dynamics of mammalian microRNA metabolism. Genome Res. 29, 1777–1790 (2019).

Treiber, T., Treiber, N. & Meister, G. Regulation of microRNA biogenesis and its crosstalk with other cellular pathways. Nat. Rev. Mol. Cell Biol. 20, 5–20 (2019).

Baccarini, A. & Brown, B. D. Monitoring microRNA activity and validating microRNA targets by reporter-based approaches. Methods Mol. Biol. 667, 215–233 (2010).

Prescott, S. L. et al. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell 163, 68–83 (2015).

Cheung, S. T., Shakibakho, S., So, E. Y. & Mui, A. L. F. Transfecting RAW264.7 cells with a luciferase reporter gene. J. Vis. Exp. 100, 52807 (2015).

Nüssing, S. et al. Efficient CRISPR/Cas9 gene editing in uncultured naive mouse T cells for in vivo studies. J. Immunol. 204, 2308–2315 (2020).

Wroblewska, A. et al. Protein barcodes enable high-dimensional single-cell CRISPR screens. Cell 175, 1141–1155 (2018).

van Buuren, S. & Groothuis-Oudshoorn, K. Mice: multivariate imputation by chained equations in R. J. Stat. Softw. http://hdl.handle.net/10.18637/jss.v045.i03 (2011).

Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).

Agarwal, V., Bell, G. W., Nam, J. W. & Bartel, D. P. Predicting effective microRNA target sites in mammalian mRNAs. Elife 4, e05005 (2015).

Pagès, H., Aboyoun, P., Gentleman, R. & DebRoy, S. Biostrings: efficient manipulation of biological strings. R package v2.46.0 (2017) https://bioconductor.org/packages/Biostrings

Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005).

Dore, L. C. et al. A GATA-1-regulated microRNA locus essential for erythropoiesis. Proc. Natl Acad. Sci. USA 105, 3333–3338 (2008).

Bönelt, P. et al. Precocious expression of Blimp1 in B cells causes autoimmune disease with increased self‐reactive plasma cells. EMBO J. 38, e100010 (2019).

Danko, C. G. et al. Dynamic evolution of regulatory element ensembles in primate CD4 + T cells. Nat. Ecol. Evol. 2, 537–548 (2018).

Hah, N. et al. Inflammation-sensitive super enhancers form domains of coordinately regulated enhancer RNAs. Proc. Natl Acad. Sci. USA 112, E297–E302 (2015).

Kaikkonen, M. U. et al. Remodeling of the enhancer landscape during macrophage activation is coupled to enhancer transcription. Mol. Cell 51, 310–325 (2013).

Nair, S. J. Phase separation of ligand-activated enhancers licenses cooperative chromosomal enhancer assembly. Nat. Struct. Mol. Biol. 26, 193–203 (2019).

Nelson, V. L. PPARγ is a nexus controlling alternative activation of macrophages via glutamine metabolism. Genes Dev. 32, 1035–1044 (2018).

Wei, C. Repression of the central splicing regulator RBFox2 is functionally linked to pressure overload-induced heart failure. Cell Rep. 10, 1521–1533 (2015).

Zhu, Y. Comprehensive characterization of neutrophil genome topology. Genes Dev. 31, 141–153 (2017).

Mostafavi, S. Parsing the interferon transcriptional network and its disease associations. Cell 164, 564–578 (2016).

Escoubet-Lozach, L. et al. Mechanisms establishing TLR4-responsive activation states of inflammatory response genes. PLoS Genet. 7, e1002401 (2011).

Quinlan, A. R. & Hall, I. M. The BEDTools manual. (2010) https://github.com/arq5x/bedtools2

The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

Lara-Astiaso, D. et al. Chromatin state dynamics during blood formation. Science 345, 943–949 (2014).

Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).

Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S-Plus (Springer, 2002).

Lavin, Y. et al. Tissue-resident macrophage enhancer landscapes are shaped by the local microenvironment. Cell 159, 1312–1326 (2014).

Langmead, B. & Salzberg, S. Bowtie2. Nat. Methods 9, 357–359 (2012).

Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).

Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).

Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).


IMPLEMENTATION

The tool was implemented as a web application written in Ruby on Rails. miRBase data from version 10 to 21 were downloaded (in the form of the officially released txt files), from the miRBase FTP server and reorganized in a SQLite database. Probes’ Annotation Information Files from 40 different detection platforms of nine vendors (Supplementary Tables S1 and S2 and Supplementary Figures S1 and S2) were retrieved and used in the application as reference for the miRNA name to mature sequence correspondence of each single probe (Supplementary Figure S3). Probes annotation files were stored in the SQLite database. miRiadne is fully compliant with HTML5 and using Twitter Bootstrap framework provides a responsive web site for variable desktop sizes and mobile devices.


PROGRAM DESCRIPTION AND METHODS

Overview of miRNet 2.0 framework

The main workflow of miRNet 2.0 is summarized in Figure 1. There are three main steps—data input, network creation and network visual analytics. To maintain a flexible and modular design, we have organized the main functions into 12 modules based on input types. The ‘miRNAs’ module allows users to connect miRNAs with target genes, TFs, ncRNAs etc. the ‘Genes’ and ‘TFs’ modules link the corresponding inputs to their partners within the context of known interactions among miRNAs, genes and TFs the ‘SNPs’ module maps SNPs to the above key players themselves or their binding sites. The remaining modules follow a similar procedure by mapping users’ inputs to their corresponding miRNA associated interaction partners. To start, users must click a circular button from the miRNet homepage to enter the corresponding data upload page. Two general data formats are accepted: a list of miRNAs, SNPs, genes, small molecules etc., or an expression table generated from qPCR, microarray or RNAseq experiments. In the latter case, well-established differential expression analysis will be applied to identify significant miRNAs or genes as new input lists. In the second step, the input lists will be mapped to the underlying knowledgebases to create one or more interaction tables and networks. Many functions are available to allow users to further customize or refine the networks. In the third step, the results are presented as interactive networks for visual exploration. Users can easily search, zoom, highlight or perform functional enrichment analysis on selected regions of interest. In the following sections, we will focus primarily on the new and improved features introduced in version 2.0. Other features can be found in our prior publications ( 9, 10, 21).

Overview of miRNet 2.0 workflow. Users can upload different data types or select queries from built-in databases to start analysis. The input will be mapped to the underlying knowledgebases to create interaction tables and networks. The visualization page allows users to intuitively explore the networks using different layout algorithms as well as to perform topology or functional analysis.

Overview of miRNet 2.0 workflow. Users can upload different data types or select queries from built-in databases to start analysis. The input will be mapped to the underlying knowledgebases to create interaction tables and networks. The visualization page allows users to intuitively explore the networks using different layout algorithms as well as to perform topology or functional analysis.

Knowledgebase update and creation

Knowledgebase for network creation

We have put considerable efforts into keeping miRNet's underlying knowledgebases up to date. miRNet 2.0 can automatically recognize different versions of miRBase IDs, as well as link pre-miRNAs to their mature forms based on the miRBaseConverter R package ( 23). We have updated the miRNA interaction knowledgebase based on the latest releases from major miRNA annotation databases including miRBase ( 24), miRTarBase ( 25), TarBase ( 26), HMDD ( 27) etc. The human tissue-specific miRNA annotations are based on TSmiR ( 28) and IMOTA ( 17) databases, and the human exosomal miRNA annotations are from ExoCarta ( 29). The interactions among miRNAs, TFs and genes are obtained from TransmiR 2.0 ( 30), ENCODE ( 31), JASPAR ( 32) and ChEA ( 33). For miR-SNPs, we have used ADmiRE ( 34), PolymiRTS ( 35) and SNP2TFBS ( 36) to obtain SNP information in miRNA genes, miRNA-binding sites and TF-binding sites. We have also systematically collected the reported xeno-miRNAs together with their putative targeted genes into xeno-miRNet ( 21), which is now integrated in miRNet 2.0. Finally, we have expanded the miRNA-lncRNA interactions to include all other major ncRNAs including circRNA, ceRNA, pseudogene and sncRNA based on starBase ( 37). These data can be downloaded from the miRNet ‘Resources’ page as plain text files.

Knowledgebase for network interpretation

For network analysis, it is important to be able to interpret the interactions in addition to their visualization. Enrichment analysis plays a significant role in this respect. Applying conventional enrichment analyses such as hypergeometric tests on target genes are known to be biased ( 38, 39). In miRNet 1.0, we implemented an algorithm based on empirical sampling for enrichment analysis using GO, KEGG or Reactome pathways ( 38). Another effective approach is to perform enrichment analysis directly at miRNA levels ( 39). To support this type of analysis, we have added six miRNA-set libraries including miRNA–function, miRNA–disease, miRNA–TF, miRNA–cluster, miRNA–family and miRNA–tissue based on TAM 2.0 ( 40). In summary, miRNet 2.0 provides four query types (all genes, highlighted genes, all miRNAs, highlighted miRNAs), two enrichment algorithms (hypergeometric tests and empirical sampling), nine annotation libraries (three gene-set libraries and six miRNA-set libraries), representing the most comprehensive support to understand collective functions of miRNAs. Their potential applications are showcased in recent studies to compare miRNA changes specific to different tissues in pancreatic ductal adenocarcinoma ( 41) and to identify enriched miRNA families in a study comparing genetic variants between Alzheimer's disease and cancers ( 42).

Enabling flexible user input

Significant efforts have been made to provide an intuitive interface that permits the integration of miRNAs into different types of interaction networks. From the homepage, users can enter their queries by: (a) uploading a list of miRNAs, ncRNAs, genes, TFs or SNPs (b) selecting a list from our built-in databases such as diseases, small compounds, epigenetic modifiers etc. (c) uploading a miRNA or gene expression table generated from RT-qPCR, microarray or RNAseq or (d) uploading multiple queries of different input types. Here, we will introduce new features for several common scenarios.

From miRNAs to networks

In miRNet 1.0, miRNA–targets mapping was limited to target genes based on experimentally validated interaction information. However, increasing evidence has shown that miRNAs participate in complex networks through interactions with other functional elements to exert effects on cell biology and human diseases ( 12). For instance, lncRNAs can act as miRNA ‘sponge’ and compete with target mRNAs, thus increasing the expression level of mRNAs ( 43). In version 2.0, users can select one or multiple targets from the ‘Targets’ dropdown list and miRNet will automatically map miRNAs to those selected targets. Users can further include protein-protein interactions (PPI) in the target networks based on several well-established PPI databases ( 44–46).

From TFs to networks

miRNAs and TFs can cooperate to tune gene expression, or mutually regulate each other in feedback loops ( 4, 47). Consequently, we have added a new module to allow users to include TFs into analysis. Users can simply upload their TF list, miRNet will automatically map the TFs to all potential targets (miRNAs and/or genes) and return as TF–miRNA and/or TF–gene interaction tables. The interactions will then be further integrated into networks for visual exploration. With the updated miRNA module and the addition of the TF module, miRNet 2.0 allows users to easily create miRNA-TF coregulatory networks from either a list of miRNAs or a list of TFs of interest.

From SNPs to networks

Mutations in mature miRNAs or their binding sites could significantly change their targeting abilities and dysregulate the expression of many genes simultaneously, whereas variations in primary or precursor miRNAs could alter the expression levels of mature miRNAs by affecting miRNA processing ( 48, 49). In miRNet 2.0, we have added a new module to support the analysis of SNPs within the context of miRNA-target gene interactions. Users can upload a list of SNPs from the SNPs upload page. miRNet currently accepts either rsIDs or genomic coordinates based on the human reference genome build GRCh37. The uploaded lists are then mapped to miRNAs and/or their target genes. Following this step, users can visually explore their data in the network visualization page.

Uploading multiple queries

The Multiple Query Types module complements miRNet's single type analysis modules by permitting the identification of novel connections amongst multiple types of user input. The module currently supports ten input types shown in a dialog when users click the central circular button at the home page. After selecting the input types of interest, users simply copy-and-paste their query lists (miRNAs, genes, TFs, lncRNAs, pseudogenes, circRNAs, sncRNAs) or select from picklists (diseases, small compounds and epigenetic modifiers). The uploaded lists are then mapped to the internal knowledgebases and proceed with the workflow as described in other modules.

Enhancing network visual analytics

Network creation and customization

The default networks are created by searching for direct interaction partners in the interaction knowledgebases. These are generally known as first-order interaction networks. When there is a large number of queries (seeds), it is reasonable to focus only on the interactions among those seeds (i.e. zero-order networks). However, many seeds could become orphan nodes when switching directly to zero-order networks. A ‘gentle’ approach is to extract, from the first-order network, a minimal subnetwork that maximally connects those seeds. In miRNet 2.0, we have added the support for computing minimum subnetworks based on the prize-collecting Steiner Forest (PCSF) algorithm ( 50), as well as several other empirical refining methods (available under ‘Network Tools’) based on shortest paths, batch filtering, node degree or betweenness values. The results can be downloaded as pair-wise interaction tables or graph files.

Network visualization and layout

miRNet 2.0 provides a wide array of options to help improve visual exploration of miRNA-centric interaction networks. During the network creation stage, users can refine the network by applying different filters on interaction tables or networks. At the network visualization page, users can specify node styles based on their types, reduce node overlap, or perform edge bundling etc. The resulting network can be further improved using different layout algorithms. Over ten network layout algorithms have been implemented, including Force-Atlas, Fruchterman-Reingold, Circular, Graphopt, Large Graph, Random, Circular Bipartite/Tripartite, Linear Bipartite/Tripartite, Concentric and Backbone. The latter four algorithms are designed for complex networks consisting of multiple node types (miRNAs, genes, TFs etc.). The bipartite/tripartite layout provides a straightforward abstraction of the relationships between different types of molecular entities by emphasizing the data type of each node ( 51). When there are multiple node types, we recommend visualizing the network in either circular bipartite/tripartite (Figure 2A) or linear bipartite/tripartite layout (Figure 2B) followed by applying the ‘reduce node overlap’ algorithm. To enable better understanding of a particular key node, we have added the Concentric layout ( 52). This layout arranges nodes in concentric circles around a node of interest (i.e. the focal node) in the middle (Figure 2C). The order of the circles represents the degree level of their interactions. By arranging nodes in this fashion, it enables a better understanding of how the focal node relates to the rest of the graph. By default, the focal node is the node with the highest degree value. Users can manually specify the key node by selecting it in the Node Explorer table or by double clicking on it in the network. Another new addition is the Backbone layout which is very effective in revealing hidden patterns in medium and large networks. The algorithm calculates layout after applying sparsification on the network by only including the most embedded edges ( 53). This process helps uncover hidden modules based on edge density by putting more emphasis on the structure of graph layout (Figure 2D).

Screenshots of the Network Visualization page showing the main features and several network layouts. (A) A typical view of the page. The central panel shows a network in Circular-tripartite layout, and the surrounding panels provide functions for network analysis and customization. For instance, users can perform enrichment analysis or module analysis on this network. An extracted network module was displayed at bottom right. (B) Linear-tripartite layout. (C) Concentric layout with edge bundling. (D) Backbone layout with several modules highlighted in different colors. More details of each layout are described in the main text.

Screenshots of the Network Visualization page showing the main features and several network layouts. (A) A typical view of the page. The central panel shows a network in Circular-tripartite layout, and the surrounding panels provide functions for network analysis and customization. For instance, users can perform enrichment analysis or module analysis on this network. An extracted network module was displayed at bottom right. (B) Linear-tripartite layout. (C) Concentric layout with edge bundling. (D) Backbone layout with several modules highlighted in different colors. More details of each layout are described in the main text.

Improving transparency/reproducibility and web APIs

Except for the interactive visualization step, which is executed on users’ browsers, all other data analysis steps including mapping, filtering, network creation and customization are performed by the corresponding R functions on our cloud server. To enable more transparent data analysis, we have released the underlying R package (https://github.com/xia-lab/miRNetR), and added a ‘Download’ page in the web application to allow users to download the R command history and results tables generated during their analysis sessions. The R history contains all function calls with user-selected parameters. We hope that the R package together with the R command history will allow users to track each step of their analysis in a form (R script) that can be easily shared and reproduced, complementing the web-based platform. We have also implemented RESTful APIs to allow tool developers to submit their query lists programmatically as external requests. While offering open access to miRNet 2.0 resources, APIs give a level of abstraction and hide complexity from programmers. The currently available APIs are shown in Table 1. More APIs will be added based on users’ feedback.

List of APIs and programmatic access endpoints on the miRNet server. The API base for miRNet 2.0 is http://api.mirnet.ca, which can be visited to view a detailed documentation

Endpoint . HTTP method . Input . Description .
base/table/mirPOST Organism, miRNA ID type, target type, miRNA list Get experimentally validated table results of the miRNA-target interactions (forward mapping)
base/table/genePOST Organism, gene ID type, gene list Get experimentally validated table results of the miRNA-gene (mRNA, TF, lncRNA) interactions (reverse mapping)
base/function/mirPOST Organism, miRNA ID type, target type, miRNA list, algorithm, database Get functional enrichment results
base/function/genePOST Organism, gene ID type, gene list, algorithm, database Get functional enrichment results
base/graph/mirPOST Organism, miRNA ID type, target type, miRNA list Get graph of miRNA–target interactions (json format)
base/graph/genePOST Organism, gene ID type, gene list Get graph of miRNA–target interactions (json format)
Endpoint . HTTP method . Input . Description .
base/table/mirPOST Organism, miRNA ID type, target type, miRNA list Get experimentally validated table results of the miRNA-target interactions (forward mapping)
base/table/genePOST Organism, gene ID type, gene list Get experimentally validated table results of the miRNA-gene (mRNA, TF, lncRNA) interactions (reverse mapping)
base/function/mirPOST Organism, miRNA ID type, target type, miRNA list, algorithm, database Get functional enrichment results
base/function/genePOST Organism, gene ID type, gene list, algorithm, database Get functional enrichment results
base/graph/mirPOST Organism, miRNA ID type, target type, miRNA list Get graph of miRNA–target interactions (json format)
base/graph/genePOST Organism, gene ID type, gene list Get graph of miRNA–target interactions (json format)

List of APIs and programmatic access endpoints on the miRNet server. The API base for miRNet 2.0 is http://api.mirnet.ca, which can be visited to view a detailed documentation

Endpoint . HTTP method . Input . Description .
base/table/mirPOST Organism, miRNA ID type, target type, miRNA list Get experimentally validated table results of the miRNA-target interactions (forward mapping)
base/table/genePOST Organism, gene ID type, gene list Get experimentally validated table results of the miRNA-gene (mRNA, TF, lncRNA) interactions (reverse mapping)
base/function/mirPOST Organism, miRNA ID type, target type, miRNA list, algorithm, database Get functional enrichment results
base/function/genePOST Organism, gene ID type, gene list, algorithm, database Get functional enrichment results
base/graph/mirPOST Organism, miRNA ID type, target type, miRNA list Get graph of miRNA–target interactions (json format)
base/graph/genePOST Organism, gene ID type, gene list Get graph of miRNA–target interactions (json format)
Endpoint . HTTP method . Input . Description .
base/table/mirPOST Organism, miRNA ID type, target type, miRNA list Get experimentally validated table results of the miRNA-target interactions (forward mapping)
base/table/genePOST Organism, gene ID type, gene list Get experimentally validated table results of the miRNA-gene (mRNA, TF, lncRNA) interactions (reverse mapping)
base/function/mirPOST Organism, miRNA ID type, target type, miRNA list, algorithm, database Get functional enrichment results
base/function/genePOST Organism, gene ID type, gene list, algorithm, database Get functional enrichment results
base/graph/mirPOST Organism, miRNA ID type, target type, miRNA list Get graph of miRNA–target interactions (json format)
base/graph/genePOST Organism, gene ID type, gene list Get graph of miRNA–target interactions (json format)

Batch convert miRNA names to Accession IDs? - Biology

The underlying SNPnexus database is kept synchronised with the UCSC human genome annotation database. However, data for some annotation categories comes from different sources.

Category hg18 hg19 hg38
Known SNP information Ensembl Variation 54dbSNP 129 Ensembl Variation 74dbSNP 138 Ensembl Variation 90dbSNP 150
Gene Definition RefSeq UCSC hg18 UCSC hg19 UCSC hg38
Ensembl Ensembl 54 Ensembl 74 Ensembl 90
UCSC UCSC hg18 UCSC hg19 UCSC hg38
CCDS UCSC hg18 UCSC hg19 UCSC hg38
Vega UCSC hg18 UCSC hg19
Acembly UCSC hg18 UCSC hg19
H-inv UCSC hg19
Population data HapMap UCSC hg18 UCSC hg19
1000 Genomes IGSR GRCh38 (liftover) IGSR GRCh38
ExAC ExAC r1
Protein Effect SIFT SIFT Human DB (release 63) Ensembl Variation 90
PolyPhen PolyPhen-2 (Ensembl Variation 63) Ensembl Variation 90
Regulatory Elements FirstEF UCSC hg18
Transcription Factor Binding Sites UCSC hg18 UCSC hg19
Enhancers UCSC hg18 UCSC hg19
CpG Islands UCSC hg18 UCSC hg19 UCSC hg38
Other micro and small RNAs UCSC hg18 UCSC hg19 UCSC hg38
miRBASE release 20 (liftover) release 20 release 21
miRNA Target Sites TargetScan: UCSC hg18 TargetScan: UCSC hg19 TarBase : Ensembl Regulation 90
ENCODE regions Ensembl Regulation 74 Ensembl Regulation 90
Roadmap Epigenomics Ensembl Regulation 74 Ensembl Regulation 90
Ensembl Regulatory Build Ensembl Regulation 74 Ensembl Regulation 90
Phenotype/Disease Association GAD UCSC hg18 GAD update Oct 2011
COSMIC version 68 version 68 version 82
GWAS UCSC hg18 UCSC hg19 UCSC hg38
ClinVar UCSC hg38
Conserved Elements PhastConsElements UCSC hg18 UCSC hg19 UCSC hg38
GERP++ GERP update late 2010 GERP update late 2010
Structural Variations DGV Build 36 DGV GRCh 37 DGV GRCh 38
Neo-epitope prediction MuPeXI v1.1
MHCflurry v0.9.2
NetTepi v1.0
Non-coding variation scoring CADD v1.3
fitCons v1.01
EIGEN v1.0
FATHMM v2.3
GWAVA v1.0
DeepSEA v0.94
FunSeq2 v2.1.6
ReMM v0.3.0

SNPnexus currently accepts query input data in three different forms (genomic position, chromosomal region or dbSNP id) and two different human genome assemblies. Users can annotate a single SNP, insertion/deletion (InDel) or block substitution by selecting one of the input formats and supplying the required data into the graphical interface. It also allows users to run batch queries by uploading the appropriately formatted input file or pasting the queries into the interface. The formats are explained in more details below.

Users can annotate a newly discovered variant by providing the following data into the interface: type (Chromosome/Contig/Clone), name, relative position, reference nucleotide/s (Allele1), observed nucleotide/s (Allele2), positive (1) or negative (-1) strand. One-based coordinate system is used to describe genomic position. Multi-allelic variations are supported where users can provide "/" separated alleles in the Allele2 field. Here are few examples on hg18 assembly:

Type Id Position Alelle1 Allele2 Strand
Chromosome 1 100002626 A T 1
Contig NT_023736 2025395 C G/A/T 1
Clone AC105270 154799 A T 1

Insertions and Deletions (InDels) and Block Substitutions. The tool has been modified to support insertions or deletions by using - as the placeholder. Users need to insert Allele1=- to indicate Allele2 insertion in the corresponding genomic position. Similarly, Allele2=- can be used to denote deletion of Allele1 from the given genomic position. Similar to single nucleotide substitution, the tool also supports block substitution when the user provides Allele1 and Allele2 data of same or different length. Here are few examples for insertion and deletion on hg19 assembly:

Type Id Position Alelle1 Allele2 Strand #Comment
Chromosome 3 9798773 C - 1 # 1-nucleotide deletion
Chromosome 3 9798773 CCC - 1 # 3-nucleotide deletion
Chromosome 3 9798773 - G 1 # 1-nucleotide insertion
Chromosome 3 9798773 - GTC 1 # 3-nucleotide insertion
Chromosome 3 9798773 CCCG GT 1 # block substitution

Note that, the tool supports multiple nucleotides in place of Allele1 and Allele2. However, for practical reasons, users are not encouraged to provide very large blocks that can possibly positioned over more than one adjacent functional regions, i.e., adjacent intronic and exonic region, in which case the predicted functionality of the SNP provided by our tool will be based on the first functional region.

IUPAC code submission. Finally, users can annotate reference and observed nucleotides complying with IUPAC nucleotide nomenclature to denote ambiguous nucleotides in certain position following the translation table shown below:

IUPAC CodeMeaning
GG
A A
T T
C C
R G or A
Y T or C
M A or C
K G or T
S G or C
W A or T
H A or C or T
B G or T or C
V G or C or A
D G or A or T
N G or A or T or C

Type Id Position Alelle1 Allele2 Strand #Comment
Chromosome 1 100002626 A S 1 # G or C substitution with A
Chromosome 3 9798773 - R 1 # G or A insertion

Users can query for known SNPs in a given chromosomal region by providing the following data: Chromosome, start position, end position. The tool will identify and annotate all the known SNPs defined in the selected region. Here are few examples on hg18 assembly:

Chromosome Start End
3 9798000 9799000
1 100000000 100050000

Currently we limit users to query for known SNPs in the genomic region of maximum size 1 Mb.

Users can also query for known SNPs by providing the corresponding dbSNP rs identifiers. Here are few examples of dbSNP rs#:

dbSNP rs#
rs293794
rs1052133
rs3136820
rs2272615
rs2953993
rs1799782
rs25487
rs2248690
rs4918
rs1071592

Note that, depending on the genome assembly, the functional annotation for a given SNP can be quite different. Users are therefore requested to take caution regarding the choice of genome assembly.

SNPnexus allows users to submit batch query when dealing with large numbers of variations. Users can either paste the variants list directly into the designed text space or upload a file containing the queries. Currently we limit the maximum number of variants in a single batch query to 100,000. We only allow batch query using genomic position and/or dbsnp rs# formats. No chromosomal region query data is allowed. Each variant must be on a new line with tab-delimited data in one of the following formats:

< Type Name Position Allele1 Allele2 Strand > # Genomic position data for novel SNPs
< "dbsnp" rs# > # dbSNP rs number for known SNPs

Example of a batch query is shown below, which one can paste directly into the textarea provided in the interface:

Chromosome 1 100002626 A T 1
Contig NT_023736 2025395 A T 1
Clone AC105270 154799 A T 1
dbsnp rs293794
dbsnp rs1052133

Alternatively, users can upload batch query files (.txt) like this example. Note that, known SNPs must be preceded by keyword "dbsnp" to be recognized as dbSNP rs#.

Variant Call Format (VCF) is a flexible and extendable standard format for variation data. SNPnexus allows users to upload VCF files (.vcf), containing SNPs,InDels and Block substitutions, directly onto the server. An example input VCF file is shown below:

##fileformat=VCFv4.1
##fileDate=20121001
#CHROMPOSIDREFALTQUALFILTERINFO
chr39798773rs1052133CG...
chr1114377568.AG,T...
chr39791667.AGA-...
chr1650763779.-C...
chr201230237.T....
chr201234567.GTCG...
chr201234568.TTA...

This example shows in order a simple SNP, a variant at which two alternate alleles are called, a deletion of 3 bases (AGA), an insertion of one base (C), a monomorphic reference with no alternate alleles which will eventually be ignored by SNPnexus, a deletion of 2 bases (TC), and an insertion of one base (A).

A VCF file should contain 8 fixed, mandatory columns as shown by third header lines in the example. SNPnexus only uses genomic positions (CHROM,POS fields) and allele information (REF, ALT fields) from the input the other information contained in the input file will be ignored and have no effect on the SNPnexus annotated outcome. Like the standard SNPnexus input format, the NULL values for insertion and deletion can be presented by '-'. The missing values in the VCF file is presented by '.'. SNPnexus will ignore the input line if missing values occur in any of the CHROM,POS,REF and ALT fields. Please consult here to know detail about the format.

The table containing genomic annotations has following columns:

SNP: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Chromosome: Variant mapped chromosome location
chromPosition: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele: Observed allele
Contig: Variant mapped contig location
contigPosition: Variant start position on contig
Band: SNP cytogenetic location
dbSNP: link to dbSNP, if known

The table containing information on overlapped or nearest genes has following columns:

SNP: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
ID: SNP presented in genomic position format
Chromosome: Variant mapped chromosome location
chromPosition: Variant start position on chromosome
Overlapped Gene: Name of the gene (HGNC system) to which the variant is overlapped
Type: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Annotation: Summary of whether the variant overlapped with the coding, intronic or untranslated regions of the various transcript isoforms of the gene, as annotated from Ensembl gene system.
Nearest Upstream Gene: If variant is not overlapped with any gene, then the gene whose end position is nearest to the variant on the left (considering the alignment of genes on the positive strand as left-to-right)
Type of Nearest Upstream Gene: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Distance to Nearest Upstream Gene: distance from the end position of the nearest upstream gene.
Nearest Downstream Gene: If variant is not overlapped with any gene, then the gene whose start position is nearest to the variant on the right (considering the alignment of genes on the positive strand as left-to-right)
Type of Nearest Downstream Gene: Gene type, e.g., protein coding, miRNA, non coding, Pseudogene, snoRNA, lincRNA etc.
Distance to Nearest Downstream Gene: distance from the start position of the nearest downstream gene.

The result table containing gene/protein consequences on a particular gene annotation system may have following columns:

SNP: <dbsnp rs#> or <chromosome/contig/clone id,":",position,":","allele",":",strand>
Allele: Examined alleles <reference allele,"|", observed allele(s) >. For Insertion, reference allele is "-". For other cases, reference allele is the allele found in reference genome sequence. Observed allele(s) can be multi-allelic separated by "|" depending on the input Allele2. If input Allele1 does not match with reference allele, then Allele1 becomes the first observed allele.
Strand: On which strand the variant is observed (1 or -1)
Symbol: Gene symbol
Gene: Gene name in the corresponding annotation system
Transcript: Transcript name in the corresponding annotation system
Entrez gene: Entrez gene id
Predicted function: Predicted function of the SNP/InDel/block substitution based on its location on the transcript. The result is based on the first nucleotide position of the variation. Possible categories: coding, intronic, intronic (splice_site), 5utr, 3utr, 5upstream, 3downstream, non-coding, non-coding intronic, non-coding intronic (splice_site). More detailed information on the predicted function is available on the "Note" column.
cdna_pos: SNP position on cdna, if the predicted function is coding, 3'UTR or 5'UTR
cds_pos: SNP position on cds, if the predicted function is coding
aa_pos: Position of the first amino acid (possibly) effected in the resultant peptide chain, if the predicted function is coding
aa_change: Peptide <reference amino acid(s),">", observed amino acid(s)_1 [,"|", observed amino acid(s)_2, . ] >
Detail (previously Note column): Detailed functional type for the variation. If the variation occurs over a single coding exon of a transcript, the type of the consequences on the corresponding protein is given. Possible values: syn (synonymous), nonsyn (non-synonymous) [stop-gain or stop-loss], frameshift [stop-gain or stop-loss], pepshift (peptide shift, block substitution). Preceded by "*", if the reference protein is found incomplete (missing stop-codon).
However, if the variation occurs over more than one functional regions on the transcript, the corresponding regions are given separated by "-".
splice_dist: Distance to splice junction, if the predicted function is intronic
proteins: reference and observed peptide sequences separated by "|", if the predicted function is coding. Available only in the downloadable text and excel files.

The SIFT result table containing the predicted effect on protein has following columns:

SNP: SNP name
Allele: <reference allele,"|",observed allele>
Transcript: Transcript name in the Ensembl gene annotation system
Protein: Protein name in the Ensembl gene annotation system
aa_pos: Position of the amino acid affected in the resultant peptide chain
wild_aa: Reference amino acid
mutant_aa: Observed amino acid
Score: SIFT prediction score for non-synonymous substitution of reference amino acid with observed amino acid. Possible real values: 0 to 1.
Prediction: SIFT predicted effect on protein based on the score. Possible values: DAMAGING (score <= 0.5), TOLERATED (score > 0.5)
Confidence: Degree of reliability about the prediction. Possible values: HIGH, LOW

The PolyPhen result table containing the predicted effect on protein has following columns:

SNP: SNP name
Allele: <reference allele,"|",observed allele>
Transcript: Transcript name in the Ensembl gene annotation system
Protein: Protein name in the Ensembl gene annotation system
aa_pos: Position of the amino acid affected in the resultant peptide chain
wild_aa: Reference amino acid
mutant_aa: Observed amino acid
Score: PolyPhen prediction score for non-synonymous substitution of reference amino acid with observed amino acid. Possible real values: 0 to 1.
Prediction: PolyPhen predicted effect on protein based on the score. Possible values: PROBABLY DAMAGING, POSSIBLY DAMAGING, BENIGN, UNKNOWN

The result table containing the specific Hapmap population data has following columns:

SNP: SNP name
Genotype(1/2/3): Observed Genotype
Count: Number of observed samples with the genotype
Frequency: Percentage of observed samples with the genotype
Allele(1/2): Observed allele
Count: Number of observed samples with the allele
Frequency: Percentage of observed samples with the allele

The result table containing the specific 1000 Genomes Super Population data has following columns:

SNP: SNP name
Chrom: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele: Observed allele
Frequency: Percentage of observed samples with the allele

The result table containing the specific Exome Aggregation Consortium (ExAC) Population data has following columns:

SNP: SNP name
Chrom: Variant mapped chromosome location
Position: Variant start position on chromosome
REF Allele: Reference allele
ALT Allele: Alternate allele
Allele Count: Total number of called genotypes
ALT Allele Count: Alternate allele count in genotypes
Frequency: Percentage of alternate allele in genotypes

The Transcription Factor Binding Sites (TFBS) result table has following columns:

SNP: SNP name
TFBS_id: TFBS id
Chromosome: Chromosome name
chromStart: Start position of the TFBS site in the chromosome
chromEnd: End position of the TFBS site in the chromosome
TFBS_Accession: TFBS accession number. Note that, browsing the link provided in the html and excel file requires free registration with TRANSFAC website.
TFBS_Species: Transcription factor species
TFBS_name: Transcription factor name
SwissProt_Accession: SwissProt accession number

The First exon and promoter prediction result table has following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the prediction in the chromosome
chromEnd: End position of the prediction in the chromosome
FirstEF_Name: Name of the item containing the type of prediction (exon, promoter, CpG window)
Probability: Prediction score. Possible values: 0 to 1000
Strand: + or -

The miRBASE result table has following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the microRNA in the chromosome
chromEnd: End position of the microRNA in the chromosome
Name: microRNA name
Accession: miRBASE accession number
Strand: + or -
Type / Description: miRNA type. Possible values: mature miRNA, miRNA_primary_transcript

The Vista Enhancer prediction result table has following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the Vista element in the chromosome
chromEnd: End position of the Vista element in the chromosome
Vista_Item: Name of the Vista element
Score: Prediction score. Possible values: 900 (Positive-enhancer), 200 (Negative-enhancer)

The CpG Island prediction result table has following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the CpG island in the chromosome
chromEnd: End position of the CpG island in the chromosome
CpG_Island: Name of the CpG Island
Length: Island Length
Cpg%: Percentage of island that is CpG
C/G%: Percentage of island that is C or G
Ratio: Ratio of observed to expected CpG in island

The TargetScan miRNA regulatory sites result table has following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the site in the chromosome
chromEnd: End position of the site in the chromosome
Item_Name: Name of the predicted target site
Score: Prediction scores by TargetScanS. Possible values: 0 to 1000
Strand: + or -

The TargetBase miRNA target sites result table has following columns:

SNP: SNP name
miRNA target site: position of the predicted target site
Strand: + or -
miRNA: miRNA targeting the site

The miRNAs/snoRNAs/scaRNAs result table has following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position in the chromosome
chromEnd: End position in the chromosome
Name: Name of the miRNA/snoRNA/scaRNa
Score: Prediction scores. Possible values: 0 to 1000
Strand: + or -
Type: Type of RNA

The ENCODE and Roadmap Epigenomics result tables has following columns:

SNP: SNP name
Chromosome: Chromosome name
Region Start: Start position of the regulatory region
Region end: End position of the regulatory region
Feature Type Class: Regulatory feature class
Feature Type: Regulatory feature name
Epigenome: Epigenome or cell name

The Ensembl Regulatory Build result table has following columns:

SNP: SNP name
Chromosome: Chromosome name
Region Start: Start position of the Roadmap Epigenomics region
Region end: End position of the Roadmap Epigenomics region
Feature Type Class: Regulatory feature class
Epigenome: Epigenome or cell name
Evidence/Activity: Whether its projected or not (hg19) state of activity (hg38)

The Vertebrate Alignment and Conservation result table contains the following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the aligned element in the chromosome
chromEnd: End position of the aligned element in the chromosome
Id: Name of the aligned element
Probability Score: Estimated probability score for conservation as determined from PHAST package. Possible values: 0 to 1000

The Genomic Evolutionary Rate Profiling (GERP++) result table contains the following information:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the aligned element in the chromosome
chromEnd: End position of the aligned element in the chromosome
RS_Score: Rejected Substitutions score for the conserved element as determined from GERP++ package.

The Genetic Association Database (GAD) result table contains the following columns:

SNP: SNP name
GAD Id: GAD id
Association: Confirmed association
Phenotype: Phenotype description
Disease_Class: Type of disease
Gene: Gene name
Reference: Reference of publication of the study
Pubmed: Pubmed id of publication of the study
SNP reported: Whether the known SNP is directly reported in the study. Possible values: Y(yes), N(no)
Associated SNPs: SNPs associated with the disease as reported in the study
Population: Sample population
Entrez gene: Entrez gene id

The COSMIC result table contains the following columns:

SNP: SNP name
Mutation Id: Cosmic mutation id
Sample: Cosmic sample id
Site: Primary Effected site
Histology: Primary Histology
Histology Subtype : Subtype of primary histology
Symbol: Gene symbol
Pubmed: Pubmed id of publication of the study

The GWAS catalogue result table contains the following columns:

SNP: SNP name
Catalogue Id: ID of SNP associated with trait
Region: Chromosome band/region of SNP
Genes: Reported Gene(s)
Allele_frequency: Risk Allele Frequency
Trait: Disease or trait assessed in study
Population: Initial sample population for the study
Platform: Platform and [SNPs passing Quality Control]
Pubmed: Pubmed id of publication of the study

The ClinVar result table contains the following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the variant
chromEnd: End position of the variant
Variation: Reference to Observed Allele
Type: Type of Variant
Clinical Significance: Whether identified as Pathogenic or Benign or uncertain
Phenotypes: List of phenotypes associated with the variant

Each of the structural variations result table contains the following columns:

SNP: SNP name
Chromosome: Chromosome name
chromStart: Start position of the structural variation in the chromosome
chromEnd: End position of the structural variation in the chromosome
Reference: Literature reference for the study that included this variant
Pubmed: Pubmed id of publication of the study
Method: Brief description of method/platform
Sample: Description of sample population for the study

The MuPeXI result table contains the following columns:

SNP: SNP name
HLA allele : HLA allele name
Mutant peptide: The extracted mutant peptide.
Normal peptide: The extracted wild type peptide.
Amino acid change: <reference amino acid(s),"/", mutated amino acid(s)>
Gene: Gene symbol
Mutant Affinity : Predicted binding affinity of mutant peptide in nanoMolar units (provided by NetMHCpan).
Normal Affinity: Predicted binding affinity of reference peptide in nanoMolar units (provided by NetMHCpan).
Priority score: MuPeXI Calculated prioritization dependent on HLA binding affinity of mutant and normal peptides, gene expression, and allele frequency.

The NetTepi result table contains the following columns:

SNP: SNP name
HLA allele : HLA allele name
Mutant peptide: The extracted mutant peptide.
Normal peptide: The extracted wild type peptide.
Amino acid change: <reference amino acid(s),"/", mutated amino acid(s)>
Gene: Gene symbol
Mutant Combined score: Combined prediction score for mutant peptide
Normal Combined score: Combined prediction score for reference peptide

The MHCFlurry result table contains the following columns:

SNP: SNP name
HLA allele : HLA allele name
Mutant peptide: The extracted mutant peptide.
Normal peptide: The extracted wild type peptide.
Amino acid change: <reference amino acid(s),"/", mutated amino acid(s)>
Gene: Gene symbol
Mutant Affinity : Predicted affinity of mutant peptide in nanoMolar units.
Normal Affinity: Predicted affinity of reference peptide in nanoMolar units.

The CADD result table contains the following columns:

SNP: SNP name
ID: Query SNP presented in genomic position format
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
Raw Score: "Raw" unaltered CADD-score for the variation. It has relative meaning, with higher values indicating that a variant is more likely to be simulated (or "not observed") and therefore more likely to have deleterious effects.
PHRED: PHRED-like (-10*log10(rank/total)) scaled CADD-score ranking a variant relative to all possible substitutions of the human genome. A score&ge10 indicates that it is predicted to be in the 10% most deleterious substitutions that you can do to the human genome, a score&ge20 indicates the 1% most deleterious and so on.

The FitCons result table contains the following columns:

SNP: SNP name
ID: Query SNP presented in genomic position format
Chromosome: Chromosome name
Region Start: Start position of the non-coding region
Region End: End position of the non-coding region
Fitness Score: In the range [0-1]. Relative indicator of the potential for interesting genomic function, with higher scores indicating more potential. The range .05 to .35 may be most appealing as nearly all non-coding classes have scores in this range, while nearly all coding classes have scores>.40
P-val: P-val indicating the statistical significance of the Fitness Score.

The EIGEN result table contains the following columns:

SNP: SNP name
ID: Query SNP presented in genomic position format
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
Score: Aggregate functional score for variants of interest (Eigen Score). With genome-wide median score of

0, higher score indicates more likelihood of the variant to be functional.
PC Score: An alternative score which is more sensitive than Eigen score, particularly useful for the noncoding variants. With genome-wide median score of

0, higher score indicates more likelihood of the variant to be functional.

The FATHMM result table contains the following columns:

SNP: SNP name
ID: Query SNP presented in genomic position format
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
Non-coding Score: Given as p-values in the range [0, 1]. Scores above 0.5 are predicted to be deleterious, while those below 0.5 are predicted to be neutral or benign. Scores close to the extremes (0 or 1) are the highest-confidence predictions that yield the highest accuracy.
Non-coding Group: Annotation features used for the prediction score. Maximum 4 features are used labelled between A and D. See publication for more details.
Coding Score: Same as non-coding score.
Coding Group: Annotation features used for the prediction score. Maximum 10 features are used labelled between A and J. See publication for more details.

The GWAVA result table contains the following columns:

SNP: SNP name
ID: Query SNP presented in genomic position format
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Known SNP: Known SNP description as reported in the tool's genome-wide score
Region Score, TSS Score, Unmatched Score: prediction scores from 3 different versions of the classifier, which are all in the range [0-1] with higher scores indicating variants predicted as more likely to be functional. See publication for more details.

The DeepSEA result table contains the following columns:

SNP: SNP name
ID: Query SNP presented in genomic position format
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score
eQTL Probability: The probability of the variant being a eQTL variant given by functional variant prioritization classifier.
GWAS Probability: The probability of the variant being a trait-associated (GWAS) variant given by functional variant prioritization classifier.
HGMD Probability: The probability of the variant being a inherited disease-associated (HGMD) variant given by functional variant prioritization classifier.
Functional Significance Score: A measure in the range [0-1] depicting the significance of magnitude of predicted chromatin effect and evolutionary conservation. Lower score indicates higher likelihood of functional significance of the variant.

The funSeq2 result table contains the following columns:

SNP: SNP name
ID: Query SNP presented in genomic position format
Chromosome: Chromosome name
Position: Variant start position in the chromosome
Variant: <reference allele,"/",observed allele> as reported in the tool's genome-wide score. "." as observed allele indicates any other nucleotide other than reference allele.
Non-coding Score: Given as p-values in the range [0, 1], with higher scores indicating variants predicted as more likely to be functional.

The ReMM result table contains the following columns:

SNP: SNP name
ID: Query SNP presented in genomic position format
Chromosome: Chromosome name
Position: Variant start position in the chromosome
ReMM Score: Potential of the chromosome position in the non-coding region to cause a Mendelian disease if mutated. Given as p-values in the range [0, 1], with higher scores indicating variants predicted as more likely to be deleterious.


GET request¶

Query parameters¶

Fields¶

Species¶

The combination of “size” and “from” parameters can be used to get paging for large query:

Fetch_all¶

Scroll_id¶

Facets¶

Facet_size¶

Species_facet_filter¶

Entrezonly¶

Ensemblonly¶

Callback¶

Dotfield¶

Filter¶

Limit¶

Email¶

Query syntax¶

Examples of query parameter “q”:

Simple queries¶

Fielded queries¶

Available fields¶

This table lists some commonly used fields can be used for “fielded queries”. Check here for the complete list of available fields.

Genome interval query¶

When we detect your query (”q” parameter) contains a genome interval pattern like this one:

we will do the genome interval query for you. Besides above interval string, you also need to specify “species” parameter (with the default as human). These are all acceptted queries:

As you can see above, the genomic locations can include commas in it.

Wildcard queries¶

Wildcard character “*” or “?” is supported in either simple queries or fielded queries:

Wildcard character can not be the first character. It will be ignored.

Boolean operators and grouping¶

You can use AND/OR/NOT boolean operators and grouping to form complicated queries:

Returned object¶

Faceted queries¶

If you need to perform a faceted query, you can pass an optional “facets” parameter. For example, if you want to get the facets on species, you can pass “facets=taxid”:

Another useful field to get facets on is “type_of_gene”:

If you need to, you can also pass multiple fields as comma-separated list:

Particularly relevant to species facets (i.e., “facets=taxid”), you can pass a “species_facet_filter” parameter to filter the returned hits on a given species, without changing the scope of the facets (i.e. facet counts will not change). This is useful when you need to get the subset of the hits for a given species after the initial faceted query on species.

You can see the different “hits” are returned in the following queries, while “facets” keeps the same:

Scrolling queries¶

If you want to return ALL results of a very large query (>10,000 results), sometimes the paging method described above can take too long. In these cases, you can use a scrolling query. This is a two-step process that turns off database sorting to allow very fast retrieval of all query results. To begin a scrolling query, you first call the query endpoint as you normally would, but with an extra parameter fetch_all = TRUE. For example, a GET request to:

Returns the following object:

At this point, the first 1000 hits have been returned (of

14,000 total), and a scroll has been set up for your query. To get the next batch of 1000 unordered results, simply execute a GET request to the following address, supplying the _scroll_id from the first step into the scroll_id parameter in the second step:

Your scroll will remain active for 1 minute from the last time you requested results from it. If your scroll expires before you get the last batch of results, you must re-request the scroll_id by setting fetch_all = TRUE as in step 1.


Gene-based SNP lookup

It is possible to dump all SNPs in a gene with the command

Plink --lookup-gene DISC1

which does two things: writes some gene-centric informationto the LOG file, and lists all the SNPs that feature on common WGAS platforms to the file By default, SNPs within 20kb upstream and downstream of the gene are recorded. To change this, add the command or for example.

In the information written to the LOG file, there is a strong bias towards neuropsychiatrically-relevant information, reflecting the research interests of the creator. For example, the output for DISC1 is: (note: there are a few relatively redundant or uninformative fields currently that will be removed in future releases) It is possible to supply a list of genes to lookup, with the command


Watch the video: NCBI Minute: Bulk Conversion of GI Identifiers to (August 2022).