Chemical structure prediction

Chemical structure prediction

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I'd like to do chemical structure prediction using a known molecule formulas. I'm familiar with de novo protein structure prediction, but are there any programs which will go from formula to structure with optimal geometry of random small chemicals (say 30 atoms)? I found but it selects mol files from a limited database. I need a command line UNIX program which computes mols from formulas.

Concise structural formulas popular during past ages are unable to describe every type of chemical structure like morphine. I found no software that supports csf (Smiles and inchi are popular but are encoded and not human readable). When the bonds of the structure are known geometric minimization (DG-AMMOS) orients the structure to a stable native like direction.

Did you try to use Bioclipse:

Bioclipse is mainly based on the Chemistry Development Kit (CDK), and contains a framework for managing and analyzing chemical compounds. Bioclipse supports editing in 2D, processing large collections of molecules in tables, calculation of various types of properties, and much more cheminformatics functionality. The Jmol application is integrated in Bioclipse as an editor, and provides advanced interactive 3D visualizations.

For the reference, there is also USPEX, which works for a wide variety of molecular and continuous structures (mostly inorganic, though).

USPEX can be used to predict stable crystal structures at given P-T conditions, knowing only the chemical composition (or to predict both the stable compositions and structures, given the element types). USPEX can also be used for finding low-energy metastable phases, as well as stable structures of nanoparticles, surface reconstructions, molecular packings in organic crystals, and for searching for materials with desired physical (mechanical, electronic) properties.

Predicting drug side-effects by chemical systems biology

New approaches to predicting ligand similarity and protein interactions can explain unexpected observations of drug inefficacy or side-effects.

Drug-related adverse events affect approximately 2 million patients in the United States each year, resulting in about 100,000 deaths [1]. For example, highly publicized cases of severe adverse reactions recently resulted in a US Food and Drug Administration advisory panel suggesting that the popular pain relievers Percocet and Vicodin be banned [2]. Some adverse events are predictable consequences of the known mechanism of a drug, but others are not predicted and seem to result from 'off-target' pathways.

When developing novel chemical entities (NCEs) for a therapeutic application, knowledge of binding partners and affected biological pathways is useful for predicting both efficacy and side-effects. Traditional drug design has relied heavily on the one drug-one target paradigm [3], but this may overlook system-wide effects that cause the drug to be unsuccessful. Adverse side-effects and lack of efficacy are the two most important reasons a drug will fail clinical trials, each accounting for around 30% of failures [3]. The development of tools that can predict adverse events and system-wide effects might thus reduce the attrition rate. Such tools will most certainly include emerging information about protein-protein interactions, signaling pathways, and pathways of drug action and metabolism. A systems view of the body's responses to a drug threatens the simplicity of the one drug-one target paradigm, but could provide a framework for considering all effects, and not just those that are targeted.

The laboratory assays currently used to evaluate potential adverse drug effects can be costly and time-consuming. For example, an expensive two-year rodent bioassay is the current gold standard for determining the carcinogenicity of a NCE [4]. Some assays are also of doubtful utility - only around 15% of gene knockouts in the standard pharmaceutical model organisms show any fitness defect [3]. Therefore, drugs designed with a single target in mind may prove ineffective, not because they do not interact with the target in the expected way, but because of natural redundancies in pharmacological networks. To compound the problem, protein-ligand studies have found that a single drug can bind targets with vastly different pharmacology and that about 35% of known drugs have two or more targets [5]. It is not surprising that evolutionary relationships might lead to shared drug-binding capabilities in protein paralogs found across a wide range of cell types and biological pathways. These complexities, however, create new opportunities for therapeutic strategies involving the concerted use of drugs with multiple targets to achieve an increased specificity in effect. A recent review by Giordano and Petrelli, for example, describes their approach to developing multi-target drugs for cancer therapy while avoiding drug resistance by targeting multiple tyrosine kinase receptors [6].

Chemical systems biology, or the application of systemwide tools to the analysis of pharmacological responses, can help address the lack of efficacy and undesired off-target effects [3]. Understanding each of these requires the ability to characterize off-target side-effects in silico. In a recent study, Philip Bourne and colleagues (Xie et al. [7]) have used a chemical systems biology approach to explain the serious side-effects of a drug that was being trialed for prevention of cardiovascular disease.

Structure-based Drug Metabolism Predictions for Drug Design

Significant progress has been made in structure-based drug design by pharmaceutical companies at different stages of drug discovery such as identifying new hits, enhancing molecule binding affinity in hit-to-lead, and reducing toxicities in lead optimization. Drug metabolism is a major consideration for modifying drug clearance and also a primary source for drug metabolite-induced toxicity. With major cytochrome P450 structures identified and characterized recently, structure-based drug metabolism prediction becomes increasingly attractive. In silico methods based on molecular and quantum mechanics such as docking, molecular dynamics and ab initio chemical reactivity calculations bring us closer to understand drug metabolism and predict drug–drug interactions. In this study, we review important progress in drug metabolism and common in silico techniques adopted to predict drug regioselectivity, stereoselectivity, reactive metabolites, induction, inhibition and mechanism-based inactivation, as well as their implementation in hit-to-lead drug discovery.

Figure 1. Computational screen of 3D similarity metrics. (A) Example of a structural alignment between a query ACE inhibitor and a hit ACE compound with a distinct chemical scaffold generated by the structural superposition algorithm. 3D chemical similarity metrics were used to measure the molecular shape and overlapping chemical features. (B) Unbiased screen of 28 3D chemical similarity metrics from Shape-it, Align-it, and ROCS programs. Representative ACE inhibitors were used as a query to test the ability of each 3D chemical similarity metric to enrich for class-specific scaffolds to the top rank from a combined set of 206 benchmark compounds consisting of six drug classes. The heatmap shows that the query (green) was retrieved as the top hit for all metrics. Additionally, each metric demonstrated a different ability to enrich for ACE-specific scaffolds (blue) from other drug classes. (C) The percentage of retrieved class-specific scaffolds was plotted against the ranking by each respective similarity score. TPR denotes true positive rate. To determine the performance of each metric, the area under the curve (AUC) was used to compute an enrichment factor (EF). For a list of 3D similarity metrics used, see Figure S2.

Yi He

Prediction of biological function related structure ensembles of flexible proteins, such as Intrinsically Disordered Proteins (IDPs), using machine learning algorithms, coarse-grained modeling, all-atom modeling and experimental data. This project provides detailed information which is beneficial for drug and protein-based biosensor design.

Simulating Dynamics and Interactions of Large Biomolecular Systems

Using customized physics-based coarse-grained models of proteins and nucleic acids to simulate dynamics and the interacting process of large biomolecular complexes comparable to experimental timescales.

Protein Structure Prediction

Developing algorithms and pipelines using a combination of knowledge-based and physics-based protein structure prediction approaches.

Developments of Physics-based Multiscale Computational Models

Construct coarse-grained computational models based on the free energy landscape derived from quantum or all-atom calculations to simulate different systems of interests. 

Chemical structure prediction - Biology

We are interested in understanding both the micro- and macro-structures of microbial genomes through computational studies and experimental validation, and in understanding why microbial genomes are organized the way they are. We are also interested applying the knowledge and information gained through such studies to prediction of pathways and networks in microbes.

Cancer Computational and Systems Biology:

We are interested in developing computational and analysis techniques in support of (a) identification of biomarkers for a number of human cancers, detetable through imaging, analysis of serum/urine samples, (b) understanding the relationships between (computationally identifiable) genomic features and cancer formation and development, and (c) cancer epigenomic studies. Our work involves microarray gene expression data analyses, comparative genome analyses and analyses of other experimental data.

Computational Methods for Protein Structure Prediction and Modeling:

We are interested in developing effective computational methods for protein fold recognition, protein structure prediction and modeling, and protein complex prediction and applying these tools to solve real structural biology problems. We are also interested in developing hybrid methods for protein structure solution using information from derived from computational tools and partial experimental data, including NMR and X-ray crystallograpohic data.

Current Opinion in Chemical Biology

Current Opinion in Chemical Biology is a systematic review journal that aims to provide specialists with a unique and educational platform to keep up to date with the expanding volume of information published in the field of Chemical Biology.

Excellence paves the way with
Current Opinion in Chemical Biology

Current Opinion in Chemical Biology is a systematic review journal that aims to provide specialists with a unique and educational platform to keep up to date with the expanding volume of information published in the field of Chemical Biology. The journal publishes 6 issues per year covering the following 10 sections, each of which is reviewed once a year, these are: Omics, Biocatalysis and Biotransformation, Bioinorganic Chemistry, Next Generation Therapeutics, Molecular Imaging, Chemical Genetics and Epigenetics, Synthetic Biology, Synthetic Biomolecules and Mechanistic Biology. There is also a section that changes regularly to reflect hot topics in the field.

Current Opinion in Chemical Biology builds on Elsevier's reputation for excellence in scientific publishing. It is a companion to the new Gold Open Access journal Current Research in Chemical Biology and is part of the Current Opinion and Research(CO+RE) suite of journals. All CO+RE journals leverage the Current Opinion legacy of editorial excellence, high-impact, and global reach to ensure they are a widely read resource that is integral to scientists' workflow.

Expertise - Editors and Editorial Board bring depth and breadth of expertise and experience to the journal.

Discoverability - Articles get high visibility and maximum exposure on an industry-leading platform that reaches a vast global audience.

Ethics in Publishing: General Statement - The Editor(s) and Publisher of this Journal believe that there are fundamental principles underlying scholarly or professional publishing. For more information, please refer to

Benefits to authors - We also provide many author benefits, such as a customized Share Link providing 50 days free access to the final published version of the article on ScienceDirect, a liberal copyright policy, special discounts on Elsevier publications and much more. Please click here for more information on our author services.

Please see our Guide for Authors for information on article submission. If you require any further information or help, please visit our Support Center

Phylogeny Tools

    PHYLIP is a free package of programs for inferring phylogenies. It is distributed as source code, documentation files, and a number of different types of executables.
    TreeView is a simple program for displaying phylogenies on Apple Macintosh and Windows PCs. It can be used to view PHYLIP generated phylogeny trees.

Greg Nelson, Chemical and Life Sciences Librarian, Brigham Young University

We welcome your comments and suggestions. If you have a resource that you would like to see highlighted please leave us a comment.

1. Introduction

Regarded as a departure from the “reductionist approach”, where investigators dedicate their efforts to the study of a single gene/protein, systems biology (SB) is considered a 𠇌omprehensive approach”. In SB, large networks describing the regulation of entire genomes, metabolic/transporter or signal transduction pathways are analyzed in their totality at different levels of biological organization (Voit et al., 2006). SB blends theory, computational modeling, and high-throughput experimentation (Kell, 2006), and has already led to advances in cell signaling (Blinov et al., 2006), developmental biology (Ochi and Westerfield, 2007), cell physiology (Brandman et al., 2005), and to the understanding of metabolic networks (Covert et al., 2004). Recently, we coined the term systems chemical biology, which integrates bioinformatic and cheminformatic databases and cheminformatic tools with biological network simulations (Oprea et al., 2007). We argued that chemistry awareness is required in order to achieve a systematic understanding of the way small molecules affect biological systems. This concept had a positive impact in the chemistry community, as reflected by the fourteen papers presented at the SCB symposium organized at the American Chemical Society national meeting in Philadelphia 1 one year later.

Other attempts of utilization of SB technologies include in silico polypharmacology (Mestres et al., 2006,Paolini et al., 2006), and are deployed in industrial drug discovery (Morphy and Rankovic, 2007,Loging et al., 2007). Furthermore, the chemical biology agenda, as embodied by the NIH Roadmap Molecular Libraries Initiative (MLI) (Austin et al., 2004), enables SCB by extending the study of chemical effects on biological targets towards the entire array of macromolecules and macromolecular networks. These can be further mapped using additional genomic and proteomic tools, in order to gain comprehensive insight into, e.g., phenotypic screening. Via the MLI and its successor, the Molecular Libraries Program (MLP), the effects of hundreds of thousands of small molecules are being investigated on biological systems of varied complexity, from individually screened targets to multiplex screens, phenotypic screens, and other cellular and whole organism assays. Indeed, this unprecedented public effort creates new challenges for advancing chemocentric approaches to systems biology, as increasing amounts of disparate data are being deposited in publicly available databases (see Table 1 ). As of November 13, 2009, PubChem (PubChem, 2009) features 328,392 MLP-related Compounds, of which, 296070 are Ro5-compliant and 152,778 are �tive”, all tested on 869-MLP related (including 515 𠇌onfirmatory”) assays, from the high-throughput screening centers network.

Table 1

Public Resources for SCB ( * ):

Entrez Gene:
Structures of biological macromolecules
Structural Genomics Consortium:
Ion Channels:
Biochemical pathway reaction kinetics:
Annotated Biological Models:
Other MLI Initiatives:
NIH Roadmap:

This plethora of small molecule data, in addition to those present in other annotated chemical libraries (e.g., WOMBAT) (see Table 2 ) has yet to reach the fields of computational biology and systems biology. As cross-system data related to genes, proteins and their modulation via diverse libraries of small molecules becomes available, an unmet critical need – chemistry cognizance – is required in order to advance the development of a systems biology, which we believe is vital to the understanding of human health. It is indeed surprising that with the possible exception of in silico pharmacology (Mestres et al., 2006), none of the computational biology approaches available to date offers any resolution from a cheminformatics perspective. Cheminformatics, an independent research discipline concerned with the application of information retrieval methods to chemical databases that emerged just over a decade ago (Brown, 2005), has become an integral part in the drug discovery decision-making system (Olsson and Oprea, 2001), and is today the main resource for computer-based studies of chemistry-modulated biological systems (Willett, 2008). In parallel to the evolution of molecular pharmacology into polypharmacology, cheminformatics is increasingly applied to in silico profile small molecule bioactivities for arrays of targets (Mestres et al., 2006,Paolini et al., 2006,Fliri et al., 2005), although it has yet to be fully utilized in chemical biology, an emerging discipline that aims at modulating all proteins via small molecules (Schreiber, 2005). Indeed, without chemistry cognizance, one cannot port cheminformatics predictive tools (Olsson and Oprea, 2001), e.g., virtual screening (Varnek and Tropsha, 2008), to systems biology.

Table 2

Sources of Bioactivity Data for SCB ( * ):

Small Molecules:
Drugs and Clinical Candidates:
NLM’s Dailymed:
WHO Essential Drugs:
Toxicology Data:

The increasing availability of data related to genes, proteins and their modulation by small molecules creates a critical need to develop systems chemical biology. There is an unmet requirement to develop a cheminformatics interface, which we believe is vital to the future of systems biology and that will enable the prediction of the effects of chemical structures in the context of biological systems. Fig. 1 illustrates the complexity of this problem and our vision for the contribution of in silico modeling of chemical structures towards modulation of biological pathways

Contribution of Cheminformatics to Systems Biology. It is expected that computational modeling will afford the prediction of chemical structures active against individual (or multiple) targets while PBPK approaches will afford the estimates of compound distribution and accumulation in target tissues. Yet the knowledge of pathways will enable to predict the effect of chemicals on the entire system in the context of steering the disease-affected network towards a normal state

Computational systems chemical biology aims to create a computational infrastructure and a platform to predict systemic effects (ultimately including clinical outcome) of an organic compound entering the body via any of the standard routes of administration (oral/i.v./i.m. etc). To achieve this goal, one should seek to build rigorous PD/PK models to predict such observables as tissue partitioning, half-life, distribution and clearance, ligand-target interaction and drug efficacy, while taking into account the relevant metabolites of a chemical. In addition, one should seek to predict the specificity of compound interaction with biological targets and simulate the outcome of drug-target interaction at the molecular, cellular and organ level. The latter objective entails the development of network simulators that explicitly take into account the chemical nature of the small molecules (or their combinations) perturbing the network. This endeavor requires the integration of several complimentary efforts in various fields contributing to the functional SCB workflow incorporating the following tasks: 1) Develop PK/PD models to predict the potential of exogenous small molecules to reach cellular components hosting specific pathways, estimate their concentrations in vivo, and their relationship to specific, understood clinical outcomes 2) Integrate available data on chemical-target interactions and develop target-specific predictive models of chemical bioactivity using advanced cheminformatics approaches such as Quantitative Structure Activity Modeling (QSAR). These models will enable to predict plausible targets for exogenous compounds from their chemical structure as well as to identify compounds in virtual chemical libraries that are predicted to interact with target proteins and pathways 3) Investigate, using kinetic network simulation technologies, how small molecules perturb a particular pathway, or perhaps several networked pathways, and predict how these perturbations result in (novel) clinical outcomes. Whereas the comprehensive exploration of SCB requires the consideration of all of the above three major components of the field we will limit our discussion here to the latter two areas. Several recent reviews provide a lot of detailed information concerning PK/PD modeling [e.g., (Danhof et al., 2008,Schmidt et al., 2008)] however, in this review we shall consider and illustrate the elements of in silico (multi)target screening and systems biology simulations contributing to the field of SCB.

Protein Structure Prediction: Recognition of Primary, Secondary, and Tertiary Structural Features from Amino Acid Sequence

This review attempts a critical stock-taking of the current state of the science aimed at predicting structural features of proteins from their amino acid sequences. At the primary structure level, methods are considered for detection of remotely related sequences and for recognizing amino acid patterns to predict posttranslational modifications and binding sites. The techniques involving secondary structural features include prediction of secondary structure, membrane-spanning regions, and secondary structural class. At the tertiary structural level, methods for threading a sequence into a mainchain fold, homology modeling and assigning sequences to protein families with similar folds are discussed. A literature analysis suggests that, to date, threading techniques are not able to show their superiority over sequence pattern recognition methods. Recent progress in the state of ab initio structure calculation is reviewed in detail.

The analysis shows that many structural features can be predicted from the amino acid sequence much better than just a few years ago and with attendant utility in experimental research. Best prediction can be achieved for new protein sequences that can be assigned to well-studied protein families. For single sequences without homologues, the folding problem has not yet been solved.