Cheminformatics & Pharmainformatics

Cheminformatics & Pharmainformatics

In this presentation…… Part 1 – Molecular Conventions Part 2 – Resources Part 3 – Drug Design Part 4 – Drug Development

Part1 Molecular Conventions

Cheminformatics • It is a combination of chemistry and information technology, is required for the processing and analysis of chemical data • Cheminformatics is relevant to biologists because chemistry data are important in many areas of molecular biology, e.g, in the study of protein interactions and metabolism

Molecular formulae • Molecules can be represented by simple formulae, which give the number and type of atoms • However, this does not show how they are connected • Structural formulae provide some information about the arrangement of atoms in a molecule and thus allow isomers to be distinguished

H H H H H 1 2 1 2 H H H H H H H H H H H H H H H H Structural representation of ethane that show tetrahedral distribution of coordinated groups about saturated carbon atoms. Panels (a) and (b) show two extreme conformations. The energetically favourable conformation (a), which predominates in nature, has H atoms on opposite sides of C-C bond as far as possible from each other (in the staggered configuration). The less favourable conformation (b) has atoms in eclipsed configuration. Panels (c) and (d) show conformations viewed from the end of molecule (b) (a) (d) (c)

Structural formulae and full and simplified structural diagrams for some common organic compounds

OH O Structural formulae and full and simplified structural diagrams for some common organic compounds

Structural diagrams • Molecules can be represented using simple graphs, which show atoms as nodes and bonds as links • For organic molecules, further simplification is achieved by assuming that carbon atoms make up the molecular backbone and that the valency of four is satisfied by hydrogen atoms unless otherwise shown • Such diagrams present all molecules as planar shapes an do not indicate the spatial distribution of atoms in 3D

Chirality • If four different groups are coordinated around a central carbon atom, the molecule is described as chiral • Chiral molecules exist in two conformations, enantiomers, which are mirror-images of each other • Although enanciomers have the same chemical properties, many enzymes and other proteins show chiral sensitivity, which is important in drug development and related fields

Multi-chiral configuration • Molecules may contain any number of chiral centers and a series of forms, called distereoisomers, may exist • These may have different chemical properties because of the way different groups interact within the molecule

DL and RS conventions • The absolute configuration of groups around a chiral carbon atom can be described using a number of conventions • In the DL system, molecules are named D or L according to whether the coordinated groups are arranged in a similar fashion to those in D-glyceraldehyde or L-alanine • In the RS system, molecules are named R (rectus) and S (sinister) according to the size of chemical groups surrounding the carbon atom

1 1 1 3 2 4 4 2 2 3 3 4 Representation of a tetrahedrally coordinated saturated carbon atom in an organic molecule(a) the carbon atom is at the centre of a tetrahedron with four coordinated groups(b) simplified representation with the central carbon removed(c) Representation of the tetrahedron as a flat image C (c) (a) (b)

CHO CHO CHO CHO OH OH OH H H OH CH2OH CH2OH CH2OH CH2OH Chirality representation (a) The structural formula of glyceraldehyde gives no indication of its chirality CH2OHCHOHCHO L D D L (b) if the molecule is represented as a tetrahedron, the D and L enantiomers can be distinguished (c) these can be shown as 2D graphs using the Fischer convention

Part2 Resources

SMILES • SMILES is a system for representing chemical formulae as strings, based on a valence model in which all valencies are considered to be satisfied by hydrogen atoms unless otherwise shown • The system has conventions for representing different bond types, cyclic molecules, branches, cis/trans isomers and chirality

RasMol and Chime • There are several specialized data formats for chemical structures based on the principle of a molecular formula and associated table of connections • Viewing utilities such as RasMol and Chime can interpret these file formats and display interactive molecular structures in a variety of user-defined schemes and colors

Chemical structure and databases • Structural information about different molecules can be obtained from a number of comprehensive WWW resources, including Chemical Abstracts On-Line, Chemfinder and MedChem • Each of these resources provides a chemical database that can be searched using a variety of query formats, e.g., systematic name, non-systematic name, formula, molecular weight or CAS registry number • Search results provide physical, chemical and biomedical information with links to other databases and resources • MedChem also provides the SMILES string

QSAR • A QSAR is a statistical method used to determine how the structural features of a molecule are related to biological activity • The QSAR approach is particularly useful for categorizing the activities of related molecules with multiple functional groups • Each molecule is broken down into a series of descriptors (molecular properties) and the QSAR determines which descriptors are most likely to promote biological activity • This gives rise to a set of rules that can be used to evaluate the potential activity of new molecules

Part3 Drug Design

Pharmainformatics • Pharmainformatics is the combination of biology, chemistry, mathematics and information technology that is essential for efficient data management, processing and analysis in the pharmaceutical industry

Drugs • Drugs interact with targets, usually proteins, in the body and through interactions cause physiological responses • The pharmaceutical industry aims to discover drugs with specific beneficial effects to treat human diseases

Gene – drug – life • To know a gene’s chemical structure and composition is one thing, but understanding its actual function is another thing • Though the sequencing and analysis would help in answering questions on aging, diseases, disorders, and many more, a new discipline of designer drugs is around the corner waiting for someone to tap • Even a single nucleotide polymorphism (SNP, pronounced “snips”), a T, for instance, in one of the gene sequence, where the neighbour has a C, can spell trouble

Gene – drug – life • Many drugs work only on 30 percent of human population • In extreme cases, a drug that saves one person may poison another. For instance, a type II drug Rezulin, which has been linked to more than 60 deaths from liver toxicity worldwide • This is where in silico drug design would help not only in reducing the designing, modeling and testing time but also reducing the expenditure in manpower, resources and on various phases of drug design and development

Areas of drug design • For drug design, the process must be viewed from three different dimensions viz., drug design for • Diseases such as HIV, cancer, etc. that have been beating the people • Life style drugs • Drugs for repairing genetic disorders • There is an immanent need for evolving drugs for diseases such as hepatitis C, leprosy and malaria since these diseases are wide spread and trouble the people at large • Other infectious diseases such as tuberculosis, HIV, etc. are also highly troublesome

In silico drug design • Earlier, the drug design process used to take many decades and was carried out haphazardly without any direction whereas presently there is a systems approach. Added to this are tremendous reduction in research and production costs • Already the surge in bioinformatics solutions has redefined the way drug trials are done making a shift from in vitro to in silico • In silico drug design could be used to shorten the time of drug design and this issue shall remain the biggest challenge for years to come

Drugs are insoluble in water… • A large portion of proteins constitute water (2/3rd of human body consists of water) and hence do not behave like rigid bodies due to the presence of water in the cells and consequently, the behavioural pattern differs from protein to protein • Drugs normally do not dissolve in water. Designing of drugs in silico (on chips, without water) should consider this point

Important areas for drug design • The four most important areas of consideration for successful drug design are the • binding sites • molecular shape • molecular size • inhibitory properties of the proteins

Important areas for drug design • The study related to crystallization of membrane protein structure also plays a vital role in drug design. This area of research would be highly challenging and would prove to be an excellent foundation for further research • Since the sequence size of dengue virus is just about 11 KB, it would be highly useful for carrying out lot of work quickly and conveniently

Medical applications • Bioinformatics and drug design can be highly useful for diagnosis and treatment of various neurological disorders. It has been found that many neurological disorders are due to unusual gene structures like the triple ‘A’ formation “AAA” (the A of “ATGC” nucleotides) in the genes. The problem becomes more complex with multiple repeats or occurrences of triple ‘A’. More than eight such repeats are known and in such cases children are permanently bed ridden or has to use wheel chairs

Part4 Drug Development

Bioinformatics in drug development • Genomics, proteomics, combinatorial chemistry and high-throughput screening (HTS) have all contributed to a massive increase in the amount of data generated by the pharmaceutical industry • The role of bioinformatics is to store, track and provide tools for the analysis of these data – some thing like an automated environment

Bioinformatics in drug development • Specific applications include the modeling of protein interactions with small molecules allowing rational drug design, the association of genotype and drug response patterns (pharmacogenomics), the design and assessment of chemical diversity in combinatorial libraries, and the processing and storage of data from high-throughput screens of lead compounds

Areas of biology

Areas of chemistry

Principles of drug development • Drug development begins with the identification of a suitable target, which must contribute significantly to a human disease • Ideally, altering the activity of this target should have a beneficial effect thus showing its potential for therapeutic intervention • The next stage of the process is lead discovery, where compounds showing some of the desired activity of an ideal drug are sought

Principles of drug development • Optimization of lead compounds results in drug candidates that may be registered and submitted for clinical trials, which establish their safety and metabolic behaviour in human subjects

Genetic link to drugs • An early example of the utility of bioinformatics in drug design is cathepsin K, an enzyme that might turn out to be an important target for treating osteoporosis, a crippling disease caused by the breakdown of bone • While analyzing the osteoclasts (cells that break down bone in the normal course of bone replenishment) taken from people with bone tumors, it was found that osteoclasts cells were over expressed and could be over active in individuals with osteoporosis • They matched with a previously identified class of molecules called cathepsins. Efforts are on to find a potential drug to block the cathepsin K target

Genetic link to drugs • Scientists believe that 99.9 percent of your genes perfectly match those of the person sitting beside you. But the remaining 0.1 percent of the genes vary and it is these variations in which the drug companies are interested in • Several years after the debut of tests for BRCA1 and BRCA2, scientists are still trying to determine exactly to what degree those genes contribute to a woman’s cancer risk

Chemical diversity • Diverse chemical libraries are required for efficient lead discovery if little is known about the binding properties of the drug target • Conversely, focused libraries are required if the structure of the target is known, since this defines a particular set of ligands • Chemical diversity can be defined by comparing molecules on the basis of descriptors (functional groups) and how these fill chemical space • A number of software tools are available for the design and assessment of diverse or focused chemical libraries, virtual screening against drug targets

Computational screening • Software applications like DOCK and Autodock match potential ligands to binding sites by calculating steric constraints and bond energies • These can be used to search chemical databases and find potential drug leads • Some applications consider the ligand and binding site as inflexible structures, rather like pieces of a jigsaw, while others can incorporate flexibility into the molecules by calculating allowable and compatible bond torsions

Functional genomics • The large-scale functional annotation of genes is known as functional genomics and incorporates areas such as homology searching, structural analysis, expression analysis, large scale mutagenesis and the analysis of protein interactions • All of these areas are important in drug development

Genome-scale mutagenesis • Genome-scale mutagenesis is a rich source of animal disease models for target identification and validation, and large mutant collections in simple organisms can be used for the rapid high-throughput screening of potential lead compounds

Approaches in functional genomics

Pharmacogenomics • It is a study of how variation in the human population correlates with drug response patterns • The analysis of genomic data and its comparison with drug response data allows patients to be clustered into drug response groups, so that appropriate drugs and dose regimens can be administered • Variation is catalogued by analyzing data on mutation (particularly SNPs) and gene expression profiles

In lab vs. out of lab effort • The companies and individuals plug into the effort of drug design at various points: collecting and storing data, searching databases, and interpreting the data • The race and competition is all about who can mine the massive information best • Just modeling or computing of the drug design or protein structure would not be sufficient, but lot of information on test results and clinical trials from outside are also very important • Most of the time should be spent on this aspect for ensuring success in drug design and development

Issues of drug design • Eventhough the human genome has been sequenced, there a number of problems awaiting for solutions…… technical, legal, and social • It is absolutely not clear as to how much must one know about a gene in order to patent it • There is also a necessity of reviewing all failed drugs, i.e., drugs failed during clinical trails since their molecular composition and experimentation process could give lot of valuable information

Various aspects connected to successful drug design include supercomputing, modeling of proteins through software, biotechnology, computational methods and analysis, biochemistry, in silico drug design, etc. • It is notable that a drug that works for protein ‘A’ does not work for protein ‘B’ or behaves differently due to various factors. That is why, many drugs could fail, and hence an integrated (team work) effort is required with tremendous amount of information and interactions

At the moment, many patent applications rely on computerized prediction techniques that are often referred to as “in silico” biology • With full or partial gene sequence, scientists enter the data into a computer program that predicts the amino acid sequence of the resulting protein • By comparing this hypothetical protein with known proteins, the researchers take a guess at what the underlying gene sequence does and how it might be useful in developing a drug, say, or a diagnostic test

Cheminformatics & Pharmainformatics