280 likes | 476 Views
Trends in Bioinformatics & Computational Biology (From Genomics To Proteomics). Professor K. Sundaram, Ph.D. Former Professor & Head Department of Crystallography & Biophysics University of Madras. Outline of Presentation. What is Bioinformatics ? Why sudden spurt of interest recently ?
E N D
TrendsinBioinformatics&ComputationalBiology(FromGenomicsToProteomics)TrendsinBioinformatics&ComputationalBiology(FromGenomicsToProteomics) Professor K. Sundaram, Ph.D. Former Professor & Head Department of Crystallography & Biophysics University of Madras
Outline of Presentation • What is Bioinformatics ? • Why sudden spurt of interest recently ? • Scope for Services/Products/Solutions ? • What activities are possible ? • By Individual -- By Institution -- By Industry • By State -- By Country
What is Bioinformatics ? • The content of this exemplary online course covers major areas of Bioinformatics (as of early 1990’s) • Introduction to Biotechnology • Pairwise Sequence Alignments • Networking • Multiple Alignment • Molecular Phylogenetics • Protein Folding
Bird’s Eye View of Biotechnology • Every Bioinformatician must have some knowledge of the fields of Molecular Biology and Biotechnology • An effective and succinct coverage through annotated illustrations has been created by Access Excellence • Other such resources are also available on the Net • Efforts like these (that organizations or individuals can undertake) are also valuable contributions to Bioinformatics and Biotechnology
Computer Technology in Biotechnology • Computer hardware and software tools are integral parts of most biotechnological procedures - see e.g., thisreview of 1997 • The recent spurt in bioinformatics has resulted from the sequencing of all the genes in the entire human genome (see The Human Genome Project) • The whole genome of the following species have also been sequenced : e-coli, yeast, mouse, mosquito, rice; • Many others are in the pipeline • IT is indispensable if all these data have to be preserved and made available to researchers
Analysis of cDNA Microarray Data • Decoding of several genomes has spurred many new areas of intense research activity • One such is cDNAmicroarray technology • Biostatisticians and software professionals are sought after for the analysis of cDNA microarray data • Here is a sample of theory used to normalise cDNA microarray data • Theoretical prediction of the three dimensional structure of a protein, given its sequence, is another area that has attracted renewed interest in the postgenomic era
Protein Tertiary Structure Prediction • Completion of the HGP, an enormous achievement, marks none the less, just the biginning of a bigger challenge -- The quest to decode the Proteome • The base sequences in genes code for all the proteins. • Proteins are responsible for virtually all the functions of a living system • Activity of a protein is determined by its 3D structure • There is a strong belief that the native 3D structure of a protein is determined by its primary sequence (and hence the base sequence in the corresponding gene) • But the rules of the game are not yet fully understood
Ramachandran Plot • G.N. Ramachandran is credited with the discovery of a major first step in Protein Structure Prediction • The disposition of adjacent peptide units is restricted to a few allowed regions as shown in the adjoining Ramachandran (f, y) plot
PDB -- The Protein Data Bank • The PDB is a rather unique database, much sought after by protein structure predictors • Each PDB entry consists basically of the (x, y, z) coordinates of atoms in a protein determined experimentally (by x-ray crystallography or nmr) • Experimental determination of structure requires the protein in very pure form (near 100%) • This was very difficult to achieve in early days but easier now, thanks to recombinant DNA technology • Similarly methods of structure solution have improved significantly due to advances in technology
PDB -- The Protein Data Bank (Contd.) • The PDB currently holds over 19,500 entries and the number keeps increasing rapidly • All the same, there is no hope of determining the 3D structures of all the proteins of the proteome • This is the reason why 3D structure prediction methods have assumed a new significance • Various techniques are being explored to work out structures theoretically based on patterns observed in PDB proteins or ab initio using molecular physics
Protein Structure Prediction • CASP (Critical Assessment of Techniques for Protein Structure Prediction) is a competition organised every two years and makes an objective assessment of various prediction methods • CAFASP is a related activity where fully automated protein structure prediction servers are assessed • Organizers of CASP and CAFASP also conduct a community wide cooperative program, TMW, in an attempt to determine theoretically the structure of TenMost Wanted proteins that are needed to solve scientific/ medical problems and for which there exists no hope of obtaining the structure experimentally in the near future
Protein Structure Prediction (Contd.) • Present day PC’s have reached a stage that enables an individual to participate in these ventures effectively • The algorithm of such a program used to participate in the TMW effort is described below • An internal coordinates representation is used and standard planar geometry used for the peptide unit. With these assumptions a model of the protein can be simulated given its primary sequence • Cartesian coordinates of all the atoms are obtained looking at the molecule as a tree-like structure and moving from terminal atoms (leaves) towards the root atom and applying an affine transformation for each bond traversal (see next slide)
Protein Structure Prediction (Contd.) • Starting conformation of a protein can be input in one of two forms: • (1) As a sequence of single letter aminoacid codes and the local dipeptide conformations at each a carbon atom (see example) • (2) As a sequence of records, one for each residue. The record for each residue contains the residue name followed by the values of variable torsion angles f, y, and, c‘s (see example) • Total nonbonded energy is calculated using these formulae
Protein Structure Prediction (Contd.) • Various strategies are used to generate likely conformations for the protein using a simulated annealing approach adopting the Monte Carlo algorithm • A more detailed description is available in this article
Proteomics Research • Once the structure of a protein is obtained, further research can be carried out in areas like functional genomics and proteomics • Several proteins are suspected to engage in function related specific associations. This can be investigated theoretically as discussed in this article