1 / 26

Organization of Biological Data and Databases

Organization of Biological Data and Databases. Pramod Wangikar Dept. of Chemical Engineering IIT Bombay. ORGANIZATION OF BIOLOGICAL DATA. Gene i. Genomics. m-RNA i. Transcriptomics. Protein Sequence / Proteomics. Protein i. Function (Enzyme, hormone etc.). 3-D Structural

lovey
Download Presentation

Organization of Biological Data and Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Organization of Biological Data and Databases Pramod Wangikar Dept. of Chemical Engineering IIT Bombay

  2. ORGANIZATION OF BIOLOGICAL DATA Gene i Genomics m-RNA i Transcriptomics Protein Sequence / Proteomics Protein i Function (Enzyme, hormone etc.) 3-D Structural Database

  3. G A C G T T 3’ P OH 3’ 3’ 3’ 3’ 3’ 5’ P P P P P 5’ 5’ 5’ 5’ 5’ Primary Structure of Deoxyribonucleic Acid (DNA) OR pApCpGpTpTpG OR ACGTTG

  4. The Basic Principle of Transcription RNA Polymerase 5’ Double stranded DNA RNA Nucleotides

  5. The Code • 64 ways of writing the codon • 20 amino acids F M uac 5' 5'... aug gaa 5' uuu ... Adjacent mRNA codons

  6. The Flow of Genetic Information Sequense same as RNA 3’ 5’ DNA ACTGCACCATGGGGCTCAGCGACGGGGAATGGCACTTGGTG TGACGTGGTACCCCGAGTCGCTGCCCCTTACCGTGAACCAC Sequence complementary to RNA 5’ mRNA ACUGCACCAUGGGGCUCAGCGACGGGGAAUGGCACUUGGUG Initiation signal codons Protein Met-Gly-Leu-Ser-Asp-Gly-Gln-Trp-His-Leu-Val

  7. Memory Requirements for Storing Genomes 00 = a 01 = c 10 = g 11 = t Prokaryotic 0.5-7.0 Mbp Eukaryotic 10 Mbp - 1000 Gbp

  8. How Much Data Does a Bacteria (E. coli) contain?

  9. E. coli and Data size Numbers are approximate: The data size increases roughly by three orders of magnitude for human system

  10. Minimal Life: Self- assembly, Catalysis, Replication, Mutation, Selection Environment Cell Boundary Monomers RNA Growth rate

  11. Maximal Life: Self- assembly, Catalysis, Replication, Mutation, Selection Regulatory & Metabolic Networks Environment Metabolites Interactions RNA DNA Protein Growth rate Expression stem cells cancer cells microbes

  12. Regulation: More biological data What is regulation: A catalogue of possible scenarios and respective course of action. • The information for regulation can be stored in the form of: • Protein-protein interaction • Protein-DNA interaction • Protein-metabolite interaction • Molecular switches, controls, set-points, etc. Genome + Environment: Input file Biological Machinery: Executable program Observations: Output file Can we crack the executable program?

  13. Some useful regulatory signals on Genes Upstream activating sequences (UAS) m-RNA expression start & end TATA box DNA x x mRNA Ribosomal binding site protein Protein synthesis stops Protein synthesis starts

  14. Minimal Gene Complement of Mycoplasma genitalium

  15. DESCRIPTION OF A LIVING CELL / VIRUS Genome / Genomics General Capability of the Cell Readyness of the Cell Transcriptomics Proteomics / Protein Map Physiological state of the cell

  16. Paradigm Shift in the Bioinformatics Age Conventional Path Structure Gene Function • Bioinformatics Age: Functional Genomics Gene sequence Structure of Protein Function Protein Map 2D-PAGE, pI, mol. wt. Proteomics

  17. Possible Relationships Between Databases Genome Sequence Protein Seqeunce Proteomics Transcriptomics Expression Profile Protein Structure Protein Profile Protein-DNA interactions Protein-Protein Interaction Protein Function Metabolome Phenotype

  18. Combinatorial Problems in Biology • Prediction of ORF; gene finding • Prediction of DNA regulatory sites • DNA regulatory Proteins • Protein-Protein interactions • Protein Function • Prediction of Metabolic capability • Prediction of Genetic Regulatory Circuits

  19. Biological Databases • Raw databases • Processed databases • Querying in databases.

  20. Raw Databases Conventional Ones DNA / Gene / Genome Sequence Databases. EMBL, GenBank, GSDB etc. > 106 genes, Doubles every 18 months. Genome Projects: E. coli, plants, Human, Mouse, etc. Protein Sequence Databases. PIR, SwissProt, GenBank, etc. > 105 protein sequences, Doubles every 21 months Three Dimensional structure Database. Brookhaven Protein Databank (PDB) > 20,000 structures, doubles every 24 months.

  21. Proteomics Database(SwissProt) • Each Protein Identified by: pI, mol wt., mass spectra, microsequencing, peptide mass fingerprint, etc. • Entries for E.coli, yeast, human etc. Hoogland et al, Nucl. Acids Res. (2000) 28, 286

  22. Cluster of Orthologous Groups (COG) of Proteins: A Processes Database • Compares genes from different genomes. • Forms clusters with similar sequences. • Each COG contains genes connected through vertical evolutionary descent. • 30 genomes (68,571 genes), 2,791 COGs with 45,350 genes • Assignment of function for genes based on known functions for some members of the cluster. • Highly useful for functional assignments for newly sequenced genomes.

  23. EcoCyc Database: Encyclopedia of E. coli genes and Metabolism 4300 genes, 695 enzymes, 595 reactions, 123 pathways Blue: E. coli only; Green: both E. coli and H. influenzae. Karp et al, Nucl. Acids Res. (1998) 26, 50

  24. Querying in Databases • Based on sequence similarity; gives similar sequences and the similarity score or expectation value. • Normally a BLAST, FASTA search (local alignment). Can look for a sequence motif. • Gene names, biological source, functional category, cellular location / role. • Structural features (for known 3-D structures).

  25. Bioinformatics: A multidisciplinary effort is required • Generation of biological data • Storage and Retrieval of Data • Conversion of known biological hypotheses into mathematical/statistical models • Building models from data • Fitting new data to existing models. • Searching for patterns in data • Derive new biological knowledge from Data

More Related