210 likes | 368 Views
Bioinformatics : Data-driven molecular biology. Mikhail Gelfand A.A.Kharkevich Institute for Information Transmission Problems, RAS Moscow II Испано-российский форум по информационным и коммуникационным технологиям Madrid, 21-25 / IX / 2009. Exponential increase of data volume.
E N D
Bioinformatics: Data-driven molecular biology Mikhail Gelfand A.A.Kharkevich Institute for Information Transmission Problems, RAS Moscow II Испано-российский форум по информационным и коммуникационным технологиям Madrid, 21-25 / IX / 2009
Exponential increase of data volume red – papers (PubMed) blue – sequence fragments (GenBank) green – nucleorides (GenBank) of 18 million papers in PubMed, ~675 thousand have keywords “bioinformat* OR comput*”
>45 thousand Google hits on “genome deciphered” Top 10 hits: • bioremediation • bacterium Pseudomonas • agriculture and biotech • crop and biofuel plant Sorghum • rice • medicine • pathogenic bacterium Staphylococcus • SARS (atypical pneumonia) virus • Brugia worm (elephantiasis) • individual genome (medicine) • James Watson • science / model organism • macaque • science / evolution • mammoth (mitochondrial) • platypus
Sequencingis just the beginning Bacterial genome:several million nucleotides 600 through 9,000 genes (~ 90% of a genome codes for proteins) This slide: 0,1% of theEscherichia coli genome Human genome: 3 billion nucleotides, 25-30 thousand genes polymorphisms (individual differences): ~ 1 for 1000 nucleotides differences between human and chimpanzee: ~ 1 of 100
Not just genomes Other types of large-scale experiments / datasets: • State of the genome (gene expression) • methylation • nucleosome positioning • histone modifications • Transcriptomics, protein abundance (gene expression) • Protein-protein interactions • signaling etc. • functional complexes • Protein-DNA interactions (regulation) • etc. etc.
Goals • Functional annotation of genes and proteins • biological function • regulation (in what conditions) • Functional annotation of genomes • metabolic reconstruction and modeling • regulatory networks and development • prediction of organism properties from its genome
Applications: biotechnology • Improvement of production strains (chemistry, pharma, food industry) • via modeling of metabolic pathways • New enzymes (new functions, stress tolerance) • via sequencing and functional annotation • Biofuels • fast-growing, stress-tolerant plants; identification of genes • microbes as producers of ethanol or fatty acids: targeted genome design
Applications: medicine and pharma • Personalized medicine • identification of predisposing alleles: lifestyle • pharmacogenomics (metabolic alleles) • diagnostics • Drug targets (chronic disease) • analysis of signaling pathways • Anti-infectives • identification of drug targets • Drug design; identification of drug candidates • modeling of protein structure and interactions of proteins with small molecules
Methods. Integration of data • Systems biology:Integration of diverse datasets for one organism • Comparative genomics:Simultaneous analysis of genomic data for many organisms • Comparative systems biology:understanding the evolution of gene regulation and expression, signaling etc. • Comparative structural biology
Bioinformatics in Russia • Few high-throughput experiments • Open data • Collaborations • Theory (evolution), methods, algorithms • Highlights: • Evolution (IITP RAS) and taxonomy (IPCB MSU) • Regulation (FBB MSU, GosNIIGenetika, IITP RAS, ICaG SB RAS) • Annotation (FBB MSU, IITP RAS) • Protein Structure (IPR RAS, IMB RAS, IPCB MSU, BF MSU) • Modeling • Metabolism (IPCB MSU, ICaG SB RAS) • Regulation (SpBSPU , ICaG SB RAS) • Drug design (IBMC RAMS)
Research and Training Center “Bioinformatics”, Institute of Information Transmission Problems (5 years: 2003-2009) • Molecular evolution • Alternative splicing as a driver of evolution in eukaryotes • Positive selection • Comparative genomics of regulation in bacteria • Evolution of regulatory pathways • Protein-DNA interactions • Annotation • Gene recognition • Functional annotation • Regulation
Comparative genomics in action: confirmed predictions • Regulatory mechanisms • riboswitches (riboflavin – vitamin B1, thiamin – vitamin B2) • antisense regulation of the methionine-cysteine pathway • role of the ribosome in zinc homeostasis • Regulators: NrdR, MtaR/MetR, CmbR, NiaR • Enzymes: FadE, ThiN, TenA, CobZ, CobX/CbiZ, PduX, NagP, NagB-II • Microcins (capistruin, Burkholderia thailandensis) • Transporters • АВС-transporters with universal energizing components: Co, Ni, biotin (vitamin H), thiamin (vitamin B2), riboflavin (vitamin B1) • other: threonin, methionin, oligogalacturonides, N-acetylglucosamin, corrinoids, nyacin, riboflacin, Co • Regulatory motifs: nitrogen-fixation, fatty acid biosynthesis, iron homeostasis, catabolism of chitin and pectin • Regulatory sites: several dozens
Functional annotation of genomes First Russian bacterial genome,Acholeplasma laidlawii(2008):sequencing and proteomics: Institute of Physico-Chemical Medicine; annotation: IITP: ~1,5 Mb; ~1400 genes. Established function for~80% genes; metabolic reconstruction
Bold: on-going * Former students Collaborations • European Laboratory of Molecular Biology * • Germany • Humboldt University, Berlin • Munich Technical University • France • Lyon University • United Kingdom • University of East Anglia • Spain • Center for Genome Regulation (Barcelona) • USA • MIT • Burnham Institute * • Lawrence Berkeley National Laboratory * • Stowers Institute * • Rutgers University • China • China-Germany Partner Institute of Molecular Genetics (Shanghai) • Industry • Biomax (Germany) • Interated Genomics (USA)