280 likes | 968 Views
Computational Biology at Carnegie Mellon University A Quick Tour. Jaime Carbonell Carnegie Mellon University December, 2008. Computational Biology at CMU: Educational History. 1987 Undergraduate program in Computational Biology established
E N D
Computational Biology at Carnegie Mellon UniversityA Quick Tour Jaime Carbonell Carnegie Mellon University December, 2008
Computational Biology at CMU: Educational History • 1987 Undergraduate program in Computational Biology established • 1991 Howard Hughes Medical Institute grant to build undergrad curriculum • 2000 M.S. Program in Computational Biology established • 2005 Joint CMU & U. of Pittsburgh PHD Program in Computational Biology
Computational Biology at CMU: History • 2002 NSF large ITR grant (CMU PI: Reddy & Carbonell) with U, Pitt, MIT, Boston U, NRC Canada Computational Biolinguistics • 2003 NSF large ITR grant (CMU PI: Murphy) with UCSB, Berkeley, MIT Bioimage Informatics • 2004-2008 10 small grants from NSF, NIH, Merck, Gates on: Computational proteomics, viral evolution, HIV-human interactome, …
Curriculum for Comp Bio PhD • Core graduate courses • Molecular Biology • Biochemistry • Biophysics • Advanced Algorithms & Language Tech. • Machine Learning Methods • Computational Genomics • Computational Structural Biology • Cellular and Systems Modeling
Curriculum • Elective Courses • Computational Genomics • Computational Structural Biology • Cellular and Systems Modeling • Bioimage Informatics • Computational Neurobiology • Advanced Statistical Learning Methods
Teaching & Advising Faculty • 30 faculty from CMU • 11 Computer Science • 11.5 Biology and Chemistry • 3.5 Bio-Engineering • 3 Statistics and Mathematics • 1 Business School • 36 faculty from Pitt • 19 Medical School • 17 Biology, Chemistry, Physics
Faculty: Computational Genomics Linguistics methods for elucidating sequence-structure-function relations • Ziv Bar-Joseph* • Jaime Carbonell • Marie Dannie Durand* • Jonathan Minden • Ramamoorthi Ravi • Kathryn Roeder • Roni Rosenfeld • Larry Wasserman • Eric Xing* Machine Learning methods for annotation Modeling genome evolution through duplication * = Primary research area
Faculty: Computational Structural Biology (Proteomics) Homologous structure determination by NMR • Michael Erdmann • Maria Kurnikova* • Chris Langmead* • John Nagle • Gordon Rule • Robert Swendsen • Jaime Carbonell* Improving determination of protein structure and dynamics using sparse data Molecular dynamics of proteins and nucleic acids
Faculty: Cellular and Systems Modeling Computational modeling of mechanical properties of cells and tissues • Ziv Bar-Joseph* • Omar Ghattas • Philip LeDuc • Russell Schwartz* • Joel Stiles* • Shlomo Ta’asan • Yiming Yang • Eric Xing Modeling of formation of protein complexes Multi-scale modeling of excitable membranes Discovery of large-scale gene regulatory networks
Faculty: Bioimage Informatics Determining subcellular location from microscope images • William Cohen • Bill Eddy • Christos Faloutsos • Jelena Kovacevic • Tom Mitchell* • Robert Murphy* • Eric Xing Generative models of protein traffic Machine learning of patterns of brain activity Statistical analysis of gel images for proteomics
Faculty: Computational Neurobiology Development of structure of neuronal circuits • Justin Crowley • Tom Mitchell • Joel Stiles* • David Touretzky* • Nathan Urban Machine learning of patterns of brain activity Multi-scale modeling of excitable membranes
Proteomics • Things to learn about proteins • sequence • activity • Partners • Structure • Functions • Expression level • Location/motility
Examples of Cool Research • Computational Biolinguistics • Sequence (DNA, Protein) Structure Function Language (Speech, Text) Syntax Semantics • GPCRs (sensor/channel proteins, Klein CMU/Pitt) • 60% of all targeted drugs affect GPCRs • Language (information-theoretic) analysis • Evolutionary Analysis(of genes, proteins, …) • Conservation, replication, poly-functionality (Rosenberg) • Immune System Modeling(just starting…) • Domain/Fold polymorphic modeling (Langmead) • Cross-species Interactome(just starting…) • Human-HIV protein-protein (Carbonell, Klein)
Human Monkey Mouse Rat Cow Dog Fly Worm Yeast Evolutionary Methods for Discovering Sequence Function Mapping (Rosenfeld) A Multiple Sequence Alignment Distribution of amino acids Conserved Properties across Rhodopsin
Subtask: Identifying Chemical Properties Conserved at each Protein Position A Single Position Results for All Rhodopsin Positions
Five Classifiers in Gene Identification for Cancer/H5 (Yang)
New Field: Location Proteomics (Langmead) • Can use CD-tagging (developed by Jonathan Jarvik and Peter Berget) to randomly tag many proteins • Isolate separate clones, each of which produces one tagged protein • Use RT-PCR to identify tagged gene in each clone • Collect many live cell images for each clone using spinning disk confocal fluorescence microscopy • Cluster proteins by their location patterns (automatically)
Quaternary Fold Predictions (Carbonell & Liu) • Triple beta-spirals [van Raaij et al. Nature 1999] • Virus fibers in adenovirus, reovirus and PRD1 • Double barrel trimer [Benson et al, 2004] • Coat protein of adenovirus, PRD1, STIV, PBCV
Model Organism: Bacterial Phage T4: (Ultimate targets are HIV, etc.)
Dendritic Clustering for Clone (Murphy) Protein name Clone isolation and images collection by Jonathan Jarvik, CD-tagged gene identification by Peter Berget, Computational Analysis of patterns by Xiang Chen and Robert F. Murphy
New Challenge: Functional Genomics • The various genome projectshave yielded the complete DNA sequences of many organisms. • E.g. human, mouse, yeast, fruitfly, etc. • Human: 3 billion base-pairs, 30-40 thousand genes. • Challenge: go from sequence to function, • i.e., define the role of each gene and understand how the genome functions as a whole.
Classical Analysis of Transcription Regulation Interactions “Gel shift”: electorphoretic mobility shift assay (“EMSA”) for DNA-binding proteins * Protein-DNA complex * Free DNA probe Advantage: sensitive Disadvantage: requires stable complex; little “structural” information about which protein is binding
Modern Analysis of Transcription Regulation Interactions Genome-wide Location Analysis Advantage: High throughput Disadvantage: Inaccurate
oncogenetic stimuli (ie. Ras) p14 extracellular stimuli (TGF-b) cell damage time required for DNA repair severe DNA damage p53 p53 activates activates G0 or G1 M G2 activates B p16 A PCNA Promotes cyclins D1,2 3 transcriptional activation p15 p21 S G1 E any phase E2F Cdk Apoptosis Inhibits Rb Rb - + Cyclin TNF TGF-b ... Fas PCNA (not cycle specific) Phosphorylation of + PCNA P DNA repair Gadd45 Gene Regulation and Carcinogenesis Cancer !
The Pathogenesis of Cancer Normal BCH DYS CIS SCC