460 likes | 535 Views
Bioinformatics A Biologist’s perspective Rob Rutherford. 1. The Biologist’s perspective 2. A survey of tools 3. Training students for the future.
E N D
1. The Biologist’s perspective2. A survey of tools3. Training students for the future
If the biota, in the course of eons, has built something …..who but a fool would discard seemingly useless parts? To keep every cog and wheel is the first precaution of intelligent tinkering. -Aldo Leopold (1887 - 1948)
Figure 1.18 Careful observation and measurement provide the raw data for science
PubMed had 400,000 new research articles entered in 2002.NCBI-NLM, 2003 Productive Tinkerers
“If your experiment needs statistics, you ought to have done a better experiment.”-Rutherford (the other one)
“To consult the statistician after an experiment is finished is … to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.” RA Fisher 1956, University of Adeliade Archives
2 Wings of Bioinformatics Housekeeping Bioinformatics Representation, storage, and distribution of data Analytical Bioinformatics New tools for the discovery of knowledge in data
“The Central Dogma” DNA Information Warehouse (4 nucleic acid letters atgc) RNA Temporary copy of a gene Protein Working Cellular Machine (20 amino acid letters) RNA polymerase PDB
A Survey of Problems Finding Genes and Understanding Genes Protein Structure and Function Gene Expression Networks Other areas
Human Genes Rutherford
10 20 30 40 ....*....|....*....|....*....|....*....| consen 1 SPKNTPVVLIPKKGPGKYRPISlvDYKILNKATKKrFSpp40 1MML 83 SPWNTPLLPVKKPGTNDYRPVQ--DLREVNKRVED-IH--1171HNI_B54 NPYNTPVFAIKKKDSTKWRKLV--DFRELNKRTQD-FWev901MU2_B49 NPYNTPTFAIKKKDKNKWRMLI--DFRELNKVTQD-FTei851D1U_A69 SPWNTPLLPVKKPGTNDYRPVQ--DLREVNKRVED-IH–103 50 60 70 80 ....*....|....*....|....*....|....*....| Consen 41 qPGFRPGRSLLNKLKGS-KWFLKLDLKKAFDSIPHDPLLR 79 1MML 118 -PTVPNPYNLLSGLPPShQWYTVLDLKDAFFCLRLHPTSQ 156 1HNI_B 91 qLGIPHPAGL-----KKKKSVTVLDVGDAYFSVPLDEDFR 125 1MU2_B 86 qLGIPHPAGL—AKK -RRITVLDVGDAYFSIPLHEDFR 120 1D1U_A 104-PTVPNPYNLLSGLPPShQWYTVLDLKDAFFCLRLHPTSQ 142 CnD3 HIV
Finding Conserved Regions/Domains HIV protein Comparing your sequence versus models derived from curated known protein families
Phylogenetics and Evolution Thanks to Porterfield
Protein Structure Imaging Experimental X-ray diffraction data Predicting structure in silico from sequence
Structure is Function HIVreverse tanscriptase DNA (human genome) RNA (HIV virus) Protein Goodsell, PDB
Figure 17.0 Ribosome Structural Predictions just from raw protein sequence?
1 ggcacgaggc acggctgtgc aggcacgcat gcaggccagc …. Figure 17.0 Ribosome 1 atctgcacgt ggttatgctg ccggagtttg ggccgccact….
An example: CASPCommunity Wide Assessment of techniques for Protein Structure Prediction Every two years, contest to test protein structure prediction from primary sequence
Gene ExpressionSequencing RNA (ESTs)Sequencing bits of ESTs (SAGE)Automation of In situDNA microarray technology
MicroArray One spot for each gene
Microarray Expression Analysis Reference Mixture Specific Organ
S i g E S i g H I d e R N r p R H 2 O 2 I r o n N O N O S D S D i a m i d e Low O2 Dormancy Genes Experimental Conditions 4000 Genes Gene turned on Gene turned off
Metabolic Pathway Map Building Transcriptional Network Map
NetworksBiochemical PathwaysSignaling NetworksTranscriptional Networks Computational Neuroscience
Microarrays uncover networks of interactions… Scientific American 2001
Other Opportunities Organismal Physiology Populations Communities Ecosystems
Same issues in “Macro” Biology • Long history of mathematical • modeling • Huge datasets from • GPS/GIS • Remote sensing
If the biota, in the course of eons, has built something …..who but a fool would discard seemingly useless parts? To keep every cog and wheel is the first precaution of intelligenttinkering. -Aldo Leopold (1887 - 1948)
Dr. Peter MunsonHead of the Mathematical and Statistical Computing Laboratory Division of Computational Biosciences National Institutes of Health Ole’ pre 1976
The Tool Builders • Excellent mathematical skills (algorithms, linear algebra, data structures) • Be comfortable in a Linux/Unix environment, and know Perl and C/C++. • A deep background in 2+ advanced area of biology with chemistry prerequisites. • Graduate training
The systems biologist. Biologist who is an intelligent and skeptical consumer of large data sets • Probability and Statistics • SQL and database basics • Equilibrium and rates of change (Calculus) • Exposure to system level data • And who knows how and when to collaborate(!)