410 likes | 620 Views
Welcome to CS374 Algorithms in Biology. Overview. Administrivia Molecular Biology and Computation DNA, proteins, cells, evolution Some examples of CS in biology Computer Scientists vs Biologists. CS374: Algorithms in Biology cs374.stanford.edu. Attendance
E N D
Overview • Administrivia • Molecular Biology and Computation • DNA, proteins, cells, evolution • Some examples of CS in biology • Computer Scientists vs Biologists
CS374: Algorithms in Biologycs374.stanford.edu • Attendance • At most 2 classes missed without affecting grade • Lectures • Most important requirement • Select available topic & day, send email to Serafim and George • Read papers, meet with Serafim 1-2 weeks before lecture • Ask George any questions on papers while preparing presentation • Schedule long (2 hr) meeting with Serafim the day before lecture • Slides due at noon before lecture
CS374: Algorithms in Biologycs374.stanford.edu • Scribing • Please sign up on a first-come first-serve basis • Due 1 week after lecture, edited & distributed 2 weeks after lecture • George will help you edit • Summaries • Select 1 lecture among first 10, 1 lecture among rest • Find one relevant paper • Write a 1-page summary of the paper • Paper reference • Abstract • Discussion • Ask George for questions/feedback • Have fun!
Nitrogenous Base Phosphate Group Sugar A T G C C G G C A T C G A T G C Structure of DNA double helix A, C, G, T DNA Physicist Ornithologist
RNA: carries the “message” for “translating”, or “expressing” one gene A G C G A C U G DNA to RNA, and genes DNA, ~3x109 long in humans Contains ~ 22,000 genes transcription translation folding
Structure of proteins Composed of a chain of amino acids. R | H2N--C--COOH | H 20 possible groups Sequence of amino acids folds to form a complex 3-D structure. The structure of a protein is intimately connected to its function.
21st Century AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT AGTAGGACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT
Computational Biology AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTCTCTCTAGTCTACGTGCTGTATGCGTTAGTGTCGTCGTCTAGTAGTCGCGATGCTCTGATGTTAGAGGATGCACGATGCTGCTGCTACTAGCGTGCTGCTGCGATGTAGCTGTCGTACGTGTAGTGTGCTGTAAGTCGAGTGTAGCTGGCGATGTATCGTGGT • Organize & analyze massive amounts of biological data • Enable biologists to use data • Form testable hypotheses • Discover new biology
DNA to RNA, and genes RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation A folding C 1 U G
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT ~500 nucleotides Some examples of central role of CS1. Sequencing 3x109 nucleotides
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT Some examples of central role of CS1. Sequencing 3x109 nucleotides A big puzzle ~60 million pieces Computational Fragment Assembly Introduced ~1980 1995: assemble up to 1,000,000 long DNA pieces 2000: assemble whole human genome
Complete genomes today More than 300 complete genomes have been sequenced
DNA to RNA, and genes RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation 2 A folding C 1 U G
2. Gene Finding Where are the genes? In humans: ~22,000 genes ~1.5% of human DNA
atg caggtg ggtgag cagatg ggtgag cagttg ggtgag caggcc ggtgag tga
Exon 3 Exon 1 Exon 2 Intron 1 Intron 2 5’ 3’ Splice sites Stop codon TAG/TGA/TAA Start codon ATG 2. Gene Finding Topics in CS374: Finding noncoding RNA genes Finding short words that regulate the expression of genes
DNA to RNA, and genes RNA: carries the “message” for “translating”, or “expressing” one gene A DNA, ~3x109 long in humans Contains ~ 22,000 genes G C G transcription translation easy 2 A 3 folding C 1 U G
3. Protein Folding • Topics on Proteins in CS374 • Protein Structure • Protein Structure Comparison • Evolution of Protein Domains • Molecular Dynamics & Drug Targets • Protein Classification • Protein Folding Dynamics • Protein Kinetics • 2. Protein Comparison • Latest multiple alignment tools • Selecting parameters for alignment • Phylogenetic trees • The amino-acid sequence of a protein determines the 3D fold • The 3D fold of a protein determines its function • Can we predict 3D fold of a protein given its amino-acid sequence? • Holy grail of compbio—35 years old problem • Molecular dynamics, robotics, machine learning, computational geometry
Complete Genomes More than 200 complete genomes have been sequenced
Evolution at the DNA level next generation OK OK OK X X Still OK?
4. Sequence ComparisonSequence conservation implies function • Sequence comparison is key to • Finding genes • Determining function • Uncovering the evolutionary processes
query DB Sequence Comparison—Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- | | | | | | | | | | | | | x | | | | | | | | | | | TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC BLAST Sequence Alignment Introduced ~1970 BLAST: 1990, most cited paper in history Still very active area of research
Comparison of Human, Mouse, and Rat • Topics on Genomics in CS374 • Indexing Large Databases • Newest BLAST techniques • Repeat Detection • Genomic Rearrangements • Finding the order of shuffles • between two genomes
5. Clustering of MicroarraysClinical prediction of Leukemia type • 2 types • Acute lymphoid (ALL) • Acute myeloid (AML) • Different treatment & outcomes • Predict type before treatment? Bone marrow samples: ALL vs AML Measure amount of each gene
6. Protein networks • Topics on Protein Networks in CS374 • Integration • Build networks from • multiple sources • 2. Alignment • Compare networks • across species • Mathematical properties • Modular, scale free Newer research area • Construct networks from multiple data sources • Navigate networks • Compare networks across organisms • Statistics • Machine learning • Graph algorithms • Databases
G G A G A T A C A G G A T A T A A G C T G C A G G A G A G A A C T A G G A G T T A T A T C G A A C A A C G A A G A C G A A A C T T A C G A A C G A C G A A G C A A C 7. Human evolution • Topics on Human Population • Genetics in CS374 • Evolution • Finding fast-evolving • genes in human populations • 2. Migration • Tracing the migration of • humans out of Africa by • genetic studies
The abstract submission deadline is 11:59 pm, Sunday, October 1, 2006.
Computer scientists vs Biologists • (almost) Nothing is ever true or false in Biology • Everything is true or false in computer science
Computer scientists vs Biologists • Biologists strive to understand the complicated, messy natural world • Computer scientists seek to build their own clean and organized virtual worlds
Computer scientists vs Biologists • Biologists are obsessed with being the first to discover something • Computer scientists are obsessed with being the first to invent or prove something
Computer scientists vs Biologists • Biologists are comfortable with the idea that all data have errors • Computer scientists are not
Computer scientists vs Biologists • Computer scientists get high-paid jobs after graduation • Biologists typically have to complete one or more 5-year post-docs...
Computer Science is to Biology what Mathematics is to Physics “Antedisciplinary” Science What is computational biology? http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0010006