370 likes | 484 Views
Instructors: Serafim Batzoglou and Jean-Claude Latombe Teaching Assistant: Sam Gross | serafim | latombe | ssgross | @ cs.stanford.edu. Spring 2006 – http://www.stanford.edu/class/cs273/. CS273 Algorithms for Structure and Motion in Biology. Need a Scribe!!. Range of Bio-CS Interaction.
E N D
Instructors: Serafim Batzoglou and Jean-Claude Latombe Teaching Assistant: Sam Gross | serafim | latombe | ssgross | @ cs.stanford.edu Spring 2006 – http://www.stanford.edu/class/cs273/ CS273Algorithms for Structure and Motion in Biology
Range of Bio-CS Interaction Enormous range over space and time Body system Robotic surgery Tissue/Organs Soft-tissue simulation andsurgical training Cells Simulation ofcell interaction Molecules Molecular structures,similaritiesand motions Gene Sequencealignment CS273
Focus on Proteins • Proteins are the workhorses of all living organisms • They perform many vital functions, e.g: • Catalysis of reactions • Transport of molecules • Building blocks of muscles • Storage of energy • Transmission of signals • Defense against intruders
Proteins are also of great interest from a computational viewpoint • They are large molecules (few 100s to several 1000s of atoms) • They are made of building blocks (amino acids) drawn from a small “library” of 20 amino-acids • They have an unusual kinematic structure: long serial linkage (backbone) with short side-chains
Proteins are associated with many challenging problems • Predict folded structures and motion pathways • Understand why some proteins misfold or partially fold, causing such diseases as: cystic fibrosis, Parkinson, Creutzfeldt-Jakob (mad cow) • Find structural similarities among proteins and classify proteins • Find functional structural motifs in proteins • Predict how proteins bind against other proteins and smaller molecules • Design new drugs • Engineer and design proteins and protein-like structures (polymers)
translation transcription Central Dogma of Molecular Biology
O N N N N O O O Protein Sequence (residue i-1) • Long sequence of amino-acids (dozens to thousands), also called residues • Dictionary of 20 amino-acids (several billion years old)
O N N N N O O O Peptide bond(partial double bond character) Protein Sequence T
Central Dogma of Molecular Biology Physiological conditions: aqueous solution, 37°C, pH 7, atmospheric pressure
Levels of Protein Structures Quaternary hemoglobin (4 polypeptide chains)
Mostly a-helices Mostly b-sheets Mixed
Intermediate states Many pathways Folding Unfolded (denatured) state Folded (native) state
How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html
How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html
How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html
How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html
How (we think) a protein folds ... DG = DH - TDS http://www-shakh.harvard.edu/ProFold2.html
Motion of Proteins in Folded State HIV-1 protease
Structural variability of the overall ensemble of native ubiquitin structures [Shehu, Kavraki, Clementi, 2005]
Flexible Loop Loop 7 Amylosucrase
Binding Inhibitor binding to HIV protease Ligand-protein binding Protein-protein binding
GLN-101 Loop ARG-106 CH3 C O C O O Binding of Pyruvate to LDH(reduction of pyruvate to lactase) + ASP-195 + HIS-193 THR-245 Pyruvate ASP-166 NADH Nicotinamide adenine dinucleotide (coenzyme) + ARG-169 Lactate dehydrogenase environment
What is CS273 about? • Algorithms and computational schemes for molecular biology problems • Molecular biology seen by computer scientists
The Shock of Two Cultures • y = f(x) • Biologists like experiments, specifics and classifications They like it better to know many (xi,yi) – i.e., facts – and classify them, than to know f • Computer scientists like simulation, abstractions, and general algorithms They want to know f – the explanation of the facts – and efficient ways to compute it, but rarely care for any (xi,yi) • One challenge of Computational Biology is to fuse these two cultures
Two Views of a BioComputation Class • Where are IT resources for biology available and how to use them • How to design efficient data structures and algorithms for biology
Main Ideas Behind CS273 • The information is in the sequence • Sequence Structure (shape) Function • Sequence similarity Structural/functional similarity • Sequences are related by evolution
Main Ideas Behind CS273 • The information is in the sequence • Sequence Structure (shape) Function • Sequence similarity Structural/functional similarity • Sequences are related by evolution • Biomolecules move and bind to achieve their functions • Deformation folded structures of proteins • Motion + deformation multi-molecule complexes • One cannot just “jump” from sequence to function Ligand protein binding Protein folding
sequencesimilarity structuresimilarity Sequence Structure Function
Main Ideas Behind CS273 • The information is in the sequence • Sequence Structure (shape) Function • Sequence similarity Structural/functional similarity • Sequences are related by evolution • Biomolecules move and bind to achieve their functions • Deformation folded structures of proteins • Motion + deformation multi-molecule complexes • One cannot just “jump” from sequence to function • CS273 is about algorithms for sequence, structure and motion- Finding sequence and shape similarities - Relating structure to function- Extracting structure from experimental data - Computing and analyzing motion pathways
Vision Underlying CS273 • Goal of computational biology:Low-cost high-bandwidth in-silico biology • Requirements: Reliable models Efficient algorithms • Algorithmic efficiency by exploiting properties of molecules and processes: • Proteins are long kinematic chains • Atoms cannot bunch up together • Forces have relatively short ranges • Computational Biology is more than using computers to biological problems or mimicking nature (e.g., performing MD simulation)
Instructors and TAs • Instructors: • Serafim Batzoglou • Jean-Claude Latombe • TA: • Sam Gross • Emails: | serafim | latombe | ssgross | @ cs.stanford.edu • Class website: http://cs273.stanford.edu
Expected Work • Regular attendance to lectures and active participation • Class scribing (assignments will depend on # of students) • Exciting programming project:http://www.stanford.edu/class/cs273/project/project.html - Structure prediction - Clustering and distance metrics - Protein design - Something else