490 likes | 683 Views
1: Introduction. Biological Sequence Analysis. Spring 2008. 1: Introduction. Administration. Teachers: Nimrod Rubinstein rubi@post.tau.ac.il Dudu Burstein davidbur@post.tau.ac.il Tel: 03-640-9245 Reception hours: by appointment. Course website:.
E N D
1: Introduction Biological Sequence Analysis Spring 2008
1: Introduction Administration Teachers: Nimrod Rubinstein rubi@post.tau.ac.il Dudu Burstein davidbur@post.tau.ac.il Tel: 03-640-9245 Reception hours: by appointment. Course website: http://bioinfo.tau.ac.il/~intro_bioinfo/mta
1: Introduction Requirements • Home assignments – 25% • Midterm quiz – 25% • Final project – 50% • All assignments must be submitted on time • Do not copy!
1: Introduction Goals To familiarize the students with research topics in sequence analysis in bioinformatics, and with relevant tools in this field Prerequisites • Familiarity with topics in molecular biology (cell biology and genetics) • Basic familiarity with computers & internet
1: Introduction Ask, Ask, Ask!! "אין הביישן למד"
1: Introduction What is Bioinformatics • “The analysis of biological information using computers and statistical techniques; the science of developing and utilizing computer databases and algorithms to accelerate and enhance biological research “ www.niehs.nih.gov/dert/trc/glossary.htm
1: Introduction What do bioinformaticians study? • Bioinformatics today is part of almost every molecular biological research • To name a few examples…
1: Introduction Example 1 • Compare proteins with similar sequences (for instance –kinases) and understand what the similarities and differences mean
1: Introduction Example 2 • Look at the genome and predict where genes are located (promoters; transcription factor binding sites; introns; exons)
1: Introduction Example 3 • Predict the 3-dimensional structure of a protein from its primary sequence Ab-initio prediction – extremely difficult!
1: Introduction Example 4 • Correlate between gene expression and disease A gene chip – quantifying gene expression in different tissues under different conditions May be used for personalized medicine
1: Introduction Computational biology – revolutionizing science at the turn of the century
1: Introduction Three studies using bioinformatics which highly impacted science • Classifying life into domains • Predicting drug resistance in HIV and personalizing drug administration • Solving the mystery of anthrax molecular biology
1: Introduction 1. Revolutionizing the Classification of Life
1: Introduction In the very beginning • Life was classified as • plants and animals • When Bacteria were discoveredthey were initially classified as plants • Ernst Haeckel (1866) placed all unicellular organisms in a kingdom called Protista, separated from Plantae and Animalia
1: Introduction When electron microscopes were developed, it was found that Protista in fact include both cells with and without nucleus. Also, fungi were found to differ from plants, since they are heterotrophs (they do not synthesize their food) Thus, life were classified to 5 kingdoms: LIFE Plants Animals Protists Fungi Procaryotes
1: Introduction Later on, plants, animals, protists and fungi were collectively called the Eucaryadomain, and the procaryotes were shifted from a kingdom to be a Bacteria domain Bacteria Eucarya Domains Plants Animals Protists Fungi Kingdoms Even later, a new Domain was discovered…
1: Introduction rRNA was sequenced from a great number of organisms to study phylogeny • The translation apparatus is universal and probably already existed in the “beginning”
1: Introduction Carl R. Woese and rRNA phylogeny
1: Introduction A distance matrix was computed for each two organisms. In a very influential paper, they showed that methanogenic bacteria are as distant from bacteria as they are from eucaryota (1977)
1: Introduction One sentence about methanogenic “bacteria” “There exists a third kingdom which, to date, is represented solely by the methanogenic bacteria, a relatively unknown class of anaerobes that possess a unique metabolism based on the reduction of carbon dioxide to methane”. These "bacteria" appear to be no more related to typical bacteria than they are to eucaryotic cytoplasms.“
1: Introduction From sequence analysis only, it was thus established that life is divided into 3 domains: Bacteria Archaea Eucarya
1: Introduction The rRNA phylogenetic tree
1: Introduction 2. Revolutionizing HIV treatment
Many viruses in blood DRUG, +a few more days 1: Introduction There are very efficient drugs for AIDS treatment Many viruses in blood A few viruses in blood DRUG, +a few days
1: Introduction Explanation: the virus mutates and some viruses become resistant to the drug Solution 1: combination of drugs (cocktail) Solution 2: not to give drugs for which the virus is already resistant. For example, if one was infected from a person who receives a specific drug. The question: how does one know to which drugs the virus is already resistant?
1: Introduction Sequences of HIV-1 from patients who were treated with drug A: AAGACGCATCGATCGATCGATCGTACG ACGACGCATCGATCGATCGATCGTACG AAGACACATCGATCGTTCGATCGTACG Sequences of HIV-1 from patients who were never treated with drug A: AAGACGCATCGATCGATCGATCTTACG AAGACGCATCGATCGATCGATCTTACG AAGACGCATCGATCGATCGATCTTACG
1: Introduction drug A+ AAGACGCATCGATCGATCGATCGTACG ACGACGCATCGATCGATCGATCGTACG AAGACACATCGATCGTTCGATCGTACG drug A- AAGACGCATCGATCGATCGATCTTACG AAGACGCATCGATCGATCGATCTTACG AAGACGCATCGATCGATCGATCTTACG This is an easy example!
1: Introduction drug A+ AAGACGCATCGATCGATCGATCGTACG ACGACGCATCGATCGATCGATCGTACG AAGACACATCGATCATTCGATCATACG drug A- AAGACGCATCGATCTATCGATCTTACG AAGACGCATCGATCTATCGATCTTACG AAGACGCATCGATCAATCGATCGTACG This is NOT an easy example! It’s an example of a classification problem
1: Introduction • 2006: Five machine learning tools were compared: • Decision trees • Linear regression • Linear discriminant analysis • Neural networks • Support vector regression ~80% accuracy
1: Introduction 3. Revolutionizing our understanding of the anthrax molecular mechanism
1: Introduction • Anthrax is a disease whose causative agent is the gram positive Bacillus anthracis • It infects mainly cattle, swine, and horses but it can also infect humans • Humans are infected from milk or meat from infected animals • In humans, it causes skin problems, in cattle – fatal blood poisoning
1: Introduction • A vaccine was found by Pasteur • Koch was the first to isolate the bacterium • Airborne anthrax, such as that induced by weaponized strains used for • bioterrosrism is almost • always fatal in humans • (respiratory distress, • hemorrhage)
1: Introduction How does the bacterium Bacillus anthracis work? It secretes three proteins: protective antigen (PA), edema factor (EF), and lethal factor (LF) PA monomer first binds to a host-cell surface receptor. This binding triggers proteolytic cleavage (a part of the N terminus is cut out) The (remaining) PA monomers oligomerize, forming heptamers
1: Introduction • LF and EF bind the heptamer and the entire complex is internalized into an endosome • The acidity in the endosome causes a conformational change in the complex, which helps it penetrate the endosome membrane and to form a pore • The story continues…
1: Introduction • Researchers from the group of David Baker wanted to know how LF and EF bind to the heptameric PA. They used a bioinformatics method • called docking…
1: Introduction This is where the two proteins interact!
1: Introduction Once they had a prediction, they performed mutagenesis experiments. Changing residues in the predicted interface cancelled the binding interaction.
1: Introduction How does docking work? Each 3D conformation is given a score. The pair with the best score is chosen
1: Introduction Challenges: what is the best score? How to go over as many conformations as possible? How to take into account that proteins are flexible?
1: Introduction Genomics: historical chronicle Genome Project 2003 Watson and Crick DNA Discovery 1953 Gregor Mendel laws of inheritance,“gene” 1866
1: Introduction Genome Project 2003
1: Introduction (Slide from Prof. Ron Shamir)
1: Introduction Bioinformatics The marriage of Computer Science and Biology • Organize, store, analyze, visualize genomic data • Utilizes methods from Computer Science, Mathematics, Statistics and Biology (Slide from Prof. Ron Shamir)
(Slide from Prof. Ron Shamir) 1: Introduction Bioinformatics • At the convergence of two revolutions: the ultra-fast growth of biological data, and the information revolution 22 Aug 2005: 100,000,000,000 bases Biology is becoming an information science
1: Introduction Bioinformatics – a short CV • Born ~1990 • Grown rapidly • Experience: essential part of modern Biomedical sciences • Now, a separate multidisciplinary scientific area • Is one of the cornerstones of 21st Century biomedical research (Slide from Prof. Ron Shamir)
1: Introduction The Bioinformatics Actors • Academic research: where it all started • Biotechnology companies • Big Pharmas • National and international centers Find me gene (gin?)
(Slide from Prof. Ron Shamir) 1: Introduction Bioinformatics in Israel • World class player in research • Ranked 2-3 in absolute number of papers in the most prestigious and competitive conferences • Maintaining our competitive global position is nontrivial