300 likes | 482 Views
341: Introduction to Bioinformatics. Dr. Nataša Pržulj & Dr. Peter Rice Department of Comput ing Imperial College London natasha@imperial.ac.uk. Course overview. Motivation:. Explosion in the availability of biological data : Sequences and microarrays (Dr. Rice) Protein 3D structure
E N D
341: Introduction to Bioinformatics Dr. Nataša Pržulj & Dr. Peter Rice Department of Computing Imperial College London natasha@imperial.ac.uk
Course overview Motivation: • Explosion in the availability of biological data: • Sequences and microarrays (Dr. Rice) • Protein 3D structure • Networks: e.g., of protein interactions; expected to be as useful as the sequence data in uncovering new biology (Dr. Pržulj) • The goal of systems biology: • Systems-level understanding of biological systems, e.g. the cell • Analyze not only individual components, but their interactions as well and its functioning as a whole • E.g.: Learn new biology from the topology of such interaction networks • However, biological data analysis research faces considerable challenges • Incomplete and noisy data • Computational infeasibility of many computational (e.g., graph theoretic) problems
Course overview We will cover: • Sequence analysis (Dr. Peter Rice) • Microarray analysis (Dr. Peter Rice) • Graph theoretic aspects: • Fundamental topics in graph theory (e.g. basic graph notation, graph representation, and special graph types) • Basic graph algorithms (e.g., graph search/traversal algorithms and running time analysis) • Important computational complexity concepts (e.g., complexity classes, subgraph isomorphism, and NP-completeness) which pose challenges on analyzing biological nets • Protein 3D structure • Biological networks aspects: • Basic biological concepts (e.g., DNA, genes, proteins, gene expression, …) • Different types of biological networks • Experimental techniques for acquiring the data and their biases • Public databases and other sources of biological network data • Existing approaches for analyzing and modeling biological networks: • Structural properties of large networks • Network models • Network clustering • Network alignment • Software tools for network analysis • Applications – data analysis: interplay of topology and biology • Learn how the above methods have been applied • Discuss valuable insights that have been learned: into biological function, evolution, complex diseases (e.g., cancer) and drug discovery
Course overview • Grading scheme: • One coursework assignment • Given out on Feb 21 by email and posted to class website • Due on Thursday, March 6, by 2pm • Written exam • Standard DoC Grading Scheme will be used as described by Degree Regulations at https://www.doc.ic.ac.uk/internal/teachingsupport/regulations/index.htm • Other departments: we provide coursework and exam marks and they decide on the weighting for the final grade
Course overview External Students – get onto DoC CATE etc.: 1) Apply at: https://dbc.doc.ic.ac.uk/externalreg/ 2) Then, ➢ Your department's endorser will approve/reject your application 3) If approved, ➢ DoC's External Student Liaison will approve/reject your application 4) If approved (again !), ➢ Students will get access to DoC resources (DoC account, CATE, …) ➢ No access after a few days? Check status of approval and contact relevant person(s) ● Key Dates: ➢ Exam registration opens end January for 2-3 weeks ➢ Exams for DoC 3rd/4th yr. courses take place at the end of the Term in which the course is taught ● If in doubt, read the guidelines available at the link above :-)
Course overview • Course organization: • Lectures • Relevant theoretical concepts and examples • Tutorials • Exercises covering concepts covered in class • One coursework assignment • Opportunity to solve problems using the methods learned in class • Written exam • Testing students’ understanding of the concepts learned in lectures • Tutorial helpers: • AnidaSarajlic (a.sarajlic12@imperial.ac.uk) • Dr. Noel Malod-Dognin(n.malod-dognin@imperial.ac.uk) • VukJanjic ( v.janjic11@imperial.ac.uk )
Course overview • Textbooks and readings • Recommended textbooks: • Pevzner and Shamir, “Bioinformatics for Biologists,” Cambridge University Press, 2011 • Junker and Schreiber, “Analysis of Biological Networks,” Wiley, 2008. • West, “Introduction to graph theory,” 2nd edition, Prentice Hall, 2001 or T. Cormen et al., “Analysis of Algorithms”, 3rd edition, MIT press, 2009. • A list of up-to-date research papers selected by the instructor: see http://www.doc.ic.ac.uk/~natasha/course2012/class_material.html . • Recommended readings: • F. Kepes (Author, Editor), “Biological Networks (Complex Systems and Interdisciplinary Science),” World Scientific Publishing Company; 1st edition, 2007. • Bornholdt and Schuster (Editors), “Handbook of Graphs and Networks: From the Genome to the Internet,” Wiley, 2003. or Dorogovtsev and Mendes (Authors), “Evolution of Networks: From Biological Nets to the Internet and WWW (Physics),” Oxford University Press, 2003. • Chapter 17 from: Chen and Lonardi (Editors), “Biological Data Mining,” Chapman and Hall/CRC press, 2009. • Chapter 4 from: Jurisica and Wigle (Editors), “Knowledge Discovery in Proteomics,” CRC Press, 2005. • “LEDA: A Platform for Combinatorial and Geometric Computing,” by Kurt Mehlhorn, Stefan Näher, Cambridge University Press, 1999.
Course overview • When and where: • Fridays, 9-11h (LT 308) and 14-16h (LT 145) • Huxley • Contact: • E-mail: natasha@imperial.ac.uk • Subject: “341 Bioinformatics” • Office hours: • Fridays after class, 4pm • Office: 407 C Huxley
Course overview • Prerequisites: no formal ones, but • General computational/mathematical maturity • Basic programming skills are desirable • Introduction into biological concepts will be provided • Course website (curriculum, class material, etc.): • http://www.doc.ic.ac.uk/~natasha/course2012/index.html also linked from CATE • Academic code of honor
Topics • Introduction: biology (Dr. Przulj, 1 lecture) • Sequence analysis (Dr. Rice, 2 lectures) • Microarray analysis (Dr. Rice, 3 lectures) • Introduction to graph theory (Dr. Przulj, 2 lectures) • Protein 3D structure (Dr. Malod-Dognin, 2 lectures) • Network biology (Dr. Przulj, 8 lectures): • Network properties • Network/node centralities • Network motifs • Network models • Network/node clustering • Network comparison/alignment • Software tools for network analysis • Interplay between topology and biology
Course overview • Any questions so far?
Course overview • About you…
Introduction: biology • Cell - the building block of life • Cytoplasm and organelles separated by membranes: • Mitochondria, nucleus, etc. 14
Introduction: biology • Distinguish between: • Prokaryotes • Single-celled, no cell nucleus or any other membrane-bound organelles • The genetic material in prokaryotes is not membrane-bound • The bacteria and the archaea • Model organism: E.coli • Eukaryotes • Have "true" nuclei containing their DNA • May be unicellular, as in amoebae • May be multicellular, as in plants and animals • Model organism: S. cerevisiae(baker’s yeast) 15
Introduction: biology • Nucleus contains DNA • Deoxyribonucleic acid • DNA nucleotides: A and T, C and G • DNA structure: double helix 16
Introduction: biology • Chromosomes • RNA: similar to DNA, except T U and single stranded 17
Introduction: biology • Main role of DNA: long-term storage of genetic information • Genes: DNA segments that carry this information • Intron: part of gene not translated into protein, spliced out of mRNA • Exon: mRNA translated into protein consists only of exon-derived sequences • Genome: total set of all genes in an organism • Every cell (except sex cells and mature red blood cells) contains the complete genome of an organism 18
Introduction: biology • Codons: sets of three nucleotides • 4 nucleotides 43=64 possible codons • Each codon codes for an amino acid • 64 codons produce 20 different amino acids • More than one codon stands for one amino acid • Polypeptide: • String of amino acids, composed from a 20-character alphabet • Proteins: • Composed of one or more polypeptide chains (70-3000 amino acids) • Sequence of amino acids is defined by a gene • Gene expression: information transmission from DNA to proteins • Proteome: total set of proteins in an organism
Introduction: biology • The 20 amino acids 20
Introduction: biology • Levels of protein structure: 21
Introduction: biology • Genes vs. proteins • Genes – passive; proteins – active • Protein synthesis: from genes to proteins • Transcription (in nucleus) • Splicing (eukaryotes) • Translation (in cytoplasm) 22
Introduction: biology • Transcription (in nucleus) • RNA polymerase enzyme builds an RNA strand from a gene (DNA is "unzipped“) • The gene is transcribed to messenger RNA (mRNA) • Transcription is regulated by proteins called transcription factors 23
Introduction: biology • Splicing (eukaryotes) • Regions that are not coding for proteins (introns) are removed from sequence 24
Introduction: biology • Translation(in cytoplasm) • Ribosomes synthesize proteins from mRNA • mRNA is decoded and used as a template to guide the synthesis of a chain of amino acids that form a protein • Translation: the process of converting the mRNA codon sequences into an amino acid polypeptide chain 25
Introduction: biology • Microarrays: • Measure mRNA abundance for each gene • The amount of transcribed mRNA correlates with gene expression: • The rate at which a gene produces the corresponding protein It is hard to measure protein level directly! 26
Introduction: biology • Every cell* contains the complete genome of an organism • How is the variety of different tissues encoded and expressed? 27
Introduction: biology 22,000? 28
Introduction: biology • -ome and –omics • Genome and genomics • Proteome and proteomics • … 29