1 / 114

Bioinformatics and sequence analysis

Bioinformatics and sequence analysis. Michael Nilges Unité de Bio-Informatique Structurale Institut Pasteur, Paris Mars 2002. Overview. Bioinformatics: a brief overview Organising knowledge: databanks and databases Protein sequence analysis Sequence alignment

doyle
Download Presentation

Bioinformatics and sequence analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformaticsand sequence analysis Michael Nilges Unité de Bio-Informatique Structurale Institut Pasteur, Paris Mars 2002

  2. Overview • Bioinformatics: a brief overview • Organising knowledge: databanks and databases • Protein sequence analysis • Sequence alignment • Multiple alignment and sequence pofiles • Phylogenetic trees Bioinformatics lecture March 5, 2002

  3. I. Bioinformatics - a brief overview

  4. What is it? Bioinformatics: Deduction of knowledge by computer analysis of biological data. or: see 20000 pages on this issue on the WWW Bioinformatics lecture March 5, 2002

  5. The data • information stored in the genetic code (DNA) • protein sequences • 3D structures • experimental results from various sources • patient statistics • scientific literature Bioinformatics lecture March 5, 2002

  6. Algorithmic developments • Important part of research in bioinformatics: methods for data storage data retrieval data analysis Bioinformatics lecture March 5, 2002

  7. Interdisciplinary research • rapidly developing branch of biology • highly interdisciplinary: using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. • many practical applications in biology and medicine. Bioinformatics lecture March 5, 2002

  8. Computation in biology... • similar to other sciences: computational physics, computational chemistry derivation of physics laws from astronomical data • already in the '20s biologists wanted to derive knowledge by induction • reasons for recent development: development of computers and networks availability of data (sequences, 3D structures) amount of data Bioinformatics lecture March 5, 2002

  9. Why? • An avalanche of data: • Sequences • Function related • Structures • requires computational approaches Bioinformatics lecture March 5, 2002

  10. Genomics • New way to perform experiments: • accumulation of data: • sequences • structures, • function-related • not hypothesis-driven • Hypothesis formed later and tested in silico Bioinformatics lecture March 5, 2002

  11. Bioinformatics key areas • e.g. homology searches organisation of knowledge (sequences, structures, functional data) Bioinformatics lecture March 5, 2002

  12. Structural Bioinformatics • Prediction of structure from sequence • secondary structure • homology modelling, threading • ab initio 3D prediction • Analysis of 3D structure • structure comparison/ alignment • prediction of function from structure • molecular mechanics/ molecular dynamics • prediction of molecular interactions, docking • Structure databases (RCSB) Bioinformatics lecture March 5, 2002

  13. Structural Bioinformatics Bioinformatics lecture March 5, 2002

  14. II. Databases

  15. Organizing knowledgein databanks and databases • Introduction • Sequence databanks and databases • EMBL, SwissProt, TREMBL • SRS: Sequence Retrieval system • 3D structure database: the RCSB - PDB • Domain databases Bioinformatics lecture March 5, 2002

  16. Biological databanks and databases • Very fast growth of biological data • Diversity of biological data: • primary sequences • 3D structures • functional data • Database entry usually required for publication • Sequences • Structures • Database entry may replace primary publication • genomic approaches Bioinformatics lecture March 5, 2002

  17. DNA sequence data bases • Three databanks exchange data on a daily basis • Data can be submitted and accessed at either location • Genebank • www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html • EMBL • www.ebi.ac.uk/embl/index.html • DNA DataBank of Japan (DDBJ) • www.nig.ac.jp/home.html Bioinformatics lecture March 5, 2002

  18. EMBL database growth Bioinformatics lecture March 5, 2002

  19. Distribution of entries Bioinformatics lecture March 5, 2002

  20. EMBL database documentation • Information on • user manual • release notes • feature table definition... see • http://www.ebi.ac.uk/embl/Documentation Bioinformatics lecture March 5, 2002

  21. EMBL entry for insulin receptor Bioinformatics lecture March 5, 2002

  22. EMBL entry 2: features Bioinformatics lecture March 5, 2002

  23. EMBL entry 3: sequence Bioinformatics lecture March 5, 2002

  24. SwissProt protein sequence data baseTREMBL translated EMBL • hosted jointly by EBI (European Bioinformatics Institute, an EMBL outstation in Hinxton, UK) and SIB (Swiss Institute for Bioinformatics in Lausanne and Geneva) • SwissProt is curated (Amos Bairoch) • quality checks • annotations • links to other databases • TREMBL: automatic translation of EMBL • automatic annotations Bioinformatics lecture March 5, 2002

  25. ExPASy - www.expasy.orgExpert Protein Analysis System Bioinformatics lecture March 5, 2002

  26. FASTA format one line header, starting with > some programs require several characters without space after > sequence, in free format (no numbers) Bioinformatics lecture March 5, 2002

  27. SWISS-PROT entry for insulin receptor(NiceProt view)

  28. Features of insulin receptor Bioinformatics lecture March 5, 2002

  29. Niceprot Feature Aligner Bioinformatics lecture March 5, 2002

  30. Clustalw-alignment of two domains Bioinformatics lecture March 5, 2002

  31. Links to other sites (Blast, ...) Bioinformatics lecture March 5, 2002

  32. The RCSB-PDBwww.rcsb.org/pdb • Data bases for 3D structures of biological macromolecules (proteins, nucleic acides) • RCSB (Research Collaboratory for Structural Bioinformatics) maintains and develops the PDB (Protein Data Bank) • others: • MMDB (EBI): msd.ebi.ac.uk • NCBI: www.ncbi.nlm.nih.gov/Structure/ Bioinformatics lecture March 5, 2002

  33. www.rcsb.org/pdb

  34. Results of a simple query Bioinformatics lecture March 5, 2002

  35. View structures Bioinformatics lecture March 5, 2002

  36. Domain databases • Pfam (A/B) www.sanger.ac.uk/Pfam • Smart smart.embl-heidelberg.de • Prodom prodes.toulouse.inra.fr/prodom/doc/prodom.html • Dart www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps • Interpro www.ebi.ac.uk/interpro/ Bioinformatics lecture March 5, 2002

  37. InterPro • InterPro release 4.0 (Nov 2001) was built from • Pfam 6.6, PRINTS 31.0, PROSITE 16.37, ProDom 2001.2, SMART 3.1, TIGRFAMs 1.2, SWISS-PROT + TrEMBL data. • 4691 entries: 1068 domains, 3532 families, 74 repeats and 15 post-translational modification sites. Bioinformatics lecture March 5, 2002

  38. Results of InterPro search for spectrin Bioinformatics lecture March 5, 2002

  39. Spectrin repeat Bioinformatics lecture March 5, 2002

  40. SMART database Bioinformatics lecture March 5, 2002

  41. Domain architecture of spectrin beta chain Bioinformatics lecture March 5, 2002

  42. Pfam home page Bioinformatics lecture March 5, 2002

  43. Compilations of links to databases at Institut Pasteur www.pasteur.fr/recherche/banques at Infobiogen (Evry) www.infobiogen.fr/services/deambulum/fr European bioinformatics institute (ebi) www.ebi.ac.uk/Databases/index.html at the swiss institute for bioinformatics (SIB) www.expasy.org www.expasy.org/alinks.html#Proteins Bioinformatics lecture March 5, 2002

  44. SRS sequence retrieval system • unified way to access and link information in different databases • powerful queries • launch applications (e.g. blast, clustalw...) • temporary and permanent projects • can be reached from the pasteur databank page • srs.pasteur.fr/cgi-bin/srs6/wgetz Bioinformatics lecture March 5, 2002

  45. SRS 6 start page Bioinformatics lecture March 5, 2002

More Related