1 / 38

MBV3070

MBV3070. Bioinformatikk. Pensumliste MBV3070 - Bioinformatikk. Arthur M. Lesk: Introduction to Bioinformatics. Oxford University Press 2002. 270 sider I tillegg: Tom Kristensen: Sekvenssammenstillinger. 7 sider.

kareem
Download Presentation

MBV3070

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MBV3070 Bioinformatikk

  2. Pensumliste MBV3070 - Bioinformatikk • Arthur M. Lesk: Introduction to Bioinformatics. Oxford University Press 2002. 270 sider • I tillegg: • Tom Kristensen: Sekvenssammenstillinger. 7 sider. • Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680. • D.G:Higgins, J.D.Thompson and T.J.Gibson: Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266 (1994) 383-402 • ??? (Genfinning) • ???? (Mikromatriser

  3. Fremdriftsplan • Innledning. Sekvensering. • Databaser. Entrez og SRS. Dotplots • Parvis sekvenssammenstilling • FASTA og BLAST • Flersekvenssammenstilling. ClustalW/ClustalX • Motiver, profiler, PSI-BLAST • Fylogeni • Genomer. Analyse av genomisk DNA. Genfinning • Mikromatriser (Ola Myklebost/Ole Chr. Lindgjærde) • Proteinmodellering • Proteinmodellering • Proteinmodellering Vincent Eijsink

  4. Nyttige nettsteder for MBV3070 • Emnets hjemmeside: http://www.uio.no/studier/emner/matnat/molbio/MBV3070/v04/ • Lærebokas hjemmeside: http://www.oup.com/uk/lesk/bioinf/

  5. Hva er bioinformatikk? The NIH Biomedical Information Science and Technology Initiative Consortium agreedon the following definitions of bioinformatics and computational biology recognizing thatno definition could completely eliminate overlap with other activities or precludevariations in interpretation by different individuals and organizations. Bioinformatics: Research, development, or application of computational tools andapproaches for expanding the use of biological, medical, behavioral or health data,including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical andtheoretical methods, mathematical modeling and computational simulation techniquesto the study of biological, behavioral, and social systems.

  6. Andre måter å definere bioinformatikk på • "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information." Fredj Tekaja, Institute Pasteur • ”The use of computers to store, retrieve, analyze or predict the composition or the structure of biomolecules.” Damian Councell, bioinformatics.org

  7. “For the last three and a half billion years, evolution has been taking notes.” “It tries experiments. It wakes up every morning, does a little mutagenesis, changes a nucleotide here and there, and sees how it works. If it’s a success, it keeps the notes. In this notebook, we have all of the information of the greatest experimental tinkerer ever.” Dr. Eric LanderDirector of the Whitehead InstituteMIT Center for Genome Research

  8. Hva betyr dette?

  9. AAdenine CCytosine GGuanine TThymine UUracil RGuanine / Adenine (puRine) YCytosine / Thymine (pYrimidine) KGuanine / Thymine (Keto) MAdenine / Cytosine (aMino) SGuanine / Cytosine (Strong) WAdenine / Thymine (Weak) BGuanine / Thymine / Cytosine (not A) DGuanine / Adenine / Thymine (not C) HAdenine / Cytosine / Thymine (not G) VGuanine / Cytosine / Adenine (not T) NAdenine / Guanine / Cytosine / Thymine Base symbols

  10. Hvorfor tvetydige symboler? • Sekvenseringsinstrumenter vil ikke alltid kunne lese sekvensen entydig • I konsensussekvenser er det nyttig med tvetydige symboler

  11. Den genetiske kode

  12. Den genetiske kode

  13. A Ala alanine B Asx aspartic acid or asparagine C Cys cysteine D Asp aspartic acid E Glu glutamic acid F Phe phenylalanine G Gly glycine H His histidine I Ile isoleucine K Lys lysine L Leu leucine M Met methionine N Asn asparagine P Pro proline Q Gln glutamine R Arg arginine S Ser serine T Thr threonine U Sec selenocysteine V Val valine W Trp tryptophan XXaa unknown or 'other' amino acid Y Tyr tyrosine Z Glx glutamic acid or glutamine (or substances such as4-carboxyglutamic acid and 5-oxoproline thatyield glutamic acid on acid hydrolysis of peptides) Aminosyresymboler

  14. To måter å sekvensere på • Shotgun-sekvensering: Dette er strategien som ble valgt av Celera for kommersiell sekvensering av det humane genom • Ordnet sekvensering (top down): Denne strategien ble brukt i den ”offentlige” sekvensering av genomet, i et internasjonalt samarbeid

  15. Ovenfra og nedover-strategi for sekvensering

  16. BAC to BAC SequencingThe BAC to BAC approach first creates a crude physical map of the whole genome before sequencing the DNA. Constructing a map requires cutting the chromosomes into large pieces and figuring out the order of these big chunks of DNA before taking a closer look and sequencing all the fragments. Whole Genome Shotgun SequencingThe shotgun sequencing method goes straight to the job of decoding, bypassing the need for a physical map. Therefore, it is much faster. To måter å sekvensere genomet på

  17. BAC to BAC Sequencing Whole Genome Shotgun Sequencing Fragmentering av genomet

  18. BAC to BAC Sequencing Whole Genome Shotgun Sequencing Kloning av fragmentene

  19. BAC to BAC Sequencing Whole Genome Shotgun Sequencing This step not needed in shotgun sequencing Plassering på kartet av BAC-klonene

  20. BAC to BAC Sequencing Whole Genome Shotgun Sequencing This step not needed in shotgun sequencing Subkloner fra BAC-klonene

  21. BAC to BAC Sequencing Whole Genome Shotgun Sequencing Sekvensering av klonene

  22. Råsekvens fra et sekvenseringsinstrument

  23. BAC to BAC Sequencing Whole Genome Shotgun Sequencing Oppbygging av sammenhengende sekvenser

  24. Sammensetting av enkeltsekvenser til større sekvenser

  25. DNA sequencing 2001

  26. Biological databases • Primary databases (archival) • GenBank, EMBL, DDBJ,PDB • Secondary databases (curated) • PIR, SwissProt and everything else

  27. Genomics Databases (non-vertebrate) Human and other Vertebrate Genomes Human Genes and Diseases Metabolic and Signaling Pathways Microarray Data and other Gene Expression Databases Nucleotide Sequence Databases Other Molecular Biology Databases Protein sequence databases Proteomics Resources RNA sequence databases Structure Databases Database Categories Listhttp://www3.oup.co.uk/nar/database/c/ In all 548 databases, 162 more than one year ago

  28. GenBank entry LOCUS LISOD 756 bp DNA BCT 30-JUN-1993 DEFINITION L.ivanovii sod gene for superoxide dismutase. ACCESSION X64011 S78972 NID g44010 VERSION X64011.1 GI:44010 KEYWORDS sod gene; superoxide dismutase. SOURCE Listeria ivanovii. ORGANISM Listeria ivanovii Bacteria; Firmicutes; Bacillus/Clostridium group; Bacillaceae; Listeria. REFERENCE 1 (bases 1 to 756) AUTHORS Haas,A. and Goebel,W. TITLE Cloning of a superoxide dismutase gene from Listeria ivanovii by functional complementation in Escherichia coli and characterization of the gene product JOURNAL Mol. Gen. Genet. 231 (2), 313-322 (1992) MEDLINE 92140371 REFERENCE 2 (bases 1 to 756) AUTHORS Kreft,J. TITLE Direct Submission JOURNAL Submitted (21-APR-1992) J. Kreft, Institut f. Mikrobiologie, Universitaet Wuerzburg, Biozentrum Am Hubland, 8700 Wuerzburg, FRG

  29. GenBank entry (cont.) FEATURES Location/Qualifiers source 1..756 /organism="Listeria ivanovii" /strain="ATCC 19119" /db_xref="taxon:1638" RBS 95..100 /gene="sod" gene 95..746 /gene="sod" CDS 109..717 /gene="sod" /EC_number="1.15.1.1" /codon_start=1 /transl_table=11 /product="superoxide dismutase" /protein_id="CAA45406.1" /db_xref="SWISS-PROT:P28763" /translation="MTYELPKLPYTYD… terminator 723..746 /gene="sod" BASE COUNT 247 a 136 c 151 g 222 t ORIGIN 1 cgttatttaa ggtgttacat agttctatgg aaatagggtc tatacctttc gccttacaat 61 gtaatttctt //

  30. EMBL database entry EMBL:TRBG361 ID TRBG361 standard; RNA; PLN; 1859 BP. XX AC X56734; S46826; XX SV X56734.1 XX DT 12-SEP-1991 (Rel. 29, Created) DT 15-MAR-1999 (Rel. 59, Last updated, Version 9) XX DE Trifolium repens mRNA for non-cyanogenic beta-glucosidase XX KW beta-glucosidase. XX OS Trifolium repens (white clover) OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; Rosidae; OC eurosids I; Fabales; Fabaceae; Papilionoideae; Trifolieae; Trifolium. XX

  31. EMBL database entry (cont.) RN [5] RP 1-1859 RX MEDLINE; 91322517. RA Oxtoby E., Dunn M.A., Pancoro A., Hughes M.A.; RT "Nucleotide and derived amino acid sequence of the cyanogenic RT beta-glucosidase (linamarase) from white clover (Trifolium repens L.)."; RL Plant Mol. Biol. 17:209-219(1991). XX RN [6] RP 1-1859 RA Hughes M.A.; RT ; RL Submitted (19-NOV-1990) to the EMBL/GenBank/DDBJ databases. RL M.A. Hughes, UNIVERSITY OF NEWCASTLE UPON TYNE, MEDICAL SCHOOL, NEW CASTLE RL UPON TYNE, NE2 4HH, UK XX DR AGDR; X56734; X56734. DR MENDEL; 11000; Trirp;1162;11000. DR SWISS-PROT; P26204; BGLS_TRIRP. XX

  32. EMBL database entry (cont.) FH Key Location/Qualifiers FH FT source 1..1859 FT /db_xref="taxon:3899" FT /organism="Trifolium repens" FT /tissue_type="leaves" FT /clone_lib="lambda gt10" FT /clone="TRE361" FT CDS 14..1495 FT /db_xref="SWISS-PROT:P26204" FT /note="non-cyanogenic" FT /EC_number="3.2.1.21" FT /product="beta-glucosidase" FT /protein_id="CAA40058.1" FT /translation="MDFIVAIFALFVISSFTITSTNAVEASTLLDIGNLSRSSFPRGFI FT FGAGSSAYQFEGAVNEGGRGPSIWDTFTHKYPEKIRDGSNADITVDQYHRYKEDVGIMK FT DQNMDSYRFSI…. FT mRNA 1..1859 FT /evidence=EXPERIMENTAL XX SQ Sequence 1859 BP; 609 A; 314 C; 355 G; 581 T; 0 other; aaacaaacca aatatggatt ttattgtagc catatttgct ctgtttgtta ttagctcatt 60 cacaattact tccacaaatg cagttgaagc ttctactctt cttgacatag gtaacctgag 120 tcggagcagt tttcctcgtg

  33. EMBL database fields Note that each line begins with a two-character line code, which indicates the type of information contained in the line. The currently used line types, along with their respective line codes, are listed below: ID - identification (begins each entry; 1 per entry) AC - accession number (>=1 per entry) SV - new sequence identifier (>=1 per entry) DT - date (2 per entry) DE - description (>=1 per entry) KW - keyword (>=1 per entry) OS - organism species (>=1 per entry) OC - organism classification (>=1 per entry) OG - organelle (0 or 1 per entry) RN - reference number (>=1 per entry) RC - reference comment (>=0 per entry)

  34. EMBL database fields (cont.) RP - reference positions (>=1 per entry) RX - reference cross-reference (>=0 per entry) RA - reference author(s) (>=1 per entry) RT - reference title (>=1 per entry) RL - reference location (>=1 per entry) DR - database cross-reference (>=0 per entry) FH - feature table header (0 or 2 per entry) FT - feature table data (>=0 per entry) CC - comments or notes (>=0 per entry) XX - spacer line (many per entry) SQ - sequence header (1 per entry) bb - (blanks) sequence data (>=1 per entry) // - termination line (ends each entry; 1 per entry)

  35. The feature table • The overall goal of the feature table design is to provide an extensive vocabulary for describing features in a flexible framework for manipulating them. The Feature Table documentation represents the shared rules that allow the three databases to exchange data on a daily basis. • The range of features to be represented is diverse, including regions which: • perform a biological function, • affect or are the result of the expression of a biological function, • interact with other molecules, • affect replication of a sequence, • affect or are the result of recombination of different sequences, • are a recognizable repeated unit, • have secondary or tertiary structure, • exhibit variation, or • have been revised or corrected.

  36. Feature table terminology The format and wording in the feature table use common biological research terminology whenever possible. For example, an item in the new feature table such as: Key Location/Qualifiers CDS 23..400 /product="alcohol dehydrogenase" /gene="adhI" might be read as: The feature CDS is a coding sequence beginning at base 23 and ending at base 400, has a product called 'alcohol dehydrogenase' and corresponds to the gene called 'adhI'.

  37. Feature table terminology (cont.) A more complex description: Key Location/Qualifiers CDS join(544..589,688..1032) /product="T-cell receptor beta-chain" /partial which might be read as: This feature, which is a partial coding sequence is formed by joining the indicated elements to form one contiguous sequence encoding a product called T-cell receptor beta-chain.

  38. Feature key examples Key Description conflict Separate determinations of the "same" sequence differ rep_origin Origin of replication protein_bind Protein binding site on DNA CDS Protein-coding sequence misc_RNA Generic label for an undefined RNA insertion_seq Insertion element D-loop Mitochondrial or other D-loop structure

More Related