1 / 13

Pfam: multiple sequence alignments and HMM-profiles of protein domains

Pfam: multiple sequence alignments and HMM-profiles of protein domains. Xianhui Li 03-02-2004. Outline. What is Pfam? What is a Hidden Markove model (the methodology underlying Pfam)? How to use Pfam and sample output. pfam.

mgrier
Download Presentation

Pfam: multiple sequence alignments and HMM-profiles of protein domains

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pfam: multiple sequence alignments and HMM-profiles of protein domains Xianhui Li 03-02-2004

  2. Outline • What is Pfam? • What is a Hidden Markove model (the methodology underlying Pfam)? • How to use Pfam and sample output

  3. pfam • Pfam is a database of multiple alignments of protein domains or conserved protein regions. • The alignments represent some evolutionary conserved structure which has implications for the protein's function. • Profile hidden Markov models (profile HMMs) built from the Pfam alignments can be very useful for automatically recognizing that a new protein belongs to an existing protein family, even if the homology is weak.

  4. Overview of Pfam Database • Pfam A contains curated families each with an associated profile HMM that can be used for alignment and database searching • Annotation --contains several compulsory fields • Seed alignment– a manually verified multiple alignment of a representative set of sequences • HMM –profile— turned a multiple sequence alignment into a position-specific scoring system. • Full alignment– generated automatically from the seed HMM-profile by searching Swisssprot for all detectable members and aligning them to the HMM profile • PfamB areclustered automatically, allowing Pfam to be comprehensive

  5. Pfam Sequence Database Coverage residue Sequence Data shown is from Pfam v2.0 as of 1998 with 527 families. Current version is Pfam 12.0 (January 2004) contains alignments and models for 7316 protein families, based on the Swissprot 42.5 and SP-TrEMBL 25.6 protein sequence databases

  6. Emit 1 Emit 4 Begin End Emit 2 Emit 3 Markov Model • Simplest example: Each state emits (or, equivalently, recognizes) a particular element with probability 1. Example sequences: 1234 234 14 121214 2123334

  7. 0.9 0.5 A (0.8) B(0.2) C (0.1) D(0.9) 1.0 Begin End 0.8 0.1 0.25 0.75 0.5 B (0.7) C(0.3) C (0.6) A(0.4) 0.2 Probabilistic Emission • If we let the states define a set of emission probabilities for elements, we can no longer be sure which state we are in given a particular element of a sequenceBCCD or BCCD ?

  8. 0.9 0.5 1.0 A (0.8) B(0.2) C (0.1) D(0.9) Begin End 0.8 0.1 0.25 0.75 0.5 B (0.7) C(0.3) C (0.6) A(0.4) 0.2 Hidden Markov Models (HMM) • Emission uncertainty means the sequence doesn't identify a unique path. The states are “hidden” • Probability of a sequence is sum of all paths that can produce it: p(bccd) = 0.5 * 0.2 * 0.1 * 0.3* 0.75 * 0.6 * 0.8 * 0.9 + 0.5 * 0.7 * 0.75 * 0.6 * 0.2 * 0.6* 0.8 * 0.9 = 0.000972 + 0.013608 = 0.01458

  9. insert insert insert end match match start delete delete HMMs for homology • Homology model: ancestral residue (match) states, insertion states, deletion states.

  10. Profile HMM

  11. Searching Pfam • Web site: provide users the ability to search query protein sequences against one, all, or a few PfamHMM. _http://www.sanger.ac.uk/Pfam _http://genome.wustl.edu/Pfam --http://www.cgr.ki.se/Pfam . Software: Users can use Pfam HMM-profile to search locally using the freely available HMMERsoftware package at: http://genome.wustle.edu/eddy/hmmer.html#hmmer

  12. Sample Pfam Query Results

  13. Acknowledgements • Some slides adapted from lectures by Larry Hunter at University of Colorado Health Sciences Center • Altmann Lab for critical comments

More Related