1 / 25

Mutiple Motifs

Mutiple Motifs. Charles Yan Spring 2006. Mutiple Motifs. From Single Motif to Multiple Motifs. One single motif is not sufficent to discriminate a protein family. Multiple motifs have stronger discriminating power. Multiple Motifs. Protein function prediction using multiple motifs

gala
Download Presentation

Mutiple Motifs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mutiple Motifs Charles Yan Spring 2006

  2. Mutiple Motifs

  3. From Single Motif to Multiple Motifs One single motif is not sufficent to discriminate a protein family. Multiple motifs have stronger discriminating power.

  4. Multiple Motifs Protein function prediction using multiple motifs • Each protein family is characterized by a set of motifs (in stead of a single one). • If a protein contain a set of motifs, it probably belong to the family that the set of motifs correspond to.

  5. PRINTS • PRINTS(http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/ ) is a database of protein fingerprints. • A fingerprint is a group of conserved motifs used to characterize a protein family; • ftp.bioinf.man.ac.uk/pub/prints • PRINTS is now maintained at the University of Manchester • PRINTS VERSION 38.0 (16 June, 2005) • 1900 FINGERPRINTS, encoding11,435 single motifs

  6. PRINTS • Each fingerprint has been defined and iteratively refined using database SWISS-PROT/TrEMBL composite. • Two types of fingerprint are represented in the database, i.e. they are either simple or composite, depending on their complexity: simple fingerprints are essentially single-motifs; while composite fingerprints encode multiple motifs. The bulk of the database entries are of the latter type because discrimination power is greater for multi-component searches. • Usually the motifs do not overlap, but are separated along a sequence, though they may be contiguous in 3D-space. • Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbors.

  7. PRINTS • A motif is a conserved element corresponding to a region whose function or structure is known. It is likely to be predictive of any subsequent occurrence of such a structural/functional region in any other protein sequence. • A motif is represented as a conserved alignment of multiple sequence. • A fingerprint is a set of motifs used to predict the occurrence of similar motifs, either in an individual sequence.

  8. PRINTS

  9. PRINTS • The starting point is a multiple sequence alignment of a small number of sequences • Once a motif, or set of motifs, has been identified, the conserved regions are excised in the form of local alignments • The motif/s are used to scan against the database • Only those sequences that match with all motifs are regarded as true matches • The additional sequence data from the new true set is then used to generate another set of aligned motifs, and the database is searched again • Until converge

  10. PRINTS

  11. PRINTS

  12. PRINTS a) General field

  13. PRINTS b) Summary field A good fingerprint should exhibit a clear discrimination cut-off, i.e. shows all true positives matching with all n motifs, perhaps some noise, and few or no matches at intermediate positions of the summary table.

  14. PRINTS • Motif name • Iteration number • PCODE: the protein identification codes of the initial sequences • ST: the location of the motifs within those sequences, • INT: and the interval between adjacent motifs. for the first motif, this is simply the distance from the beginning of the sequence to the start of the motif.

  15. PRINTS

  16. PRINTS FPScan Submitting a PROTEIN sequence find the closest matching PRINTS fingerprint/s.

  17. PRINTS

  18. PRINTS

  19. PRINTS

  20. PRINTS

  21. PRINTS GRAPHScan A graphical view of the result of a scan of a fingerprint against a sequence. Matching motifs are highlighted if they score above the threshold % identity

  22. PRINTS

  23. PRINTS

  24. PRINTS MULScan This facility allows multiple sequences to be scanned against the database, Results are returned via email.

  25. Related Projects • InterPro - Integrated Resources of Proteins Domains and Functional Sites • BLOCKS - BLOCKS db • Pfam - Protein families db (HMM derived) [Mirror at St. Louis (USA)] • PRINTS - Protein Motif fingerprint db • ProDom - Protein domain db (Automatically generated) • PROTOMAP - An automatic hierarchical classification of Swiss-Prot proteins • SBASE - SBASE domain db • SMART - Simple Modular Architecture Research Tool • TIGRFAMs - TIGR protein families db

More Related