410 likes | 565 Views
INTERPRO An integrated resource of protein families, domains and functional sites. Increase in submission of raw sequence data leads to increased need for automated methods for protein characterisation. Methods of protein characterisation.
E N D
INTERPROAn integrated resource of protein families, domains and functional sites.
Increase in submission of raw sequence data leads to increased need for automated methods for protein characterisation
Methods of protein characterisation • Alternative to BLAST -use hand-curated sequence alignments of protein families or domains –build diagnostic signatures (methods): • Patterns • Profiles • Hidden Markov Models (HMMs)
Pfam PRINTS Prosite ProDom SMART TIGRFAMs Major pattern databases All have individual strengths and weaknesses, and different formats –solution: Integrated them intoInterPro
Co-ordinated by EBI PROSITE (A. Bairoch, P. Bucher, N. Hulo, C. Sigrist, L. cerutti. M. Pagni, L. Falquet) PRINTS (T. Attwood, P. Bradley) PFAM (R. Durbin, A. Bateman, S. Griffiths-Jones) PRODOM (D. Kahn, Florence Servant) SMART (C. Ponting, R. Copley, N. Dickens) TIGRFAMs (D. Haft, O. White) The InterPro consortium:
Creation of InterPro entries: PROSITE patterns and profiles IPR000001- IPR005000 PFAM Assignment of AC numbers PRINTS ProDom SMART TIGRFAMs
Overlapping signatures: • P49150 PR00018 45 111 • P49150 PS50070 44 126 • P49150 PF00051 45 126 • P49150 PS00021 96 101 • P49150 PS00134 133 140 • P49150 PS00135 339 350 • P49150 PR00722 175 351 PR00018 PS50070 PR00722 PF00051 PS00134 PS00135 PS00021
Entry relationships in InterPro • Parent/child- family level • Contains/found in- domain composition
April 1999: Alpha release. November 1999: Beta release. December 1999: First official release. June 2000: Release 2.0, Integration of ProDom March 2001: Release 3.0, Integration of SMART November 2001: Release 4.0, integration of TIGRFAMs May 2002: Release 5.0 5312 entries and 2.5 million hits in SPTR InterPro releases
Data access • Webserver –direct from Oracle database www.ebi.ac.uk/interpro • XML file –dumped from database and used for: • SRS • Condensed graphical view • Sequence search –InterProScan
InterProScan • PROSITE patterns: ppsearch • PROSITE profiles: pfscan • PFAM HMMs: hmmpfam • PRINTS fingerprints: fpscan • ProDom: BlastProDom.pl • SMART HMMs: hmmpfam • TIGRFAMs HMMs: hmmer2.1 • eMotif derived PROSITE pattern • TMHMM • SignalP • GO annotation • 6-frame translator for DNA sequences Web version Perl stand-alone
Diagnostic protein family signature database for: Useful for member databases themselves Enhancing the functional annotation of TrEMBL entries. Classification of proteins through text and sequence search tools Large-scale classification using GO terms Enhancing genome annotation -fly, human, rice mouse Proteome Analysis Database Applications of InterPro
Extract conditions from reference database- InterPro. Group SWISS-PROT entries by conditions and extract common annotation. Group TrEMBL by conditions and add common annotation to the TrEMBLentries. INTERPRO Automatic Annotation of TrEMBL TrEMBL SWISS-PROT RuleBase
Proteome Analysis Database (4) GOA project –GO annotation of SPTR via: manual EC2GO SP2GO IPR2GO
Future plans • Complete GO mapping • Updating references and annotation • Taxonomy data • Integration of PIR superfamilies • General improvements to servlets/database • InterPro 3D –SCOP/CATH/MSD
Richard Copley Chris Ponting Dan Haft Owen White InterPro at EBI Rolf Apweiler Nicola Mulder Wolfgang Fleischmann Alexander Kanapin Margaret Biswas Maria Krestyaninova David Binns Sandra Orchard Robert Vaughn InterPro Collaborators Amos Bairoch Nicolas Hulo Christian Sigrist Marco Pagni Laurent Falquet Terri Attwood Paul Bradley Richard Durbin Alex Bateman Sam Griffiths-Jones Philipp Bucher Daniel Kahn, Jerome Gouzy Florence Servant Emmanuel Courcelle Credits