300 likes | 503 Views
Duncan Legge EMBL-EBI. Introduction to Protein Signatures & InterPro. Protein Signatures. Protein Signature = an amino acid sequence (not necessarily consecutive) associated with a protein characteristic. Integration of signatures. InterPro. Foundations of InterPro. Manual curation.
E N D
Duncan Legge EMBL-EBI
Introduction to Protein Signatures & InterPro Introduction to InterPro
Protein Signatures Protein Signature = an amino acid sequence (not necessarily consecutive) associated with a protein characteristic. Introduction to InterPro
Integration of signatures InterPro Foundations of InterPro Manual curation Introduction to InterPro
InterPro Consortium Consortium of 11 major signature databases Introduction to InterPro
What value are signatures? • Better at finding proteins with common function • Find more distant homologues than BLAST
What value are signatures? • Better at finding proteins with common function • Classification of proteins • Associate proteins that share: Function Domains Sequence Structure
What value are signatures? • Better at finding proteins with common function • Classification of proteins • Annotation of protein sequences • Define conserved regions of a protein • e.g. location and type of domains key structural or functional sites
Protein Signature Methods Introduction to InterPro
Multiple sequence alignment How are protein signatures made? Protein family/domain Build model Search Refine Protein signature Significant matches E-value 1e-49 ITWKGPVCGLDGKTYRNECALL E-value 3e-42 AVPRSPVCGSDDVTYANECELK E-value 5e-39 SVPRSPVCGSDGVTYGTECDLK E-value 6e-10 HPPPGPVCGTDGLTYDNRCELR Introduction to InterPro
Types of Protein signatures (sequence based) Multiple protein alignment
Types of Protein signatures (sequence based) Single motif methods Regular expression patterns C - C - {P} - x(2) - C- [STDNEKPI] - C
Types of Protein signatures (sequence based) Single motif methods Regular expression patterns x = any AA ( ) = number of AAs Must be this C - C - {P} - x(2) - C- [STDNEKPI] - C { } = cannot be.. [ ] = any of
Types of Protein signatures (sequence based) Single motif methods Regular expression patterns 1 2 3 Multiple motif methods Identity matrices Fingerprints
D3 Types of Protein signatures (sequence based) Single motif methods I2 I1 I3 Regular expression patterns M1 M2 M3 M4 M4 Full domain alignment methods D2 Profiles (Profile Library) Multiple motif methods Hidden Markov Models Mathematical model of amino acid probability Identity matrices Fingerprints
CONTRIBUTING MEMBER DATA BASES Models built on either sequence or structural alignments Each MDB has its own focus Hidden Markov Models Finger- Prints Profiles Patterns Sequence Clusters Protein features (active sites…) Prediction of conserved domains Structural Domains Functional annotation of families/domains
A Closer look at InterPro Introduction to InterPro
Integration of signatures InterPro Foundations of InterPro Manual curation Master headline
InterProCurationPriniciples -To represent MDBs signatures as closely as possible to what they intended • To reflect biological reality as accurately as possible in the entry we create by using types, relationships, GO mapping • To provide as much information to the end user as possible about the signatureby annotating signatuires and providing links to other databases. Master headline
InterPro Entry Links related signatures Groups similar signature together Adds extensive annotation Linked to other databases Structural information and viewers Master headline
* Parent (100) Protein kinase PFAM PFAM (75) Serine kinase SMART Protein kinase * (100) Protein kinase PFAM (25) PROSITE Tyrosine kinase SMART PROSITE Serine kinase Tyrosine kinase SMART PROSITE Children No proteins in common Link related signatures - relationships 1) Parent - Child (subgroup of more closely related proteins) Applies to domains and families Master headline
The InterPro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Biological units with defined boundaries Short sequences typically repeated within a protein Active Site Binding Site Conserved Site PTM Master headline
Searching InterPro protein ID Paste in unknown sequence
InterPro Search Results Family Link to PDBe Domains and sites Unintegrated signatures Structural data
Link to InterPro entry Links to signature databases
https://www.ebi.ac.uk/Tools/pfa/iprscan/ Select member databases
Caveats • InterPro entries are based on signatures supplied to us by our member databases • ....this means no signature, no entry! We need your feedback! missing/additional references reporting problems requests
ACKNOWLEDGEMENTS InterPro Team: Craig McAnulla AmaiaSangrador Sarah Hunter Alex Mitchell Siew-Yit Yong Maxim Scheremetjew Phil Jones Matthew Fraser SebastienPesseat