330 likes | 346 Views
Learn about InterPro, an integrated resource for protein families, and how it evolved from its vision to become a powerful tool for protein analysis and annotation. Explore its origins, development, and the challenges faced in integrating diverse databases.
E N D
2nd IMPACT workshop 5-6 May, 2010 InterPro An Introduction European Bioinformatics Institute Wellcome Trust Genome Campus
Overview • What InterPro is • Where it came from • What the vision was • Has it evolved in line with that vision? • Is it still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus
What is InterPro? According to the User Manual: “InterPro is an integrated documentation resource for protein families, domains & sites. InterPro combines a number of databases that use different methodologies & a varying degree of biological information on well-characterised proteins to derive protein signatures. By uniting the member databases, InterPro capitalises on their individual strengths, producing a powerful integrated database & diagnostic tool.” European Bioinformatics Institute Wellcome Trust Genome Campus
Where did it come from? • The concept of an integrated protein family database emerged almost 20 years ago! • at the 1991 BCA spring meeting in Sheffield • Amos Bairoch had a poster on PROSITE • I had one on a ‘fingerprint’ database… • We recognised that our approaches were under-pinned by similar philosophies • to provide meaningful biological information • to provide high quality manual annotation European Bioinformatics Institute Wellcome Trust Genome Campus
Where did it come from? European Bioinformatics Institute Wellcome Trust Genome Campus
Where did it come from? European Bioinformatics Institute Wellcome Trust Genome Campus
Where did it come from? European Bioinformatics Institute Wellcome Trust Genome Campus
Where did it come from? • PROSITE & PRINTS were different • but somehow also the same… • most importantly, they were complementary In combination, we gain powerful structural & functional insights European Bioinformatics Institute Wellcome Trust Genome Campus
Where did it come from? • So where next? • we had created 30 family fingerprints • PROSITE documented 375 families & functional sites • PROSITE was way ahead! • we were still on the starting blocks… • Nevertheless, we decided to apply for an EU grant to unite the databases • …seemed like a good idea at the time! European Bioinformatics Institute Wellcome Trust Genome Campus
What was the vision? • Naïvely, we wanted to make life easier! • We aimed to • simplify & rationalise protein family analysis • ensuring that entries & their linked signatures pointed to related information on the same biological object • centralise & streamline the annotation process • reduce manual annotation burdens • facilitate automatic functional annotation of uncharacterised proteins European Bioinformatics Institute Wellcome Trust Genome Campus
How has it evolved? • The EU proposal was submitted in 1992 • and was promptly declined! • Later, in 1995, the EBI was established at Hinxton • Visiting Fellowship in 1997 • to help integrate my work more closely with that of EBI • Rolf, Amos & I decided to try again for an EU grant • by then, Profiles, ProDom & Pfam had also been created • so it made sense to include them too • With the bigger picture, the grant succeeded • -InterPro was born! European Bioinformatics Institute Wellcome Trust Genome Campus
How has it evolved? Prosite • Release 0.1 beta was made in October 1999 • It contained 2,423 entries • 1,370 PROSITE entries • 1,465 Pfam entries • 1,157 PRINTS entries • 241 preliminary profiles • Based on Swiss-Prot 38 & TrEMBL 11 ProDom PRINTS ProDom InterPro Profiles Pfam European Bioinformatics Institute Wellcome Trust Genome Campus
How has it evolved? “Various factors rendered a step-wise approach to the development of InterPro desirable. First, the scale of the task of amalgamating just the first 3 databases was immense. The rational merging of apparently equivalent database entries that in fact simultaneously define a specific family, domains within that family, or even repeats within those domains, presented an enormous challenge.” European Bioinformatics Institute Wellcome Trust Genome Campus
How has it evolved? domain family super-family families sub-families • Unravelling the biological relationships is vital! European Bioinformatics Institute Wellcome Trust Genome Campus
How has it evolved? • Clearly, the task of integration was hard • understanding the biological relationships being represented within member databases, let alone between them, was proving to be a significant challenge • Rather than making our lives easier, it was probably making them much harder! • …& that was just with 3 databases! • Today, with 11 sources, life is harder still… European Bioinformatics Institute Wellcome Trust Genome Campus
How has it evolved? • Release 26.0, March 2010 • It contains 20,329 entries • 1,023 Gene3D entries • 620 HAMAP entries • 2,234 Panther entries • 2,744 PIRSF entries • 1,975 PRINTS entries • 1,291 PROSITE regexs • 836 PROSITE profiles • 11,056 Pfam entries • 803 SMART entries • 1,095 SUPERFAMILY entries • 3,689 TIGRFams • Release 0.1 beta was made in October 1999 • It contained 2,423 entries • 1,370 PROSITE entries • 1,465 Pfam entries • 1,157 PRINTS entries • 241 preliminary profiles • Based on Swiss-Prot 38 & TrEMBL 11 European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? • The database has grown almost 10-fold in ~11 years • Why was it created in the first place? • to simplify & rationalise protein family analysis • ensuring that entries & their linked signatures pointed to related information on the same biological object • to centralise & streamline the annotation process • & reduce manual annotation burdens • to facilitate automatic functional annotation of uncharacterised proteins • to make life easier!! European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus European Bioinformatics Institute Wellcome Trust Genome Campus 1/3/2020 19
Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? Why separate out structurally & functionally relevant information? Remember this? European Bioinformatics Institute Wellcome Trust Genome Campus
What is InterPro? A reminder: “InterPro is an integrated documentation resource for protein families, domains & sites. InterPro combines a number of databases that use different methodologies & a varying degree of biological information on well-characterised proteins to derive protein signatures. By uniting the member databases, InterPro capitalises on their individual strengths, producing a powerful integrated database & diagnostic tool.” European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? • Integration = greater than the sum of the parts • - a perfect example… This integrated view is incredibly powerful & informative! European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? What does it mean? European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? They’re still not the same! • Let’s see what the alignments actually look like • - consider just the first 3 TM domains… They’re not the same! European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? • In the process of growing bigger, InterPro has grown massively in complexity • Its internal convolutions now challenge us to ask, “What does it mean?” • what does it all mean to end users?! • & what does it all mean to computers?! European Bioinformatics Institute Wellcome Trust Genome Campus
Has it evolved in line with its vision? • With IMPACT, yes, InterPro has an opportunity to realise its original vision • it can rationalise protein family analysis • it can help to streamline the annotation process • it can facilitate functional annotation of proteins • it can make life easier • but it can only do these things if we’re prepared to empathise, collectively, with its growing pains! • That’s why this workshop is important European Bioinformatics Institute Wellcome Trust Genome Campus
Is InterPro still fit for purpose? “There is a tremendous amount of information regarding evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it.” Margaret O. Dayhoff to C.Berkley, February 27th, 1967 That is still InterPro’s unique opportunity! “To kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact.” Charles Darwin, 1879 This remains IMPACT’s imperative! European Bioinformatics Institute Wellcome Trust Genome Campus
A workshop 5-6 May, 2010 Day 1 09.00-09.30 Registration 09.30-09.35 Domestic 09.35-10.00 InterPro, an introduction (Terri) 10.00-10.30 Single-motif signatures: pros, cons & added-value to InterPro (Nicolas) 10.30-11.00 Multiple-motif signatures: pros, cons & added-value to InterPro (Alex) 11.00-11.30 Coffee 11.30-12.00 Domain-based signatures: pros, cons & added-value to InterPro (Rob) 12.00-12.30 Structural annotation: pros, cons & added-value to InterPro (Corin) 12.30-13.15 InterPro today [including GO mapping] (Sarah) 13.15-14.00 Lunch 14.00-14.30 How InterPro is used to add functional annotation to UniProt (Claire) 14.30-15.30 Hands-on examples 15.30-16.00 Coffee 16.00-17.00 Open discussion/feedback 19.30- Dinner European Bioinformatics Institute Wellcome Trust Genome Campus
A workshop 5-6 May, 2010 Day 2 09.30-10.00 Issues with integrating different signatures: domains 10.00-10.30 Issues with integrating different signatures: families and subfamilies 10.30-11.00 Meaningful terms to group signatures and name entries 11.00-11.30 Coffee 11:30-12:00 Unexpected sequences in match lists & how to reconcile them 12.00-12.30 Improving InterPro’s interface to better visualise, integrate & maintain data 12.30-13.00 Open discussions 13.00-13.45 Lunch 13.45-??? Format/outline/organisation of November outreach event Future funding Reviewer feedback Review of EoY deliverables – status report & action plan AOB European Bioinformatics Institute Wellcome Trust Genome Campus