350 likes | 548 Views
Demo: Protein Information Resource. March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation. Database Demo. NREF Database http://pir.georgetown.edu/pirwww/search/pirnref.shtml iProClass Database
E N D
Demo: Protein Information Resource March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation
Database Demo • NREF Database • http://pir.georgetown.edu/pirwww/search/pirnref.shtml • iProClass Database • http://pir.georgetown.edu/iproclass.html • iProClass Sequence (A58910), Motif (PCM00487) • PIR-PSD Database • http://pir.georgetown.edu/cgi-bin/pirwww/nbrfget?&uid=A58910 • PIR Entry (A58910) • Other Molecular Databases • Function: KEGG Enzyme (EC 1.1.1.205), KEGG Pathway (MAP00230); BRENDA (EC 1.1.1.205) • Structure: PDB (1AK5), SCOP (1.003.001.006.002.002), CATH (1AK5) • Family: Pfam (PF00478), BLOCKS (BL00487), PROSITE (PS00487)
Test Sequence: ftp://nbrfa.georgetown.edu/pir/misc/test.seq PIR-NREF Search Result (I)
PIR Pattern Search Result (I) Pattern Match: Sequence vs. PROSITE • http://pir.georgetown.edu/pirwww/search/patmatch.html
Search a query pattern against a sequence database. PIR Pattern Search Result (II)
PIR-NREF Database (http://pir.georgetown.edu/pirwww/search/pirnref.shtml) . search
PDB Structure of Molecule: Inosine-5'-Monophosphate Dehydrogenase
Protein Family Classification Discovery of New Knowledge by Using Information Embedded within Families of Homologous Sequences and Their Structures • Superfamily, Domain, and Motif Classification • Superfamily Concept • End-to-End Similarity & Same Overall Domain Architecture • Significance • Improve Sensitivity of Protein Identification • Provide Complete Clustering for Database Organization • Detect and Correct Genome Annotation Errors Systematically • Drive Other Annotations • Stimulate Evolution, Genomics and Proteomics Research
Protein Family/Superfamily Definitions • Family • A Set of Protein Sequences That Share a Common Evolutionary Ancestor with End-to-End Sequence Similarity (No Major Discrepancy by Standard Multiple Alignment Methods) • Have the Same Domain Architecture (Except Incomplete or Alternately Spliced) • Overall Sequence Identity >=45% • Superfamily • A Set of Protein Families That Share a Common Evolutionary Ancestor From End-to-end • Have the Same Domain Architecture • Overall Sequence Identity >=20%
Protein Domain Definition • Homology Domain • A Recognizable Region of Similarity • Have a Common Ancestry • Found in Diverse Protein Sequences (in >= 2 Superfamilies) • A Sequence Can Belong to Only One Protein Family and Superfamily, but May Contain More Than One Domains.
Superfamily-Domain Relationship: ~6000 SFs have >=1 Domains • Superfamily for Domain Architecture iProClass Superfamily List • All Superfamilies Containing PF00001
Multiple Alignment & Tree View BLAST Search PIR Searches and Aligment
PIR Hidden Markov Model • http://pir.georgetown.edu/pirwww/search/pirhmm.html • HMM Model Building & Sequence Search • One Protein Against All HMMs • All Proteins Against One HMM
Bibliography Submission • View Bibliography Information • View Protein Entry • Submit Citation with Optional Categorization S1
Bibliography Information Display (I) • From PIR-NREF • From Other Curated Database (e.g., SGD)
Bibliography Information Display (II) • From User Submission • From Computer-Mapping (e.g. Gene Symbol)
Oracle Demo • Java Web Interface for Oracle Database Search: (http://pir.georgetown.edu/iproclass.html) • WebDB Interface to Oracle (WebDB) • Tables/Views/Objects • Functions/Procedures/Packages
Proteomic Bioinformatics • Large-Scale Analysis of Proteomic Data: Homology Search for Pathways