150 likes | 239 Views
Proteome Analyst: Accelerating Protein Research. www.cs.ualberta.ca/~bioinfo. 1953. 1990. $3 billion and 13 years later…. White House, 2000. Courtesy, Reuters. DNA Sequence. 1 cctcgcccgc ctgccgcctt tttgtgcgcg tgtgagtgtg ggccccagcg tgccctcccg
E N D
Proteome Analyst:Accelerating Protein Research www.cs.ualberta.ca/~bioinfo
DNA Sequence 1 cctcgcccgc ctgccgcctt tttgtgcgcg tgtgagtgtg ggccccagcg tgccctcccg 61 ggggtgggtt ccgggcggaa ggcggaggcc cggcgcgcag cccgccgccc gcctgcccgc 121 ggaccgggga gccggggtgc ttggagcggg ggacgccagg cgtgggctgg cggcgggacc 181 aggaggagga ggaggaggag gaggagagcg cgggctggcg cttgcccggg cgcagtcggc 241 ggggaccgag tcgtacttcc tgtgcgaaag gcggcccgac cctaaccgcc accccctccc 301 cctgtctccc tctctgaacc cgcccattgg gggtaggaca ctcagccgtc accgctcgct 361 ctgctggccg ctacctgcag caagataggg ccgccatcgc cgggcgacga cgaggaggag 421 gcggccgccg cagccggggc ccccgccgcc gccggagcga caggtgattt ggcttctgca 481 cagttaggag gagcaccaaa ccgatgggag gttttgtcag ccacacctac aactataaaa 541 gatgaagctg gtaatctagt ccagattcca agtgctgcta cttcaagtgg gcagtatgtt 601 cttccccttc agaatttgca gaatcaacaa atattttccg ttgcaccagg atcagattca 661 tcaaatggta cagtgtccag tgttcaatat caagtgatac cacagatcca gtcagcagat 721 ggtcagcagg ttcaaattgg tttcacaggc tcttcagata atgggggtat aaatcaagaa 781 agcagtcaaa ttcagatcat tcctggctct aatcaaacct tacttgcctc tggaacacct 841 tctgctaaca tccagaatct cataccacag actggtcaag tccaggttca gggagttgca 901 attggtggtt catcttttcc tggtcaaacc caagtagttg ctaatgtgcc tcttggtctg 961 ccaggaaata ttacgtttgt accaatcaat agtgtcgatc tagattcttt gggactctcg 1021 ggcagttctc agacaatgac tgcaggcatt aatgccgacg gacatttgat aaacacagga 1081 caagctatgg atagttcaga caattcagaa aggactggtg agcgggtttc tcctgatatt 1141 aatgaaacta atactgatac agatttattt gtgccaacat cctcttcatc acagttgcct 1201 gttacgatag atagtacagg tatattacaa caaaacacaa atagcttgac tacatctagt
Protein Sequence >UniProt/Swiss-Prot|P30613|KPYR_HUMAN MSIQENISSLQLRSWVSKSQRDLAKSILIGAPGGPAGYLRRASVAQLTQELGTAFFQQQQ LPAAMADTFLEHLCLLDIDSEPVAARSTSIIATIGPASRSVERLKEMIKAGMNIARLNFS HGSHEYHAESIANVREAVESFAGSPLSYRPVAIALDTKGPEIRTGILQGGPESEVELVKG SQVLVTVDPAFRTRGNANTVWVDYPNIVRVVPVGGRIYIDDGLISLVVQKIGPEGLVTQV ENGGVLGSRKGVNLPGAQVDLPGLSEQDVRDLRFGVEHGVDIVFASFVRKASDVAAVRAA LGPEGHGIKIISKIENHEGVKRFDEILEVSDGIMVARGDLGIEIPAEKVFLAQKMMIGRC NLAGKPVVCATQMLESMITKPRPTRAETSDVANAVLDGADCIMLSGETAKGNFPVEAVKM QHAIAREAEAAVYHRQLFEELRRAAPLSRDPTEVTAIGAVEAAFKCCAAAIIVLTTTGRS AQLLSRYRPRAAVIAVTRSAQAARQVHLCRGVFPLLYREPPEAIWADDVDRRVQFGIESG KLRGFLRVGDLVIVVTGWRPGSGYTNIMRVLSIS
Annotation • Knowledge of the DNA and protein sequences greatly accelerates lab research to discover protein function • Time- and resource-intensive • Human bottle-neck
Sequence Database Growth Protein Sequences 2 000 000 1 500 000 1 000 000 500 000 0 Unnannotated Protein Sequences (GenPept) Human Annotated Protein Sequences (SwissProt) 86 88 92 94 96 98 00 02 04 Year
Proteome Analyst Proteome Analyst (PA): • is a free, Web-based tool • uses machine learning to make predictions; can explain its predictions • is very accurate (e.g., precision and recall) Goal: Filter vast amounts of biological data and make meaningful predictions on the function and location of proteins; accelerate protein research.