240 likes | 373 Views
A Data Analysis Pipline presented by Jason Phillips. Phytome. High Level Flow Chart. Retrieve Unigenes. Translate Unigenes. Families. Main Outline. Unigenes (Where'd they come from, where'd they go?) Translation (methods and procedures) Building Families (the power of together-ness).
E N D
A Data Analysis Pipline presented by Jason Phillips Phytome
High Level Flow Chart Retrieve Unigenes Translate Unigenes Families
Main Outline • Unigenes (Where'd they come from, where'd they go?) • Translation (methods and procedures) • Building Families (the power of together-ness)
phytome» Unigene • What are? • Where from? • Nine Species • Arabidopsis, a special case • Storage
phytome» Unigene » What Are? Combined EST's that overlap
phytome» Unigene » Where From? • TIGR • Other sources?
Highly annotated... Highly sequenced... Highly translated... phytome» Unigene » Arabidopsis
phytome» Unigene » Storage species count ------------------- ghir 24350 mcry 8455 osat 60778 hann 20520 mtru 36976 lesc 31012 ljap 11025 lsat 21960 atha 27170 ------------------- total: 242246
phytome» Translation • Methods • Estwise • Estscan • FrameFinder • Procedure • Numbers
phytome» Translation » methods HOMOLOGIES via BLAST AB INITIO ESTSCAN FRAMEFINDER EST-WISE sprot + trembl
phytome» Translation » procedure • EST-WISE (Mac OSX Cluster) • blast swiss prot: 10.3 hours, 35 nodes (~15 days) • blast trembl: 35.7 hours, 35 nodes (~52 days) • ESTSCAN (Mustard) • FrameFinder (Mustard)
phytome» Translation » numbers 151,830 90,416 242,246 Unigenes ESTWISE 226,988 ESTSCAN 15,258 FRAMEFINDER 242,242 4
phytome» Families • Relationships • Clustering • Numbers
phytome» Families » Relationships Blast everything against everything sequences blastable db of sequences query sbjct e-value ------- -------- ----------- mtru302 ljap4523 1 29 mtru302 lesc25072 1 26 mtru302 hann20270 5 24 osat59606 osat59606 1 157 osat59606 osat4002 1 96 osat59606 atha25166 1 88 ...... ..... . .. ...... ..... . ..
phytome» Families » Relationships But we have 4 set's of sequences! nucleotides estwise estscan framefinder blastp 151,830 tblastx 242,246 blastp 226,988 blastp 242,242 Which method do we trust?
phytome» Families » Relationships 4 data sets...4 family interpretations BLAST OFF! tb ~3 days, 28 nodes (~84 days) ~1/4 day, 21 nodes (~5days) ew es ~1/4 day, 21 nodes (~5 days) ff ~1/4 day, 21 nodes (~5 days)
phytome» Families » Relationships BLAST RESULTS Method size no blast no trans attrition ------ -------- -------- -------- ---------- tb 242246 153 0 153 ew 151830 22 90416 90438 ff 242242 24563 4 24567 es 226988 1345 15258 16603
phytome» Families » Clustering TRIBE MCL gene evalue
phytome» Families » Clustering TRIBE MCL gene evalue
phytome» Families » Clustering query sbjct evalue -------- -------- ------ atha7499 atha8483 6 78 atha7499 atha7503 4 90 osat23081 atha10704 8 78 osat23081 osat36667 8 78 atha1072 atha5059 2 68 atha1072 lsat15421 2 60 atha1072 lsat21190 1 102 atha1072 atha5059 9 54 ...... ...... . .. ...... ...... . .. ...... ...... . .. fam id member ------ ------ .... ....... .... ....... 4035 atha7499 4035 atha7503 4035 atha8483 4036 atha10704 4036 osat23081 4036 osat36667 4037 atha1072 4037 atha5059 4037 lsat15421 4037 lsat21190 .... ....... .... ...... tribe mcl
phytome» Families » Clustering blast results families tb tb ew ew TRIBE MCL es es ff ff
Let's look as some histograms! phytome» Families » Clustering