240 likes | 328 Views
Capstone Presentation. Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan Gunduz. I.U. School of Informatics. 04/25/04. INTRODUCTION. Motifs
E N D
Capstone Presentation Motif Discovery from Large Number of Sequences:A Case Study with Disease Resistance Genes in Arabidopsis thalianaby Irfan Gunduz I.U. School of Informatics 04/25/04
INTRODUCTION • Motifs • Highly conserved regions across a subset of proteins • that share the same function >Seq A >Seq B >Seq C >Seq D YNEDSKH YDDDSNH YDNDSNH YENDSKH • Motifs can be used to predict • A molecule’s function • A Structural Feature • Family membership I.U. School of Informatics
INTRODUCTION • Current motif finding soft-wares: • MEME • PROSITE • PRATT, etc Do they work with large number of sequences? • Pattern discovery relies on statistical or combinatorial techniques,looking for signals • Signal-to-noise ratio becomes less clear as the number of sequences increases What to do? I.U. School of Informatics
Objective • Develop a computational procedure to find functional motifs from large number of sequences I.U. School of Informatics
COMPUTATIONAL PROCEDURE Tools • BLAST (Sequence alignment tool) • BAG ( Sequence Clustering package) • CLUSTAL W (Multiple sequence alignment) • HMMERII (HMM based software) • BLOCK MAKER (Block/Motif finder) • LAMA (Block comparison tools) • PERL I.U. School of Informatics
COMPUTATIONAL PROCEDURE 1- Collecting and Clustering Sequences I.U. School of Informatics
COMPUTATIONAL PROCEDURE 2 - ENRICHMENT I.U. School of Informatics
COMPUTATIONAL PROCEDURE 3 – REFINEMENT 4 – MOTIF FINDING I.U. School of Informatics
A Case Study with Disease Resistance Genes in Arabidopsis thaliana I.U. School of Informatics
Why Disease Resistance Genes? I.U. School of Informatics
Background, Disease Resistance Genes DomainProbable Function TIR CC KIN LRR Recognition of specificity NB ATP and GTP binding I.U. School of Informatics
Case Study, Arabidopsis thaliana • 116 disease resistance protein or disease resistance protein like • annotated sequences were extracted from Arabidopsis thaliana genome • Clustered into 32 groups • 20 to 640 sequences were added in each cluster after HMM iterations • After refinement four clusters were formed for further analysis I.U. School of Informatics
Case Study, Arabidopsis thaliana PFAM Search Domains Cluster 1 NB-ARC, TIR, Kin, LRR NB-ARC, Kin, LRR Cluster 2 Ser/Thr Kin Cluster 3 Kin Cluster 4 I.U. School of Informatics
Case Study, Arabidopsis thaliana Results, Block Maker 15218608 YDVFLSFRGVDTRQTIVSHL 15218618 YDVFLSFRGEDTRKNIVSHL 15220795 YDVFLSFRGEDTRKTIVSHL Cluster1 Cluster2 I.U. School of Informatics
Case Study, Arabidopsis thaliana Results, Lama and BAG Clusters at the whole gene level Cluster1 Cluster2 Cluster1 Cluster3 Cluster2 Clusters at the Block Level I.U. School of Informatics
Case Study, Arabidopsis thaliana RPS4 RPP1 RPP5 Clusters at the whole gene level Cluster1 TIR-I TIR-II Kin1a Kin2 NBS-B LRR Kin1a NBS-A Kin2 NBS-B NBS-C GLPL LRR Cluster2 RPP8 RPM1 Cluster1 Cluster3 Cluster2 Clusters at the Block Level I.U. School of Informatics
Case Study, Arabidopsis thaliana Number of Disease Resistance Gene Candidates on each Chromosome CHR-1CHR-IICHR-IIICHR-IVCHR-V Cluster 1 16 2 6 16 35 Cluster 2 20 0 6 4 9 I.U. School of Informatics
Case Study, Arabidopsis thaliana New Disease Resistance Gene Candidates Cluster 2 Cluster 1 GI 15221277 GI 15221280 GI 15217940 GI 15221744 GI 15236505 GI 15242136 GI 15233862 I.U. School of Informatics
Case Study, Arabidopsis thaliana To test effectiveness of the computational procedure • 792 Unique sequences were merged and submitted to MEME and PRATT to detect functional motifs. • Time : Took more than 9000 minutes on Pentium IV • 1.7 GHz machine running on Linux • Result : No known disease resistance gene motifs • were detected I.U. School of Informatics
Case Study, Arabidopsis thaliana CONCLUSIONS: • Sensible combination of tools provides an excellent mechanism for motif detection • Clustering helps to improve performance of other well known tools I.U. School of Informatics
ACKNOWLEDGEMENT Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim will be presented at The 2003 International Conference on Mathematics andEngineering Techniques in Medicine and Biological Sciences I.U. School of Informatics
Case Study, Arabidopsis thaliana I.U. School of Informatics
Disease Resistance Mechanism I.U. School of Informatics
COMPUTATIONAL PROCEDURE • Refinement B A D B D C C I.U. School of Informatics