1 / 21

Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana

Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana. Sihui Zhao. Background – Kinase. Protein kinases play a pivotal role in the control of all cellular processes Cell proliferation, differentiation, adhesion, migration, metabolism and signal transduction

Download Presentation

Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana Sihui Zhao

  2. Background – Kinase • Protein kinases play a pivotal role in the control of all cellular processes • Cell proliferation, differentiation, adhesion, migration, metabolism and signal transduction • A kinase superfamily in each genome, ~2% of all sequences

  3. Background Structure of Catalytic Domain • Also called C-subunit • Conserved among protein kinase superfamily • Contains 250-300 residues • 12 subdomains

  4. Background Subdomains of C-subunit • Two pivital subdomains (based on PKA): • Subdomain I: Sequester ATP Gly-X-Gly-X-X-Gly-X-Val • Subdomain VIB: ‘Catalytic loop’ His-Arg-Asp-X-Lys-X-X-Asn

  5. Background Conserved Residues

  6. Background Motif • Motif is a locally conserved region • Conserved due to higher selection pressure compared to non-conserved regions • Importance to the biological function or structure

  7. Background Problem & Strategy in Motif Discovery • Motif discovery relies on either statistical or combinatorial pattern search techqniues • Problem: High noise compared to signal when facing huge number of sequences • Strategy: Clustering/classification used to find sequence families first to decrease the noise ratio

  8. Objectives • Cluster kinase sequences into different families • Find conserved motifs from sequence families

  9. Tools • Blast – Sequence alignment tool • ClustalW – Multiple alignment tool • HMMER – HMM-based package • BAG package – Sequence clustering package • BlockerMaker – Block/Motif discovery tool • LAMA – Alignment tool for Blocks • Perl

  10. Computational Framework–Outline • Collecting and clustering kinase sequences based on similarity • The iterative HMM search – To collect more kinases, especially remotely homologous sequences • Motif discovery – To find blocks from each cluster and merge blocks across multiple clusters

  11. Computational Framework Collecting and Clustering Sequences • Cluster kinase sequences Extract annotated kinase sequences All to all pairwise comparison Estimate best score for clustering Cluster sequences using BAG

  12. Computational Framework HMM Iterative Search • Collect more sequences for each cluster Multiple alignment using CLUSTALW Build HMM/Profile Search all 3 genomes Add hits to each cluster if any

  13. Computational Framework Motif Discovery • Find blocks and merge across multiple clusters Block discovery by BlockMaker All to all block comparison by LAMA Clustering blocks using BAG package Conserved sites detection

  14. Result • 963 kinase from ~45,000 sequences (~2%) • 159 clusters of kinase sequences containing 2 to 32 sequences each • 0 to ~1000 sequences added to each cluster after HMM iterative search

  15. Result • 71 sequence clusters sent to BlockMaker ID c51.seq-1 BLOCK AC c51.seq-1; distance from previous block=(79,120) DE similar to eukaryotic protein kinase domains BL EGL motif=[5,0,17] motomat=[1,1,-10] width=31 seqs=5 gi|3329644|gb|AAC ( 792) SNFNFEFHKDSLEILEPIGSGHFGVVRRGIL 99 gi|3329650|gb|AAC ( 154) YNPKYEVDLEKLEILEQLGDGQFGLVNRGLL 92 gi|3877967|emb|CA ( 836) YNNDYEIDPVNLEILNPIGSGHFGVVKKGLL 79 gi|3877968|emb|CA ( 842) YNEDYEIDLENLEILETLGSGQFGIVKKGYL 77 gi|3878749|emb|CA ( 129) YKKQYEIASENLENKSILGSGNFGVVRKGIL 100

  16. Result • 45 clusters of Blocks after LAMA comparison and BAG clustering

  17. Result Some Found Conserved Sites • Cluster 11, size 29 Subdomain I: G-X-G-X-X-G-X-V • Cluster 16, size 97 Subdomain VIB: H-R-D-X-K-X-X-N

  18. Result Some New Sites • Cluster 20, size=8 Alignment and motif • Known: Arg280 - assembly of catalytic core • Unknown: Cys, Trp, Pro • Cluster 31, size=13 Alignment and motif • Known: Asp220 - assembly of catalytic loop • Unknown: Gly, Thr, Tyr • Cluster 40, size=7 Alignment and motif • Known: Glu91 - positioning triphosphate group • Unknown: His, Pro

  19. Conclusion • This computational framework is successful • Especially when no preliminary information on huge amount of sequences • It’s efficient • Not completely automatic

  20. Conclusion • Kinases are clustered based on similarity, which provides a way to deduce the functions from other family members • Some new conserved sites are found, which might indicate the specificity of kinase functions

  21. Acknowledgement • Prof. Sun Kim • Prof. Mehmet Dalkilic • Dr. Irfan Gunduz

More Related