Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana

Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana Sihui Zhao

Background – Kinase • Protein kinases play a pivotal role in the control of all cellular processes • Cell proliferation, differentiation, adhesion, migration, metabolism and signal transduction • A kinase superfamily in each genome, ~2% of all sequences

Background Structure of Catalytic Domain • Also called C-subunit • Conserved among protein kinase superfamily • Contains 250-300 residues • 12 subdomains

Background Subdomains of C-subunit • Two pivital subdomains (based on PKA): • Subdomain I: Sequester ATP Gly-X-Gly-X-X-Gly-X-Val • Subdomain VIB: ‘Catalytic loop’ His-Arg-Asp-X-Lys-X-X-Asn

Background Conserved Residues

Background Motif • Motif is a locally conserved region • Conserved due to higher selection pressure compared to non-conserved regions • Importance to the biological function or structure

Background Problem & Strategy in Motif Discovery • Motif discovery relies on either statistical or combinatorial pattern search techqniues • Problem: High noise compared to signal when facing huge number of sequences • Strategy: Clustering/classification used to find sequence families first to decrease the noise ratio

Objectives • Cluster kinase sequences into different families • Find conserved motifs from sequence families

Tools • Blast – Sequence alignment tool • ClustalW – Multiple alignment tool • HMMER – HMM-based package • BAG package – Sequence clustering package • BlockerMaker – Block/Motif discovery tool • LAMA – Alignment tool for Blocks • Perl

Computational Framework–Outline • Collecting and clustering kinase sequences based on similarity • The iterative HMM search – To collect more kinases, especially remotely homologous sequences • Motif discovery – To find blocks from each cluster and merge blocks across multiple clusters

Computational Framework Collecting and Clustering Sequences • Cluster kinase sequences Extract annotated kinase sequences All to all pairwise comparison Estimate best score for clustering Cluster sequences using BAG

Computational Framework HMM Iterative Search • Collect more sequences for each cluster Multiple alignment using CLUSTALW Build HMM/Profile Search all 3 genomes Add hits to each cluster if any

Computational Framework Motif Discovery • Find blocks and merge across multiple clusters Block discovery by BlockMaker All to all block comparison by LAMA Clustering blocks using BAG package Conserved sites detection

Result • 963 kinase from ~45,000 sequences (~2%) • 159 clusters of kinase sequences containing 2 to 32 sequences each • 0 to ~1000 sequences added to each cluster after HMM iterative search

Result • 71 sequence clusters sent to BlockMaker ID c51.seq-1 BLOCK AC c51.seq-1; distance from previous block=(79,120) DE similar to eukaryotic protein kinase domains BL EGL motif=[5,0,17] motomat=[1,1,-10] width=31 seqs=5 gi|3329644|gb|AAC ( 792) SNFNFEFHKDSLEILEPIGSGHFGVVRRGIL 99 gi|3329650|gb|AAC ( 154) YNPKYEVDLEKLEILEQLGDGQFGLVNRGLL 92 gi|3877967|emb|CA ( 836) YNNDYEIDPVNLEILNPIGSGHFGVVKKGLL 79 gi|3877968|emb|CA ( 842) YNEDYEIDLENLEILETLGSGQFGIVKKGYL 77 gi|3878749|emb|CA ( 129) YKKQYEIASENLENKSILGSGNFGVVRKGIL 100

Result • 45 clusters of Blocks after LAMA comparison and BAG clustering

Result Some Found Conserved Sites • Cluster 11, size 29 Subdomain I: G-X-G-X-X-G-X-V • Cluster 16, size 97 Subdomain VIB: H-R-D-X-K-X-X-N

Result Some New Sites • Cluster 20, size=8 Alignment and motif • Known: Arg280 - assembly of catalytic core • Unknown: Cys, Trp, Pro • Cluster 31, size=13 Alignment and motif • Known: Asp220 - assembly of catalytic loop • Unknown: Gly, Thr, Tyr • Cluster 40, size=7 Alignment and motif • Known: Glu91 - positioning triphosphate group • Unknown: His, Pro

Conclusion • This computational framework is successful • Especially when no preliminary information on huge amount of sequences • It’s efficient • Not completely automatic

Conclusion • Kinases are clustered based on similarity, which provides a way to deduce the functions from other family members • Some new conserved sites are found, which might indicate the specificity of kinase functions

Acknowledgement • Prof. Sun Kim • Prof. Mehmet Dalkilic • Dr. Irfan Gunduz

Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana

Clustering and Motif Discovery in Kinases of Yeast, Worm and Arabidopsis thaliana

Presentation Transcript

Arabidopsis thaliana

Motif discovery

Plant Parasitic Nematodes and Vascular Transport in Arabidopsis Thaliana

Motif search and motif discovery

Translational Inhibition in Arabidopsis thaliana

Zinc Uptake and Storage in Arabidopsis thaliana

Stable transgenic Arabidopsis thaliana

The Evolution of Selfing in Arabidopsis thaliana

Abscission in Arabidopsis Thaliana

Motif search and discovery

MicroRNA regulation in Arabidopsis thaliana

Arabidopsis thaliana response to Drought

Arabidopsis thaliana

V6 Circadian clocks in Arabidopsis thaliana

Expression Levels in Treated and Untreated Arabidopsis Thaliana

Arabidopsis thaliana gene At3g50060

Investigation of Arabidopsis thaliana proteins AtSNX2a and AtSNX1

Arabidopsis thaliana

MicroRNA in Developing and Germinating Arabidopsis thaliana Seeds

Motif discovery

V7 Arabidopsis thaliana

Arabidopsis Thaliana Gene AT5G58610