250 likes | 409 Views
Global Annotation of the Protein Kinase Family. Michael Gribskov University of California, San Diego. Signaling Cascades. Statistics. Arabidopsis 1028 putative kinase 58 Potentially alternatively spliced 82 % confirmed by full length cDNA Less than 100 experimentally investigated Rice
E N D
Global Annotation of the Protein Kinase Family Michael Gribskov University of California, San Diego
Signaling Cascades
Statistics • Arabidopsis • 1028 putative kinase • 58 Potentially alternatively spliced • 82 % confirmed by full length cDNA • Less than 100 experimentally investigated • Rice • 1565 putative kinases • What are the functions of each protein kinase? • Functional groupings • Substrate prediction • Pathway analysis and modeling
Targets • Protein kinase • Protein phosphatase • Membrane transporters • Proteasome complex
Some Receptor Kinases Class I (EGF receptor) Class II (Insulin receptor) Class III (FGF receptor)
Requirements for Functional Clustering • Must handle very large number of objects (over 1200 for plants, over 9000 for all species) • Must deal sensibly with paralogs from functional point of view • Must be based on entire sequence, not just kinase catalytic domain • Must be tolerant to sequence errors and omissions
Species B Species A Orthology vs Paralogy • Relationships between genes in multigene families are complex • Multiple genes may exist before speciation • Genes may be lost and replaced along lineages • “Function space” must be filled
Clustering/Classification Maximum linkage
Clustering/Classification • Pairwise distances • All-against-all BLAST • Uses entire sequence • Alignments not required • Longer matches, i.e. more domains, give better score
Basic Approach • Maximum linkage clustering up to “natural” limit • Recalculate average distances between groups • Repeat until tree is complete
Statistics • Class 1: RLKs (transmembrane) and RLCKs • Class 2: “Raf-like” • Class 3: Casein Kinase and CLK • Class 4: Non-TM, Non-Receptor
BLASTDistance Entire Sequence
BLASTDistance Non-KinaseDomain
GIN4/ERC47/CLA6/D9719.13/YDR507C KCC4/YCL024W HSL1/(SEL2)/NIK1/YKL453/YKL101W SNF1/CAT1/GLC2/CCR1/PAS14/HAF3/D8035.20/YDR477W At5g39440 At3g29160/AKIN11 At3g01090/AKIN10 At5g58380 At5g07070 At5g01810 At5g45820 At4g30960 At5g25110 At5g10930 At2g25090 At2g30360 At5g01820/AtSR1 At2g38490 At3g23000/AtSR2 At4g14580 At1g01140 At1g30270 At2g26980 At4g24400 At5g35410/SOS2 At1g48260 At3g17510 At5g57630 At1g60940 At1g10940 At5g08590 At5g63650 At2g23030 E=10-80 At1g78290 At3g50500 At5g66880 At4g33950 At4g40010 At1g29230 At2g34180 At4g18700 At5g45810 KIN1/YD9727.17/YDR122W KIN2/L8004.3/L2546/YLR096W KIN4/KIN31/(KIN3)/O5220/YOR233W YPL141C/LPI5 YPL150W/P2597 50 See Fig. 2 SnRK • At AKIN10 and AKin11 • Rescue yeast SNF1 deletion • Functional homolog
Summary • Functional groups by clustering • Functional assignment by transgenomic comparison • Directed search for functional motifs by motif comparison • Construction of public data resources
Michael Gribskov Fariba Fana Degeng Wang Sheila Podell Tobey Tam * Jason Tchieu * Hannes Niedner Douglas Smith Guangfa Zhang * Jeff Harper Major Contributors Catherine Chan Alice Harmon Estelle Hrabak David Kerk Shinhan Shiu Bioinformatics Group