170 likes | 298 Views
Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides. Fábio M. Marques Madeira Supervisor: Professor Geoff Barton. 7 th May 2013. 14-3-3s dock onto pairs of tandem phosphoSer / Thr. 2R-ohnologue families. P. P. Kinase 1. 14-3-3. Kinase 2.
E N D
Development of classification methods to predict new 14-3-3-binding proteins and phosphopeptides Fábio M. Marques Madeira Supervisor: Professor Geoff Barton 7th May 2013
14-3-3s dock onto pairs of tandem phosphoSer/Thr 2R-ohnologue families P P Kinase 1 14-3-3 Kinase 2 Hundreds of structurally and functionally diverse targets 1
The binding specificity of 14-3-3s is determined by overall steric fit and the sequence flanking the phosphoSer/Thr site P P Mode I: RSX(pS/T)XP Mode II: RX(F/Y)X(pS)XP Mode III: C-terminal X(pS/T) Johnson et al., (2011) Molecular & cellular proteomics10, M110.005751. 2
ANIA: ANnotation and Integrated Analysis of the 14-3-3 interactome 3
Development and evaluation of three new classifiers Position-specific scoring matrix (PSSM) Artificial Neural Network (ANN) Support Vector Machines (SVM) 6
Defining positive and negative examples for training and testing Training datasets: Current Pos 93 Neg Previous Pos 76 Neg 72 Proteins -N C- 1,192 Likely Neg pS/T pS/T 5
Defining positive and negative examples for training and testing Training datasets: Blind datasets: Previous 17 Pos 17 Neg Current Pos 93 Neg Current 38 Pos 38 Neg Previous Pos 76 Neg 1,192 Likely Neg • Sequence redundancy thresholds: • 60%, 50% and 40% -11:11 -9:9 -7:7 Different motif regions/lengths: -5:5 -3:3 5
Development and evaluation of three new classifiers The area under the curve (AUC) was tested by Jackknife 7
Development and evaluation of three new classifiers Q - Accuracy MCC - Matthews Correlation Coefficient 8
Amino acid alphabet reduction reduces accuracy Grouping 20 amino acids in 10 physicochemical classes: Livingston and Barton, 1993 Li et al., 2003 • Overall, alphabet reduction led to lower classification performances, suggesting that some sequence features that influence 14-3-3 binding, were lost by the reduction. 9
Protein secondary structure, disorder and conservation do not improve the performance of the ANN Sequence conservation Protein secondary structure by Jpred Protein disorder by IUPred, DisEMBL and GlobPlot P – Positives; N – Negatives (true + likely neg); L – Likely neg only; R – Random neg 10
Blind testing shows that the PSSM is the best overall predictor 80% Overall Accuracy 11
Prediction of new 14-3-3-binding sites using the PSSM Human Proteome 12
The PSSM predictor outperforms Scansite intermsofaccuracy Scansite includes a set of predictions based on type I 14-3-3-binding motif: RSX(pS/T)XP Scansite PSSM 13
Conclusions • New strategy to map negative datasets • Performance improvement (AUC from ~0.80 to 0.88) and 80% accuracy, for the PSSM model (60% and [-5:5]) • Large-scale prediction of the human 14-3-3-binding proteome • The PSSM classifier outperforms Scansite in terms of accuracy 15
Future work • Test training of the classifiers using non-symmetrical motif regions: e.g. [-6:3] • Investigate new machine learning algorithms such as Bayesian classifiers • Use the PSSM classifier to predict the 14-3-3-binding proteome of model organisms such as Arabidopsis thaliana • Integrate predictions in ANIA and investigate if the candidate sites are lynchpin sites conserved across 2R-ohnologue family members 16
Acknowledgements • Geoff Barton • Chris Cole • All members in the Computational Biology group • Carol MacKintosh and Michele Tinti