180 likes | 311 Views
correlating graph-theoretical centrality indices with interface residue propensity. or: where do things stick together?. Stefan Maetschke Teasdale Group. …a bit more specific. Prediction of interface residues Protein-RNA interfaces Machine learning methods Structural information
E N D
correlating graph-theoretical centrality indices with interface residue propensity or: where do things stick together? Stefan Maetschke Teasdale Group
…a bit more specific • Prediction of interface residues • Protein-RNA interfaces • Machine learning methods • Structural information • Graph-topological features
something for the visual cortex Protein-RNA complex Binding site Contact graph [JMol,1R3E_A] [Terribilini et al. 2006] [Jung Library]
questions Most predictors are sequence based: • What impact has structural information on prediction accuracy? • What features are predictive for interface residues?
obvious features • is on surface => Accessible surface area • has to bind => Physico-chemical prop. • must be stabilized => Contact graph topology • prefers flat surface => not really • is conserved => maybe not that much Interface residue…
accessible surface area (ASA) http://www.see.ed.ac.uk/~tduren/research/surface_area/ http://www.ysbl.york.ac.uk/~ccp4mg/ccp4mg_help/analysis.html
physico-chemical properties • AAIndex database • approx. 400 indices • AUC over 144 protein chains4304 binding and 27932 non-bindingsequence similarity < 30% Hydrophobicity Inside/Outside Conformation Partition Coefficient
patch type comparison • Naïve Bayes • PSI-BLAST Profiles • AUC • 5-fold x-validation • RB144 data set
betweenness-centrality (BC) s v t http://en.wikipedia.org/wiki/Image:Graph_betweenness.svg
BC for contact graph • 1FJG_K • AUC = 0.71 • Red: interface residue • Size: betweenness centrality Histogram: binned BC over RB144
combined features • WRC : distance-weighted retention coefficient • BC : betweenness centrality • ASA : accessible surface area • 5-fold x–validation, RB144 • Patch sizes: sequential->11, topological->19, spatial->19
summary • Patch size is critical for sequential patches • Spatial/topological patches perform better • Structural information helps – but not much: +5% • Novelty: centrality indices as predictors • SVM superior to NB • Top prediction accuracy – as far as one can tell • Accuracy in general is still low (MCC < 0.4)
what’s next… • Prediction of disease associated SNPs • Graph-spectral methods • Protein function prediction
acknowledgments • Zheng Yuan – Data sets and much more … • Karin Kassahn – Aminoacyl-tRNA synthetases http://en.wikipedia.org/wiki/Aminoacyl_tRNA_synthetase