240 likes | 447 Views
Computational Analysis of Protein-DNA Interactions. Changhui (Charles) Yan Department of Computer Science Utah State University. Problem I. Identifying amino acid residues involved in protein-DNA interactions from sequence. Materials And Methods.
E N D
Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University
Problem I Identifying amino acid residues involved in protein-DNA interactions from sequence
Materials And Methods • 56 double-stranded DNA binding proteins previously used in the study of Jones et al. (2003) • Encoding
Naïve Bayes Classifier Leave-one-out cross-validation Naïve Bayes
Naïve Bayes Classifier Leave-one-out cross-validation Naïve Bayes
Predictions in The Context of 3-D Structures Pit-1, PDB 1au7 TP:30 FP: 16 TN: 86 FN:14 CC: 0.51 (2nd) Accuracy: 79% Actual Predicted
Predictions in The Context of 3-D Structures -Cro, PDB 6cro TP:10 FP: 5 TN: 34 FN:10 CC: 0.37 (19th) Accuracy: 73% Predicted Actual
Predictions Compared With PROSITE Motifs • Predicted binding sites substantially overlap with 34 of the 37 “DNA-binding” PROSITE motifs • In 52 of the 56 proteins, the predictor identifies at least 20% of the DNA-binding residues • 28 of the 56 proteins contain no PROSITE motifs that are annotated as “DNA-binding”
Comparison With Previous Study *Ahmad, S. and Sarai, A. (2005) PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics, 6, 33.
Summary • A simple sequence-based Naive Bayes classifier predicts interface residues in DNA-binding proteins with 75% accuracy, 37% specificity+, 53% sensitivity+ and correlation coefficient of 0.29 • Predicted binding sites • correctly indicate the locations of actual binding sites • substantially overlap with known PROSITE motifs
Problem II Identification of Helix-Turn-Helix (HTH) DNA-binding motifs
HTH Motifs • Sequences sharing low similarities can fold into a similar HTH structure • Identifying HTH motifs from sequence is extremely challenging
Trick 1 • Including more information • Amino acid sequence • Secondary structure
Hidden Markov Model (HMM) LQQITHIANQL-GLE----KDVVRVWF
Hidden Markov Model (HMM_AA_SS) LQQITHIANQL-GLE----KDVVRVWF HHHEEHEEEHMHE----HHEEMMEH
Trick 2 • There are similarities among the 20 naturally occurred amino acids • Reduced alphabets
Reduced Alphabets Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix elements as described in the text. (Murphy et al. 2000)
Cross-Families Evaluations • True positive: HTH motifs that are correctly identified as such. • False positive: Non-HTH motifs that are identified as HTH motifs. • The alphabet used to encode amino acid sequences.
Comparisons of HMM_AA_SS with FFAS03 in Cross-Family Evaluations