230 likes | 481 Views
Prediction of Protein Inter-Domain Linkers Using Compositional Index and Simulated Annealing. College of Information Technology United Arab Emirates University (UAEU) UAE nzaki@uaeu.ac.ae. Maad Shatnawi and Nazar Zaki. Nazar Zaki. Amsterdam, The Netherlands, July 06-10, 2013. Outline.
E N D
Prediction of Protein Inter-Domain Linkers Using Compositional Index and Simulated Annealing College of Information Technology United Arab Emirates University (UAEU) UAE nzaki@uaeu.ac.ae • Maad Shatnawi and Nazar Zaki Nazar Zaki Amsterdam, The Netherlands, July 06-10, 2013
Outline • Introduction • Existing methods • Proposed solution • Method • Compositional index • SA optimization • Experimental results • Conclusion and future directions
Introduction • Proteins have two types of segments: domains and linkers • Predicting inter-domain linkers is very important • Accurate identification of functional domains • Less computational cost • Classify proteins, Predict PPI, fold prediction, transmembrane, etc
Proposed solution • Our approach consists of two main steps: • Calculation of the compositional index • Employing Simulated Annealing to refine the prediction
Compositional index Calculate the averaged compositional index values
Compositional index Calculate the averaged compositional index values Domain Linker (12-35), , Threshold = 0,
Compositional index (Illustration) >1LGH:B (AERSLSGLTEEEAIAVHDQFKTTFSAFIILAAVAHVLVWVWKPWF) • Window size 5.
Compositional index (Illustration) >1LGH:B (AERSLSGLTEEEAIAVHDQFKTTFSAFIILAAVAHVLVWVWKPWF) • Window size 5.
Compositional index (Illustration) >1LGH:B (AERSLSGLTEEEAIAVHDQFKTTFSAFIILAAVAHVLVWVWKPWF) • Window size 5.
Compositional index (Illustration) >1LGH:B (AERSLSGLTEEEAIAVHDQFKTTFSAFIILAAVAHVLVWVWKPWF) • Window size 5.
Compositional index (Illustration) Dynamic threshold is needed
Why Simulated Annealing (SA)? • A protein sequence is seen as a set of sequence chunks. • Each chunk would have its proper dynamic threshold value. • This is a search problem of a set of dynamic threshold values. • In other terms: partitioning a given set of positive real numbers into k subsets (kis unknown) so as to maximize an objective function. • SA is known to be well adopted for partitioning problem • An intuitive customization is straightforward
SA Customization • AS is a probabilistic searching method for the global optimization of a given function in a large search space. • Inspired by the annealing technique which is the heating and controlled cooling of a metal to increase the size of its crystals and reduce their defects. • Ability to avoid being trapped in local optima. • SA algorithms are usually better than greedy algorithms, when it comes to problems that have numerous locally optimum solutions.
SA Optimization • Divide each protein sequence into segments. The segment size was set to the average linker size among the dataset. • Start from a random threshold value for each segment (starting 0.1) • Calculate the AA compositional index of the input protein sequence. • Classify each AA as linker or domain according to its compositional index value with respect to the corresponding segment threshold. • Calculate recall and precision. • Randomly increase or decrease the threshold value of a segment. • SA accepts or rejects the transition in order to maximize both the recall and precision of the linker segment prediction. Optimal threshold values for XYNA_THENE protein sequence in DomCut dataset which contains 133 AAS
Evaluation Measures • Recall is the proportion of correctly predicted linkers to all of the structure-derived linkers listed in the dataset • Precision is defined as the proportion of correctly predicted linkers to all of the predicted linkers
Experimental Results Datasets
Experimental Results Applying the proposed method on Dataset (1)
Experimental Results Applying the proposed method on Dataset (2)
Conclusion • We examined the amino acid compositional index to predict protein inter-domain linker segments from amino acid sequence information. • We employed simulated annealing to improve the prediction by finding the optimal set of threshold values that separate domains from linker segments. • Experimental results show that the proposed method outperformed the currently available approaches for inter-domain linker prediction in terms of recall and precision.
Conclusion • This work can be extended by examining different sliding window sizes in computing AA compositional index. • Additional SA parameter tuning and use of dynamic segment sizes. • Combine compositional index with other features such as PSSM, AA physiochemical properties, hydrophobicity can be examined.