1 / 23

College of Information Technology United Arab Emirates University (UAEU) UAE nzaki@uaeu.ac.ae

Prediction of Protein Inter-Domain Linkers Using Compositional Index and Simulated Annealing. College of Information Technology United Arab Emirates University (UAEU) UAE nzaki@uaeu.ac.ae. Maad Shatnawi and Nazar Zaki. Nazar Zaki. Amsterdam, The Netherlands, July 06-10, 2013. Outline.

hisoki
Download Presentation

College of Information Technology United Arab Emirates University (UAEU) UAE nzaki@uaeu.ac.ae

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction of Protein Inter-Domain Linkers Using Compositional Index and Simulated Annealing College of Information Technology United Arab Emirates University (UAEU) UAE nzaki@uaeu.ac.ae • Maad Shatnawi and Nazar Zaki Nazar Zaki Amsterdam, The Netherlands, July 06-10, 2013

  2. Outline • Introduction • Existing methods • Proposed solution • Method • Compositional index • SA optimization • Experimental results • Conclusion and future directions

  3. Introduction • Proteins have two types of segments: domains and linkers • Predicting inter-domain linkers is very important • Accurate identification of functional domains • Less computational cost • Classify proteins, Predict PPI, fold prediction, transmembrane, etc

  4. Existing methods

  5. Proposed solution • Our approach consists of two main steps: • Calculation of the compositional index • Employing Simulated Annealing to refine the prediction

  6. Compositional index Calculate the averaged compositional index values

  7. Compositional index Calculate the averaged compositional index values Domain Linker (12-35), , Threshold = 0,

  8. Compositional index

  9. Compositional index (Illustration) >1LGH:B (AERSLSGLTEEEAIAVHDQFKTTFSAFIILAAVAHVLVWVWKPWF) • Window size 5.

  10. Compositional index (Illustration) >1LGH:B (AERSLSGLTEEEAIAVHDQFKTTFSAFIILAAVAHVLVWVWKPWF) • Window size 5.

  11. Compositional index (Illustration) >1LGH:B (AERSLSGLTEEEAIAVHDQFKTTFSAFIILAAVAHVLVWVWKPWF) • Window size 5.

  12. Compositional index (Illustration) >1LGH:B (AERSLSGLTEEEAIAVHDQFKTTFSAFIILAAVAHVLVWVWKPWF) • Window size 5.

  13. Compositional index (Illustration) Dynamic threshold is needed

  14. Why Simulated Annealing (SA)? • A protein sequence is seen as a set of sequence chunks. • Each chunk would have its proper dynamic threshold value. • This is a search problem of a set of dynamic threshold values. • In other terms: partitioning a given set of positive real numbers into k subsets (kis unknown) so as to maximize an objective function. • SA is known to be well adopted for partitioning problem • An intuitive customization is straightforward

  15. SA Customization • AS is a probabilistic searching method for the global optimization of a given function in a large search space. • Inspired by the annealing technique which is the heating and controlled cooling of a metal to increase the size of its crystals and reduce their defects. • Ability to avoid being trapped in local optima. • SA algorithms are usually better than greedy algorithms, when it comes to problems that have numerous locally optimum solutions.

  16. SA Optimization • Divide each protein sequence into segments. The segment size was set to the average linker size among the dataset. • Start from a random threshold value for each segment (starting 0.1) • Calculate the AA compositional index of the input protein sequence. • Classify each AA as linker or domain according to its compositional index value with respect to the corresponding segment threshold. • Calculate recall and precision. • Randomly increase or decrease the threshold value of a segment. • SA accepts or rejects the transition in order to maximize both the recall and precision of the linker segment prediction. Optimal threshold values for XYNA_THENE protein sequence in DomCut dataset which contains 133 AAS

  17. Evaluation Measures • Recall is the proportion of correctly predicted linkers to all of the structure-derived linkers listed in the dataset • Precision is defined as the proportion of correctly predicted linkers to all of the predicted linkers

  18. Experimental Results Datasets

  19. Experimental Results Applying the proposed method on Dataset (1)

  20. Experimental Results Applying the proposed method on Dataset (2)

  21. Conclusion • We examined the amino acid compositional index to predict protein inter-domain linker segments from amino acid sequence information. • We employed simulated annealing to improve the prediction by finding the optimal set of threshold values that separate domains from linker segments. • Experimental results show that the proposed method outperformed the currently available approaches for inter-domain linker prediction in terms of recall and precision.

  22. Conclusion • This work can be extended by examining different sliding window sizes in computing AA compositional index. • Additional SA parameter tuning and use of dynamic segment sizes. • Combine compositional index with other features such as PSSM, AA physiochemical properties, hydrophobicity  can be examined.

  23. Thank you

More Related