1 / 26

Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM)

Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM). Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas Fall 2009. Goal of the Dissertation.

jenny
Download Presentation

Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Local 3D Structure Prediction by Super Granule Support Vector Machines (Super GSVM) Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas Fall 2009

  2. Goal of the Dissertation • The main purpose is trying to obtain and extract protein sequence motifs information which are universally conserved and across protein family boundaries. • And then use these information to do Protein Local 3D Structure Prediction

  3. ResearchFlow Part1 Bioinformatics Knowledge and Dataset Collection Part2 Discovering Protein Sequence Motifs Part3 Motif Information Extraction Part4 Protein Local Tertiary Structure Prediction

  4. Data set

  5. HSSP matrix: 1b25

  6. HSSP matrix: 1b25

  7. HSSP matrix: 1b25

  8. Representation of Segment • Sliding window size: 9 • Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP. • More than 560,000 segments (413MB) are generated by this method. • DSSP: Obtain 2nd Structure information

  9. ResearchFlow Part1 Bioinformatics Knowledge and Dataset Collection Part2 Discovering Protein Sequence Motifs Part3 Motif Information Extraction Part4 Protein Local Tertiary Structure Prediction

  10. Original dataset Fuzzy C-Means Clustering Information Granule 1 ... Information Granule M New Improved or Greedy K-means Clustering ... New Improved or Greedy K-means Clustering Join Information Final Sequence Motifs Information Granular Computing Model

  11. Reduce Time-complexity Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days) Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days) (FCM exe time) (2.7 Days)

  12. Comparison of Quality Measures

  13. ResearchFlow Part1 Bioinformatics Knowledge and Dataset Collection Part2 Discovering Protein Sequence Motifs Part3 Motif Information Extraction Part4 Protein Local Tertiary Structure Prediction

  14. Super GSVM-FE Motivation • First, the information we try to generate is about sequence motifs, but the original input data are derived from whole protein sequences by a sliding window technique; • Second, during fuzzy c-means clustering, it has the ability to assign one segment to more than one information granule.

  15. Original dataset Fuzzy C-Means Clustering Information Granule 1 Information Granule M ... Five iterations of traditional K-maens Five iterations of traditional K-maens ... Greedy K-means Clustering Greedy K-means Clustering … For Each Cluster … For Each Cluster ... Ranking SVM Feature Elimination Ranking SVM Feature Elimination For Each Cluster For Each Cluster … … ... Collect Survived Segments Collect Survived Segments ... Greedy K-means Clustering Greedy K-means Clustering Join Information Final Sequence Motifs Information Super GSVM-FE Additional Portion

  16. Extracted Motif Information

  17. ResearchFlow Part1 Bioinformatics Knowledge and Dataset Collection Part2 Discovering Protein Sequence Motifs Part3 Motif Information Extraction Part4 Protein Local Tertiary Structure Prediction

  18. 3D information • 3D information is generated from PDB (Protein Data Bank), • an example of 1a3c PDB file

  19. 3D information • 3D information is generated from PDB (Protein Data Bank), • an example of 1a3c PDB file

  20. Testing Data • The latest release of PISCES includes 4345 PDB files. • Compare with the dataset in our experiment, 2419 PDB files are excluded. • Therefore, we regard our 2710 protein files as the training dataset and 2419 protein files as the independent testing dataset.

  21. Testing Data • We convert the testing dataset by the approach we introduced • more than 490,000 segments are generated as testing dataset.

  22. Training dataset Fuzzy C-Means Clustering Information Granule 1 Information Granule M ... Five iterations of traditional K-means Five iterations of traditional K-means ... Greedy K-means Clustering Greedy K-means Clustering … For Each Cluster … For Each Cluster ... Train Ranking SVM and then Eliminate 20% lower rank members Train Ranking SVM and then Eliminate 20% lower rank members Collect all extracted clusters and Ranking-SVMs Find the closest cluster within a given distance threshold Feed to the belonging SVM If the rank belongs to cluster Independent testing Dataset All Sequence clusters All Ranking SVMs Predict the local 3D structure If not, find the next closest cluster Super GSVM

  23. Prediction Accuracy

  24. Prediction Coverage

  25. Future Works • Incorporate Chou-Fasman parameter for SVM training

  26. Training dataset Fuzzy C-Means Clustering Information Granule 1 Information Granule M ... Five iterations of traditional K-means Five iterations of traditional K-means ... Greedy K-means Clustering Greedy K-means Clustering … For Each Cluster … For Each Cluster ... Build Decision Tree Build Decision Tree Collect all extracted clusters and Ranking-SVMs Find the closest cluster within a given distance threshold Feed to the belonging DT If the rank belongs to cluster Independent testing Dataset All Sequence clusters Test by DT Predict the local 3D structure If not, find the next closest cluster Future Works • For each cluster, instead of building SVM model, we build Decision Tree instead

More Related