1 / 9

k -Nearest-Neighbors Problem

k -Nearest-Neighbors Problem. cRMSD. cRMSD(c,c ’ ) is the minimized RMSD between the two sets of atom centers: min T [(1/n) S i=1, … ,n ||a i (c) – T(a i (c’))|| 2 ] 1/2 where the minimization is over all possible rigid-body transform T. k -Nearest-Neighbors Complexity.

Download Presentation

k -Nearest-Neighbors Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. k-Nearest-Neighbors Problem

  2. cRMSD • cRMSD(c,c’) is the minimized RMSD between the two sets of atom centers:minT[(1/n)Si=1,…,n||ai(c)– T(ai(c’))||2]1/2 where the minimization is over all possible rigid-body transform T

  3. k-Nearest-Neighbors Complexity • O(N2(log k + L)) • N number of protein conformations to be compared • K number of nearest neighbors • L time to compare two conformations (cRMSD takes linear time). • Solution reduce L by reducing the number of centers to compare -> m- averaging

  4. m-Averaged Approximation • Cut the backbone into fragments of m Ca atoms • Replace each fragment by the centroid of the Ca atoms

  5. Evaluation: Test Sets[Lotan and Schwarzer, 2003] • FOLDTRAJ random partially unfolded structures -> good correlation with small m (few long segments) • Park-Levitt set [Park et al, 1997] compact native-like structures -> good correlation with large m (many short segments) • Use smaller m on unfolded proteins for greater time savings

  6. Flexible m-averaging • ProteinA 47 residues • 14 < rgyr < 24 • 6 < m < 12 rgyr

  7. Results • Overhead for calculating and m-averaged structures and rgyration too high • Without averaging 28 sec and for all constant m’s 1 min • With flexible average 2 mins 20 sec • Easily fixed by precalculating rgyr and structures

  8. F U Uses

  9. Conclusions • Flexible m-averaging can save time (without sacrificing accuracy?) • Useful for quickly finding k nearest neighbors and building roadmaps • Precalculate m-averaged structures and rgyration for greater speed up

More Related