290 likes | 424 Views
TEXTAL: A System for Automated Model Building Based on Pattern Recognition. Thomas R. Ioerger Department of Computer Science Texas A&M University. Main Stages of TEXTAL. electron density map. CAPRA. build-in side-chain and main-chain atoms locally around each CA. C-alpha chains.
E N D
TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University
Main Stages of TEXTAL electron density map CAPRA build-in side-chain and main-chain atoms locally around each CA C-alpha chains Reciprocal-space refinement/ML DM LOOKUP example: real-space refinement model (initial coordinates) Human Crystallographer (editing) Post-processing routines model (final coordinates)
F=<1.72,-0.39,1.04,1.55...> F=<1.58,0.18,1.09,-0.25...> F=<0.90,0.65,-1.40,0.87...> F=<1.79,-0.43,0.88,1.52...>
Overview of CAPRA • goal: predict CA chains from density map • not just “tracing” - more than Bones • desire 1:1 correspondence, ~3.8A apart • based on principles of pattern recognition • use neural net to estimate which pseudo-atoms in trace “look” closest to true C-alphas • use feature extraction to capture 3D patterns in density for input to neural net • use other heuristics for “linking” together into chains, including geometric analysis (s.s.)
CAPRA: C-Alpha Pattern-Recognition Algorithm • Tracer - remove lattice points from map (lowest density first) without breaking connectivity • Neural nework - for each pseudo atom, extract features, input to network, predict distances to CAs (1:10 in trace), trained on example points in real maps • Linking - desire long chains, good CA predictions (not in side-chains), “structurally plausible” (e.g. linear, helical) Density Trace Neural Network Linking into C-alpha chains map pseudo atoms predictions of distance to true CA C-alpha coordinates
Tracer + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Feature Extraction • characterize 3D patterns in local density • must be “rotation invariant” • examples: • average density in region • standard deviation, kurtosis... • distance to center of mass • moments of inertia, ratios of moments • “spoke angles” • calculated over spheres of 3A and 4A radius
Forward Propagation: Backward Propagation:
Selection of Candidate C-alpha’s • method: • pick candidates in order of lowest predicted distance first, • among all pseudo-atoms in trace, • as long as not closer than 2.5A • notes: • no 3.8A constraint; distance can be as high as 5A • don’t rely on branch points (though often near) • picked in random order throughout map • initially covers whole map, including side-chains and disconnected regions (e.g. noise in solvent)
Linking into Chains • initial connectivity of CA candidates based on the trace • “over-connected” graph - branches, cycles... • start by computing connected components (islands, or clusters) • two strategies: • for small clusters (<=20 candidates), find longest internal chain with “good” atoms • for large clusters (>20 candidates), incrementally clip branch points using heuristics
Extracting Chains from Small Clusters • exhaustive depth-first search of all paths • scoring function: • length • penalty for inclusion of points with high predicted distance to true CA by neural net • preference for following secondary structure (locally straight or helical)
Secondary Structure Analysis • generate all 7-mers (connected fragments of candidate CAs of length 7) • evaluate “straightness” • ratio of sum of link lengths to end-to-end distance • straightness>0.8 ==> potential beta-strand • evaluate “helicity” • average absolute deviation of angles and torsions along 7-mer from ideal values (95º and 50º) • helicity<20 ==> potential alpha-helix
Handling Large Clusters • start by breaking cycles (near “bad” atoms) • clip links at branch points till only linear chains remain • clip the most “obvious” links first, e.g. • if other two links are part of sec. struct. • if clipped branch has “bad” atom nearby • if clipped branch is small and other 2 are large ? ? ?
Availability • Textal web site: • http://textal.tamu.edu:12321 • server-side processing • free access to Capra • beta-testing of Textal • To contact us, email: textal@tamu.edu
Acknowledgements • Funding • National Institutes of Health • Welch Foundation • People • Dr. James C. Sacchettini • The rest of the TEXTAL Group: • Tod Romo • Kreshna Gopal • Reetal Pai