270 likes | 362 Views
Scientific Benchmarks for Structure Prediction Codes. Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill. With thanks to:. Collaborators Brian Kuhlman, UNC Biochem Many other members of the RosettaCommons Richardson lab, Duke Biochem Funding NIH NSF. Key Points….
E N D
Scientific Benchmarks for Structure Prediction Codes Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill
With thanks to: Collaborators • Brian Kuhlman, UNC Biochem • Many other members of the RosettaCommons • Richardson lab, Duke Biochem Funding • NIH • NSF
Key Points… • Scientific Models, esp. for Structural Molecular Biology • Models are the lens through which we view data • Models are predominantly geometric • Computational models are complex • Models evolve, so testing becomes crucial • Focus on statistical/computational models with • a sample source, observable local features, chosen functional form, fit parameters, & visualization/testing methods • Capture assumptions and date used to build models to: • Visualize for making design decisions while building • Fit parameters to ensure best performance • Record as scientific benchmarks Case Study: Rosetta protein structure prediction software [B]
Model complexity • Physical and Conceptual models • Kept simple to aid understanding • Statistical and Computational models • Evolve by combining simple models • Even when complex can still be effective atValidation (Molprobity) or Prediction (Rosetta)
Computational model life cycle Spiral development, much like software • Discover problematic features in some data • Create an energy function to adjust them • Fit parameters to improve results • Check into the software as a new option • Make default option if everyone likes it • Occasionally refactor and rewrite, removing outdated or unused models But less support for testing…
Computational model testing Our goal: Capture data and assumptions from model building for use in model visualization and testing.
Our computational models Abstraction: A simple component of a complex computational model consists of: • One or more sample sources giving • Pdb files from native or decoys • Observable local features having a • Hydrogen bond distances and angles • Chosen functional form that • Energy from distances and angles • Depends on fittingparameters • Weights for combining terms KMB’03
Tool schematic data set A gather features data set B . . . data set Z plots SQL query filter transform statistics ggplot2 spec
Visualization Implemented tools • Compare distributions from sample sources • Tufte’s small multiples via ggplot • Kernel density estimation • Normalization Opportunities for • Statistical analysis • Dimension reduction …
Normalization [KMB’03]Histogram of Hbond A-H distances in natives
Tool uses… Scientific unit tests native, HEAD, ^HEAD run on continuously testing server Knowledge-base score term creation native, release, experimental turn exploration into living benchmarks Test design hypotheses native, protocol, designs how strange is the this geometry?
Key Points… • Scientific Models, esp. for Structural Molecular Biology • Models are the lens through which we view data • Models are predominantly geometric • Computational models are complex • Models evolve, so testing becomes crucial • Focus on statistical/computational models with • a sample source, observable local features, chosen functional form, fit parameters, & visualization/testing methods • Capture assumptions and date used to build models to: • Visualize for making design decisions while building • Fit parameters to ensure best performance • Record as scientific benchmarks Case Study: Rosetta protein structure prediction software [B]