NMRbox: TRD 3

NMRbox: TRD 3 • A probabilistic core as a coherent inference engine • PINE+ Core • Extend functionality through the new core • PINE+: Assignment, use of structure data, RNA • Reproducibility – “revisable executable documentation” • Side-by-side, reproducible, probabilistic and deterministic

Why Bayes? Bayes: Extremizes the log-likelihood. Consistent & extensible determination of parameters argmin Etotal = Eempirical + λEexperimental Constrain: NOEs not distances. J’s not dihedralsetc. Consistency + Extensibility Reproducibility

What is PINE and PINE Core? • PINE is the probabilistic Bayesian engine for • Data verification (LACS) • Secondary structure determination (PECAN) • Assignment (PISTACHIO) • Peak picking (HIFI) • Bayesian updating (PINE) • Bayesian updating + deterministic constraints (ADAPT-NMR) • RNA assignment (RNA-Pairs) LACS PECAN PISTACHIO HI FI ADAPT-NMR HI FI PISTACHIO PISTACHIO PECAN PECAN LACS LACS

Corollary • In our view, Pine has been a highly successful probabilistic model.The inference core has proven to be very powerful – an excellent setting for building a structure determination protocol for: • Repeatability and Reproducibility • Coexistence of probabilistic and deterministic paradigms • Extensibility to support new functions through plugins • Additionally empowering: • Revisable executable documentation • Document as you go, execute any time, reuse.

PINE+ • PINE+ is the collection of applications • Containing a core probabilistic engine • The core will support all existing tools • HIFI, PINE, PECAN, ADAPT-NMR, LACS, RNA-PAIRS • And, New assignment tools • NOESY, RNA, Structure-based • PINE+ will support probabilistic and deterministic approaches • PINE+ Will additionally enable reproducible results • Playback of blocks or the entire project (repeatability)* • Edit & Replay of blocks or the entire project (reproducible)* • Template (reuse)

Key Innovations • The complete process of structure determination is captured and executed using one language, one environment, one paradigm, in open source form, using open technologies. • As the user steps through structure determination steps, a living “executable document” is created – repeatability. • “Blocks” of a structure determination “document” can be modified with new data or process – reproducibility. • A “document” can be used as a “template”; replacing, or modifying blocks as needed – reuse.

Transition, Transformation, Synergy TRD 2: NMRSTAR TRD 1: Virtual Machine TRD 2: Workflow & Versions TRD 1: External applications & built-in applications Current architecture

Impact • Opportunity for • Enabling repeatable and reproducible structure determination • Empowering other packages with probabilistic functionality • Supporting side-by side deterministic and probabilistic tools • Creating end-to-end open source tools • Built-in probabilistic functions for • Peak picking, backbone assignment, side-chain assignments, secondary structure prediction, verification, NOESY assignment, RNA assignment, pipeline/network definition, scientific documentation, capture and replay, edit and execute.

Progress to date • Things under the hood • Environment • Protocol • Engine • Demonstration • Interface • Desktop PINE • Executable document

Progress in context

Things under the hood • Environment • High performance, dynamic, distributed, parallel, scientific computing • Protocol • JSON for specification and exchange, Pandoc for data transformation • Engine • Inference-as-a-service using a resource-oriented model • Simple (6 components)relation, learner, predictor, transformer, attribute, schema

Demo • Show interface and interaction • Start a PINE job • Display output of PINE • Show block idea • Show repeatability idea

NMRbox: TRD 3

NMRbox: TRD 3

Presentation Transcript

Dissemination