160 likes | 270 Views
TRD 2 Update: An annotation scheme to foster reproducible NMR data analysis. Matt Fenwick, Eldon Ulrich, Michael Gryk. Overview of NMR spectral analysis. peak-picking: distinguishing S from N true positives from false resonance assignment NOESY peak assignment semi-automated
E N D
TRD 2 Update: An annotation scheme to foster reproducible NMR data analysis Matt Fenwick, Eldon Ulrich, Michael Gryk
Overview of NMR spectral analysis peak-picking: distinguishing S from N true positives from false resonance assignment NOESY peak assignment semi-automated - software tools - human intervention required human uses deductive process of reasoning - small set of rules/expectations (library) - deductions may be logically dependent on each other L10 + A5 +
Problem: Missing Data -> Irreproducible Much intermediate data is not saved / deposited - step order - logical dependencies - deductive reasoning - peculiarities found and their resolutions (unexpected, missing, extra peaks) final data - resonances, spin systems - extraneous data -- contaminants, noise, artifacts, anomalies ...
Missing Data: Spin Systems & Resonances NMR experiments are designed to exploit networks of coupled spins (spin systems). The assignment process is 2-step: (1) assign resonances to spin systems, (2) assign spin-systems to residues Resonance and spin-systems are not deposited. Images are from Protein NMR: A Practical Guide (http://www.protein-nmr.org.uk/)
Solution 1. capture process of reasoning - version control: capture intermediate states - model of commonly used deductive reasons - annotate changeset with deductive reasons 2. capture complete final data set - model for identifying problems - model for extraneous data - deposit full results
1. version control -- snapshots, commit message snapshots of intermediate states: enables backtracking, inspecting of past states describe difference between consecutive snapshots; summary, purpose, justification, questions, uncertainties
1. model of NMR deductive reasoning start with CCPN data model augment with library of common deductive reasons use deductive reasons to annotate commits
2. model: identify problems (distinguishing signal from noise; true positives, false positives, false negatives) facilitates re-interpretation, if additional data is collected, by pointing out trouble spots unassigned signal peak missing CB peaks of Gln sidechain
2. extraneous data, full results collaborate with BMRB: deposit full data sets extend NMR-Star data dictionary extend Sparky assignment program noise & artifact peaks, unassigned spin systems, contaminants, anomalies, ...
Review: Solution 1. process of reasoning - version control: capture intermediate states - model of commonly used deductive reasons - annotate changeset with deductive reasons 2. final data - model for identifying problems - model for extraneous data - deposit full results
Challenges? - human/computer optimization - simple enough for users to apply properly, vs. detailed enough that a program can understand complete context of an annotation - separate layers: use more/less detail as needed - (future) tools can increase level of detail without bogging humans down - future compatibility - library of annotations provides “guidance”; extensions can be trivially added by augmenting library - if there’s a problem with the library of annotations, can fix by extending (providing a new, similar annotation) - tooling - Sparky
Annotation Mock up (STAR-like format) loop_ # spin-system/amino-acid-type assignment _SSAA_Assn.ID _SSAA_Assn.SS_ID _SSAA_Assn.AA_ID ... ... 101 52 Alanine stop_ loop_ # peak/spin-system assignment _Peak_SS_Assn.ID _Peak_SS_Assn.SS_ID _Peak_SS_Assn.Peak_ID _Peak_SS_Assn.Peak_Spectrum ... ... 175 52 124 HNCACB 176 52 125 HNCACB 177 52 126 HNCACB 178 52 127 HNCACB stop_ save_ data_example save_assign loop_ # tags _Tag.ID _Tag.Parent_ID ... ... 24 23 stop_ loop_ # reasons used _Tag_Reason.ID _Tag_Reason.Tag_ID _Tag_Reasons.Name ... ... 73 24 "BMRB statistics" 74 24 "chemical shift grouping" stop_
Impact - reproducibility - error detection - error correction - collaboration - sharing - learning - analysis quality - amenability to future analysis
Appendix: NMR phenomena: grouping resonances based on chemical shift
Appendix: extraneous data: processing artifacts, spurious peaks
Appendix: Library examples Asn sidechain Ala backbone sequential spin systems