100 likes | 222 Views
Capture Disparities Highlighted by Provenance Datasets. G. Blake Coe + , R. Christopher Doty * , M. David Allen + , Adriane P. Chapman + + ( gcoe , dmallen , achapman )@ mitre.org * chris.doty@library.gatech.edu. Capture. When a single system is used, provenance is Complete
E N D
Capture Disparities Highlighted by Provenance Datasets G. Blake Coe+, R. Christopher Doty*, M. David Allen+, Adriane P. Chapman+ +(gcoe, dmallen, achapman)@mitre.org *chris.doty@library.gatech.edu
Capture • When a single system is used, provenance is • Complete • Obtained by one capture point • Options of granularity • High quality • E.g. can see payload in addition to envelope
PLUS Real World Capture is Complicated PASS Taverna Coordination points for automatic provenance capture
The Results of Real World Capture Are Complicated • There may be multiple, independent observations of the same thing • sending system observes transmission of M; receiving system Y observes the receipt of M. • Data or processes may be observed via different technical channels • program generates a file M on disk; months later, user receives email with attachment M • Disconnection and duplication occurs
What do we need when Capture is Messy? • Ability to identify when multiple, independent observations of the same thing occurs • And what to do with that information • Ability to go “up and down” in the provenance abstraction • E.g. work over all of the low-level OS provenance, or just the application-generated provenance, etc. • Ability to function over incomplete provenance graphs • Impossible to work on these issues when the only provenance datasets are generated via single or heterogeneous systems
The Datasets Provided • Actually 3 Datasets! • Complete • App-Based • User Monitor • Show the differences in provenance record when different capture agents are used • In PROV-XX App-Based User Monitor Complete • Related across datasets • E.g. the first G1Complete is the same set of actions as G1App-Based is the same as G1UserMonitor
Complete • “Uber User View” a.k.a. “what Actually happened. • Based on a use case observed within the Georgia Tech Library system Save links App: Word, SharePoint User: Alice Web Data Browse App: Firefox User: Alice Web Data Web Data Notes.txt Web Data Web Data Web Data Email from Prof Email from Mom Create Summary App: Word, SharePoint User: Alice View App: Outlook User: Alice Email from Prof View App: Outlook User: Bob Review App: Word SharePoint User: Bob Summary.doc Publish App: SharePoint User: Cathy AboutInstitution Summary.doc’
App-Based • Capture points in • SharePoint • Firefox • Notice not all processes observable! Web Data Browse App: Firefox User: Alice Save links App: Word SharePoint User: Alice Web Data Web Data Notes.txt Web Data Web Data Web Data Create Summary App: Word, SharePoint User: Alice Review App: Word SharePoint User: Bob Summary.doc Publish App: SharePoint User: Cathy AboutInstitution Summary.doc’
Save links App: Word, SharePoint User: Alice Web Data Browse App: Firefox User: Alice Web Data Web Data Notes.txt Web Data User Monitor Web Data Web Data Email from Prof Email from Mom Create Summary App: Word, SharePoint User: Alice View App: Outlook User: Alice Email from Prof View App: Outlook User: Bob Review App: Word SharePoint User: Bob Summary.doc • Capture Point in SpectorSoft • User Monitoring Software • Notice that all applications are seen, but edges lacking Publish App: SharePoint User: Cathy AboutInstitution Summary.doc’
Conclusions • Providing a set of related datasets and workload queries to facilitate research on provenance capture and system interoperability • PLUS, and some capture agents, can be found at https://github.com/plus-provenance/plus • Datasets can be found at ProvBenchhttps://github.com/provbench