1 / 22

Electronic Notebooks: An Interface Component for Semantic Records Systems

Electronic Notebooks: An Interface Component for Semantic Records Systems. James D. Myers , Michael Peterson, K Prasad Saripalli, Tara Talbott Mathematics and Computational Science Directorate Pacific Northwest National Laboratory. Outline. Why have an electronic notebook?

tavi
Download Presentation

Electronic Notebooks: An Interface Component for Semantic Records Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Electronic Notebooks: An Interface Component for Semantic Records Systems James D. Myers, Michael Peterson, K Prasad Saripalli, Tara Talbott Mathematics and Computational Science Directorate Pacific Northwest National Laboratory

  2. Outline • Why have an electronic notebook? • The changing science/IT landscape • Semantic repositories • Scientific Annotation Middleware • ENs on semantic repositories • The ELN on SAM 2

  3. PNNL Electronic Laboratory Notebook (ELN)~1995+ • Secure shared WWW based space • Hierarchical Chapters/Pages/Notes • Add/View/Search Notes • File upload, sketch, text, equations, forms, image capture, … • Interactive views of data • Editor/Viewer APIs • Cross-out capability • Digital Signatures/Timestamps • Java Client, Perl and Java (2001+) servers • … 3

  4. What distinguishes ENs from other tools? • Emphasis on multimedia human-entered information • Chronological, page-oriented display • Master/personal project record • Records functionality: • Non-repudiation - digital signatures and timestamps • Persistence/completeness - write-once/no deletions/audit trail • Standardized lifecycle – signing/witnessing policies, archiving, retention schedules, … 4

  5. Community Resources Bi-directional flow/feedback of information Partial results being combined to produce new knowledge Experiment/Theory/Model comparisons Multiscale optimizations Rapid Evolution High Complexity Shifting/Emerging disciplinary boundaries Resources will be distributed With multiple curators The Systems Science Revolution 5

  6. Advances in Problem Solving Environments/Grids/Semantic Technologies • Multiple Applications recording data Pedigree/Provenance • Experiment Metadata • Project Organization • Workflow • Categorization • Detected Features • Instrument logs • … • Replica Locations • Endorsements • Community Annotations • … • How do we provide EN capabilities in this larger context? 6

  7. Semantic Repositories • Use self-describing metadata/relationships • Triple-stores • RDF • OWL • Aggregate information generated by multiple applications • Allows browsing, searching, reasoning across integrated information 7

  8. Scientific Annotation Middleware (SAM) - 5 yr DOE funded research project • Develop middleware to create semantic repositories • Enable the sharing of this information among portals and problem solving environments, software agents, scientific applications, and electronic notebooks • With different levels of sophistication • Without global schema • Improve the completeness, accuracy, and availability of the scientific record. http://www.scidac.org/SAM/ 8

  9. Database Web SAM Architecture Notebook Services Semantic Services DAV, JDBC, GridFTP DAV, DASL, JMS, SAM Extensions Metadata Services DataGrid 9

  10. Web Distributed Authoring and Versioning (WebDAV) • An early web service • Put/Get data with arbitrary properties (dynamic) • Properties can be discovered and accessed independently • DASL, Versioning, Transactions, … • Widely supported (MS Office, databases, file system drivers,…) 10

  11. File Format 1 BFD Parser BFD Description 1 XML Format XML Format XSLT 1 2 Processor XSL Stylesheet (reformat) Binary Format Description (BFD) Language • XML Language to describe ASCII, Binary, and XML data formats • Generic Parser to extract and semantically tag data in files/streams • The meaning of data can be captured, regardless of format, for future use • Data Format Description Language Standard <XSIL> <Param Name=“units”>meters</Param> <Param Name=“numColumns” Type="int“/> <Vector Name=“orbitData”> <Dim><XBFDvalue-of select="/XSIL/Param [@Name='numColumns']"/> </Dim> <Dim>4</Dim> </Vector> <Stream Type=“remote” XBFDStreamnumber=“0” Encoding = “biinary”/> </XSIL> 11

  12. SAM Metadata Services Layer • Jakarta Slide DAV server plus configurable: • Mapping to Data Store(s) • Property Generation from binary/ASCII/XML files • Dynamic Virtual Translations • Server generated Properties and Relationships • Timestamp, size, CopyOf DAV+ RDF Export Fortran Application … ELNProp1 Prop2 Translated Content BFD Web Service XSLT Prop1 hastranslation ‘Local Disk’ Content DAV 12

  13. SAM Semantic Services Layer • SAM Metadata Layer plus configurable: • Relation-scoped Queries • Translation of DAV Properties to RDF Triples • RDF/GXL Pedigree Generation • … 13

  14. Back to ENs… • What is needed to be able to provide • Unstructured human entry of information? • Chronological, page-oriented display? • A master/personal project record? • Records functionality? 14

  15. Creating Notes • A ‘standard’ ELN client can create notes • Stored as content with a hasNote relationship with pages, notes • Plus…any app can store notes the same way • Page generation – works as before 15

  16. ENs as a Primary View? • Instruments, PSEs, etc. may organize parts of the experiment that an EN should not duplicate define other relationships as part of the EN chapter/page/note hierarchy: Notebook1 Chapter1 Chapter2 Page1 Page2 Project Experiment1 Experiment2 Data1 Data2  Defined by PSE Interpreted by EN 16

  17. Records? • Digital Signatures, Timestamps, etc. are services that can be exposed as repository services and associated metadata • But • What do we sign (content/metadata)? • Where is the edge of the record? • How deep do we travel through the web of relations? • How do we stop other applications from changing/deleting signed content? 17

  18. Multiple Options • Simple: • Sign content plus defined subset of metadata • Stop at edge of server • Treat relationship cycles as links • Lock content and metadata subset when signed • Advanced: • Multiple self-describing signatures (e.g. XMLSignature) • Allow records across servers via trust, cached metadata/data • Define fine-grained retention schedules 18

  19. SAM Notebook Services Layer • SAM Metadata and Semantic Layers plus: • Notebook Management, Page Display, … • Digital Signatures • Canonicalization • Notarized Timestamps • Data/Signature Migration Capabilities • Notebook API, Notebook Components • Supports ELN 5.1, Annotation Applet, new portal-based EN client EN Portlets 19

  20. Collaborations • Collaboratory for Multiscale Chemical Science (CMCS) • SAM as primary data system, pedigree, notebook • NEESgrid/CHEF Portal/NMI Grid User Computing Environments • ELN, SAM as a metadata/pedigree store? • Genomes-To-Life • SAM as annotation/metadata repository, notebook • Internal PNNL Projects • Concept Map Repository, Interface to Lustre, Biological Data Annotation • DOE2000 Notebook Community (1500+ email addresses) • Upgrades to DOE2K Notebooks • E.g. Columbia University Environmental Science Lab Notebooks 20

  21. A Scientific Content Repository Vision • Notebooks become just one view of the scientific information • Applications contribute data, metadata, and relationships directly • Records functionality provided by middleware, available to multiple applications • Content is stored in multiple repositories managed independently • The scientific record becomes richer and re-integrated 21

  22. Mathematical, Information and Computational Sciences Division of the Office of Science Acknowledgments • Carina Lansing, PNNL • Al Geist, Jens Schwidder, David Jung, ORNL • U.S. Department of Energy • Pacific Northwest National Laboratory • Pacific Northwest National Laboratory is a multiprogram national laboratory operated by Battelle Memorial Institute for the U.S. Department of Energy under Contract DE-AC06-76RL0 1830 • Oak Ridge National Laboratory • Oak Ridge National Laboratory is a multiprogram national laboratory operated by UT-Battelle, LLC for the U.S. Department of Energy under Contract DE-AC05-00OR22725 22

More Related