1 / 11

Towards Bootstrapping Knowledge-Based Archives*

Towards Bootstrapping Knowledge-Based Archives*. Bertram Lud ä scher Richard Marciano Reagan Moore San Diego Supercomputer Center {ludaesch,marciano,moore}@sdsc.edu.

Download Presentation

Towards Bootstrapping Knowledge-Based Archives*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Bootstrapping Knowledge-Based Archives* Bertram Ludäscher Richard Marciano Reagan Moore San Diego Supercomputer Center {ludaesch,marciano,moore}@sdsc.edu *Towards Self-Validating Knowledge-Based Archives, Bertram Ludäscher, Richard Marciano, Reagan Moore, 11th Workshop on Research Issues in Data Engineering(RIDE), Heidelberg, IEEE Computer Society, April 2001

  2. Archival Processes and Functions • Data submission/accessioning: • loop: information producer <==> "archival engineer" • Ingestion: • a sequence of information preserving transformations is applied to submitted "raw data" => ingestion network • Migration: • ... as time goes by ... • ... migrate to new physical media, maybe data formats, information model ... • "easy migration" <=> "good" archival format & model • Instantiation/Access: • revive/reanimate the archive => queryable collection/database • GOAL: preserve information! • Right!?

  3. What is it that we try to archive?? • Information hierarchies: • data ... information ... knowledge ... (aka: the big picture!) • instance ... schema ... model ... metamodel ... metametamodel ... • linear syntax ... data structure ... data model ... conceptual model ... • Static vs. dynamic information: • extensional data ... intensional/virtual/derived data (facts/rules) • data ... functions/programs • Managing complexity • layered approach: "protocol stack" (cf. ISO/OSI, "SemanticWeb", communication in general) ==going up==> aggregate/abstract

  4. OAIS (Open Archival Information System) Information Model • info(rmation)_object~data_object+representation_info • data_object~digital_object+physical_object • digital_object~[bits] • representation_info~structure_info + semantic_info • representation_infois_interpreted_usingrepresentation_info • an AIP (archival information package) contains content info_objects + PDI (preservation description information) • knowledge-level extension: data objects (e.g., RTF/HTML/... formatted objects) =wrapping/tagging=>information objects (e.g., XML docs + DTD/Schema) =knowledge extraction/semantic annotation=>semantic/conceptual objects (e.g., declarative OO model + rules)

  5. Ingestion Networks Transformation t is information preserving, if there is an inverse transformation t_inv, s.t., for all d in dom(t): t_inv( t( d ) ) = d . • asking for "=" at the level of raw (unwrapped) data may be too strict: • => lift to the information level; make sure information is preserved there • e.g., mapping back to HTML using XSL(T) can give the same "look and feel" as the raw data; but presentational HTML "noise" (irregularities) is removed

  6. .TM S7 generate generate .XML .XML S2 .HTML S6 S5 save-as consolidate archive save-as Perl OmniMark .DOC .RTF .XML .OAV decompose S0 S1 S3 S4 Legend (stages): SIP DIP AIP Ingestion Network: Senate Collection

  7. From XML-Based to Knowledge-Based Archives... • XML/collection-based archival: save data "as is" plus... • ... separate content from presentation • ... tag your data (take a lift in the info hierarchy) • ... use a self-describing, semistructured data format (XML) • Knowledge-based archival: add ... • ... conceptual level information • ... integrity constraints • ... explanations/derivation rules: • archiving only resultsy=f(x) vs. archiving the rules/function "f" (e.g. f = Florida ...) • … knowledge representation (rules) ~ metadata on steroids ...

  8. ... to Self-Validating, Self-Instantiating Knowledge-Based Archives • Goal: self-contained archive • Limitations: how much context can you drag into your archive to make it self-contained?? (...Dublin Core … human history) • Using open, infrastructure independent representations... => make the archive as self-contained as you can ... … pay for …

  9. Maximizing “Self-Containedness” • Self-validating archives: add ... • ... "executable knowledge" (=rules) • "helping (bugging?) the data provider" => add the functionality and meaning of DTD (+Schema+IC+...) validation to the AIP => package the validator! • Self-instantiating archives: add ... • ... "executable ingestion process" • “helping the archival engineer (aka archivist)” • …here is looking over your shoulder… => add the functionality of database transformations to the AIP => package the transformers! • BUT packaging validators and transformers increases infrastructure dependence!

  10. Towards Bootstrapping Knowledge-Based Archives • enable addition of semantic annotations ("knowledge") via logic rules to AIPs • add executable specifications of semantics => AIP += KP (knowledge package, i.e., rules) • => self-validating archive • add executablespecifications of the ingestion network => AIP += IN (ingestion network, ...more rules) • => self-instantiating archive • => a bootstrapping knowledge-based archive with DTD/Schema/IC validation and ingestion transformations all expressed in a declarative logic program • from the 2do list: build a prototype (BARON) based on rule languages for domain semantics and (self-validation) and ingestion transformations (self-instantiation) Baron von Münchhausen, pulling himself out of the swamp

  11. References • Towards Self-Validating Knowledge-Based Archives, Bertram Ludäscher, Richard Marciano, Reagan Moore, 11th Workshop on Research Issues in Data Engineering(RIDE), Heidelberg, IEEE Computer Society, April 2001, SDSC TR-2001-1, January 18, 2001. • Knowledge-Based Persistent Archives, Reagan Moore, SDSC TR-2001-7, January 18, 2001 • The Senate Legislative Activities Collection (SLA): a Case Study Infrastructure Research to Support Preservation Strategies, Richard Marciano, Bertram Ludäscher, Reagan Moore, SDSC TR-2001-5, January 18, 2001 • Reference Model for an Open Archival Information System (OAIS), Draft Recommendation, Consultative Committee for Space Data Systems, CCSDS 650.0-R-1, May 1999. • Digital Rosetta Stone: A Conceptual Model for Maintaining Long-term Access to Digital Documents, Alan R. Heminger, Steven B. Robertson

More Related