1.1k likes | 1.26k Views
Digital Libraries of the Future: Integration through the 5S Framework Sixth National Russian Research Conference Sep. 28 - Oct. 1, 2004. Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt.edu http://fox.cs.vt.edu/talks. Acknowledgements: Sponsors. Conference Organizers and Staff
E N D
Digital Libraries of the Future: Integration through the 5S FrameworkSixth National Russian Research ConferenceSep. 28 - Oct. 1, 2004 Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA fox@vt.edu http://fox.cs.vt.edu/talks
Acknowledgements: Sponsors • Conference Organizers and Staff • NSF Grants • CITIDEL: DUE-0121679 • DL-in-a-box: DUE-0136690 • ETANA: ITR-0325579 • GetSmart: DUE-0121741 • OAD: IIS-0086227 • Others • AOL, Capes (Brazilian funding agency) • ASOR, CWRU, ETANA, Vanderbilt U. • ACM, Adobe, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, OCLC, SUN, US Dept. of Ed. (FIPSE)
Acknowledgements: Faculty, Staff • Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, Douglas Knight, Deborah Knox, John Impagliazzo, Gail McMillan, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
Acknowledgements: Students • Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, …
Outline • Vision of the Future: Chatham Report • Integration • 5S Framework • DL Taxonomy • Minimal DL • DL Ontology • Applications of Framework: Language (5SL), Design (5SGraph), Generation (5SGen), Logging • Quality DLs
Outline • Vision of the Future: Chatham Report • Integration • 5S Framework • DL Taxonomy • Minimal DL • DL Ontology • Applications of Framework: Language (5SL), Design (5SGraph), Generation (5SGen), Logging • Quality DLs
As data, information, and knowledge play increasingly central roles … digital library research should focus on: • Increasing the scope and scale of information resources and services; • Employing context at the individual, community, and societal levels to improve performance; • Developing algorithms and strategies for transforming data into actionable information; • Demonstrating the integration of information spaces into everyday life; and • Improving availability, accessibility, and, thereby, productivity.
An appropriate infrastructure program will provide sustainability of digital knowledge resources among five dimensions: • Acquisition of new information resources; • Effective access mechanisms that span media type, mode, and language; • Facilities to leverage the utilization of humankind’s knowledge resources; • Assured stewardship over humanity’s scholarly and cultural legacy; and • Efficient and accountable management of systems, services, and resources.
Outline • Vision of the Future: Chatham Report • Integration • 5S Framework • DL Taxonomy • Minimal DL • DL Ontology • Applications of Framework: Language (5SL), Design (5SGraph), Generation (5SGen), Logging • Quality DLs
Integration: Rationale • We can read any paper book (ignoring limitations of language, vision, …). • Scholarship requires access, analysis, and synthesis spanning disciplines and sources. • New theories, systems, and services build upon our past accomplishments. • Our “Small World” and the “Internet Age” demand that we, and our computers, work together and interoperate.
Integration: Urgency, Longevity • If we collect, capture, acquire, or produce information, will it be usable in 100 years? • NSF Digital Archiving Program • Library of Congress National Digital Information Infrastructure and Preservation Program
Integration: Standards • Standards don’t exist in many areas. • Standards that do exist create a jumble: • Conversion between (without loss?) • Bridging gaps (Z39.50 -> OAI) • Managing legacy content and systems • Standards in DLs have focused on: • Metadata (e.g., Dublin Core) • Architecture (e.g., handles, repositories)
Integration: Challenges • “Semantic Web” is vision, not reality. • How can we integrate without a theory? • How can we interoperate without a common framework? • How can we have a science of DLs if we lack agreement on definitions (so we can reason and discuss) and measures of quality (so we can compare and improve)?
Outline • Vision of the Future: Chatham Report • Integration • 5S Framework • DL Taxonomy • Minimal DL • DL Ontology • Applications of Framework: Language (5SL), Design (5SGraph), Generation (5SGen), Logging • Quality DLs
Motivation • DLs are not benefiting from formal theories as have other CS fields: DB, IR, PL, etc. • DL construction: difficult, ad-hoc, lacking support for tailoring/customization • Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. • Lack of specific DL models, formalisms, languages
Outline • Vision of the Future: Chatham Report • Integration • 5S Framework • DL Taxonomy • Minimal DL • DL Ontology • Applications of Framework: Language (5SL), Design (5SGraph), Generation (5SGen), Logging • Quality DLs
DL Services/Activities Taxonomy Infrastructure Services Information Satisfaction Services Repository-Building Add Value Creational Preservational Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing
Outline • Vision of the Future: Chatham Report • Integration • 5S Framework • DL Taxonomy • Minimal DL • DL Ontology • Applications of Framework: Language (5SL), Design (5SGraph), Generation (5SGen), Logging • Quality DLs
5S Model – Informally • Digital libraries are complex information systems that: • help satisfy info needs of users (societies) • provide info services (scenarios) • organize info in usable ways (structures) • present info in usable ways (spaces) • communicate info with users (streams)
5S in Archaeology - Structures Streams Regions Structures Example: Madaba Plains 5S Spaces Scenarios Societies
Background: 5S and DL formal definitions and compositions (April 2004 TOIS)
Glossary: Concepts in the Minimal DL and Representing Symbols
The 5S Formal Model • A digital library is a 10-tuple (Streams, Structs, Sps, Scs, St2, Coll, Cat, Rep, Serv, Soc) in which: • Streams is a set of streams, which are sequences of arbitrary types (e.g., bits, characters, pixels, frames); • Structs is a set of structures, which are tuples, (G, ), where G= (V, E) is a directed graph and : (V E) L is a labeling function; • Sps is a set of spaces each of which can be a measurable, measure, probability, topological, metric, or vector space.
The 5S Formal Model • Scs = {sc1, sc2, …, scd} is a set of scenarios where each sck = <e1k({p1k}), e2k({p2k}), …, ed_kk({pd_kk})> is a sequence of events that also can have a number of parameters {pik}. Events represent changes in computational states; parameters represent specific locations in a state and respective values. • St2 is a set of functions : V Streams () that associate nodes of a structure with a pair of natural numbers (a, b) corresponding to a portion (span/segment) of a stream. • Coll = {C1, C2, …, Cf} is a set of DL collections where each DL collection Ck = {do1k, do2k, …, dof_kk} is a set of digital objects. Each digital object dok = (hk, Stm1k, Stt2k, k) is a tuple where Stm1k Streams, Stt2k Structs, k St2, and hk is a handle which represents a unique identifier for the object.
The 5S Formal Model • Cat = {DMC_1, DMC_2, …, DMC_f} is a set of metadata catalogs for Coll where each metadata catalog DMC_k = {(h, msshk)}, and msshk = {mshk1, mshk2, …, mshkn_hk} is a set of descriptive metadata specifications. Each descriptive metadata specification mshki is a structure with atomic values (e.g., numbers, dates, strings) associated with nodes. • A repository Rep = {(Ci, DMC_i)} (i=1 to f) is a set of pairs (collection, metadata catalog); it is assumed there exists operations to manipulate them (e.g., get, store, delete).
The 5S Formal Model • Serv = {Se1, Se2, …, Ses} is a set of services where each service Sek = {sc1k, .., scs_kk} is described by a set of related scenarios. • Soc = (C, R) where C is a set of communities and R is a set of relationships among communities. SM = {sm1, sm2, …, smj}, and Ac = {ac1, ac2, …, acr } are two such communities where the former is a set of service managers responsible for running DL services and the latter is a set of actors that use those services. • Being basically an electronic entity, a member smk of SM distinguishes itself from actors by defining or implementing a set of operations {op1k, op2k, …, opnk} smk. Each operation opik of smk is characterized by a triple (nik, sigik, impik), where nik is the operation’s name, sigik is the operation’s signature (which includes the operation’s input parameters and output), and impik is the operation’s implementation. These operations define the capabilities of a service manager smk.
Outline • Vision of the Future: Chatham Report • Integration • 5S Framework • DL Taxonomy • Minimal DL • DL Ontology • Applications of Framework: Language (5SL), Design (5SGraph), Generation (5SGen), Logging • Quality DLs
Motivation • Previous definitions emphasize syntactic aspects, i.e., how digital library concepts are composed or built from previously defined concepts. • Complete a formal DL theory by: • Making explicit the implicit relationships that exist among the DL formal concepts defined in [Gonc04] • Providing set of axiomatic rules that precisely define and constrain the semantics of the relationships • Categorizing and classifying DL services on the basis of the ontology • Research questions • How should DL services be built from the other DL components • Which are the fundamental and elementary DL services ? • How can services be built/composed from other DL services? • We will explore semantic relations and rules of the DL domain by using ontologies.
Digital Library Formal Ontology • An ontology is a tuple = (Ontol_Concepts, Ontol_Rels) where: • Ontol_Concepts is a family of ontological concepts, • Ontol_Rels is a family of relations. • Relations in Ontol_Rels are operationally realized by one or more rules (e.g., first-order logic axioms) which intentionally specify or constrain which elements of a concept can participate in a relation. • Ontol_Rules is a family of rules of a particular ontology.
Digital Library Formal Ontology • Relationships • Intra-Model • Video contains Audio (MM) • Metadata Catalog describes Collection (LIS) • Probabilistic Space is_a Measure Space • Service extends Service (reuse) • Service Manager inherits_from Service Manager (OO) • Inter-Model • Event executes Operation • Actor participates_in Scenario • Service Manager runs Service • Service employs/produces Streams Structures Spaces
Digital Library Formal Ontology • Concepts: {Se, Sc, e}; Key: Se = service; Sc = scenario; e = event. • Relations: • contains Sce • Symbolic Rule. x, y (x contains ySc(x) e(y) j: (j x.Dom y = x(j)) ) • precedes eeSc;happens_before eeSc • Symbolic Rule 1. x, y, z (x precedesz y e(x) e(y) Sc(z) i, j: (z contains x z contains y x = z(i) y=z(j) i + 1 = j)) • Symbolic Rule 2. x, y, z (x happens_beforez y e(x) e(y) Sc(z) i, j: (z contains x z contains y x = z(i) y=z(j) i < j)) • includes SeSeScSc; extends SeSeScSc • Symbolic Rule 1.x, y (x includes y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p precedesy q p precedesx q)) • Symbolic Rule 2.x, y (x extendsy Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p happens_beforey q p happens_beforex q)) • Symbolic Rule 3. x, y (x extends y Se(x) Se(y) y x (x y p, q: Sc(p) Sc(q) p x q y p extends q))
Digital Library Formal Ontology • Consistency Rules • Catalog-Collection • A complete catalog has at least one set of metadata specifications for each digital object in the collection it describes (surjective partial function). • In a consistent catalog, each set of metadata specifications describes (exactly) one digital object in the related collection (total function). • Scenarios-Society • A scenario x is consistent with regards to a set of service managers Y if each operation executed by each event in the scenario is defined in some service manager y Y.
Digital Library Formal Ontology • Characterizing employs/produces relationships • In the table each service is characterized by • parameters (input, output) • of the initial and final events • of the scenarios that compose those services • All other previous definitions and keys apply here. • That set is complemented with the following definitions:
Services Related Definitions • Aquery q is the representation of user interest or information need. • Hyptxt is an hypertext; wherein an anchor is a node. • A log_entry is a descriptive metadata specification about an event of a scenario. • Let {doi} = {doi1, doi2,…, doin } be a set of digital objects and Ct = {c1, c2,…,cn} be a set of labels for categories. A classifier classCt: {doi} 2Ct is a function that maps a digital object to a set of categories. • A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.
Applications: A Taxonomy of DL Services • Infrastructure Services: dealing with basic concepts such as collections and catalogs • Repository-Building: create collections (digital objects) and/or catalogs (metadata specifications). • Preservational: generate instances by copying collections (digital objects) or transforming (converting/translating) objects into different formats for preservation purposes • Add_Value: either aggregate value/information to collections (digital objects) or connect objects together. • Information Satisfaction: dealing with higher level societal requirements • KEY in next slide: • Fundamental: minimal set of services or essential to existence of a DL • Composite DL service: takes input from some other service; otherwise the service is called elementary.
Outline • Vision of the Future: Chatham Report • Integration • 5S Framework • DL Taxonomy • Minimal DL • DL Ontology • Applications of Framework: Language (5SL), Design (5SGraph), Generation (5SGen), Logging • Quality DLs
Model Formal definition Objective within 5SL Streams Sequences of arbitrary types Describe properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Labeled directed graphs Specify organizational aspects of the DL (e.g., structural /descriptive metadata, hypertexts, taxonomies, classification schemes) Spaces Sets of objects and operations on those objects that obey specific constraints Define logical and presentational views of several components. Scenarios Sequences of events that modify states of a computation in order to accomplish some functional requirement Detail the behavior of the DL services Societies Sets of communities and relationships (relations) among them Define managers, responsible for running DL services; actors, that use those services; and relationships 5S model/ 5S language
Model Primitives 5SL implementation Streams Model Text; video; audio; picture; software program MIME types Structures Model Collection; catalog; hypertext; document; metadata; organization tools XML and RDF schemas; Topic maps ML (XTM) Spaces Model User interface; index; retrieval model MathML, UIML, XSL Scenarios Model Service; event; condition; action Extended UML sequence diagrams; XML serialization Societies Model Community; service managers; actors; relationships; attributes; operations XML serialization 5SL primitives and implementation
Challenges with Approach • The designer should know the 5S theory very well and be very familiar with the syntax and semantics of 5SL to be able to write correct 5SL files. • It is difficult to get the big picture of a digital library just from a textual 5SL file.