260 likes | 378 Views
Beyond Seamless Access: Meta-data in the Age of Content Integration. Spring 2000 Program Information Technology Interest Group of Association of College & Research Libraries, New England Chapter Univ. of Connecticut May 26, 2000 Amanda Xu Information Architect
E N D
Beyond Seamless Access:Meta-data in the Age of Content Integration Spring 2000 Program Information Technology Interest Group of Association of College & Research Libraries, New England Chapter Univ. of Connecticut May 26, 2000 Amanda Xu Information Architect EBSCO, 10 Estes Street, Ipswich, MA 01938 axu@epnet.com
OVERVIEW • Definitions • Meta-data, schemas, and XML linking structures • Why content integration and analysis? • Assumptions about information search and retrieval • Meta-data applications for content integration and analysis • How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? • Role of librarians, and information mediators in the wave of content integration
Definitions (1)Meta-data, What is it? [1/6] • Definitions: • 1) “Data about data” or “information which describes a data set” • 2) Data elements, and attributes that facilitate the search and retrieval of • a set of associated attributes • Example 1: • An address label contains: name, address, city, state, zip • Address might feature a home or office, address access permissions, last updated, internal references • 3) A set of semantics that describe the data, classify it, categorize it, and provide instructions on how and where to exploit it • Example 2: • Standard bibliographic information, summaries, indexing terms, and abstracts
Definitions (1) Meta-data, What is it? [2/6] • Example 3: Simple XML Record • <record> • <title>The Tao of Pooh</title> • <author label=“personal”>Benjamin Hoff</author> • <date label=“1st-published”>1982</date> • <isbn>01400-67477</isbn> • <publisher>Dutton</publisher> • <subject label=“personal”>Winnie the Pooh</subject> • <subject>Taoism in literature</subject> • <classification scheme=“LCC”>PR6025.I65Z68 1983 • </classification> • </record>
Definitions (1) Meta-data, What is it? [3/6] 4) Supports understanding of a document, its structure, relationship, locations, and usage 5) Helps you find things or make things disappear Where is meta-data? 1) Internally: • Embedded with markup, and with content • Attached as resource header (HTML META Tag), or package 2) Externally: • Stored separately from its resource • Generated on demand, e.g. MS SQL Server or Oracle • Static, e.g. bibliographic record • Dynamic linked using Xlink/Xpointers/Xpath and ISO Hytime
Definitions (1)Meta-data,What is it? [4/6] Naming Issues: • Can your meta-data be interchanged, and shared with others via computer programs or parsers? • URI = URN + URL + URC (IETF) • Namespaces (W3C): qualify elements uniquely, and avoid name collision • URIs specify the namespaces in use • XML Namespaces provide a way for the name to be unique, but it doesn’t solve vocabulary ambiguity
Definitions (1) Meta-data, What is it? [5/6] • Example 4: • <date> used in three different occasions: • From George’s document: <date>9-Sept-1999</date> • From Martha’s document: <date>The lovely Deni</date> • From Hadley’s document: <date>Large Plump Medjool</date> • Use namespaces: • <george:date> 9-Sept-1999</george:date> • <martha:date>The lovely Deni</martha:date> • <hadley:date> Large Plump Medjool</hadley:date> • Note:Example from Brian Dravis <Essential_XML>seminar on 11/02/99, Boston
Definitions (1)Meta-data,What is it? [6/6] Example 5: Simple Dublin Core Record with DC namespace, and qualifiers <?xml version=“1.0” encoding=“UTF-8”?> <?xml version=“1.0” standalone=“yes”?> <record xmlns:dc=“http://purl.org/dc/elements/1.0/” xmlns:dc=“http://purl.org/dc/elements/qualifiers/1.0/”> <dc:title>The Tao of Pooh</dc:title> <dc:creator>Benjamin Hoff</dc:creator> <dcq:creatorType>Illustrator</dcq:creatorType> <dc:date>1982</dc:date> <dc:isbn>01400-67477</dc:isbn> <dc:publisher>Dutton</dc:publisher> <dc:subject>Winnie the Pooh</dc:subject> <dc:subject>Taoism in literature</dc:subject> </record>
Definitions (2)Schemas,What is it? [1/3] • How do you know which meta-data/vocabularies that you are interchanging with? • Schemas (DTDs): • understand document elements and structures • validation /parsing • schemas support data types (e.g. integer, time, time period), open content model, inheritance, constraints, and namespaces • Example: <xsd:schema xmlns:xsd="http://www.w3.org/1999/XMLSchema"> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> <xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed" value="US"/> Note: Example from Brian Travis’s tutorial, “XML and Data-Driven Web Architectures”, Seybold Seminars, Boston, Feb. 11, 2000.
Definitions (2)Schemas,What is it? [2/3] How many types of XML vocabularies are there? Examples: 1) xml schema <xs:schema xmlns:xs="http://www.w3.org/1999/XMLSchema targetNamespace="http://purl.org/metadata/dublin_core” version="M.n">... </xs:schema> 2) RDF <? xml version=‘1.0’> <rdf:RDF xmlns:rdf=“http://www.w3.org/TR/REC-rdf-syntax#” xmlns:rdfs=“http://www.w3.org/TR/WD-rdf-schema#” xmlns:dc=“ “>
Definitions (2)Schemas,What is it? [3/3] 3) Schema repositories: industry-specific • SOAP, BizCodes, XMLRPC, ICE, CDF, WebDav, XML/ASN.1, XML/EDI, XER, and Z39.50 • BizTalk.org: routing information <bizTalk> <Route> <From locationID=“206.247.76.187” locationType=“IP” handle=“72” process=“POConf” Path=“”/> <To locationID=“83-627-54204” locationType=“DUNS” handle=“14” process=“PO_Process” Path=“”/> </Route> <body> <purchaseOrder xmlns=“urn:schemas-toycat-com:PurchaseOrder.biz” PONumber=“10-01-2118”></purchaseOrder> </body> </bizTalk> Note: Example from Brian Dravis <Essential_XML>seminar on 11/02/99, Boston
Simple Meta-data Interchange Model System A System B Repository DB DB • protocol • syntax • encoding Template mapping between SysA to Sys B, then sys B to sys C Schema & map sys C to SysA ASN.1/BER XML/BER XML/ASN.1Server Direct Transfer XML/ASN.1 Server STMP Direct Transfer to STMP ASN.1/BER to XML/BER STMP to Direct Transfer XML/BER to XML/EDIFACT XML/EDIFACT to ASN.1/BER ILL Request in XML/EDIFACT Direct Transfer System C
Definitions (3) Linking Structures [1/6] Root URI Remote schema/ (DTD) My Element Attlist my thing address Xlink-Root URI Xpointer address Leveraging XML Syntax: Link structures, which link an XML name tag to an external standard reference item, and which allow context query and non-context query at element and attribute level • Notes: • Xlink specification <http://www.w3.org/TR/xlink> • Xpointer Specification <http://www.w3.org/TR/xptr>
Definitions (3) Linking Structures [2/6] The API to retrieve link information from the linkbase Application Request linkInfo Linkbase • Leveraging application: • The link structures, in which linkInfo partakes are returned to the application, • which can be re-assembled for different purposes on the fly
Definitions (3) Linking Structures [3/6] • Leveraging resources and merging links Original Doc linkbase API merge the links Composite Doc Link structures in which links are merged into the original doc, and formed a composite document.
Definitions (3) Linking Structures [4/6] Topic Map: • “To qualify the content and/or data contained in information objects as topics to enable navigational tools such as indexes, cross-references, citation systems, or glossaries. • To link topics together in such a way as to enable navigation between them • To filter an information set to create views adapted to specific users or purposes. For example, such filtering can aid in the management of multilingual documents, management of access modes depending on security criteria, delivery of partial views depending on user profiles and/or knowledge domains, etc. • To structure unstructured information objects, or to facilitate the creation of topic-oriented user interfaces that provide the effect of merging unstructured information bases with structured ones.” Note: Quote from Topic Map web site: http://www.ornl.gov/sgml/sc34/document/0058.htm/>
Definitions (3) Linking Structures [5/6] Leverage Topic Maps Query Result set w/ category map TOPIC MAP Search/ navigate Category map TOPIC MAP Match query TOPIC MAP knowledge domains languages access rights delivery views/devices filter DB TOPIC MAP profiles Attach categories TOPIC MAP Link Cluster 1 Structured docs Adaptive categories 2 Unstructured docs TOPIC MAP
Definitions (3) Linking Structures [6/6] • <topic id=“n001” types=“city”> • <topicname> • <basename>New York City</basename> • </topname> • <mention adr1 adr2 adr3</mention></topic> • <topic id=“c98991” types=“monument”> • <topicname> • <basename>Brooklyn Bridge</basename> • </topname> • <mention>adr34 adr3462 adr9832</mention></topic> • <assoc type=“sightseeing” scope=“civil-engineering”> • <when-in>n001</when-in> • <visit>c98991</visit></assoc> • <topic id=“city” types=“topictypes”> • <topic id=“monument” types=“topictypes”> • <topic id=“civil-engineering”> • <topic id=“topictypes”> • Note:Example from Steve R. Newcomb’s tutorial, “Metadata, Schemas, and Linking Structures” XML World conference, Ottawa, Sept. 13, 1999, updated 5/30/2000. • Topic association -Example
Why content integration and analysis? • Assumptions about information search and retrieval Information retrieval is only the 1st step for information management. The next step is information analysis and decision support, where information analysis is to cross-correlate information from multiple and diverse data sources in the net for specific problem solving, and where decision support is to detect, analyze and alert topics, trends and events based on the correlated information. Notes: Schatz, Bruce R. 1998. “Information Analysis in the Net: The Interspace of the Twenty-First Century.” Visualizing Subject Access for 21st Century Information Resources, edited by Pauline Cochrane and Eric E. Johnson. Univ. of Illinois at Urbana-Champaign. Evans, David A. 1999. “Beyond Information Retrieval Workshop, 4th Search Engine Conference, April 9, 1999, Boston, MA.”
Meta-data applications for content integration and analysis (1 of 3) • What has it to do with products for the library world? • Today: • Full-text linking • ILL/DocDelivery • ILS linking for holdings • Publishers & Authors’ Web sites • Linking services • Reference linking services provided by CrossRef, SFX, LANL • Patent data • Tomorrow: • User can link directly to any content published by a specific organization simply by highlighting a phrase, sentence, paragraph, a document appearing in any browser, word-processing package, email program or other application
Meta-data applications for content integration and analysis (2 of 3) • Interwoven threads for subjects, journal titles, authors, collections • No document boundary, but information space where a deeper understanding of knowledge within and across domain is facilitated for specific problem solving and decision support • Authors • Who’s Who • Wilson Bibliography • Gale Contemporary Authors • Authority files from LC • Community of Science • Subjects • UMLS • Word Net • LCSH • Lexicons • Dictionaries • Journal • Titles • Ulrich’s Serials • Directory • LC Serials • Gale Directory Link base Link base Article collections Book collections Journal collections Other media
Meta-data Applications for Content Integration and Analysis (3 of 3) Websites reviews /annotations /publisher sites /author pages /email /mailing lists /chatting rooms /community pages Meta-data standardization • Future -- decision support and problem solving Bi-directional linking Authority Control Book directory Collection directory Journal directory Author directory Collections Library holdings ILL/Document delivery Reference linking Site-map Knowledge-base
How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? (1 of 2) • XML is nothing but data interchange. It is the application that makes the data reusable, and thus adds functionality and intelligence to it: • In the beginning --> Editing • Generation X --> Look and feel • Intelligence (SGML/XML) • --> Semantics: • Levels of fragmentation • Schema recognition, • Namespace handling • Linking registration and management • --> Viewing/Personalized delivery • --> Interactive services, e.g. B2B • --> Software applications, • e.g. re-purposing, concurrent editing
How can XML technologies support the interchange, analysis, and personalized delivery of full-text on the Web? (2 of 2) XML enables text mining which has become • increasingly fine grained, subjective, and personal via • extracting information • counting by type (quantifying) • categorizing/filtering • discovering trends • capturing critical details • assessing trends Note: Evans, David A. 2000. “Text Mining Workshop.” Fifth Search Engines Conference, Boston, MA.
Role of librarians, and information mediators in the wave of content integration • Every aspect of librarian-ship is needed • It is a matter of which parts you would like to participate