190 likes | 352 Views
Adding Value to Open Scholarly Content. How Services and Search Expose The Value of the Perseus Digital Library. What is Perseus?. Mission To increase accessibility to and interest in the humanities. What makes our content interesting?
E N D
Adding Value to Open Scholarly Content How Services and Search Expose The Value of the Perseus Digital Library
What is Perseus? • Mission • To increase accessibility to and interest in the humanities. • What makes our content interesting? • We give this content away yet still maintain a user base that finds value in its offerings.
Perseus’ Static & Dynamic Services • Perseus gives away its static content.. • Perseus also makes its content dynamically accessible. • Allows for interconnections among Perseus’ objects. • This allows us to build up a network of associations between primary and secondary sources of information. • Named Entity Extraction • Morphological Analysis • The more content we have, the more associations between objects we can offer.
Text Services • Increasing the value of Perseus’ texts • The concepts behind the Canonical Text Services protocol (CTS) • CTS will allow us to interconnect our objects. • Intra-connecting: Making associations within our own content • Inter-connecting: Making associations between our content and external services/content. • The role of search • In a time when “scholarly content is increasingly being seen as a public resource,” what is the role of search engines in conceiving and delivering texts?
Goals • By the end of the talk we will see: • A Service for Referencing Text • CTS • The Value of Associations • CTS URNs, a syntax for intra and inter-connecting texts • Perseus’ other sources of value • Perseus’ logical architecture
Concepts Behind CTS: Author to Edition • Hierarchical Ontology of Text Organization • An author’s works • Get me all works by Julius Caesar • urn:cts:latinLit:stoa0069 • A particular work of an author • Get me Caesar’s The Gallic War • urn:cts:latinLit:stoa0069:stoa002 • An edition or translation of a work • Get me a specific English translation of Caesar’s The Gallic War • urn:cts:latinLit:stoa0069:stoa002:1999_02_0001
Concepts Behind CTS:Edition to Character • A logical component of text from an edition or translation in terms of its citation scheme • Get me Book 1, Chapter 1 of Caesar’s Gallic War from this English translation • urn:cts:latinLit:stoa0069:stoa002:1999_02_0001:1.1 • A paragraph, quotation, or single character within a text • “All Gaul is divided into three parts” • urn:cts:latinLit:stoa0069:stoa002:1999_02_0001:1.25:All:0-parts:0 • A range of text • Give me Book 1, Chapter 37 through Book 2, Chapter 5 of Caesar’s Gallic War • urn:cts:latinLit:stoa0069:stoa002:1.37-2.5
CTS And Content Delivery • CTS URNs can be thought of as a syntax for a “new and emerging content delivery mechanism” • Through URNs we can “break down the content into component parts, each of which can be manipulated…separately” • Although CTS adds value to the raw data/content we give away. • Logical referencing • Enables associations
Intra-Connecting Content:The Role of Index Services • Associations between data add value. • Google Page Rank • Index services let us construct associations with semantic precision. • Named entity disambiguation • Citations • Morphological Information • Associations add context and increase understanding of the underlying content. • Occurrence of Gaul in a text to its definition • Occurrence of Gaul on this slide to previous examples.
Inter-Connecting Content:The Role of Search Engines • Perseus can increase the value of its content even further by connecting its highly-structured data with external services (like search engines) providing less-structured data • We’ve seen this idea before… • Google Earth: Search and display results • Longitude and latitude (Geographic coordinates) • CTS-aware searching: Search and display results • CTS URNs (textual coordinates)
CTS URNs and Search • What Perseus is doing now (experimental): • Using Google Base and CTS-URNs to find Perseus’ highly-structured content with semantic precision. • Search texts at any tier of the hierarchical structure • expanding or truncate the URN. • Examples: • Get me all works by Julius Caesar visible to this search. • Get me Caesar’s The Gallic War • Get me a Perseus-edition English translation of Caesar’s Gallic War • Get me Book 1, Chapter 1 of Caesar’s The Gallic War from the English Translation
CTS as a Value-Added Service • We have a standard mechanism for referencing and retrieving texts • We have a mechanism for tracking our audience. • A syntax for aggregation of content (Shore) • A well-defined API implementing an open standard (Shore) • Handles multi-lingual content • Provides a syntax for datasets of aligned texts. • A notation for semantically precise associations.
Perseus’ Logical Architecture:Identifying Sources of Value • DATA LAYER: TEI-XML texts, databases, raw data. Perseus gives away this raw data under the Creative Commons License. • Perseus as a data source to the community • Perseus understands how to create this data and can help others to do so as well • DOMAIN LAYER: The objects that encapsulate the data and add a set of behaviors. • The knowledge and experience gained while creating this layer, and coming to understand the objects of the domain. • Working in the domain of Classical texts provides Perseus with a unique perspective on the nature of text that others may find useful.
Perseus’ Logical Architecture: Identifying Sources of Value • SERVICE LAYER: The service layer provides an API implementing a series of protocols for each of the types of data Perseus serves. • Others are free to repurpose Perseus’ content through an API that encodes domain knowledge. • The community using the API becomes a source of information and value • DISPLAY LAYER: The user interface. Think widgets, HTML web pages, PDFs, etc. • Convenience & Ease of Use • Expertise: The UI reflects the knowledge about the content gained when building the other layers.
Closing Points • The idea: • Perseus can give away its static data because it adds value through providing semantically rich associations, adding context to the content. • An Example Service: • The Canonical Text Services’ protocol offers a new way to conceive of, reference, and deliver texts • Associations Add Value: • Perseus’ value stems from these associations, the value is not inherent in the raw data, but comes from creating relationships among the data. • Search engines give Perseus the opportunity to create semantically precise associations from less-structured, external content to highly-structured Perseus content. This is accomplished through augmenting search queries with the ‘textual coordinates’ of the CTS URN. • Perseus Offers More Than Services: • In giving away our raw data, we hope to encourage others to create their own associations, increasing our value as a data provider and as service developers. • For the majority of users however, our value stems from providing highly structured texts with rich associations in a simple user interface.
Resources • People • Blossom, John. “Shoreviews. Content Industry Outlook 2007: Reality Checks.” Shore Communications Inc. 8 Feb. 2007. • Crane, Gregory. Conversations and being in his general vicinity. 2004-Present. • Interconnecting primary and secondary sources • The Perseus Digital Library • Smith, Neel. Conversations and being in his general vicinity. 2000-Present. • “An Architecture for a distributed library incorporating open-source critical editions.” OSCE position paper. • Weaver, Michael. Conversations and being in his general vicinity. 1982-Present. • Slide layout based on his HTML design • Logical layers of an application as a model for business processes (www.dynamicinsight.com) • Relevant Links • CTS: http://digitalclassicist.xwiki.com/xwiki/bin/view/osce/paperReg • Google Base: http://base.google.com • Perseus: http://www.perseus.tufts.edu/