1 / 72

Interoperability

Interoperability. Kevin Hegg and Andreas Knab 2008 ARLIS/NA-VRA Summer Educational Institute July 11, 2008 . The Challenge.

max
Download Presentation

Interoperability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interoperability Kevin Hegg and Andreas Knab 2008 ARLIS/NA-VRA Summer Educational Institute July 11, 2008

  2. The Challenge • How do we connect disparate systems and applications so that users can discover, access and exchange digital content and cataloging data from a coherent interface using their preferred set of desktop tools?

  3. iMovie 8

  4. Photoshop Express

  5. Broad Categories: Systems & Tools (1 of 2) • Digital Assess Management (DAM) with Discovery/Access/Presentation (DAP)Examples: Almagest, ARTstor, CONTENTdm, Luna Insight, MDID • Institutional RepositoriesExamples: Digitools, DSpace, Fedora, VITAL • Online image collections and digital libraries — freely accessible contentExamples: American Memory (Library of Congress), Bristol Biomedical Image Archive, Earth Science World Image Bank, Metropolitan Museum of Art, Museum of Modern Art (MOMA), NASA Image eXchange, New York Public Library Digital Gallery • Online image collections and digital libraries — subscriptionExamples: AccuNet/AP Multimedia Archive, Art Museum Image Gallery (Wilson), ARTstor, CAMIO (OCLC), images MD (Current Medicine, Inc.) • Content aggregators/gatewaysExamples: HarvestRoad Hives, IMLS-DCC, MERLOT, OAIster

  6. Broad Categories: Systems & Tools (2 of 2) • Media-sharing communitiesExamples: Flickr, Picasa, Wikimedia Commons, YouTube • Course Management SystemsExamples: ANGEL, Blackboard, Moodle, Sakai, WebCT • Federated search enginesExamples: Central Search (Serials Solution), LibraryFind (Open Source), MetaLib, Muse, WebFeat • Internet search enginesExamples: Google Image Search, Live Search (Microsoft) • Stand-alone applications and browser-based toolsExamples: Amaznode, ARTstor Offline Image Viewer (OIV), Collex, Cross Media Annotation System, Digital Library eXtension Service (DLXS), Image Innovations Image Manager, iTunes University, PowerPoint (Microsoft), Pachyderm, Scholar's Box, VireoCat, VUE • Social networkingExamples: Facebook, MySpace

  7. Data Exchange Mechanisms

  8. Data Exchange Mechanisms • Protocols, standards, specifications, interfaces, guidelines, etc. used to facilitate the orderly discovery and exchange of data • Three methods for exchanging data: • Linking/redirecting: Digital content is stored on remote system. User is directed to remote system to access content (e.g. federated searches) • On request with optional caching: Digital content is downloaded from remote system as needed and presented to user (e.g. MDID remote collections) • Harvesting (bulk import): Entire collection of digital content is copied from remote system in advance and served to user locally (e.g. Allan Kohl’s AICT collection)

  9. Z39.50 • “ISO 23950: Information Retrieval: Application Service Definition and Protocol Specification” • Maintained by Library of Congress • Defines procedures and formats that a client may use to search a remote database, to learn about the results of the search, and to manipulate and retrieve search results • Complicated and difficult to implement • Used commonly by libraries to facilitate federated searches

  10. SRU/SRW • “Search and Retrieval via URL/Web Service” • Maintained by Library of Congress • Companion protocols used to formulate and execute Internet search queries and to retrieve the query results as a record set • Query results are formatted in MARCXML or Dublin Core • Relatively simple and easy to implement • ARTstor’s XML Gateway is built on SRU/SRW

  11. OAI-PMH • “Open Archives Initiative (OAI) Protocol for Metadata Harvesting” • Developed and maintained by The Open Archives Initiative • Allows data providers (repositories) to expose metadata to client applications (harvesters) and facilitates the aggregation of metadata from more than one repository • Fairly easy to implement • LOC’s American Memory repository implements OAI-PMH

  12. OAI-ORE (Beta) • “Open Archives Initiative (OAI) – Object Re-use and Exchange” • Developed and maintained by The Open Archives Initiative • A companion standard to OAI-PMH that will allow repositories to exchange digital objects and applications to consume digital objects residing in repositories • Fairly easy to implement • OAI released public beta June 2008

  13. OKI OSID • “Open Knowledge Initiative (OKI) Open Service Interface Definition” • Developed and maintained by the Open Knowledge Initiative (housed at MIT) • Describes a set of programmatic interfaces used to achieve interoperability among different repositories built on a variety of evolving technologies • Implemented by a variety of systems and tools – including the Museum of Fine Arts, Boston; the National Library of Australia; ARTstor; Sakai; Fedora; Dspace; Pachyderm; and VUE

  14. Proprietary APIs • An API is “the interface that a computer system or application provides in order to allow requests for service to be made of it by other computer programs, and/or to allow data to be exchanged between them”* • APIs are frequently used to facilitate data sharing and digital object reuse • Examples of systems and tools that use APIs to exchange data and/or digital objects: ARTstor, Blackboard, CONTENTdm, Facebook, Flickr, MDID, Photobucket, Photoshop Express, Picasa, PowerPoint, and YouTube * http://en.wikipedia.org/wiki/Application_programming_interface

  15. Data Exchange Overview Data Exchange Overview

  16. Metadata

  17. Metadata • Metadata is “data about data” and is used to structure and describe any kind of content, including of course digital objects • Metadata helps us discover, evaluate, retrieve and use digital objects • Metadata standards are an important component in data exchange • Result sets must be structured and understandable • Metadata standards provide structure and meaning to content • Many data exchange mechanisms support multiple metadata standards and most support Dublin Core

  18. The Challenge • It is very unlikely that two unrelated repositories will catalog data in exactly the same way using the same metadata standards • How does one system or tool process information from another system? How does the user search a remote collection in a targeted • The solution: A simple, generic metadata standard that supports a wide variety of cross-disciplinary resources

  19. Dublin Core Overview • “The Dublin Core metadata element set is a standard for cross-domain information resource description. It provides a simple and standardised set of conventions for describing things online in ways that make them easier to find”* • The Simple DC consists of 15 optional and repeatable DC elements: Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights • The Qualified DC extends or refines DC elements in order to narrow the meaning of the DC elements * http://en.wikipedia.org/wiki/Dublin_Core

  20. Dublin Core Principles • The One-to-One Principle: “Create one metadata description for one and only one resource”* • For example: Do not describe a JPEG of the Mona Lisa as if it were the original painting. Do not confuse the creator of the JPEG with the painter of the original • The Dumb-down Principle: Translate qualified DC to simple DC so that a user can ignore qualifiers and treat a description as if it were unqualified • Appropriate values: Construct your metadata in such a way that it makes sense to a user outside of your context (e.g. to someone who is not a curator or art historian) * Marty Kurth. Basic DC Semantics. http://dublincore.org/resources/training/dc-2006/Tutorial1.pdf (p. 20)

  21. Sample Dublin Core Record* * http://www.pictureaustralia.org/schemas/pa/pa-slvic-example.xml

  22. Crosswalks • A crosswalk “maps the elements in one metadata scheme to the equivalent elements in another scheme”* • The challenge: Many if not most systems and repositories do not use Dublin Core to catalog data • Digital image repositories often use the VRA Core or proprietary schemas to catalog image records • MDID allows the curator to define a custom catalog structure for each collection • ARTstor aggregates data from many sources and thus is not able to use standardized metadata • The solution: Because many data exchange mechanisms support Dublin Core (e.g. SRU/SRW, OAI-PMH, OKI OSID, MDID API), use a crosswalk to map your data to Dublin Core * http://en.wikipedia.org/wiki/Crosswalk_(metadata)

  23. Curators Map MDID Fields to Dublin Core

  24. Dublin Core/VRA Crosswalk* *Claremont Colleges Digital Library (http://ccdl.libraries.claremont.edu/inside/CCDLmetadata.pdf)

  25. Unicode, Encodings, Character Sets

  26. Where do the funny characters come from?

  27. History • ASCII represents characters with numbers between 32 and 127 (7 bits) • Computers use 8 bits, so there is room for an additional 128 characters

  28. More History • Problem: everybody used the additional 128 characters for something different • Solution: ANSI standard for code pages • Examples • Israel: code page 862 • Greece: code page 737 • All code pages still use the same characters below 128, but different ones above

  29. More History • Problem: Only one code page possible at any time • Problem: Some languages use more different characters than fit in 8 bit • Solution: Unicode

  30. Unicode • Each letter or symbol is represented by a code point (a number) • No limits on numbers • Examples • A is U+0041 • ڬ is U+06AC • ♫ is U+266B

  31. Encodings • There are many different ways to store these code points (numbers) in a file • UCS-2/UTF-16 • Stores every character in two bytes • Problem: byte order • UTF-8 • Stores every character in one to six bytes • Compatible with ASCII/ANSI for first 128 characters

  32. So, where do the funny characters come from?

  33. Encodings • If the expected encoding does not match the actual encoding, characters will be interpreted incorrectly or not at all • Make sure to save your content in an encoding that is understood by the program you want to read it with • UTF-8 generally is the best option

  34. Example: Microsoft Excel

  35. Unicode • Reference: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)by Joel Spolskyhttp://www.joelonsoftware.com/articles/Unicode.html

  36. Cataloging Data File Formats

  37. Many options • Databases • Microsoft Access • FileMaker Pro • Spreadsheets • Microsoft Excel • Text based • CSV (comma separated values) • TSV (tab separated values) • XML (structured text)

  38. Options for exchanging data • Usually text based • CSV and TSV are functionally equivalent • Simple spreadsheet format • Multi-valued fields or hierarchies are difficult to represent • XML • Easily handles multi-valued fields or hierarchies • Structure must be understood

  39. Image Formats

  40. Criteria for picking an image format • Lossy vs. Lossless • Compressed vs. Uncompressed • Archival vs. Presentation • Offline vs. Online • Exchangeable vs. Proprietary • Common vs. Uncommon

  41. Lossy vs. Lossless • Tradeoff between file size and image quality • Archival images should always be lossless • Image quality with lossy formats will get progressively worse with editing or processing • Most formats are lossless • JPEG is lossy

  42. Compressed vs. Uncompressed • Tradeoff between file size and processing time • Processing time is usually no longer an issue • Compressed does not mean lossy • Examples: • TIFF can be compressed or uncompressed • JPEG is compressed and lossy • PNG is compressed and lossless

  43. Archival vs. Presentation • Archival images should always be lossless and highest possible quality and resolution • Presentation images need to be small for quick network transfer • Presentation image quality and size can be lower depending on presentation equipment • As equipment gets better, better presentation images can be derived from archival images

  44. Offline vs. Online • Online images need to be smaller and in a format that is supported by a web browser • Offline images can be larger and in formats that may require specific software to view or process • Example • Adobe Photoshop files are great for post-processing, but cannot be delivered in the browser

  45. Exchangeable vs. Proprietary • Proprietary image formats may be harder to exchange with others • Examples • PSD files require Adobe Photoshop • MrSID files require proprietary plug-ins • TIFF or JPEG are universally understood

  46. Common vs. Uncommon • Some file formats or format options are more popular and widely supported than others • Examples • Transparent GIFs work in any browser • Transparent PNGs only work in newer browsers • 8-bit RGB TIFFs work in almost every program • 16-bit RGB TIFFs or grayscale TIFFs do not

  47. Common Tasks for Curators

  48. Reformatting Data in Excel • Cells • A spreadsheet is made up of cells organized in columns and rows, respectively identified by letters (A, B, C, …) and numbers (1, 2, 3, …) • A cell is referred to by its column letter and row number, for example “A1” or “D25” • Formulas in Excel • Always start with an equal sign “=“ • Adjust their cell references when being moved or copied to other cells

  49. Excel User Interface Current cell Formula in current cell Cell value Dragging little square copies cell formula into adjoining cells

  50. Useful Excel Formulas • Combine cell values • CONCATENATE(cell, cell, …) • Or use “&” operator: cell & cell & … • Combines cell values • Extract parts of a cell value • LEFT(cell, length) • RIGHT(cell, length) • MID(cell, start, length)

More Related