660 likes | 1.06k Views
For discussion today. The current models for library and publisher supply chain metadata creation and maintenance are not sustainable!We must move toward new paradigms for metadata creation and maintenance that include:Further interoperability and shared metadataMechanisms for allowing metadata t
E N D
1. Renee Register
Global Product Manager
OCLC Cataloging & Metadata Services From ONIX to MARC and Back Again: New Frontiers in Metadata Creation at OCLC
2. For discussion today The current models for library and publisher supply chain metadata creation and maintenance are not sustainable!
We must move toward new paradigms for metadata creation and maintenance that include:
Further interoperability and shared metadata
Mechanisms for allowing metadata to “grow up” over time, sharing enhancements as they are made by both library and publishing communities
LC Working Group on the Future of Bibliographic Control recommendation
1Increase the efficiency of bibliographic production and maintenance
1.1.1 Make more use of bibliographic data earlier in the supply chain
However, library challenges will not be solved by reliance on upstream data alone. The publisher supply chain experiences challenges in metadata creation and management, too!
We need to work together to increase efficiency, consistency and accuracy in bibliographic production and maintenance for both library and publisher marketsLC Working Group on the Future of Bibliographic Control recommendation
1Increase the efficiency of bibliographic production and maintenance
1.1.1 Make more use of bibliographic data earlier in the supply chain
However, library challenges will not be solved by reliance on upstream data alone. The publisher supply chain experiences challenges in metadata creation and management, too!
We need to work together to increase efficiency, consistency and accuracy in bibliographic production and maintenance for both library and publisher markets
3. It all starts with publisher metadata – libraries, retailers, wholesalers and consumers make decisions based on this metadata
4. Publishers create electronic metadata for use in their own tools and systems and also share metadata with supply chain partners
5. Metadata originates with the publisher or material provider responsible for the content
This metadata is used in various ways:
Print catalogs, marketing and advertising
Publisher websites
Publisher inventory systems
Publisher business intelligence
Publisher data feeds to supply chain partners, etc.
Significant investment in staff and systems is required to support publisher metadata needs
Metadata originates with the publisher or material provider responsible for the content
This metadata is used in various ways:
Print catalogs, marketing and advertising
Publisher websites
Publisher inventory systems
Publisher business intelligence
Publisher data feeds to supply chain partners, etc.
Significant investment in staff and systems is required to support publisher metadata needs
6. The metadata ends up on publisher websites
7. And publisher print or PDF catalogs as well as other publisher ordering/inventory systems and tools
8. Publishers also share pre-publication electronic data with the Library of Congress for the creation of CIP records
9. CIP Record(Publication Date April 2009)
10. The same title on the publisher website
11. The publisher supply chain also invests significant staff and resources in the creation, enhancement and manipulation of metadata
Aggregators pull together metadata from multiple publishers and package it in ways that can be of use to materials providers, libraries and end users
Searchable websites with features that make metadata more useable
Data feeds that can be ingested into supplier systems
All have staff who work on the metadata received from publishers and create metadata
The publisher supply chain also invests significant staff and resources in the creation, enhancement and manipulation of metadata
Aggregators pull together metadata from multiple publishers and package it in ways that can be of use to materials providers, libraries and end users
Searchable websites with features that make metadata more useable
Data feeds that can be ingested into supplier systems
All have staff who work on the metadata received from publishers and create metadata
12. The same title on the Barnes and Noble website
13. The same title on Amazon
14. Title data on Amazon
15. And more Amazon metadata about this title(Note the inclusion of LCSH)
16. Wholesalers build products and services on publisher metadata
17. Wholesaler ordering tools pull information about content from multiple publishers into one interface
18. Data Aggregators also collect and enhance metadata from multiple publishers
19. The metadata can be combined with business data to assist with buying decisions
20. Libraries use these wholesaler and aggregator tools built on publisher metadata
21. Retailers use metadata from wholesalers and aggregators too
22. Buying decisions (for retailers and libraries) incorporate business intelligence connected to title metadata(Source: Nielsen BookData)
23. Libraries and retailers use metadata sliced and diced using various metadata elements (Source: Nielsen BookData)
24. Categories are an important part of how we sort and make meaning from metadata(Source Nielsen BookData)
25. Publisher Supply Chain SubjectsBISAC Subject Headings
26. Publisher Supply Chain Subjects:BISAC Subject Headings
27. Publisher Supply Chain Subjects:BISAC Major Categories
28. Publisher Supply Chain Subjects:BIC Standard Subject Categories (U.K.)
29. ONIX is the international standard for the book industry
30. And the book industry encourages best practices for metadata creation
31. From BISG “Best Practices” document
32. But library systems require electronic metadata in MARC format, library terminologies and library-defined input standards
33. So, many library wholesalers maintain separate databases and cataloging staff to provide MARC to library customers
34. The Library of Congress transforms pre-publication metadata into MARC and adds library-specific metadata to create CIP records
Library catalogers retrieve MARC records from LC and other shared resources and create MARC records – usually upon receipt of materials
Library vendors often employ catalogers (in addition to other data staff described earlier) to create MARC records and physical processing staff to perform shelf-ready services
These vendors must maintain (at least) two databases to accommodate different data formats and customer needs
The Library of Congress transforms pre-publication metadata into MARC and adds library-specific metadata to create CIP records
Library catalogers retrieve MARC records from LC and other shared resources and create MARC records – usually upon receipt of materials
Library vendors often employ catalogers (in addition to other data staff described earlier) to create MARC records and physical processing staff to perform shelf-ready services
These vendors must maintain (at least) two databases to accommodate different data formats and customer needs
35. Data Silos
Libraries, retailers, wholesalers and aggregators are consumers of publisher direct and publisher supply chain metadata
Parts of the publisher supply chain also use and create library metadata
But library metadata has evolved separately from publisher supply chain metadata
36. Putting It Back Together
37. New Models for Creating and Sharing Metadata Re-mix and re-use existing metadata
Increase collaboration and cooperation between library and publisher supply-chain communities
Break down barriers between metadata used for selection and acquisitions and metadata used for cataloging, discovery, business intelligence and collection management
Become more involved in upstream metadata creation processes, integrate available metadata into workflows upstream and allow the metadata to evolve over time
38. New Models for Creating and Sharing Metadata Solutions must be interoperable and easily shared – inside and outside the library community
The library community must extend our expertise, as well as our cooperative and collaborative practices, to include publishers and publisher supply chain partners
39. “Next Generation” Cataloging and Metadata Creation Pilot Automated capture, crosswalk and enhancement of publisher ONIX metadata
Output in MARC and ONIX to benefit both library and publishing communities
ONIX enriches MARC data and MARC enriches ONIX data
Mapping between library and publisher terminologies
OCLC pilot program with publishers, vendors and libraries
40. “Next Generation” Cataloging and Metadata Creation Pilot Components of the “Next Gen” process
ONIX to MARC crosswalk
MARC to ONIX crosswalk
Automated record build and add to WorldCat
Enrichment software: rules and hierarchies for data mining and record enrichment using FRBR work sets
Mapping between terminologies: first up – DDC/BISAC Subject Headings
Output files in either MARC or ONIX
41. “Next Generation” Data Flow
42. How are we doing?
43. ONIX to MARC Crosswalk
44. Example of ONIX Input
45. OCLC’s ONIX/MARC Mapping
46. OCLC’s ONIX/MARC Mapping
47. Matching in WorldCat and Data Enrichment
48. Example of Enriched ONIX
49. Enrich Existing WorldCat Record
50. Example of Enrichment to Existing WorldCat Record
51. Example of Enrichment to Existing WorldCat Record (Cont.)
52. MARC to ONIX Crosswalk
53. Example of Enriched ONIX Output
54. Add New Records to WorldCatand Enrich New Records
55. We can build a basic MARC format record by mapping ONIX to MARC But we want to automatically make that record better and more fit for use by libraries by mining existing WorldCat records
Many forthcoming or newly published titles are new iterations of existing works
New editions, paperbacks, audio books, e-books, large print, etc.
We want to make our process “think like a cataloger” for these types of titles and use the metadata in WorldCat records for earlier versions
Earlier editions, hardcover editions, etc.
56. We will do this by mining FRBR work set records for newly added titles that link to an existing work set The new record will be a hybrid of:
Publisher ONIX metadata that pertains specifically to the new work
ISBN, physical description, imprint, price, etc.
And …
Existing MARC data that pertains to the same intellectual content
LC and Dewey Classification, LCSH, NLM, authority controlled contributor names, etc.
57. Example of “Next Gen” New RecordCreated from ONIX Data and Mining FRBR Work Set
58. Map Between Terminologies: DDC/BISAC Subject Headings
59. Example of DDC/BISAC Mapping
60. Example of DDC/BISAC Mapping(Cont.)
61. Planned Enhancements:Additional terminologies mappingUse WorldCat Identities in enrichment process
62. “Next Generation” Cataloging and Metadata Creation Pilot Pilot Wrap-up Spring 2009
Complete statistical analysis and compile pilot partner evaluation results
Complete case studies of pilot partners
Share results with pilot partners, advisory board
Share results with library and publisher supply chain communities
Watch this space for pilot results:
http://www.oclc.org/us/en/partnerships/material/nexgen/nextgencataloging.htm
63. “Next Generation” Cataloging and Metadata Creation: Beyond the Pilot “Productionize” the process so that it can used routinely to ingest, create and enhance metadata in WorldCat and output enhanced data in MARC and ONIX
Add additional mappings between classification schema and terminologies
64. “Next Generation” Cataloging and Metadata Creation: Beyond the Pilot Continue to refine enrichment of existing WorldCat records and the addition of records to WorldCat based on publisher data
Integrate WorldCat Identities into the process
Enhance the process based on response from library and publisher communities
Include mechanisms for ongoing delivery of record enhancements to libraries and publishers
65. Contact Information Renee Register
register@oclc.org
614-764-6107
Maureen Huss
hussm@oclc.org
614-764-4327
66. Questions? Questions?