600 likes | 751 Views
CONTENTdm Interoperability -- Leveraging resources; repurposing collections. ALA Annual New Orleans, LA June 23 rd , Friday, 9 am to noon. Claire Cocco , Product Manager Geri Ingram , Customer Service Specialist DiMeMa, Inc. Agenda Part 1. 9:00 to 10:15
E N D
CONTENTdm Interoperability-- Leveraging resources; repurposing collections ALA AnnualNew Orleans, LAJune 23rd, Friday, 9 am to noon • Claire Cocco, Product Manager • Geri Ingram, Customer Service Specialist • DiMeMa, Inc.
Agenda Part 1 9:00 to 10:15 • Mainstream digital objects into existing workflows Importing from legacy systems • Exporting • Example of collaborative development for interoperability METS transform (courtesy of CDL) [BREAK 10:15 TO 10:30]
Agenda Part 2 10:30 to 11:30 • Customizing and integrating your CONTENTdm site • Web templates • Custom Queries and Results • Configuration files
Agenda Part 3 11:30 to Noon • Handling Finding Aids • Importing EAD files into CONTENTdm
Setting the context: fully engaged in digital library transformation • Library services and collections expanding to encompass all • Traditional to digital • Licensed • Reformatted • Sharing • Preserving
Leveraging resources • Staff time and skills throughout the organization and/or consortium • Existing metadata in some form • Existing digital collections (images and transcripts)
Why? For better customer service • In order to mainstream your processing and amplify your efforts. • Your digital collections should ultimately be mainstreamed into regular workflows, similar to the ones used for other materials (whether that’s done centrally or in a distributed fashion). • This includes selection, technical processing (cataloging, organizing, importing), integration with site vis-à-vis presentation and archiving.
Mainstreaming processing of digital formats(Part 1 of 3) • Importing from other systems to CONTENTdm • Exporting from CONTENTdm • Example of collaborative development for interoperability • CONTENTdm Standard Export • METS transform for import
I. Importing from other systems to CONTENTdm • Metadata only • When records describe items that are not yet scanned • Replace “null” files at later time • Metadata AND their digital files
From an OPAC or other database system When you have… • Individual image files cataloged already • And can export from an OPAC or other dbms Or where you havecompound digital objects ready for migration
Migration steps: • Prepare the collection and the import files • Cross-walk metadata to Dublin Core • Configure the CONTENTdm collection fields • Export and prep data in a tab-delimited ASCII file • Import the file to CONTENTdm
Data prep: Common problems in tab delimited data files • Extra data in columns or rows • Extra tabs at end of line • Extra CRs at end of file (Should only be 1 CR) • Carriage return in metadata, tab in metadata • Files must exist • 0 versus O • Error may occur in previous record, check few rows before and after error • File names are required, not full pathnames
Data prep: Troubleshooting with Excel • Use Microsoft Excel to open the file and view data • Each row should be an item with last column as filename • Work with small batches to find errors – keep adding items until record with error is found • Use Excel’s “CLEAN” function to remove invisible characters • Import images from directory without using tab delimited file • Checks for any type of imaging errors
Demo: MARC to DC • Export MARC records to tab-delimited text file (using ILS or MarcEdit) • Format and clean up the text file to conform to your CONTENTdm Collection schema • Import the file (with or without images) to the Collection
Importing compound objects • For documents, postcards, monographs and picture cubes • Can do singly or in batch • Much easier to start with singles, then set up for batch when process is smooth
Migrate compound objects from another database system Where you have many compound digital objects to migrate • Prepare the collection and the import files • Cross-walk metadata to Dublin Core • Configure the CONTENTdm collection fields • Configure folders for scans and transcripts (if appropriate) • Choose an import method based on your data structure • Create tab-delimited ASCII file(s) appropriate to the method • Import the files to CONTENTdm in batches
Multiple compound object wizard • Documented in online tutorial • Today’s demo described in handout • Four import methods for multiple object loading • Compound object (same as single, but upload batched) • Directory Structure (most flexible and efficient) • Object List (useful when NO page-level metadata) • Job List • Time allowing, demonstrate three different object types using 3 of 4 methods
Are your scan files Create compound object separated into No directories for EACH compound object compound object. directories? Yes Break up into batches by type Are they all the same type of compound No object? Yes Do you have one tab-delimited text file containing ALL the objects? Create text file listing all Do you have page-level Do you have tab- compound objects and metadata for the No No delimited text files for No object metadata or compound objects? EACH compound object? create a text file for each compound object. Yes Yes Yes . DIRECTORY STRUCTURE DIRECTORY STRUCTURE OBJECTLIST .
Every one of the four CONTENTdm compound object importing methods • Requires object-level metadata • Requires preparation • File–naming, keeping sort order in mind • Each object has own directory for scans • May use tab-delimited text file(s) • Accommodates transcripts
A word about descriptive page-level metadata • Supported by some but not all 4 import methods • NOT supported by Object List • At page-level Title is only field required • Technical metadata, can be generated by Template creator
More on transcripts • Typescripts and transcripts • Requires a field designated as the data type “Full Text Search” • Inserted into the metadata field of the scanned page • During import • Through use of .txt file found, or • By Template Creator • If OCR Extension in use • Or by “Directory Import” as with early versions of CONTENTdm • Transcripts and typescripts are supported by all four methods (i.e., not considered “metadata” for purposes of this discussion)
Demo: Import Multiple Compound Objects • Monograph using Compound Object method • Postcards using Object List method • Documents using Directory Structure method
II. Exporting from CONTENTdm • To ascii tab-delimited with field headers • To xml: • Standard Dublin Core —only DC • Custom—all fields, including local but not structure • CDM Standard—all fields, including structure
III. Examples of collaboration for interoperability • Web integration through search engines, RSS • OAI harvesting • Enable at collection or server level • Choose to suppress <pagedata> or not • WorldCat registration • Open WorldCat integration
CONTENTdm and a new METS transform • Info available on USC in July • Code at SourceForge • Windows-oriented
The CONTENTdm to METS conversion tool
What is/are METS? Why is/are METS good? What is 7train? How do I use 7train? What do I get from 7train? How do I get 7train?
What is/are METS? METS (Metadata Encoding and Transmission Standard) is an XML-based standard for encoding metadata to describe objects (digital or otherwise) within a digital library. See http://www.loc.gov/standards/mets/ for more information
METS METS Metadata about this particular METS - encoder, contact info, etc. metsHdr metsHdr Descriptive metadata - title, author, subjects, etc. dmdSec dmdSec Metadata for the management of the object: technical details, object history, etc. amdSec amdSec A list of files that make up the object fileSec fileSec Description of the structure of the object, i.e. how the files fit together structMap structMap What to do with the object: machine actionable instructions behaviorSec behaviorSec Yellow elements/tags are required; all others are optional What is/are METS?
Why METS? To be able to add your objects to other collections and increase the visibility your institution's assets.
What is 7train? 7train is an XSL-based tool for converting XML documents - in this case CONTENTdm exports describing objects managed in the CONTENTdm system - into METS objects suitable for submission to a digital library system, such as the California Digital Library's Online Archive of California. 7train is a platform-independent, standalone tool that was designed to work on any system and to be simple to use.
How does 7train work? It is as easy as dragging your CONTENTdm XML export file onto an executable file.
References & Links 7train Home: http://seventrain.sourceforge.net 7train Download: http://seventrain.sourceforge.net/7train_download.html CONTENTdm: http://www.dimema.com METS: http://www.loc.gov/standards/mets/ XSL: http://www.w3.org/Style/XSL/ The California Digital Library: http://www.cdlib.org The Online Archive of California: http://www.oac.cdlib.org
CONTENTdm Existing Libraries New Libraries 10K/50K/ Unlimited Objects Other CONTENTdm sites CONTENTdm Multi-Site Server OPACS Librarians, Archivists… Interoperability Web WorldCat DC Regional Union Catalog XML DC OAI OAI OAI MARC RECORDS OPEN WORLDCAT OAI Other digital archives For Library Users
BREAK—15 minutes • This concludes Part 1 • To come after the break: Part 2 • Customization Part 3 • Finding Aids
Customizing and integrating your CONTENTdm site (Part 2 of 3) • Web templates • Custom Queries and Results • Configuration files
CONTENTdm Web Templates • Customizable for integration • Designed to support broad range of users • Small to large organizations • Beginners to experts • Use out of the box with minimal customization • Basic customization requires minimal HTML skills • Fully customize including advanced extensions • Based on a PHP API (Hypertext Preprocessor and Application Program Interface)
Basic Customizations • Minimal skills needed • Easy to make changes • Global include files • Variables • Recommend all organizations do basic customizations • Header (name/logo), contact e-mail address, colors, about page, home page http://www.contentdm.com/help4/custom/templates.html
Getting Started • Access to Web server docs directory • HTML editor or text editor • Design plan • Logo or other graphics • Backup copy of original files
Customization Demo • http://sr.contentdmdemo.com • Files located in /cdm4 directory • /includes/global_header.php • /client/LOC_global.php • /client/STY_global_style.php • about.php • browse.php • results.php • New logo saved in /cdm4/images/
Advanced Customizations • Experience with HTML, PHP, and JavaScript needed • Customize looks for each collection • University of Nevada, Reno • Web Template extensions • E-commerce (University of Utah, Oregon State University) • Comment forms (SENYLRC, Enoch Pratt Free Library, OSU) • Custom metadata display (University of Oregon) • QuickTime video (Williams College) • http://www.contentdm.com/customers/index.html
Examples of Advanced Customizations • University of Nevada, Reno http://imageserver.library.unr.edu/ • University of Utah http://www.lib.utah.edu/digital/bodmer/ • Oregon State University http://digitalcollections.library.oregonstate.edu/cdm4/client/bracero/ • SENYLRC http://www.hrvh.org/ • Enoch Pratt Free Library http://www.mdch.org/ • Williams College http://contentdm.williams.edu/
Customizations Tips • Always make a backup! • Be aware of encoding (UTF-8 vs. ASCII) • See what other users are doing • Share, borrow, and copy ideas and code • http://www.contentdm.com/customers/index.html • Listserv • Document changes • Document which files are edited and what code changes are made to ease upgrading to newer versions
Custom Queries and Results (CQR) • Create predefined, custom queries • Virtual collections • Guide users to specific results • Integrate with other sites • Multiple options • Simple hyperlink, drop-down list, index box, text box, browse • Easy to use • Wizard generates code to copy and paste into Web pages • Documentation • http://www.contentdm.com/help4/custom/cqr.html • http://www.contentdm.com/USC/tutorials/cqr.pdf
CQR DEMO • Generate code using CQR • Copy and paste into Web pages • May need to change path • Customize as desired
Configuration Files • Customizable files that reside on the server • Stop words • Full text field stop words – fullstop.txt • Automatic hyperlink stop words – stopwords.txt • http://www.contentdm.com/help4/custom/stopwords.html • Image viewer • Customize how images are displayed – imageconf.txt • For all collections or per collection • http://www.contentdm.com/help4/custom/zoompan.html