Redesigning VuFind's Record Handling for MARC and Beyond

VuFind Beyond MARCdiscovering everything else Demian Katz VuFind Developer demian.katz@villanova.edu

How VuFind Used to Work • MARC records were loaded into Solr. • Data parsed to fields for searching/faceting. • Full binary record stored in “fullrecord” field. • Solr was used for retrieving records. • VuFind’s PHP code made heavy use of “fullrecord” data for building displays.

What’s wrong with that? • MARC must die. • Not all searchable documents are MARC. • Code for pulling data from MARC is ugly.

Redesign Goals • Centralize MARC-specific code so it can be easily replaced. • Use stored Solr fields whenever possible. • Allow arbitrary metadata formats to coexist peacefully. • Make no assumptions about metadata content.

The Solution: Record Drivers • A class interface for displaying a document retrieved from Solr. • A new Solr field tells VuFind which Record Driver to instantiate for each document. • A default Record Driver can be written to display a document based solely on stored Solr fields.

One Key Design Decision • What should the Record Driver class contain? • Data-oriented methods (getTitle, getAuthor, etc.) • Screen-oriented methods (getSearchResult, getStaffView, etc.)

The Answer: All of the Above interface RecordInterface public getSearchResult() public getStaffView() … class IndexRecord implements RecordInterface protected getAuthor() protected getTitle() … class MarcRecord extends IndexRecord protected getAuthor() protected getTitle() …

Record Driver Benefits • Large-scale changes are possible. • Small-scale changes are easy. • Allows object-specific behaviors. • Eases maintenance of local customizations.

Next Problem… • Where’s the data? • MARC records traditionally come from an ILS export. • SolrMarc traditionally takes care of populating VuFind’s Solr index.

Growing the Toolkit • The toolkit approach is important! • Problems to solve: • Obtain records from remote sources • Process harvested files • Index arbitrary XML

Tool #1: OAI-PMH Harvester • Purpose of tool: harvest metadata files from an OAI-PMH server into a directory. • Key feature: ID manipulation. • Key feature: delete support.

Tool #2: Batch Import Scripts • Purpose of tool: process all metadata files in a directory. • Easily achieved with Windows batch or Unix shell scripting. • Several sample scripts ship with VuFind.

Tool #3: XSLT Importer • Purpose of tool: with XSLT, map an XML document to a Solr document based on VuFind’s schema. • Key feature: PHP integration • Key feature: Aperture support • Several sample XSLT documents ship with VuFind (DSpace, OJS, VuDL).

Parting Thoughts • Understanding Record Drivers gives you a lot of control over VuFind. • VuFind should be able to index practically anything with a bit of effort. • Don’t be afraid to build your own tools!

More Information • VuFind: • http://vufind.org • Demian Katz: • demian.katz@villanova.edu

Redesigning VuFind's Record Handling for MARC and Beyond

Redesigning VuFind's Record Handling for MARC and Beyond

Presentation Transcript

VuFind Beyond MARC discovering everything else

VuFind

Net Asset Value versus “everything else”

Everything Depends on Everything Else

VuFind

1 st Law: Everything is connected to everything else

Everything pulls on everything else.

If you fix everything you lose fixes for everything else

Everything Else About Data Flow Analysis

Everything Else

Everything Else

Everything pulls on everything else.

Everything else about Fundraising

Moving Beyond MARC: Musings

“Everything Else”

Customizing VuFind

Global History : Everything Else

Water: everything else.

“For Everything Else, There’s backpage Shepparton ”

Everything pulls on everything else.