New Century, New Metadata

New Century, New Metadata Thomas Krichel http://openlib.org/home/krichel University of Surrey, Hitotsubashi University and Long Island University

Why Metadata • Fun • Information retrieval • Support organization of social process

Crisis of Author Self-archiving • Formal archiving • Small • Metadata poor • Informal archiving • Information retrieval difficult • Lack of support infrastructure

Improving formal archiving • Strengthen the metadata provision • Broaden the mission of archiving • Allow usage of archived material in many user services • Better report on archive material usage • Strengthen the relationship with overlay services

Improving Informal Archiving • Build standardized metadata supply format • Harvest that metadata into larger digital libraries • Offer archival backup for papers

Metadata to Support Self-archiving • Simple to compose • Intuitive vocabulary that is specific to the academic process, e.g. “author” instead of “creator” • Widely applicable • All disciplines and publication forms • High quality i.e. controlled

Metadata Control • Any processing that is done to the metadata before its inclusion in a user service. • Essential in a situation where metadata is harvested.

Types of Control • Syntactic control • Relational control • Retrieval control • Identity control • Verity control • Accession control

Basic Model • Four different record types • Document • Group • Person • Organization

Group and document • There is only one document type. • Groups are used to refine the status of the document. • Group construct meant to be defined by librarians, publishers and other intermediaries.

Person and Institution • Person and institution admit very similar attributes • It is hoped that organizational information will be contributed by intermediaries.

Implementation of Basic Model • RePEc • 100000 documents • 100 groups (series) • 500 authors • 5000 institutions • Example • http://ideas.uqam.ca/EDIRC/data/frbgvus.html • Possible to do the same thing for ReLIS

Basic Grammar • XML syntax • Three groups of XML elements • Nouns: element for items described • Adjectives: elements that describe nouns • Verbs: elements that relate nouns

Modular Design <person><isauthorof> <document><ispublishedby> <organization><hasmember> <person></person> </hasmember></organization> </ispublishedby></document> </isauthorof></person>

Relational Design • <person id=“kmarxthered”><email> k.marx@highgate.london.uk</email> </person> • <document id=“kapital”> <title>Das Kapital</title><hasauthor> <person id=“kmarxthered”/> </hasauthor></document>

Other features • Lang qualifier to all elements, it ISO 639-1 if there are two letters and the bibliographic variant of ISO 639-2 if three letters. • Nouns have id. • Verbs have startdate and enddate qualifiers, and of course have id. • Adjectives can have child elements.

Remaining Problems • Resolvability rules for identifiers • Dates and history • Subject classification using the group mechanism • Aliasing of element names

To be done… • Complete list of verbs and adjectives • Schema design • Parsing and validation software. • Conversion with test collection ReLIS.

Collaboration is welcome Thanks for listening. Have a happy New Year.

New Century, New Metadata

New Century, New Metadata

Presentation Transcript

New Century Brewing:

A new century, a new round

New Ideas for a New Century

Into a New Century

Into a New Century

A New Century

New Century, New Teacher

New Century, New Teacher

Into a New Century

Into a New Century

Into a New Century

New Century English 7B

A New Century a New Beginning

New Century Junior English

New Problems for a New Century

A New Century

New Century English

New Century, New Teacher

New Ideas for a New Century

New Century English

New Century Homes

INTO A NEW CENTURY