190 likes | 295 Views
New Century, New Metadata. Thomas Krichel http://openlib.org/home/krichel University of Surrey, Hitotsubashi University and Long Island University. Why Metadata. Fun Information retrieval Support organization of social process. Crisis of Author Self-archiving. Formal archiving Small
E N D
New Century, New Metadata Thomas Krichel http://openlib.org/home/krichel University of Surrey, Hitotsubashi University and Long Island University
Why Metadata • Fun • Information retrieval • Support organization of social process
Crisis of Author Self-archiving • Formal archiving • Small • Metadata poor • Informal archiving • Information retrieval difficult • Lack of support infrastructure
Improving formal archiving • Strengthen the metadata provision • Broaden the mission of archiving • Allow usage of archived material in many user services • Better report on archive material usage • Strengthen the relationship with overlay services
Improving Informal Archiving • Build standardized metadata supply format • Harvest that metadata into larger digital libraries • Offer archival backup for papers
Metadata to Support Self-archiving • Simple to compose • Intuitive vocabulary that is specific to the academic process, e.g. “author” instead of “creator” • Widely applicable • All disciplines and publication forms • High quality i.e. controlled
Metadata Control • Any processing that is done to the metadata before its inclusion in a user service. • Essential in a situation where metadata is harvested.
Types of Control • Syntactic control • Relational control • Retrieval control • Identity control • Verity control • Accession control
Basic Model • Four different record types • Document • Group • Person • Organization
Group and document • There is only one document type. • Groups are used to refine the status of the document. • Group construct meant to be defined by librarians, publishers and other intermediaries.
Person and Institution • Person and institution admit very similar attributes • It is hoped that organizational information will be contributed by intermediaries.
Implementation of Basic Model • RePEc • 100000 documents • 100 groups (series) • 500 authors • 5000 institutions • Example • http://ideas.uqam.ca/EDIRC/data/frbgvus.html • Possible to do the same thing for ReLIS
Basic Grammar • XML syntax • Three groups of XML elements • Nouns: element for items described • Adjectives: elements that describe nouns • Verbs: elements that relate nouns
Modular Design <person><isauthorof> <document><ispublishedby> <organization><hasmember> <person></person> </hasmember></organization> </ispublishedby></document> </isauthorof></person>
Relational Design • <person id=“kmarxthered”><email> k.marx@highgate.london.uk</email> </person> • <document id=“kapital”> <title>Das Kapital</title><hasauthor> <person id=“kmarxthered”/> </hasauthor></document>
Other features • Lang qualifier to all elements, it ISO 639-1 if there are two letters and the bibliographic variant of ISO 639-2 if three letters. • Nouns have id. • Verbs have startdate and enddate qualifiers, and of course have id. • Adjectives can have child elements.
Remaining Problems • Resolvability rules for identifiers • Dates and history • Subject classification using the group mechanism • Aliasing of element names
To be done… • Complete list of verbs and adjectives • Schema design • Parsing and validation software. • Conversion with test collection ReLIS.
Collaboration is welcome Thanks for listening. Have a happy New Year.