1 / 21

Metadata and identifiers for e-journals

Metadata and identifiers for e-journals. Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi. Contents. Introduction Traditional cataloguing Full-text indexing Embedded metadata + Dublin Core DIEPER choices Identification of e-journals. Introduction.

Download Presentation

Metadata and identifiers for e-journals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata and identifiers for e-journals Copenhagen 13.-14.3.2000 Juha Hakala Helsinki University Library juha.hakala@helsinki.fi

  2. Contents • Introduction • Traditional cataloguing • Full-text indexing • Embedded metadata + Dublin Core • DIEPER choices • Identification of e-journals

  3. Introduction • Metadata = structured description of resource • Structure of metadata is defined in a format • simple formats (AltaVista) • complex formats (MARC) • structured formats (Dublin Core) • Choices have important cost and quality implications (good is not free)

  4. Traditional cataloguing • Routinely done for journals (ISSN DB) • Articles indexed only selectively • Finnish article index Arto: 1100 journals; 65000 articles + 10 man years annually, 40 libraries co-operate in production • Extending MARC cataloguing to all digitised articles is too expensive • Any selection criteria for “good material”?

  5. Full-text indexing • Will not replace cataloguing... • In large databases precision still bad • ...but we should follow what is happening • RDBMS become document-literate (Oracle Intermedia) • new search techniques (e.g. fuzzy searching) • efficient use of language technologies • knowledge management

  6. Embedded metadata (1) • Three issues to solve: • semantics: in which metadata format should my metadata be? • syntax: is it possible / feasible to embed metadata into this document (does the document format allow inclusion of metadata) • once topics 1 & 2 have been solved: are there tools for creating / harvesting / indexing my metadata?

  7. Embedded metadata - syntax • It must be possible to include metadata in non-compromised form & specify each data element separately • Most document formats do not allow efficient metadata usage • “flat files”, image formats, Word97 • “This is Dublin Core identifier element, and there is an ISBN in it”

  8. Embedded metadata - syntax (2) • HTML 4.0 • META tag enables sophisticated metadata • Explicit specification for how to embed Dublin Core -based metadata (RFC 2731) • XML/RDF • “Resource Description Framework makes data machine understandable” • very versatile, but may be tough to implement

  9. Embedded metadata - semantics • Metadata formats tend to be domain specific, complex and hard to learn • Dublin Core as an alternative: • simple (in its basic form) • generic (no domain dependency) • extensible (local elements possible) • Is there any competition left?

  10. Status of Dublin Core Initiative • maintenance in reliable hands • 15 elements stable (DC 1.1) • syntax for HTML 4.0 stable • core qualifiers under development • proposals published in December -99 • agreement in DC-AC in March 2000 • will result to 50-60 qualifiers

  11. Tools for Dublin Core • Metadata support in Web indexes becoming more popular • Metadata creation emerging in document management systems • Text editors: XML support in place, RDF yet to come

  12. DIEPER choices • Document format will be XML/RDF • extensible and open document format that will become very popular in the future • Metadata format will be based on DC • DC tags: Identifier, Title, Creator, Contributor, Publisher, Language, Subject • Local tags: e.g. SerialsNumbering, PlaceOfPublication, SizeSourcePrint

  13. Identifiers for e-journals • Two different issues: • how to identify journals themselves • how to identify articles and possibly sections of articles (table of contents etc.) • Do we need resolution mechanism (based on DOI or URN)

  14. E-journals • ISSN must be used, also for digitised journals • digitised version may have the same ISSN than the original paper version • ISSN should not be embedded on issues / articles, since this enhances recall too much • Broadened scope: serials + integrating resources

  15. Issues & articles • SICI (Serial Item and Contribution Identifier) should be used • ANSI/NISO standard (1996) • http://sunsite.berkeley.edu/SICI/ • Not widely supported yet; e-commerce is likely to change this • need to identify whatever that can be sold • SICI generator available

  16. Properties of SICI • Extensible: can identify issue/article/section within article • Can be created automatically (from structured source document) • Complex • 0002-8231(1929)30:1<ZBDMSU>2.0.CO;2-Z • Can be used as URN or DOI

  17. URN & DOI • Umbrella systems that provide e.g. persistent linkage between a reference and the resource via a resolution service • DOI is a publisher-driven initiative, URN comes from the Internet community • DOIs can be used as URNs, not vice versa

  18. Digital object identifier • Consist of prefix and suffix, separated by a slash • 10.1045/february2000-risher • Suffix may be anything, there is no hint on its content • Prefix identifies the publisher + indicates where to find a resolution service

  19. Uniform resource name • Consists of three parts: • string urn: • Namespace identifier (NID) • Namespace specific string (NSS) • When NID is known, creating URNs from existing identifiers is trivially easy • No hint on where to find resolution service

  20. Business models • DOI: annual payment for each DOI assigned • no decision yet on the size of the payment • flat fee for publisher ID • URN: no price at all • but someone has to pay for the resolution services

  21. DIEPER policy • URNs will be used, in order to enable URN-based resolution services • ISSN/SICI will be used • ISSN International Centre will assist in creation of URN resolution services • ISSN database will be contacted first, in order to get the address of the resolution service

More Related