140 likes | 394 Views
Use of DDC in HILT 2 and beyond. Gordon Dunsire Presented to the Dewey Decimal Classification seminar, British Library, Boston Spa, 26 March 2003. Overview. HILT II project Functional components of a “terminologies” server in the distributed digital library How high? Role of DDC
E N D
Use of DDC in HILT 2 and beyond Gordon DunsirePresented to the Dewey Decimal Classification seminar, British Library, Boston Spa, 26 March 2003
Overview • HILT II project • Functional components of a “terminologies” server in the distributed digital library • How high? • Role of DDC • Issues with DDC
HILT II project (1) • High Level Thesaurus • First phase looked at approaches to resolving issues of interoperability of subject vocabularies • Cross-domain agreement • No future in doing nothing • Cross-match of standard vocabularies at high (broad) level feasible
HILT II project (2) • HILT II building a pilot (IE) terminologies server • Centre for Digital Library Research is lead site • Partners are MDA, NCA, NGfL, OCLC, SLIC, and UKOLN; funded by JISC • Test terminology sets: WebDewey (DDC, LCSH); Napier University DDC subject index • So far: sample mappings database; interface development • Cross-related to SCONE (collection description), CC-interop (COPAC/clumps) and SPEIR (portals) projects at CDLR
Functional components • Disambiguation • The "lotus" problem • Landscaping • Identification of relevant collections • Resource discovery • Item-level retrieval by subject
Disambiguation • Input primary term • Match term against terminology sets • DDC: schedules + comparative index • LCSH; local vocabularies; etc. • Return matched sets of term objects • Within each terminology • Between terminology • Remove semantic mismatches • Return set of disambiguated term objects
Landscaping (1) • DDC number is contained in each term object • Use longest number of set (highest precision) • Match against DDC index of collection-level descriptions (CLDs) • Until acceptable number of cumulated matches (level of recall), truncate number (decrease precision) and repeat
Landscaping (2) • 2 types of DDC to CLD mapping • Specific subject (special collections) • Subject strength (general collections) • Conspectus via LCSH/DDC mapping • Returns set of CLDs with high recall/low precision matching of primary term • Constrained by DDC taxonomies • CLD set is basis of subject-based portal
Landscaping (3) • CLD set filtered by portal parameters • Authentication, educational level, etc. • CLD object contains collection-description (C-D) object • Analytic finding-aid; metadata repository; OPAC • C-D object contains subject scheme identifier • DDC, LCSH, etc.
Resource discovery • Term object in disambiguated set contains subject scheme identifier • C-D object matches to term objects • Terms or classification numbers from matched term objects are available for each C-D object in landscape • Terms or numbers can be input to C-D target subject or classified indexes for item-level resource discovery
How high? • Top 1000 divisions of DDC • Close cross-domain semantic mapping because concepts are broad • But input term likely to be more precise (low level) • Scaling issues for deeper vocabulary sets • High level not generally useful for item-level discovery or special collections
Role of DDC • Common base of terminology set mappings • All mapped to DDC • Simple mechanism for increasing recall • Decimal notation traverses taxonomy • Classification of collection-level descriptions • Specifically, and via subject-strength schemes
Issues with DDC • Digital awareness • Digital objects not in scope of provenance • Version control • Continuous or step-wise updating? • Terminology • U.S. bias in spellings; synonyms • Notation • Standard subdivisions and inconsistency • IPR
Thank you • g.dunsire@strath.ac.uk • HILT: http://hilt.cdlr.strath.ac.uk/ • CDLR: http://cldr.strath.ac.uk