220 likes | 362 Views
FRBR Work Match activities at DBC. Where are we and where are we going Author: Hans-Henrik Lund Elag 2002 - Roma 17.04.2002 ( hhl@dbc.dk ). What do we have . A record collection of 16,5 mil. marc records from 172 different ’libraries’ Including: the Danish national bibliography 1,4 mil.
E N D
FRBR Work Match activities at DBC Where are we and where are we going Author: Hans-Henrik Lund Elag 2002 - Roma 17.04.2002 ( hhl@dbc.dk )
What do we have • A record collection of 16,5 mil. marc records from 172 different ’libraries’ • Including: • the Danish national bibliography 1,4 mil. • BNB 1,3 mil. • LC 3,3 mil. • All converted to danMARC2
What do we want • Make this collection available for the end user as a ”work” collection (and not as a collection of records). • We have defined that 2 works are different, if the language or the material type is different.
How do we do this: • We have matched the entire data base on a ”edition/manifestation” level (in clusters). If you want the system to handle orders, its important to maintain edition level. • By making clusters based on manifestation the logical numbers of records was reduced from 16,5 to 12,3 mil. records
From manifestation to work • The result of a search will be matched, on the fly, on work level. (in the test version) • A result of a author search ”Stephen King” yields 362 cataloguing records, 231 manifestation/clusters and 102 works • The benefits of this approach is that we can change the criteria for a ”work” and test it.
The match program • The match program works in two phases • First it makes a key. This key is like a hatch key. The key could be based on the title and/or a known identifier (issn, isbn etc.) • Second it takes two record at a time, with the same key, and compares them according to rules for the match-script
Normalization of the text • København’s freds kommité KOBENHAVNS FREDS KOMMITE • Hans Krüger HANS KRYGER
3 different operands • alike • not_alike • alike_or_missing
Logical fields • A logical field containing data from many subfields • maintitle = 245*a | 239*t | 240*a & 240*d & 240*e & 240*f & 240*h • A logical field containing only parts of a subfield • author = 700*a & 700*h:1 • 100 *a Rifbjerg *h Klaus = 100 *a Rifbjerg *h K.
Conversion of text • 250*a • udg:edition + ed:edition + udgave:edition ... • first:1 + 1st:1 + second:2 + 3rd:3 …. • rev:revised + revideret:revised • 041*a • und: + mul: + mis: • 260*b • Det Schønbergske forlag:schønberg
Edition comparison • We make a temp-field only with words recognized from the edition field (after it has been text converted) • “EDITION” & ( @digit | “REVISED” | “NEW” ) • 250 00 *a 3. ed. *x 12. reprint. = EDITION 3 • 250 00 *a 3. ed.,4. rep. = EDITION 3
Year comparison • 260 *c 1980 = 260 *c 19791982 • 260 *c 19982001 = 260 *c 19992002
Problems • Different cataloguing praxis • Errors (typing etc) • More than one work in the same marc-record • A CD can contains works from many different artist
Development strategy • The syntax and features of the match-script has been developed along with the project in collaboration between the libarien and the programmer. • The libarien had a online test program of the match-script
It depend of the result of this project Perhaps the cost/benefits not good enough Perhaps we actually make a publication database with records stored as works ? The future ?
The script language • An example
Some test results • Boligsikring = 145 manifestations 29 works • Mankell and bøger and dansk = 44 manifestations 19 works • Verdi and opera and cd = 111 manifestations 35 works