280 likes | 451 Views
October 2014 OCLC. Janifer Gatenby. EMEA Program Manager Metadata OCLC. ISNI Annual General Assembly, Frankfurt 2014. ISNI Assignment. Assigned 8 million. Provisional: Possible 701,157. Provisional: Unassigned 9,953,505. ISNI Assignment: Batch loading. Independent matching sources.
E N D
October 2014OCLC Janifer Gatenby EMEA Program Manager MetadataOCLC ISNI Annual General Assembly, Frankfurt 2014 ISNI Assignment
Assigned 8 million Provisional: Possible 701,157 Provisional: Unassigned 9,953,505
ISNI Assignment: Batch loading Independent matching sources 3 VIAF sources
ISNI Matching Name Title Partial title Rare title word Date Publisher Personal affiliation Organisation affiliation ISBN, ISWC, ISAN, DOI + Other name identifier e.g. IPI, VIAF, IPD Instrument Linked entities Dewey classification Scores are collected from each judge (ice skating style) Lowered for common surnames and common titles Score > .85 = match Score >.6 but <.85 = possible match
ISNI Assignment: Batch loading Unique name Single source
Central database - Trust + % confidence Publicly accessible www.isni.org Assignmentiscurated Authoritative Unique Trustful Persistent Assigned ≈ 8 million Provisional: Possible ≈638,000 Provisional: Unassigned 9+ Million • Matching algorithms • Data sampling • Anomaly checks • Quality assurance processes • End User input notes - % confidence Assignment only if confident
Confidence • The two main problems for maintaining persistence are • duplicates needing to be merged • undifferentiated identities needing to be split • ISNI errs on the side of making duplicates rather than mixed identities • Thus the batch load process (usually) makes a provisional record • where there is no match (for fear of making a duplicate assignment) • where there is a low confidence match (for fear of making a mixed identity or a duplicate assignment) • where a matching record already has another local ID for the same source, regardless of the strength of the match (for fear of making a mixed identity)
Procedures for maximizing assignment • Refinement of matching algorithms • E.g. introduced rare title word; • Now ignoring date of birth 1900 • Re-import program • Rematch with new rules • Rematch after new data added • ISNI Quality Team: Data sampling • assessing impact of single source • Recommendations for program changes • New criteria • Assessing uncommon surname assignment • Rules for online rich assignment
Online: Guarantee assignment – Personal Name • ISNIs will be automatically assigned where there are no possible matches in these cases: • There are matches with a database record with a different source • A personal name is unique and includes a surname and forename • The request includes an “isNot” statement • The metadata supplied is considered rich as per these cases: • Full date of birth and death supplied • Year of birth + 1 title or instrument+ 1 related name (co- author or affiliated institution) • 1 title or instrument + 1 external URL link of type encyclopaedia, home page (not social network page) + 1 related name (co-author or affiliated institution) • The request is resolving a possible match by including a PPN
Online: Guarantee assignment – Organisation Name • ISNIs will be automatically assigned where there are no possible matches in these cases: • There are matches with a database record with a different source • An organisation name is unique and does not consist only of • abbreviations • The metadata supplied is considered rich as per these cases: • Includes LOCODE & • Organisation type & • Organisation URL • The request is resolving a possible match by including a PPN
Maximizing assignment • Enter a request record online (Web page or via API) • Batch loaded records – passive method • Quality Team manual fixes • OCLC periodic re-match runs • Matches from later batch loading & online activity • Batch loaded records – active method • Resolve possible matches found by the system • Search the database for candidate records for merging • Enrich a record with URLs to external sources such as author’s web pages, Wikipedia, IMDB, MusicBrainz, Discogs, etc.
Correcting and enriching These are all the same person. The second has an incorrect DOB = 1900
Enriching You can add a source note or general note to any database record, your code does not need to be present
Reporting errors The general note will trigger an email to the ISNI Quality Team for attention
Atom Pub API (Machine to machine) • Requests and replacements (you can replace your existing data citing local identifier) • Request • Atom Pub Header • Content = Request in the ISNI XML Request schema • Documentation • ISNI Atom Pub API guidlines.doc • ISNI request.xsd (XML schema) • ISNI request schema.doc (describes the schema) • ISNI response.xsd (XML schema) • ISNI response schema.doc (describes the schema)
What is requested from ISNI Data Contributors? • Act on notifications • (new assignments, changed assignments, errors and queries) • Assist in reviewing possible matches • (Exact matches then possible matches) • Add a note to any record found with an error • Supply URI Ingest ISNIs • Keep data up to date (become a RAG or use the services of an existing one)