1 / 31

InterParty: Common Metadata and Linking Public Identities

InterParty: Common Metadata and Linking Public Identities. A presentation to the final InterParty Seminar The Hague 13 June 2003 Robina Clayphan The British Library. Outline. Common Metadata for Public Identities - background and issues Proposed Common Metadata set InterParty Links

theo
Download Presentation

InterParty: Common Metadata and Linking Public Identities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. InterParty InterParty: Common Metadata and Linking Public Identities A presentation to the final InterParty Seminar The Hague 13 June 2003 Robina Clayphan The British Library

  2. InterParty Outline • Common Metadata for Public Identities - background and issues • Proposed Common Metadata set • InterParty Links • Proposed Link Record

  3. InterParty Common functionality and metadata • InterParty will be a network of InterParty members (IPMs) who have databases containing party metadata • All potential members share a common need for accurate metadata to support the identification of parties • All member databases have identification of parties as a common function • Sharing access to party metadata between databases can substantially reduce costs of data creation and improve data quality • Creating links will add value to the metadata held in separate systems

  4. InterParty • … because different people use the same name ... Is not the same person as Is known as John Williams What common metadata is required? • We need sufficient metadata to allow “disambiguation” between parties with shared or similar attributes

  5. InterParty … in different contexts • eg Iain Banks and Iain M Banks … to hide their identity • use of pseudonyms … sometimes it’s simply a matter of language • Mao Tse Tung or Mao Zedung? Is also known as a member of the group called “Sky” Is also known as John Williams What common metadata is required? • … and the same person uses different names • We need sufficient metadata to allow “collocation” of the same party with different attributes

  6. InterParty The same person? What common metadata is required? • How much is sufficient? The answer is contextual • Metadata about people (as about anything else) is essentially unbounded • A unique identifier may be enough (if you trust it’s source) • For InterParty - we need enough metadata to make a comparison between parties in different databases in order to make a decision

  7. InterParty “Personal” metadata Publicly known Not publicly known Sensitive Public Identity Person InterParty is concerned with “public identities” not “persons” The InterParty approach InterParty “Common Metadata” is a subset of what may be publicly known

  8. InterParty One person usually has only one public identity But some people have more than one, with different attributes Public Identity Public Identity Person For example, may write under a pseudonym The nature of Public Identities

  9. InterParty Public Identity Public Identity Person The nature of Public Identities Relationships between real persons and public identities out of scope

  10. InterParty InterParty and Public Identities InterParty is concerned with Public Identities in different namespaces Within the InterParty network, each Public Identity will require a Public Identity Identifier or “PIDI” . This is a combination of a Namespace and a Unique Identifier within that namespace Public Identity PIDI Namespace A : Brian Green Public Identity PIDI Namespcae B : 876X5

  11. InterParty Common Metadata and Public Identities • Metadata that IPMs are willing and able to share over the network • Information that is in the public domain • Excluding information that is private or sensitive

  12. InterParty Common Metadata • Designed to be a practicable set of elements: • To enable disambiguation • To enable the creation of Links asserting a relationship between Public Identity records in IPM databases • That IPMs will be willing and able to provide • it is not expected that all IPMs will be able to provide all the elements • Feasibility is the biggest issue • Certainly for the demonstrator we will not be able to achieve what we might see as “the ideal solution”

  13. InterParty Common Metadata “standards” • Need to define rules and format conventions appropriate • The more standardised the Common Metadata (in terms, for example, of controlled “values”) the higher its value – but the higher its cost • To what extent will the “common metadata” need to adhere to common forms of semantic or syntactical expression? • Manual links: only to a limited extent, if its function is primarily for human interpretation • Automated links: algorithmically-based linking would require more standardised “common metadata”

  14. InterParty Metadata questionnaire • Unique identifier? Persistent? • Standard & variant name forms? • Party types? Corporate? Personal? • Pseudonyms - single or multiple public Ids? • Other standard identifiers? • Dates of birth/death? • Dates of incorporation? • Period of activity? • Address/contact details? • Works? • Roles? Author? Composer? Artist? • Associations? • Other distinguishing metadata? Nationality? Citizenship?

  15. InterParty Limitations to common metadata • Is the element there? • Is it held in a discrete field that could be mapped to a common metadata set? • Many elements missing from individual data sets, e.g. • addresses, dates, works, roles, etc. • Where data is held there is variable practice in how it is held, e.g. • works data contained within broader Notes fields • data held outside the “party file” as links within a given IPM’s working context - library authority files links to bibliographic files • Options for automated, algorithm-based linking are limited

  16. InterParty Common metadata - conclusions • There are unique party records to be linked • There are sufficient metadata elements in different systems to support judgements about links • Different databases bring different strengths - with potential for enrichment and more accurate identification • examples to be shown in demonstrator • Proposal for a reduced metadata set based on areas of greatest commonality • Assumption that the system is primarily a manual search/edit facility to support accurate identification - with Links themselves built via usage

  17. InterParty Proposed Common Metadata Set • PIDI • An identifier assigned to a public identity by an IPM that is unique within the domain of that IPM • The unique ID comprises Namespace/identifier to ensure it is unique on the network • Must be persistent, though the associated metadata will typically change • Standard Name • The standard, preferred or usual name by which a Public Identity is known • Party Type • Nature of the public identity • Categories: personal, corporate, unknown • Variant Names • different forms of names belonging to the public identity • Related Ids • may contain names of other public identities linked to the public identity within the IPM’s own system, e.g. pseudonyms • Date of birth • usually 4 digit year of birth Elements defined for the demonstrator system

  18. InterParty Proposed Common Metadata Set Elements defined for the demonstrator system • Works • Works with which the Public Identity is associated, represented by title • Accompanied by date & role of Public Identity if known • Address/contact details • permissible only where Party Type = Corporate • Notes • includes other identifiers, associations, roles, other metadata that may include works, dates, etc. where not recorded in a discrete field for display • InterParty Links themselves • Access to the Links is key element of Common Metadata • Currently, the only mandatory elements are expected to be the PIDI and a Name • BUT more data will be essential to support the task of identification

  19. InterParty InterParty Links

  20. InterParty InterParty’s added value proposition - InterParty Links • All metadata is ultimately about expressing relationships that someone claims to exist • e.g. Book A ‘has’ Author B • All the participating databases in InterParty will express such relationships internally • Sometimes the same relationships, sometimes different relationships • InterParty will create value to the extent that it enables new relationships to be expressed between databases... • e.g. Person X in Namespace A ‘is the same as’ Person Y in Namespace B • … and recorded as an “InterParty Link”

  21. InterParty Is the same person Establishing new relationships • The establishment of InterParty Links will require effort and judgement • hence the need for enough metadata to make a decision about relationships • These Links need to be recorded and made available to others to be of real value • therefore InterParty will need a facility to record the decision and store it as a Link record. • For the demonstrator this new Link data will be held centrally as part of the “aggregated metadata”

  22. InterParty PIDI Namespace A : Brian Green is PIDI Namespcae B : 876X5 InterParty Links • An InterParty Link is the assertion of a relationship between two PIDIs • Links may only be made by the owner of one of the PIDIs… • …and endorsed or disputed by the owner of the other PIDI • The assertion of a relationship between the two PIDIs is held in a single Link record • For the purposes of the demonstrator project, the relationships expressed in a Link will be restricted to “is”, “is complex” and “is not”

  23. InterParty Types of Relationship • PIDI 1 “is” PIDI 2 • It is asserted that PIDI 1 and PIDI 2 have a functional and reciprocal equivalence for the purposes of InterParty • PIDI 1 “is not” PIDI 2 • It is asserted that PIDI 1 does not have a functional equivalence with PIDI 2 despite appearances • PIDI 1 “has a complex relationship with” PIDI 2 • It is asserted that PIDI 1 has a partial equivalence or complex relationship with PIDI 2 that is not necessarily reciprocal

  24. InterParty What is a “Complex” assertion? • IPMs hold records for parties and names in different ways - there may not be easy one-to-one relationships between databases • IPM A assigns a single PIDI for Ruth Rendell, with a note that Barbara Vine is a pseudonym of Ruth Rendell • IPM B assigns separate PIDIs for both Ruth Rendell and Barbara Vine (with or without an internal assertion between them) • The “complex” relationship must be used • There are numerous other circumstances where IPMs may take a different approach to identification • It is not proposed to define all the relationships covered by “Complex” any more precisely for the demonstrator

  25. InterParty Establishing a Link • The status of a link will relate to how it is established and to what degree the two IPM owners have been involved. • There are 4 status types • “Proposed” • The relationship has been asserted by one IPM owner only • “Authorised” • Concurring assertions have been made by both IPM owners • “Disputed” • Assertions have been made by both IPM owners but they do not concur • “Inferred” • Generated automatically based on inference from “is” relationships only

  26. InterParty An inferred relationship IF is THEN is AND is PIDI NSC: 876X54 PIDI NSA:123456 PIDI NSB:Brian Green Inferred Links

  27. InterParty Outline of a Link Record • Link ID • a unique identifier for the Link Record • PIDI 1 and PIDI 2 • the Identifiers being linked • Link relationship • the relationship asserted (is, is not, is complex) • Link status • Proposed, Authorised, Disputed, Inferred • Link method • Manual or automatic • Link timestamp • when the record was created or last updated Elements defined for the Link Record

  28. InterParty Outline of a Link Record • Owner Assertion composite • a group of elements which record each Owner IPM’s assertion about the Link, including • Owner ID • PIDI owned • Owner assertion – used to set up or amend Link Relationship • Assertion comment – notes field • Asserted by – name of individual • Assertion timestamp • Comment composite • A group of elements to allow other IPMs to add further notes or comments to the record without directly affecting status of the assertion Elements defined for the Link Record

  29. InterParty “Common metadata” A A Metadata B “Resolution Service” B Metadata Metadata User Metadata C “InterParty Link records” C The InterParty network

  30. InterParty To summarise ... • A limited set of discrete metadata fields • The fields are mostly optional to allow for variable practices among IPMs • The system will be built on human judgements and interpretation of the metadata found in searches across the network • Adding links will make connections that will mutually enrich metadata in any two systems by associating the different strengths of the different systems • Adding links will be a simple procedure • illustrated by the demonstrator

  31. InterParty Thank you

More Related