490 likes | 773 Views
Bibliographic relations. Erik Thorlund Jepsen Library advisory officer The Danish Library Agency. Outline. Bibliographic relations: Definition Why are relations important in the bibliographic universe Typologies Relations and FRBR Utilization of relations in OPAC’s
E N D
Bibliographic relations Erik Thorlund Jepsen Library advisory officer The Danish Library Agency Erik Thorlund Jepsen
Outline • Bibliographic relations: Definition • Why are relations important in the bibliographic universe • Typologies • Relations and FRBR • Utilization of relations in OPAC’s • One record structures, Links, displays... • Examples • Work display in Bibliotek.dk • ’Non-bibliographic’ relations and links • Perspectives Erik Thorlund Jepsen
Relations: Definition • A relationship between information entities exist, when two entities are somehow associated with each other. (Velucci, 1997, p.105). Though semantically precise, this definition does not provide much guidance for the identification of relationships, since associations rely on subjective judgements, assessing the relevance of ‘connecting/relating’ two or more entities. • For convenience, the concept ”Bibliographic” is dealt with as ”being related to bibliographic entities and records” Erik Thorlund Jepsen
Importance of relations: FRBR-terms • Information about bibliographic relations between two or more bibliographic entities can support the user tasks ‘find’, ‘identify’ and select’. Relations stated somehow in a bibliographic record can potentially: • Improve users understanding of a given entity, which potentially strengthens the identification and selection/deselection of the entity. • Improve user’s options for finding relevant entities by leading the way from a known (found) entity too related entities which are more relevant in a given situation. Erik Thorlund Jepsen
Importance of relations • Furthermore, information about bibliographic relationships can strengthen the users understanding of the system (database) at hand and the knowledge organization in the system, by: • Creating groups of entities and • Facilitating navigation in the bibliographic universe (the database/Catalogue) Erik Thorlund Jepsen
Example: navigation Erik Thorlund Jepsen
Importance of linking: Danish example Analysis of searches in bibliotek.dk (20.506 searches - 20. December 2004) • Free-text 7% • Author 34% • Link – author 5% • Title 20% • Descriptor/keyword 11% • Link – descriptor/keyword; “more like this” and ”Literature about..”) 15% • Other (each max 2%) sum: 8% Source: Kirsten Larsen, Deputy Head, The Danish Library Centre (DBC) Erik Thorlund Jepsen
Importance - GFOD • User Principle: General guidelines for good practice in display design and criteria for effective screen displays as these relate to legibility, clarity, understandability and navigability • Content and Arrangement Principle 7. Support navigation from the displayed information to related information • (this principle is further divided into more specific, and i add ambitious, principles.) Erik Thorlund Jepsen
Bibliographic relationships: identification of • Associations/relations can be identified by analyzing: • Sets of documents • Existing information systems • Standards, rule sets and registration formats • Empirical studies of user’s identification - and assessment of importance – of associations among groups of entities Erik Thorlund Jepsen
Typologi Categories Holds between • Equivalence Relationships (copies, facsimiles, microforms and other similar reproductions) • Derivative Relationships (versions, editions, revisions, translations…) • Descriptive Relationships (annotated editions, commentaries, reviews…) • Whole-Part Relationships (selections from anthologies, collections, series, chapters vs. books…) • Accompanying Relationships (supplements, concordances, indexes…) • Sequential Relationships (sequels of a monograph, parts of a series) • Shared Characteristics Relationships (common author, publisher, title, subject…) Vellucci and Tillett Categories of Relations (Shortened description from Vellucci, 1997) Erik Thorlund Jepsen
Content relationships (equivalence, derivative and descriptive) are sometimes hard to distinguish in practice. Erik Thorlund Jepsen
FRBR and relations • Relationships depicted in the high level diagrams • Other Relationships between Group 1 entities at these levels: • Work-to-work • Expression-to-expression • Expression-to-work • Manifestation-to-manifestation • Manifestation-to-item • Item-to-item • Whole/Part at work, expression, manifestation and item Level • Not meant to be exhaustive! • Yet, relationships are mapped to user task (alongside attributes) Erik Thorlund Jepsen
FRBR - tasks • tofindentities that correspond to the user's stated search criteria (i.e., to locate either a single entity or a set of entities in a file or database as the result of a search using an attribute or relationship of the entity); • to identify an entity (i.e., to confirm that the entity described corresponds to the entity sought, or to distinguish between two or more entities with similar characteristics); • to select an entity that is appropriate to the user's needs (i.e., to choose an entity that meets the user's requirements with respect to content, physical format, etc., or to reject an entity as being inappropriate to the user's needs); • to acquire or obtain access to the entity described (i.e., to acquire an entity through purchase, loan, etc., or to access an entity electronically through an online connection to a remote computer). (Functional Requirements for Bibliographic Records, 1998, p.82) Erik Thorlund Jepsen
FRBR – additional tasks? • to relate................. A fifth task? “Even more, FRBR reminds us of the importance of bibliographic relationships, and reminds us that we describe things in the bibliographic universe in order to meet specific user tasks: ‘find,’ ‘identify,’ ‘select,’ ‘obtain,’ and i add ‘relate’” (Tillett, 2005, p. 198). • Yet, information about relationships supports the three tasks: to find, to identify and to select (e.g. supports collocation, which is seen as part of “to find”) • In other words, to relate is a sub task of to find, to identify and to select. • It could cause a breakdown of the model to incorporate “to relate” as a fifth task • To navigate….A fifth task • Yes probably Erik Thorlund Jepsen
Utilization • Three purposes when cataloguing information about relations and setting up system rules: • Identification and understanding of relation • Linking from found entity to related entities • Displaying meaningful/useful sets of records Erik Thorlund Jepsen
Relations expressed as links • Relations are expressed as implicit or explicit links, where explicit links are divided into ’directional’ and ’mechanical’ links (hyperlinks) (Velucci, 1997) • Hyperlinks are constructed by manual or computational means • Manual links are static and are commonly used to structure text’s or to connect associatively related entities (by topic) (and to connect bibliographic families – added by etj) • Computational links can be created at search time (dynamicality) and are primarily used to connect ’similar’ entities (e.g. based on shared characteristics – added by etj) (Agosti, 1997) Erik Thorlund Jepsen
Utilization and links: examples • One record structures (e.g. for accompanying relations) • Computational links for shared characteristics • Rules and codes (e.g. for derived relations) • Computational solutions for work display Erik Thorlund Jepsen
One-record structures Erik Thorlund Jepsen
Links: shared characteristics Erik Thorlund Jepsen
Rules and codes: example “Reuse+” • Widened use of specific field in Marc-formats to handle relations in a uniform way. • 787 Non-specific relationship entry (Repeatable)....and two subfields : • $w Record control number (target to link current record to) • $g Relationship information (textual; optional) Erik Thorlund Jepsen
Reuse+ (2) • To distinguish between the various relationships, and to make them specific, our simple model proposes the use of indicator 2 in 787, as yet undefined. This indicator might take on the following values (and here, a full-scale model would not have to differ): (in parentheses: DC Simple terms for relations) 0 Equivalence (facsimile or reproduction) (IsFormatOf) 1 Simultaneous edition (IsVersionOf) 2 Successive derivation, edition, version (IsVersionOf) 3 Amplification (incl. commentaries, illustrations, criticism etc.) (IsBasedOn) 4 Extraction (abridgements, condensations, excerpts) 5 Recordings of performances 6 Adaptation, modification (change of genre or medium, arrangement) (IsFormatOf) 9 Translations (IsVersionOf) a Accompanying relationship (supplements of any kind) (IsRequiredBy) p Part à whole relationship (IsPartOf) r Review or other descriptive relationship s Sequential relationship (like successive title of a serial) u Unspecific relationship, based on shared characteristics of other kinds (Eversberg, 1998) Erik Thorlund Jepsen
On collocating the work • “Most users seek particular works, not particular editions. Yet works are published in the form of editions; the fundamental duty of descriptive cataloguing is to organize the resulting chaotic bibliographic universe to facilitate user access to works, and to allow them easily to select the edition of the work sought that best meets their needs…” (Yee, 1997, p.64). Erik Thorlund Jepsen
Computational solutions for work display • FRBR Display Tool: Library of Congress: FRBR Display Tool was developed to transform bibliographic data found in MARC 21 record files into meaningful displays by grouping them into the work, expression and manifestation FRBR entities. Based on XML technologies, the tool may be altered to meet the needs of individual institutions. It also shows how the theoretical portion of the FRBR model can be used practically to allow librarians to evaluate the consistency of their local bibliographic data Erik Thorlund Jepsen
Work-display: Bibliotek.dk (The Danish Union Catalogue) • An example of an almost totally automatic initiative is the display of editions of a work in the Danish Union Catalogue “Bibliotek.dk” • Attributes like author and title are used in a best match algorithm to identify different editions of the work. • Due to, a high level of authority control and the use of original titles, the different expressions of a work will normally be collocated in the search result. Erik Thorlund Jepsen
bibliotek.dk - library.dk • End user version of the Danish Union Catalogue • Sponsored by The library Agency but maintenance and development by The Danish Library Centre (DBC) • Content: • The Danish national bibliography • all titles in public libraries and research libraries in Denmark • Content is not 100% equivalent to the Union Catalogue (availability matters) • Works together with a national transportation system – users can pick up books from every library at their own (chosen) library Erik Thorlund Jepsen
Adaptation of FRBR in bibliotek.dk • The records in bibliotek.dk represents manifestations (AACR2/danMARC2). • The aim is to present these records grouped according to the work they embody • At one point our definition differs from FRBR: For practical reasons we consider expressions in different language to be different ’works’. You could also say that in this case we prefer grouping according to the expression of the work. (Paul B. Jensen, Danish Library Center) Erik Thorlund Jepsen
Implementing the work concept • The work level display is based on matching and collocating manifestation records on-the fly • This match is based on simple author and title data in normalized form • From the work level you can expand to the manifestations, select one (or more) and make a request (Paul B. Jensen, Danish Library Center) Erik Thorlund Jepsen
Accomplishment • A more user-friendly interface (as confirmed by a majority of test-users) • A reduction of unnecessary inter-library loans, because it is easier to locate an edition to your local library (or libraries) (Paul B. Jensen, Danish Library Center) Erik Thorlund Jepsen
Challenges(read: problems) • In principle a traditional aacr2/marc-record does not specify which bibliographic information refers to work level and which to the expression/manifestation level • Many bibliographic items contains more than one work: • Collected plays in one volume (e.g. Shakespeare) • 3 novels in one volume • 3 symphonies on one cd • Etc. (Paul B. Jensen, Danish Library Center) Erik Thorlund Jepsen
Neglect or choose edition Erik Thorlund Jepsen
Show editions Show full record Erik Thorlund Jepsen
Other kinds of linkages • Author-pointers • citations, references and links • semantic equivalence (same as similarity below) • Use-determined • + frequencies • Similarity-based • Co-occurrence of ‘text’-elements • e.g. words in text, citations (bibliographic coupling) • Third part pointers • co-citations • articles, books .. Erik Thorlund Jepsen
Types: Author pointers Author Entities citing, linking to or referring to other entities Entity Entity Entities cited by, referred by or linked to by other entities Erik Thorlund Jepsen
Types: Use determined relations Entity User Entities bought or lent by same user Entity Erik Thorlund Jepsen
Example: use determined links RomanSuzanne BrøggerLinda Evangelista Olsen / Suzanne Brøgger4. oplag. - [Kbh.] : Gyldendal, 2002. - 134 siderKatten Linda formodes at være en reinkarnation af forfatterens mor, der selv var en kat. Og det passer godt nok på den tilværelse mor og kat har, og deres måde at påvirke omgivelserne på ....Tidligere: 1. udgave. 2001.Originaludgave: 2001.ISBN 87-00-48736-8 : hf. : kr. 175,00. Andre, der har lånt Suzanne Brøgger: Linda Evangelista Olsen, har også lånt:Suzanne Brøgger: JaSuzanne Brøgger: En gris som har været oppe at slås kan man ikke stegeSuzanne Brøgger: Creme fraicheSuzanne Brøgger: JadekattenSuzanne Brøgger: ToneJan Lyderik: Tangs saga. Bind 1-2 Erik Thorlund Jepsen
Similar entities Statistical based: e.g. vector space model using tf*idf weights Entity 1 Entity 2 Shared elements Erik Thorlund Jepsen
Similarity: example Similar pages Erik Thorlund Jepsen
Third part pointers • Co-citations • Similar to “others who has lent this book, has lent these materials” • But from an author/domain perspective • Example from Citeseer -> Erik Thorlund Jepsen
Citeseer example Abstract: Latent Semantic Indexing (LSI) is a technique for representing documents, queries, and terms as vectors in a multidimensional real-valued space. The representations are approximations to the original term space encoding, and are found using the matrix technique of Singular Value Decomposition. In comparison, Multidimensional Scaling (MDS) is a class of data analysis techniques for representing data points as points in a multidimensional real-valued space. The objects are represented so that... (Update) Cited by: More Automated Modeling and Nonlinear Axis Scaling - Leejay Wu (2005) (Correct) Similar documents (at the sentence level): 8.5%: Optimizing Ranking Functions: A Connectionist Approach to.. - Bartell (1994) (Correct) Active bibliography (related documents): More All 0.2: A Survey of Information Retrieval and Filtering Methods - Faloutsos, Oard (1996) (Correct) 0.2: Document Space Models Using Latent Semantic Analysis - Gotoh, Renals (1997) (Correct) 0.2: Approximating Matrix Multiplication for Pattern Recognition Tasks - Cohen, Lewis (1997) (Correct) Similar documents based on text: More All 0.5: Chapter 15: Getting Better Results With Latent Semantic Indexing - Nakov (2000) (Correct) 0.4: Image Retrieval using Latent Semantic Indexing - Pecenovic (1997) (Correct) 0.4: On the Use of Singular Value Decomposition for Text Retrieval - Husbands, Simon, Ding (2000) (Correct) Related documents from co-citation: More All 8: Personalized information delivery: an analysis of information filtering methods (context) - Foltz, Dumais - 1992 6: Indexing by latent semantic analysis - Deerwester, Dumais et al. - 1990 5: Term-Weighting Approaches in Automatic Text Retrieval (context) - Salton, Buckley - 1988 Co-citation + threshold Erik Thorlund Jepsen
Perspectives: Designing OPAC’s and integrated search tools: according to relations • A lot of possibilities – lots of types of relationships to display and utilize in different ways: • Bibliographic families; Shared characteristics; Whole-part and other bibliographic relations • Similarity (statistical); Co citations; User defined (co use) • A.o. • Need for carefull design of system features/link structures and a lot of testing (not only emploing user satisfaction but essentially improved search results) • In other words: Pick the functionalities that works for the user – not the ones you like or are familiar with • Yet, data are essential and data decides the value of functionalities Erik Thorlund Jepsen