620 likes | 634 Views
Learn about Headings Index structure, creation, performance, and more. Explore concepts like Z01 record and Display Text in this informative presentation.
E N D
Understanding Indexes: Headings www.exlibrisgroup.com Prepared by Marina Spivakov, 2002; Updated by Jerry Specht, June 2003
Scope of the Lecture Points for discussion in each index: Index structure (Oracle tables) Specifying index Index creation and update Performance issues Understanding Indexes
NOTE: This Power Point discusses Headings in 14.2 (and in general). It is supplemented by a NAAUG.INDEX_CH.ppt which follows and which describes Headings features new in 15.2. Understanding Indexes
Where to get your own copy: Both this Power Point presentation and the following may be found on the US documentation server ( http://support.exlibris-usa.com/D ) in the NAAUG_Indexes_2003 directory. Understanding Indexes
Headings Index Understanding Indexes
Headings Index • Headings indexes are whole phrases from the record such as author, title, subject, publishers, etc. Understanding Indexes
Database Tables • Heading index: • Z01 – phrase dictionary • Z02 – pointers to the documents Understanding Indexes
Filing text (stripped sub-fields, stripped punctuation, add leading zeros to numeric fields, character conversion etc.) Z01 Z01 record unique identifier, link to other records Authority link Display text
Z01- Z02 link Z02 record Z01 record Bibliographic record
How to Define the Headings Index? • Tables to remember • tab00.lngdefines system index codes & filing procedures • tab11defines connections between the bibliographic record fields and the indexes • tab_filing defines filing procedures • tab_expanddefines expand procedures which have to be activated when index is created • tab_character_conversion_line defines character conversion routines • unicode_to_filing_nn character conversion table used for normalization of headings
How to Define the Structure of the Headings Index – Interrelation of Tables tab00.lng tab11 tab_filing tab_expand
Z01:Display Text and Filing TextUseful Details Understanding Indexes
Z01 – Display Text and Filing Text Display text - data for the display text is taken directly from the record. Filing text– data undergoes filing and character conversion processing. Understanding Indexes
Bibliographic document 1 Bibliographic document 2 z01 z01 Z01 –Display Text If two records generate headings that have a common filing text but different display texts, the system will create two headings, not one.
Z01 –Display Text In order to achieve normalization of headings in 14.2, the headings themselves must be changed to the same form. The only exception is the suppression of end punctuation, specified in tab00.eng: tab00.lng, col.4: 0 - no suppression 1 or space - suppress punctuation at the end each sub-field when creating a Z01 heading. Understanding Indexes
Z01 –Display Text Normalization Bibliographic document 1 Bibliographic document 2 Tab00.lng z01 z01
Z01 –Display Text Normalization Bibliographic document 1 Bibliographic document 2 Tab00.lng z01 NOTE: Version 15 allows more advanced normalization of headings.
Normalization of Headings – Cataloger’s Assistant Detect Similar Headings (p_manage_26)reports headings which differ in display text only, i.e. headings which are the same except for punctuation and case differences. Example of output file
Normalization of headings – Cataloger’s Assistant Correct allows you to: • Discover inconsistencies • Change bibliographic documents without going to Cataloguing module • Reindex the documents (creates Z07)
Filing of Headings • Headings are filed (organized in the index, sorted) according to the filing text of the heading. • Data for the filing text field is processed in two ways: • Text goes through the appropriatefiling routine. • Characters go throughcharacter conversion. Understanding Indexes
Filing Routines From version 14, the filing routines are made up of a group of individual procedures. Filing routines are defined in tab_filing: Understanding Indexes
tab_filing - Structure • 1 2 3 4 • !!-!-!!!!!!!!!!!!!!!!!!!!-!!!!!!!!!!!!!!> • 01 # compress ’ • 01 # char_conv FILING-KEY-01 • Col.1: procedure identifier • Col.2: alpha of the text • Col.3: procedure name • Col.4: procedure parameters Understanding Indexes
Examples of Filing Procedures • compress • Strips characters listed in col. 4 • (e.g., ()[]:,) • delete_subfield • Changes subfield sign to blank • (e.g., $$x) • to_blank • Changes characters listed in col. 4 to blanks. Understanding Indexes
Examples of Filing Procedures • to_lower • Changes all characters to lower case. • to_carat • Changes subfield sign to two caret (^^) signs in order to achieve hierarchical sorting of headings. • suppress • Suppresses all text contained within <<…>>, as well as the signs themselves. Understanding Indexes
Examples of Filing Procedures • expand_num • For filing numbers numerically, adds leading zeroes to numbers to fixed length of 7 (e.g. 17 -> 0000017). • mc_to_mac • Changes initial “mc” to “mac” (for interfiling McKay and MacKay). • non_filing • Suppresses initial text according to non-filing indicator defined in tab11. Understanding Indexes
Examples of Filing Procedures • compress_blank • Strips blanks (e.g. ISBN). • numbers • Compresses a comma and a dot between numbers (e.g., 2,153 • changes to 2153). • non_numeric • Deletes all non-numeric characters (e.g. for ISSN). Understanding Indexes
Examples of Filing Procedures • abbreviation • Compresses a dot between single characters (e.g., I. B. M. changes to I B M, I.B.M. changes to IBM). • build_filing_key_lc_call_no • Special procedure for correct sequencing of LC call numbers. Understanding Indexes
Examples of Filing Procedures • char_conv • Performs character conversion. • Characters can be: • - filed as themselves • - ignored • - converted to spaces or to one or more different characters. • Examples. • ue (0075 0065) • ü (00FC) • u (0075) • &(0026)and(0041 004E 0044)
Examples of Filing Procedures – Character Conversion • tab_filing • 01 # char_conv FILING-KEY-01 • $alephe_unicode/ • tab_character_conversion_line • FILING-KEY-01 ##### # line_utf2line_sb unicode_to_filing_01 • FILING-KEY-02 ##### # line_utf2line_sb unicode_to_filing_02 • FILING-KEY-03 ##### # line_utf2line_sb unicode_to_filing_03 • $alephe_unicode/
Character Conversion Tables • unicode_to_filing_nnis the one actually used by • the index creation process. • unicode_to_filing_nn_source - raw material, • ‘human interface’ for character conversion • definitions. All the editing has to be done in this table. • Process unicode_to_filing_nn_sourceusing • UTIL P/3 in order to createunicode_to_filing_nn • UTIL P/3 performs an additional translation in order • to remove null characters.
changes characters specified in col.4 to blank compresses a comma and a dot between numbers. IMPORTANT NOTE • The procedures must be listed in the logical order. • For example, the following setup is not logical: • ‘2,153’has to be turned into‘2153’bynumbers • But here, it will first be changed to ‘2 153’byto_blank
Filing of Headings – Putting it Together… tab00.lng tab_filing tab_character_conversion_line FILING-KEY-01 ##### # line_utf2line_sb unicode_to_filing_01 unicode_to_filing_01
Index Creation and Update • The headings index is : • Created by p_manage_02 • Enriched by ue_08 • Updated by ue_01 Note : In the authority libraries the headings are created when the document is updated, before ue_01 indexes it. Understanding Indexes
Maintenance of the Browse • Index : • -Alphabetize long headings • - Resequencing • - Delete unlinked headings Understanding Indexes
What are Long Headings? • z01-filing-sequence = 69* characters • z01-display-text = 2000 characters • * “Effective” length = 34 characters with double-byte • p_manage_17 (Alphabetize Long • Headings) sorts those headings whose • display text is longer than 69 characters. Understanding Indexes
Alphabetize Long Headings • Before p_manage_17… • After p_manage_17… Understanding Indexes
Alphabetize Long HeadingsHow does it work? util-g-2 Last heading (z01) indexed by p_manage_02 or ue_01 Last heading (z01) processed by p_manage_17 START: last-acc-number FINISH: last-long-acc-number
When to run p_manage_17? • p_manage_17 must be run periodically (e.g. daily) in order to alphabetize long headings that were added since the last time this function was run. Understanding Indexes
If the rules for filing text creation have been changed… • Runp_manage_16(Alphabetize Headings - Setup ) • p_manage_16 recreates filing text Understanding Indexes
Unlinked Headings • What are unlinked headings? • These are headings which do not have pointers to documents (Z01s without corresponding Z02s). • How are unlinked headings created? • When a heading is modified, the existing Z01 is NOT updated. Instead, the Z02 record linking the heading to the bib record is deleted and a NEW Z01 record with a new Z02 is created. Thus, “orphaned”, outdated Z01s can accumulate. Understanding Indexes
Unlinked Headings • How to delete unlinked headings? • Run p_manage_15 (Delete Unlinked Headings) periodically. NOTE: The job does not delete Z01 records which have an authority link. This is in order to keep the cross-references, which are not linked to the documents directly (do not have attached Z02 records). Understanding Indexes
Performance Issues Understanding Indexes
Performance Issues • In order to display the browse list the system must count the documents which are connected to a heading (Z02 records attached to Z01). Understanding Indexes
Performance Issues • Pre-14.2 ALEPH: p_manage_10updates Z01 (z01_number_of_doc)with the number of documents available for each heading. • 14.2 and higher: the system allows extensive use of base and denied records (per user profile) functionality.That is why browse list can benefit from beingpre-filtered.The system counts thenumber of records on the fly. Understanding Indexes
Performance issues • How to speed up z02 count when the headings are displayed? Count limit. A heading with records greater than the number defined in this counter will display with + rather than the number itself Understanding Indexes
Base Filtered Headings (Z0102) Understanding Indexes
Z0102 Pre-14.2 – Problem: The smaller a logical base is, the more work the system has to do in order to find 20 headings which are in the base to show in the Browse list display. Solution: There is a new index Z0102 which ‘divides’ Z01 into sections in accordance with the existing logical bases. Understanding Indexes
Z0102 Example of Z0102 record: Understanding Indexes
Z0102 Structure Z0102 record is built for each Z01 in a logical base, giving the filing text and sequence. The record does not include pointers to the doc records; this is still done by Z02. Z01 Z0102
Z0102 When a logical base is being browsed, the system uses the Z0102 table to “decide” whether to display the heading (Z01) without having to retrieve the documents attached to the heading, read them, and then “decide”. Understanding Indexes