280 likes | 291 Views
Learn about the structure and procedures of filing and word breaking in Aleph, from the pre-14.x version to 14.1 onwards. Discover the ready-made components provided by Aleph and the tables that identify and define the procedures.
E N D
Filing and Word Breaking Procedures
Session Agenda • Pre-14.x • tab_word_breaking table • Structure • Procedures • Special remarks • tab_filing table • Structure • Procedures
Pre-14.x • Various filing and word breaking procedures existed. Each procedure included many parts, but was a closed box. • Each procedure was assigned a code, such as B1, B5, C1, A3, AM, etc. • Each procedure was a separate program, requiring new program development to create new procedures. For example, there was no A3 + AM filing procedure.
From 14.1 onwards • ALEPH provides ready-made components (programs) for creation of filing and word breaking procedures • /tab/tab_word_breaking - • an ALEPH table which identifies word breaking procedures and defines their component parts • / tab/tab_filing - a table which identifies filing procedures and defines their component parts
tab_word_breaking • /tab/tab_word_breaking - • is an ALEPH table which identifies word breaking procedures and defines their component parts. • Each word breaking procedure is made up of a group of one or more programs.
tab_word_breaking • 1 2 3 4 • !!-!-!!!!!!!!!!!!!-!!!!!!!!!!!!!!!!!!!!!!!!!!!!! • 03 L abbreviation • 03 L numbers • 03 L compress - • 03 L to_blank !@#$%^&*()_+={}[]:";'<>,.?/|\ • col.1: procedure identifier • col.2: alpha of the text • col.3: procedure name • col.4: procedure parameters
Procedures (1) • compress • Strips characters listed in col. 4 • delete_subfield • Changes sub-field sign (e.g., $$x) • to blank • to_blank • Changes characters listed in col. 4 • to blanks
Procedures (2) • subf_to_sign • Changes second and subsequent • sub-field signs to the single character listed in col. 4 • blank_to_carat • Changes blanks to carat (^) • marc21_41 • 041 for separating languages in MARC21 field 041
Procedures (3) • Abbreviation • Compresses a dot between single characters (e.g., I. B. M. changes to I B M; I.B.M. changes to IBM) • Numbers • Compresses a comma and a dot between numbers (e.g., 2,153 changes to 2153)
Procedures (4) • IMPORTANT NOTE • The procedures must be listed in logical order. For example, numbers must be listed before compress or change_to_blank if a comma or a dot is included inthem. • Otherwise, they will no longer be present when the numbers procedure is used.
Procedures (5) • Reminder • Word breaking procedures are used in tab11, section W. A line can be listed several times in tab11, in order to index it multiple times, with different word breaking each time. • For example, an apostrophe: • O’hara Ohara O hara • 11 W 100## abcdq 01 B WRD WAU • 11 W 100## abcdq 04 B WRD WAU
unicode_to_word_gen • Word indexing routines, as well as retrieval routines, use the table defined under instance WORD-FIX in ./alephe/unicode/tab_character_conversion_line. The table is traditionally called unicode_to_word_gen.
unicode_to_word_gen • This table defines equivalencies for characters, for the purpose of creating words in the words file. • All characters naturally retain their unicode value, and are stored in the system in UTF encoding. In order to translate one character into another character (e.g. translating an accented "e" to "e"), you can set an equivalency. The equivalency can be up to 5 characters: • 00E6 0061 0065 #LATIN SMALL LETTER AE
unicode_to_word_gen • The library's tab_word_breaking table can define different treatment for the same characters. In separate procedures specific characters can be set to compress or to be changed to blank. Characters dealt with in this manner should be left in their natural value, and not translated in this table. • For example, you might want an apostrophe to be considered like a blank, like itself, and as if it were not there at all (e.g. o'hara, ohara). In order to be • able to set the apostrophe in tab_word_breaking as both as a compressed character, it must retain its natural value, and NOT be translated in this table.
Special Remarks • 2. When browsing a word index in the OPAC, special characters are always displayed in their converted state. • I.e., if unicode_to_word_gen table sets umlaut to ue, the word will be displayed with ue, and not with an umlaut.
tab_filing - Example • 01 L del_subfield • 01 L to_lower • 01 L abbreviation • 01 L suppress • 01 L compress ' • 01 L to_blank !@#$%^&*()_+- ={}[]:";<>?,./~` • 01 L mc_to_mac • 01 L pack_spaces • 01 L char_conv FILING-KEY-01 • 01 C chi
tab_filing - Structure • 1 2 3 4 • !!-!-!!!!!!!!!!!!!!!!!!!!-!!!!!!!!!!!!!!> • 01 L compress ’ • 01 L char_conv FILING-KEY-01 • col.1: procedure identifier • col.2: alpha of the text • col.3: procedure name • col.4: procedure parameters
tab_filing Procedures (1) • compress • Strips characters listed in col. 4 • (e.g., ()[]:,) • delete_subfield • Changes subfield sign to blank • (e.g., $$x) • to_blank • Changes characters listed in col. 4 to blanks
tab_filing Procedures (2) • to_lower • Changes all characters to lower case • to_carat • Changes subfield sign to two carat (^^) signs in order to achieve hierarchical sorting of headings • suppress • Suppresses all text contained within <<…>>, as well as the signs themselves
tab_filing Procedures (3) • expand_num • For filing numbers numerically, adds leading zeroes to numbers to fixed length of 7 (e.g. 17 -> 0000017) • mc_to_mac • Changes initial “mc” to “mac” (for interfiling McKay and MacKay) • non_filing • Suppresses initial text according to non-filing indicator defined in tab11
tab_filing Procedures (4) • compress_blank • Strips blanks (e.g. ISBN) • numbers • Compresses a comma and a dot between numbers (e.g., 2,153 • changes to 2153) • non_numeric • Deletes all non-numeric characters (for ISBN, ISSN)
tab_filing Procedures (5) • abbreviation • Compresses a dot between single characters (e.g., I. B. M. changes to I B M, I.B.M. changes to IBM) • build_filing_key_lc_call_no • Special procedure for correct sequencing of LC call numbers
tab_filing Procedures (7) • char_conv • Translates one character for another (up to 5), using the char_conv procedure listed in the matching line of the tab_character_conversion_line in alephe/unicode • For example: • 01 L char_conv FILING-KEY-01 • refers to the line • FILING-KEY-01 ##### # line_utf2line_sb unicode_to_filing_01
unicode_to_filing_nn_source • This table is used for character conversion for filing. The table must be processed using UTIL P/3 in order to create the unicode_to_filing_nn table. • This latter table is the one actually used by the system. It performs an additional translation in order to remove null characters.
unicode_to_filing_01_source • Examples: • Latin capital letter AE: • 00C6 0041 0045 • Small letter sharp s: • 00DF 0053 005A
IMPORTANT NOTE • The procedures must be listed in logical order. • For example: • numbers must be listed before compress or change_to_blank • if comma or dot are included inthem. • Otherwise, they will no longer be present when the numbers procedure is used.
./tab/tab_filing - usage • Filing procedures are used when building filing key for headings (Z01), index entries (Z11) and sort keys (Z101)
./tab/tab_filing - usage • Note: if no procedure for creation of sort keys • has been defined in tab01.lng, the system will use the default filing procedure 99. • Filing procedure 99 MUST be defined tab_filing, as far as it installs the default sort order.