210 likes | 385 Views
Subject Analysis: Computer Assisted Indexing. Bekele Negeri INIS Unit Nuclear Information Specialist (Adapted from A. Nevyjel’s presentation). 07 – 11 October 2013 Vienna, Austria. Subject Indexing Tools. There are two main INIS products used for indexing: WinFibre and CAI
E N D
Subject Analysis:Computer AssistedIndexing Bekele Negeri INIS Unit Nuclear Information Specialist (Adapted from A. Nevyjel’s presentation) 07 – 11 October 2013 Vienna, Austria INIS Training Seminar
Subject Indexing Tools There are two main INIS products used for indexing: WinFibre and CAI • WinFibre – for input preparation both bibliographic and subject indexing • CAI(Computer Assisted Indexing) – for subject classification and indexing INIS/ETDE Thesaurus and INIS Subject Category Codes are incorporated in both. INIS Training Seminar
Indexing with FIBRE INIS Training Seminar
Computer-assisted Indexing - CAI • Kick-off Meeting Jan 2004 • Implementation and Customisation Jun 2004 • Production Indexing from Jun 2004 ongoing • CAI version 1.0 final acceptance Aug 2004 • Tuning of the system from Aug 2004 ongoing • CAI batch processing for Member States Dec 2004 • CAI online from remote for MS Nov 2007 INIS Training Seminar
CAI Thesaurus Extension • Thesaurus • Valid Descriptors 22,051 • Forbidden Terms 8,675 • Total 30,726 • CAI • Hidden Terms ~35.000 Terminological Knowledge Base INIS Training Seminar
CAI Thesaurus extension “Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. • handled similar to “forbidden terms” with one or more USE relations • CAI internal only • not exported to INIS production system • not exported to FIBRE • not printed in any appearance of the thesaurus • support identification of descriptors in the free text INIS Training Seminar
Hidden Terms: Compounds and Isotopes Descriptor hidden term free text MAGNESIUM BORIDES MgB_2 MgB2 ACETIC ACID C_2H_4O_2 C2H4O2 CESIUM 137 Cesium 137, Cesium-137 "1"3"7cs 137Cs 137 caesium 137 Caesium, 137-Caesium caesium 137 Caesium 137, Caesium-137 137 cesium 137 Cesium, 137-Cesium 137 cs 137 Cs, 137-Cs s 137 Cs 137, Cs-137 cs"1"3"7 Cs137 cs137 Cs137 INIS Training Seminar
Hidden Terms: Elementary Particles and countries Descriptor hidden term free text ELECTRON NEUTRINOS #nu#_e νe MUON NEUTRINOS #nu#_#mu# νμ TAU NEUTRINOS #nu#_#tau# ντ RHO-770 MESONS #rho#-770 ρ-770 OMEGA-782 MESONS #omega#-782 ω-782 Country Names: CAMBODIA kampuchea COTE D'IVOIRE ivory coast GREECE hellas MYANMAR burma THAILAND siam INIS Training Seminar
Hidden Terms: UK/US Spellings Descriptor hidden term A CENTERS a centres ACTIVITY METERS activity metres ANALOG COMPUTERS analogue computers ANESTHESIA anaesthesia ARCHAEOLOGY archeology AUSTRIAN ORGANIZATIONS austrian organisations BALLISTIC MISSILE DEFENSE ballistic missile defence BAYARD-ALPERT GAGES bayard-alpert gauges BEAM ANALYZERS beam analysers BEHAVIOR behaviour CATALOGS catalogues INIS Training Seminar
Hidden Terms: Other Spellings Descriptor hidden term Singular/Plural FUNGI fungus FUNGI funguses G MATRIX g matrices G MATRIX g matrixes Reverse Sequence ATOM-MOLECULE COLLISIONS atom-molecule scattering ATOM-MOLECULE COLLISIONS molecule-atom scattering ATOM-MOLECULE COLLISIONS atom-molecule reactions ATOM-MOLECULE COLLISIONS molecule-atom reactions ATOM-MOLECULE COLLISIONS atom-molecule interactions ATOM-MOLECULE COLLISIONS molecule-atom interactions INIS Training Seminar
Further Improvements necessary • “+” and “-“ signs • K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS • Case sensitivity • TiN TIN (instead of TITANIUM NITRIDES) • gas GALLIUM SULFIDES • “…who is the …” WHO (World Health Organization) • Verbs versus Nouns • “… this leads us to …” LEAD • “… this leaves it ….” LEAVES • Homographic terms • Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS • Nuclear Reactions, e.g. 14N(γ,α)10B • Targets • Beams • Reactions INIS Training Seminar
INDEXING PROBLEMS • General terms (energy, physics, materials, uses etc. • Misleading CAI suggestions: • Thesaurus terms: PRODUCTIONandPARTICLE PRODUCTION SOLUTIONandMATHEMATICAL SOLUTION IGNITIONandTHERMONUCLEAR IGNITION WALLS andTHERMONUCLEAR REACTOR WALLS PLANTSandNUCLEAR POWER PLANTS MEMBRANES (classic) andmembrane (in brane theory) COLORandCOLOR MODEL (elementary particle characteristics) TRANSPORT, etc. INIS Training Seminar
INDEXING PROBLEMS • chemical compounds/ case sensitivity/homonyms: INDIUM IONS for “in ions” ASTATINE 200 for at 200oC VISIBLE RADIATION for light (weight) HELIUM 6 for “consisting of 6 He 3 tubes” VISIBLE RADIATION for “light weight” • temperature, pressure, etc. range • abbreviations: TNA for Thermal Neutron Analysis and TRINONYLAMINE MPA for Maximum Permissible Activity MPa (Mega Pascal) INIS Training Seminar
CAI Batch used by China Czech Republic (seldom) Georgia (only in 2012) Germany Iran Uzbekistan Vietnam CAI Online in use by Austria Bulgaria Cuba Israel (registering) Japan Mexico Netherlands (seldom) Uruguay CAI online for Member Statesintroduced in July 2007 CAI online and CAI batch are now regular services for Member States INIS Training Seminar
CAI Batch and Online Processing • Input: MemSt-CC-yymmdd-xxxxxxxxxxx • MemSt is a standard prefix (meaning “member state”) • CC is the country code • yymmdd is the date when the file was generated • xxxxxxxxxxx is any additional identification • Examples • MemSt-AR-041203-thisismytestfile • MemSt-FR-041212-fileidentification INIS Training Seminar
CAI Batch Processing • Output: _MemSt-CC-yymmdd-xxxxxxxxxxx • These files will carry the CAI suggested descriptors in tag 800, preceded by the string ##CAI suggestions##; • Example: • 800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2; DESCRIPTOR3; ……. • sent back to the member state for reviewing INIS Training Seminar
CAI Batch and Online ProcessingReviewing Process • Delete all suggested descriptors which are too general • Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature ranges,... • nuclear reactions • chemical compounds, alloys, etc. • CAI is cleaning up BT/NTs clean up BT/NTs from manual additions • Clean up suggestions from homographic terms INIS Training Seminar
CAI Batch and Online ProcessingFinalisation Process CAI batch • When reviewing of the record completed:Delete “##CAI suggestions## “ • When reviewing of all records completed: Submit file to “INIS Input Box” CAI online • When reaching the last record:press “export and exit” button • File goes directly to INIS production system, or if required, sent back to Member State for reviewing INIS Training Seminar
Thank you! INIS Training Seminar