1 / 29

INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing. Alexander Nevyjel Head, Content Management Group. 23 – 27 November 2009 Vienna, Austria. Introduction to Subject Analysis.

evan-ayers
Download Presentation

INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INIS Training SeminarSubject Analysis, Thesaurus undComputer Assisted Indexing Alexander Nevyjel Head, Content Management Group 23 – 27 November 2009 Vienna, Austria INIS Training Seminar

  2. Introduction to Subject Analysis • Subject Analysis should be carried out whenever possible by subject specialists with a good knowledge of the subject matter and a familiarity with the subject analysis tools of the respective database (subject categories, thesaurus, subject analysis rules) • Steps of Subject Analysis • subject classification • abstracting • subject indexing INIS Training Seminar

  3. Subject Classification • The main topic of the document determines the primary subject category • If there are other significant topics, one or more secondary subject categories can be assigned in addition INIS Training Seminar

  4. Abstracting • Each input item should contain an English abstract(exception: short communications) • Abstracts in other languages are optional • If an author abstract is available, it should be checked by the subject specialist, and edited, if necessary • An abstract should be as informative as possible • Emphasize what is novel about the information in the original document INIS Training Seminar

  5. Thesaurus „A thesaurus is aterminological control deviceused intranslatingfrom thenatural languageof documents, indexers or users into a more constrainedsystem language. It is a controlled and dynamic vocabulary ofsemantically and generically related termswhich covers aspecific domain of knowledge“ This definition has been adopted by UNESCO „Guidelines for the establishment and development of monolingual thesauri“, UNESCO, SC/W/255, Paris, September 1973 INIS Training Seminar

  6. The Thesaurus and its Structure Relationship Sy Cross reference hierarchical BT broader term (level 1, 2,...) hierarchical NT narrower term (level 1, 2,...) affinitive RT related term preferential UF used for (reciprocally USE ...) preferential UF+ used for multiple (reciprocally USE ... AND ...) preferential SF seen for (reciprocally SEE ... OR ...) INIS Training Seminar

  7. Subject Indexing Subject indexing means analysing the information content of a piece of literature and expressing the meaningfull information content in the language of the database using the controlled vocabulary of the Thesaurus • Understanding of the content --> subject specialist • Familiarity with Thesaurus and indexing rules • Select a set of descriptors that describes the subject content of the piece of literature INIS Training Seminar

  8. Procedures for Indexing • Carefully read the title and abstract and scan the body of the piece of literature • scan the full text (introduction, table of content, tables, graphs, figures, conclusion) to find information items missing from the abstract or requiring more precision • Identify the concept(s) about which the piece of literature contains useful information • Translate the concepts into descriptors • Avoid overindexing INIS Training Seminar

  9. Proposed Terms (Technical Note 175) If no suitable descriptor exists in the Thesaurus for the retrieval of a usefull concept, make a proposal for a new one, containing the following: • Proposed term • Proposed word block of the term (in particular proposed BTs) • Potential forbidden terms pointing to this proposed descriptor • Scope note when appropriate • Explanation and justification for the proposal • One or more sample records INIS Training Seminar

  10. The purpose of subject indexing is to enable useful retrieval INIS Training Seminar

  11. Computer-assisted Indexing - CAI • Kick-off Meeting Jan 2004 • Implementation and Customisation Jun 2004 • Production Indexing from Jun 2004 ongoing • CAI version 1.0 final acceptance Aug 2004 • Tuning of the system from Aug 2004 ongoing • CAI batch processing for Member States Dec 2004 • CAI online from remote for MS Nov 2007 INIS Training Seminar

  12. CAI Thesaurus extension “Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. • handled similar to “forbidden terms” with one or more USE relations • CAI internal only • not exported to INIS production system • not exported to FIBRE • not printed in any appearance of the thesaurus • support identification of descriptors in the free text INIS Training Seminar

  13. Hidden Terms: Compounds Descriptor hidden term free text MAGNESIUM BORIDES MgB_2 MgB2 MAGNESIUM CARBONATES MgCO_3 MgCO3 MAGNESIUM HYDRIDES MgH_2 MgH2 IRON BROMIDES iron dibromide IRON BROMIDES iron tribromide ARSENIC IONS As"3"- As3- ACETYLENE C_2H_2 C2H2 ACETALDEHYDE C_2H_4O C2H4O ACETIC ACID C_2H_4O_2 C2H4O2 approx. 1400 hidden terms (expected 3000) INIS Training Seminar

  14. Hidden Terms: Isotopes Descriptor hidden term free text CESIUM 137 Cesium 137, Cesium-137 "1"3"7cs 137Cs 137 caesium 137 Caesium, 137-Caesium caesium 137 Caesium 137, Caesium-137 137 cesium 137 Cesium, 137-Cesium 137 cs 137 Cs, 137-Cs s 137 Cs 137, Cs-137 cs"1"3"7 Cs137 cs137 Cs137 CESIUM 138 "1"3"8"mcs 138mCs cs"1"3"8"m Cs138m approx. 22.400 hidden terms INIS Training Seminar

  15. Hidden Terms: Elementary Particles Descriptor hidden term free text B QUARKS bottom quarks T QUARKS top quarks ELECTRON NEUTRINOS #nu#_e νe MUON NEUTRINOS #nu#_#mu# νμ TAU NEUTRINOS #nu#_#tau# ντ RHO-770 MESONS #rho#-770 ρ-770 OMEGA-782 MESONS #omega#-782 ω-782 KAONS NEUTRAL K"0 K0 KAONS NEUTRAL SHORT-LIVED K"0_S K0S KAONS NEUTRAL LONG-LIVED K"0_L K0L approx. 300 hidden terms INIS Training Seminar

  16. Hidden Terms: UK/US Spellings Descriptor hidden term A CENTERS a centres ACTIVITY METERS activity metres ANALOG COMPUTERS analogue computers ANESTHESIA anaesthesia ARCHAEOLOGY archeology AUSTRIAN ORGANIZATIONS austrian organisations BALLISTIC MISSILE DEFENSE ballistic missile defence BAYARD-ALPERT GAGES bayard-alpert gauges BEAM ANALYZERS beam analysers BEHAVIOR behaviour CATALOGS catalogues approx. 800 hidden terms INIS Training Seminar

  17. Hidden Terms: Diacritics and Countries Descriptor hidden term Diacritics: BAECKLUND TRANSFORMATION backlund transformation BRUECKNER MODEL bruckner model BRUNSBUETTEL REACTOR brunsbuttel reactor MOESSBAUER EFFECT mossbauer effect Country Names: CAMBODIA kampuchea COTE D'IVOIRE ivory coast GREECE hellas MYANMAR burma SYRIA syrian arab republic THAILAND siam approx. 250 hidden terms INIS Training Seminar

  18. Hidden Terms: Other Spellings Descriptor hidden term Singular/Plural FUNGI fungus FUNGI funguses G MATRIX g matrices G MATRIX g matrixes Reverse Sequence ATOM-MOLECULE COLLISIONS atom-molecule scattering ATOM-MOLECULE COLLISIONS molecule-atom scattering ATOM-MOLECULE COLLISIONS atom-molecule reactions ATOM-MOLECULE COLLISIONS molecule-atom reactions ATOM-MOLECULE COLLISIONS atom-molecule interactions ATOM-MOLECULE COLLISIONS molecule-atom interactions approx. 900 hidden terms INIS Training Seminar

  19. CAI Thesaurus Extension • Thesaurus • Valid Descriptors 21.826 • Forbidden Terms 9.009 • CAI • Hidden Terms 34.381 • Total 65.216  Terminological Knowledge Base INIS Training Seminar

  20. Further Improvements necessary • “+” and “-“ signs • K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS • Case sensitivity • TiN TIN (instead of TITANIUM NITRIDES) • gas  GALLIUM SULFIDES • “…who is the …”  WHO (World Health Organization) • Verbs versus Nouns • “… this leads us to …”  LEAD • “… this leaves it ….”  LEAVES • Homographic terms • Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS • Nuclear Reactions, e.g. 14N(γ,α)10B • Targets • Beams • Reactions INIS Training Seminar

  21. CAI-Workflow Batch Mode Interactive CAI Processing Conventional Processing INIS Training Seminar

  22. INIS Training Seminar

  23. CAI Batch and Online Processing • Input: MemSt-CC-yymmdd-xxxxxxxxxxx • MemSt is a standard prefix (meaning “member state”) • CC is the country code • yymmdd is the date when the file was generated • xxxxxxxxxxx is any additional identification • Examples • MemSt-AR-041203-thisismytestfile • MemSt-FR-041212-fileidentification INIS Training Seminar

  24. CAI Batch Processing • Output: _MemSt-CC-yymmdd-xxxxxxxxxxx • These files will carry the CAI suggested descriptors in tag 800, preceded by the string ##CAI suggestions##; • Example: • 800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2; DESCRIPTOR3; ……. • sent back to the member state for reviewing INIS Training Seminar

  25. CAI Batch and Online ProcessingReviewing Process • Delete all suggested descriptors which are too general • Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature ranges,... • nuclear reactions • chemical compounds, alloys, etc. • CAI is cleaning up BT/NTs  clean up BT/NTs from manual additions • Clean up suggestions from homographic terms INIS Training Seminar

  26. CAI Batch and Online ProcessingFinalisation Process CAI batch • When reviewing of the record completed:Delete “##CAI suggestions## “ • When reviewing of all records completed: Submit file to “INIS Input Box” CAI online • When reaching the last record:press “export and exit” button • File goes directly to INIS production system, or if required, sent back to Member State for reviewing INIS Training Seminar

  27. CAI Production Statistics01-06-2004 until 31-08-2009 INIS Training Seminar

  28. CAI Batch Processing Statistics2005 until 31-08-2009 INIS Training Seminar

  29. Tested by China Germany France India Japan Switzerland Uruguay Regularly in use by Argentina Brazil China Czech Republic Japan Switzerland CAI online for Member Statesintroduced in July 2007 CAI online and CAI batch are now regular services for Member States INIS Training Seminar

More Related