1 / 27

Knowledge Organization Research in the last two decades: 1988-2008

Knowledge Organization Research in the last two decades: 1988-2008. Fidelia Ibekwe-SanJuan Eric SanJuan. Outline. Previous work Goal Data collection Analysis methodolgy Results Discussion. Previous works. On trends survey in KO: McIlwaine & Williamson (1999) McIlwaine (2003)

patia
Download Presentation

Knowledge Organization Research in the last two decades: 1988-2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Organization Research in the last two decades: 1988-2008 Fidelia Ibekwe-SanJuan Eric SanJuan

  2. Outline Previous work Goal Data collection Analysis methodolgy Results Discussion

  3. Previous works • On trends survey in KO: • McIlwaine & Williamson (1999) • McIlwaine (2003) • Hjorland & Albrechtsen (1999) • Lopez-Huertas (2008) • Saumure & Shiri (2008)‏ • Smiraglia (2009)‏

  4. Previousworks • Personal readingsof journals & ISKO proceedings • Query: was a query constructed and submitted to a database in order to retrieve records? • Publications: reading / perusing of full texts? • Records: bibliographic records (titles & abstracts)‏

  5. Previous works • Major findings: • 1998-2003: McIlwaine & Williamson (1999); McIlwaine (2003)‏ • Classification schemes (UDC, DCC, LCSH,..)‏ • Bias in classification (gender, culture)‏ • Interoperability of KO vocabularies • Rise of Internet technology, search engines, impact on KO • Resource discovery • Emerging trends in expert systems (NLP, ontologies, automatic indexing...)‏ • Terminology management problems • Thesauri design • Information visualisation in online context

  6. Previous works • Major findings: • 1989-1998?: Lopez-Huertas(1998); • Mainstream research in KO are reformulations of old problems (classification, thesauri)‏ • Recasting them in web era gives them a new life! • Especially since KO is more & more entwined with sister fields • 2 major driving forces of research in KO: • demand for quality & interoperability in a multilingual, multicultural world • Managing emergent knowledge in KOS in the semantic web era • Both are reformulations of multidimensionality of knowledge • Necessitating an inter- and multi-disciplinary effort • etc...

  7. Previous works • Major findings: • 1966-2006 (40 yrs!): pre & post-web era Saumure & Shiri(1998); • Organizing corporate or business information • Machine-assisted knowledge organization • Information professionals • Interoperability • Cataloging and classification • Classifying the web • Digital preservation and digital libraries • Metadata applications and uses • Cognition • Education • Indexing and abstracting • Thesauri initiatives

  8. Previous works • Major findings: • Saumure & Shiri(1998):1966-2006 (40 yrs!): pre & post-web era; • Trends b/w pre (<1993, date of 1st navigator, Mosaic) and post-web era • KO research focused throughout on mainstream topics • Cataloguing, classification • Pre-web era: more focused on indexing and cataloguing • Post-web era: metadata generation & harvesting, interoperability, thus more technological thrust

  9. Previous works • Summary • Despite methodological differences in data collection and analysis methods • Important overlaps in findings • Mainstream research is still driving KO (classification research, cataloguing, thesauri, bias,...)‏ • Reformulations in the web era (interoperability, metadata creation & harvesting, assisted indexing & retrieval, terminology issues...)‏

  10. Goal • Trends survey of research on KO issues over past 2 decades (1988-2008), 21 yrs. • What can we get from automatic data analysis methods? • Can they provide any useful insight?

  11. Goal • Epistemology: • Empiricism (how): methodology - observation of evidence from data • Pragmatism (why): is it useful and for whom? • Some connection with bibliometrics but focus is not on mapping authors but on mapping contents • Methodological difference with mainstream data analysis techniques: symbolic (linguistic & terminology) vs bag-of-word approach

  12. Data collection (1)‏ • issue • ISKO proceedings: not indexed in a machine-processable format (database)‏ • No problem for peer-reviewed journals... • But ambiguity of KO concept! • At the end of the day... a manual selection of KO & LIS-related journals • Records downloaded from Web-of-Science (WoS)‏

  13. Data collection (2)‏ • List of 31 selected journals at http://fidelia1.free.fr/isko2010/data/list-journals.pdf • 931 records out of which 838 came from KO & ancestor (International Classification)‏ • 45 000 words in titles & abstracts • Research trends will portray mostly publications from KO journal. • Not the entire realm of publications on KO but we had to be content with that...

  14. Sample record from ISI-WoS PT J AU RADA, R ROSSIMORI, A PATON, R RECTOR, A MAGLIANI, F ROBBE, PD TI THE GALEN DREAM SO INTERNATIONAL CLASSIFICATION AB Outlines the origin, needs and principles of GALEN, the Generalized Architecture for Languages, Encyclopedias, and Nomenclatures as applicable to Medicine. Short-term and long-term plans of GALEN have been elaborated to cope with possible developments. ''Milestones'' are given indicating what should be reachedwhen and how much funding will be required for each milestone. In two ''vision'' pictures the situation before and after the introduction of GALEN is shown and the responsibilities at 4 different levels are listed. SN 0340-0050 PY 1992 VL 19 IS 4 BP 188 EP 191 UT ISI:A1992KH33900002

  15. Analysismethodology (1)‏ • Empirical observations of how terminology depicts knowledge artefacts (titles & abstracts)‏ • Terminology engineering • Descriptive text data analysis (propose automatically a partition in the data)‏ • Hierarchical agglomerative clustering • Mapping & Visualisation: • Multidimensional view of domain structure: symbolic & numerical information • TermWatch system (SanJuan & Ibekwe-SanJuan 2006)‏

  16. Analysismethodology (2)‏ - Corpus split in 2 periods * 1988-1997 * 1998-2008 - Terminologymodeling * Automatic extraction of terms * Term variant search - Clustering by semantic relations - Linking clusters by co-occurrence - Mapping & visualization

  17. Analysismethodology (3)‏ - Terminologymodeling * Automatic extraction of terms * surface morpho-syntactic properties of terms * rule implementation * extraction of likely candidates * filtering: statistical measures or manual * Problem: statistical measures work on massive data

  18. Analysismethodology (4)‏ - Terminologymodeling * Term variant search * surface morpho-syntactic operations b/w terms * spelling variants (WordNet)‏ * synonyms (USE/UF)(WordNet)‏ * likely BT/NT candidates: syntactic information * likely RT: lexico-syntactic information * some errors and noise * but in automation you do a trade off!

  19. Analysismethodology (5)‏ • Some term variants acquired • Paradigmatic organization (BT/NT)‏ classification scheme universal classification scheme generic classification scheme knowledge classification scheme Library of Congress – LC (USE/UF)‏ knowledge organisation scheme knowledge organization tool (RT)‏ • The system does not tag these relations as such • They are assumed to be implied by the variations

  20. Analysismethodology (6)‏ • Assumptions behind terminology modeling • Consensus from studies on terminology/lexicography: new terms (denominations of concepts) are mostly created from existing terms • Rare creation of terms ad nihilo • Surface linguistic operations reveal semantic (conceptual?) relations between domain concepts • By studying these operations and visualising how they relate terms • Reveal the conceptual structure of a domain

  21. Analysismethodology (7)‏ • Clustering • 3 tierprocess: • 1st group terms by close semantic relations 2nd hierarchical clustering by lesser semantic relations (many iterations)‏ 3rd link cluster labels by co-occurrence of labels or that of their variants • Visualisation • Thematic maps (Pajek)‏ • Navigation interface (browser)‏

  22. Results (1)‏

  23. Results (2)‏ • Main topics for period 1 (1988-1997)‏ • Global structure : typical « core - peripheral » layout • Knowledgeis the structuring pole • Classification • Subjects gravitating around the Knowledge pole: • analysis • online vocabulary control standardization • bibliographic information system • indexing (automatic & manual)‏ • thesaurus construction and usage • information documentation system • translation

  24. Results (3)‏ • In the last decade (1998-2008): • Researchnetwork is much more intertwined • No one center but several « core » issues connected to one another • Major topics are intertwined: • KO issues ↔ classification ↔ information theoretic ↔ indexing language ↔ user evaluation • Newer topics: web issues, metadata, knowledge discovery, computer algorithm,...

  25. Results (4)‏ • 1998-2008, equaldivide b/w: • theoretical research • information science, concept, classification theory, epistemological foundation,... user-orientedstudies • user librarian, user-defined descriptor, user evaluation mainstream KO issues • classification, thesaurus, KO, term selection technology oriented handling of KO issues • knowledge, system, transfer, knowledge representation, knowledge engineering, knowledge discovery, information processing, computer algorithm... • web, web designer, web document • information retrieval, terminology structuring, metadata, metadata quality

  26. Discussion Evaluation of clusters: information-theoretic problem. No solution. No gold standard Goal of the method: precisely to propose a partition amongst the data Is it the best one? Reliance on external criteria: human (expert) evaluation So response from the community neeeded!

  27. Thankyou for listening

More Related