1 / 34

Assessing the effectiveness of your current search and retrieval function

Anna G. Eslau, Information Specialist, H. Lundbeck A/S Marianne Lykke Nielsen, Associate Professor, Royal School of Library and Information Science. Assessing the effectiveness of your current search and retrieval function. Case story evaluating human metadata

duaa
Download Presentation

Assessing the effectiveness of your current search and retrieval function

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anna G. Eslau, Information Specialist, H. Lundbeck A/S Marianne Lykke Nielsen, Associate Professor, Royal School of Library and Information Science Assessing the effectiveness of your current search and retrieval function Case story evaluating human metadata indexing versus automatic query expansion using a corporate thesaurus H. Lundbeck A/S20-Sep-141

  2. Agenda • Motivation • Case study • Research partners • Purpose • Test design • Findings • Conclusions • Summing up H. Lundbeck A/S20-Sep-142

  3. Motivation • A lot of money has been invested – but does our current search and retrieval function perform as expected? • An advanced and time consuming indexing task has been laid upon our end users – but is our current indexing strategy effective? • Do we have - as high quality - alternatives to manual indexing? H. Lundbeck A/S20-Sep-143

  4. Agenda • Motivation • Case study • Research partners • Purpose • Test design • Findings • Conclusions • Summing up H. Lundbeck A/S20-Sep-144

  5. Case study - Research partners • H. Lundbeck A/S • Pharmaceutical company • 5000 employees, in > 40 countries • Information systems with electronic documents • Corporate thesaurus • Users and search requests • Royal School of Librarianship • Thesaurus research expertise • Domain knowledge from former research project • Ensight A/S • Verity K2 search engine and Intelligent Classifier • Technical expertise H. Lundbeck A/S20-Sep-145

  6. Purpose of case study To evaluate • Information retrieval based on controlled, human indexing (controlled metadata) • Information retrieval based on full-text indexing, with thesaurus-based automatic query expansion H. Lundbeck A/S20-Sep-146

  7. Case study – Retrieval system and indexing policy • Electronic document management system (EDMS) and bibliographic information system containing research documentation • Indexing policy • Written indexing policy • Mandatory training of indexers • Corporate Thesaurus • Human, controlled indexing • Topical checklist/Facetted indexing • Searching by controlled metadata and full-text • Domain specific thesaurus containing 5,500 concepts and 16,000 terms H. Lundbeck A/S20-Sep-147

  8. EDMS 1/2 - Indexing

  9. EDMS 2/2 – Searching

  10. Lundbeck Thesaurus 1/3 H. Lundbeck A/S20-Sep-1410

  11. Lundbeck Thesaurus 2/3 H. Lundbeck A/S20-Sep-1411

  12. Lundbeck Thesaurus 3/3 H. Lundbeck A/S20-Sep-1412

  13. Agenda • Motivation • Case study • Research partners • Purpose • Test design • Findings • Conclusions • Summing up H. Lundbeck A/S20-Sep-1413

  14. Test design - Retrieval performance of different search strategies • Three different search strategies were evaluated: • Searches based on natural language (words from original request) in full text • Searches based on natural language in full text expanded with words from thesaurus (query expansion with synonyms and narrower terms) • Searches based on (manually assigned) controlled keywords in selected metadata fields H. Lundbeck A/S20-Sep-1414

  15. Test design - Query expansion • Search for information about intravenous administration of a drug AND Alzheimer’s disease: ’Intravenous OR IV OR Intravenously OR…’ AND ’Alzheimer’s disease OR Alzheimer’s disorders OR Alzheimer type dementia OR…..’ H. Lundbeck A/S20-Sep-1415

  16. Lundbeck Thesaurus H. Lundbeck A/S20-Sep-1416

  17. Test design - Test persons and retrieval system • Persons • Query expansion tests were carried out by the thesaurus manager and did not involve end-users • Evaluation of search results were carried out by end users – 4 subject experts (Medical advisers) who had formerly answered the search requests • System • Verity K2 search system was used as test retrieval system for the query expansion test work • Original document management systems were used as retrieval system for the metadata searches H. Lundbeck A/S20-Sep-1417

  18. Test design - Test thesaurus • The Lundbeck Thesaurus was the test thesaurus. The thesaurus formed basis for query formulations: - Synonyms and narrower terms were picked from the thesaurus for the test searches based on expansion of natural language in full text searches - Preferred keywords were picked from the thesaurus for the test searches based on controlled keywords in selected metadata fields. H. Lundbeck A/S20-Sep-1418

  19. Test design - Test collection • 25,384 document objects from two different sources • 24,369 document objects from a bibliographical (BRS) information system (internal research reports and published research articles) • 1015 documents from the full-text EDMS system (internal research reports) H. Lundbeck A/S20-Sep-1419

  20. Test design - Search requests • 10 search requests were selected from a set of searches which in real life had been carried out in the corporate information systems Work task 7: You are a medical reviewer. A physician has contacted you. He would like to have data on the use of Citalopram and Reboxetine together to treat resistant depression. He wants any reporting of possible interactions. Indicative request: Find reports, papers or case stories that investigate the possible interaction of Citalopram and Reboxetine on resistant depression H. Lundbeck A/S20-Sep-1420

  21. Agenda • Motivation • Case study • Research partners • Purpose • Test design • Findings • Conclusions • Summing up H. Lundbeck A/S20-Sep-1421

  22. Findings – Performance SJ = Search Job, QE = Query Expansion Precision (% relevant docs out of all retrieved docs) went down from 33% to 24% with query expansion H. Lundbeck A/S20-Sep-1422

  23. Findings – Human indexing problems

  24. Findings – Other metadata • Topical retrieval and situational relevance ranking - the importance of contextual parameters • Document type • Publication year • Source • Language • Author H. Lundbeck A/S20-Sep-1424

  25. Findings – Thesaurus • Thesaurus • Relevant synonyms (acronyms with multiple meanings should be omitted) • Logical hierarchies • High topical relevance H. Lundbeck A/S20-Sep-1425

  26. Findings – Documents and search requests • Document collection • OCR scanned documents may contain errors => false positive hits • Large (>100 pages) full text documents lower precision (irrelevant hits) • Search requests • If people are searching using very general terms, QE will be extremely complicated/extensive, the more levels of QE we choose to add • Different types of facets result in • Different relevance assessment according to document types • Different recall in metadata search H. Lundbeck A/S20-Sep-1426

  27. Findings – Search software • Search software settings are important • Stemming • Case sensitivity • Character sensitivity (()) • Number of search terms allowed • Zoning H. Lundbeck A/S20-Sep-1427

  28. Agenda • Motivation • Case study • Research partners • Purpose • Test design • Findings • Conclusions • Summing up H. Lundbeck A/S20-Sep-1428

  29. Conclusion – Thesaurus and QE • A domain specific thesaurus are well suited for QE • QE improves recall but decreases precision • QE with synonyms only are in most cases sufficient H. Lundbeck A/S20-Sep-1429

  30. Conclusion - Search result display • Users want to see all hits (recall is important) • Manual sorting of search results by (other than topical) metadata is requested by the users • Ranking based on e.g. zoning is not always useful H. Lundbeck A/S20-Sep-1430

  31. Conclusion – Indexing policy • Difficult to obtain complete, accurate and exhaustive human indexing • Findings suggest that searching for specific topics should be based on full-text indexing, supported by thesaurus based query expansion • Human indexing should focus on few, important, well-defined topics, e.g. used to develop taxonomies for broad browsing • Non-Topical context metadata are important in assessment of document relevance • Document type • Publication year • Source • Language • Author H. Lundbeck A/S20-Sep-1431

  32. Conclusion – Implications for Lundbeck • Lundbeck Thesaurus has been integrated with bibliographic information system to perform automated QE • EDMS upgrade planned where QE should be possible • OCR scanning of existing documents are considered • Metadata on document types in EDMS are evaluated and under revision (simplified) • New models on how to add metadata are considered (dictionaries) • New indexing tools for the users are developed (indexing keys) H. Lundbeck A/S20-Sep-1432

  33. Agenda • Motivation • Case study • Research partners • Purpose • Test design • Findings • Conclusions • Summing up H. Lundbeck A/S20-Sep-1433

  34. Summing up • If your current search and retrieval function does NOT perform as expected, your organisation may loose important information • You may have an indexing strategy (which is good…) but evaluation may reveal that the resource investments could be used even better • Evaluation is important, it may save your organisation money over time H. Lundbeck A/S20-Sep-1434

More Related