1 / 36

Efficient Search Strategies and Techniques for Information Retrieval

Learn about different search strategies and techniques for effective information retrieval, including building blocks search, successive facet strategies, and pairwise facets. Improve your search results with practical tips and tricks.

sheppardm
Download Presentation

Efficient Search Strategies and Techniques for Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 資訊檢索策略與技巧 黃慕萱,Chap.6 Harter,Chap. 7

  2. 檢索策略v.s.檢索技巧 • 1979,Marcia Bates,”Information Search Tactics” • 檢索策略(search strategy) • 針對一檢索問題之通盤考量或全面性之規劃 • 如分區組合檢索法、引用文獻滾雪球法….等 • 檢索技巧(search heuristics) • 為完成特定目的所採取的行動

  3. Briefsearch • 簡易檢索 • 最常見的檢索方式 • 快速簡單,但常是低recall,低precision • 適用 • 已知書目資料 • 主題明確

  4. Building Blocks Search • 亦有人稱為“block building” • 分區組合檢索法 • 檢索方式 • 將索引問題分解成數個主題層面(facets) • 確定主題層面間的關係 • 通常facets間的關係為”AND”,出現”OR”或”NOT”的情況較少 • 找出可代表各主題層面的檢索詞彙 • 利用布林邏輯”OR”做聯集,以求完整性

  5. Building Blocks Search Strategy--1/4 • Conduct reference interviews • Formulate search objectives • High recall • High precision • Moderate levels of recall and precision • Select database(s) and search system • Identify major concepts or facets and their logical relationships with one another

  6. Building Blocks Search Strategy--2/4 • Identify • search strings that represent the concepts • Words • Full-text phrases • Pieces of words • Descriptors • Identifiers • Codes • Non-semantic bibliographic characteristics • fields to be searched

  7. Building Blocks Search Strategy--3/4 • For each distinct facet of the search, a set of postings will be created for each search string within that facet. The sets are then combined into a single set representing that facet using Boolean OR • Following setp#6, the facets sets themselves will be combined with Boolean AND and NOT • Plan alternatives

  8. Building Blocks Search Strategy--4/4 • Formulate the initial statements of the search in the command language of the system • Logon and put the search to the system • Evaluate the intermediate results • Iterate • Use the interactive features of the system to carry out search heuristics  tactics, maneuvers, strategies, tricks, devices, approaches, to try to improve search results

  9. Building blocks approach Facet A Facet B Fact C Term A1 OR Term A2 OR …… ….. Term Ap Term B1 OR Term B2 OR …… ….. Term Bq Term C1 OR Term C2 OR …… ….. Term Cr Boolean combination of facets (AND, OR, NOT) Answer Set

  10. Building Blocks search sample Measurement of Risk Tendencies( looking for high recall) Boolean Combination: ((RISK AND MEASUREMENT) OR RISK AVERSION OR BEHAVIORAL DECISION THEORY) NOT INSURANCE

  11. 檢討結果重新檢索 • 想增加recall時 • find additional concepts or search terms to add to one or more facets • delete a facet • 想增加precision時 • delete some of the more broader or more ambiguous terms in the facets • add an additional facet to be intersected with the others

  12. Successive facet strategies--1/3 • 主題層面連續檢索法 • 其他名稱 • fewest postings first (最少筆數優先) • most specific concept first (最精確概念優先) • successive fractions (非以主題層面開始的連續檢索) • 分區檢索法使用所有主題層面 • 主題層面連續檢索法設法動用最少的主題層面 • 決定檢索問題的主題層面後,需確定其優先順序,視結果決定是否要繼續進行檢索

  13. Successive facet strategies--2/3 First Facet Other Facet AND (optional) Second Facet Solution Set AND (optional) Other Facet (optional) Sample: Search for: “members and activities of 4-H clubs”

  14. Successive facet strategies--3/3 • 適用情況 • 當所有的主題層面以布林運算元結合,很可能產生零筆資料時 • 當檢索問題中有一至兩個主題層面涵義相當模糊時 • 當檢索問題具備其他非主題之檢索條件,如資料類型、語言、或出版年代等,可將此非主題檢索條件視為第一個檢索概念時 • 當檢索者寧願忍受誤引而不願失去相關文章時 • 當加入其他主題層面所花費的時間和金錢,可能會超越直接列印檢索結果時 • 當相關文獻過少,檢索者願意檢視一些相關度較低的文章時

  15. Pairwise Facets—1/3 • 主題層面配對法 • 將主題層面兩兩配對並取其交集,而後再聯集之 • 適用情形 • 所有主題層面都同樣重要 • 主題層面之精確性或模糊性相差不大 • 將所有主題層面結合會導致零筆資料 • 注意:主題層面過多時,盡量以3-4個為執行交集的基本單位,以免混淆

  16. Pairwise Facets—2/3 分區組合檢索 主題層面配對檢索 A AND B AND C (A AND B) OR (A AND C) OR (B AND C)

  17. Pairwise Facets—3/3 FINAL SOLUTION SET: A OR B OR C Facet #1 Solution Set B AND Solution Set A Facet #2 AND Solution Set C AND Facet #3 Sample: A doctoral student wants a high recall bibliography prepared on the relationship between facial musculature and the physiological (autonomic) responding of emotions, e.g., fear.

  18. Citation Pearl Growing • 引用文獻滾雪球法以high precision 為目的 • 由100%precision(相關的文章),反推追求recall • 不斷從已知相關的文獻中,獲取檢索所需的descriptors、identifiers 、 words,重新進行檢索 • 適用情形 • 資料庫無索引典或詞彙集 • 新興學科 • 常需重複多次檢索,不適於初學者

  19. Other facet strategies • Multiple Briefsearch • 利用不同的database,盡量取得high recall • Interactive Scanning • most time-consuming and interactive • 如使用classification codes, natural language • Implied Concepts • 掌握隱含性概念,視資料庫之主題性質,選用不同詞彙

  20. Citation indexing strategies • 利用引用(citing)與被引用(cited)文獻之間的關係,建構檢索策略 • Offer highly interdisciplinary and multidisciplinary approaches to online searching • 檢索策略 • Cited publication、Cited Author、Cocited Authors • 國科會人文學研究中心人文學引用文獻資料庫(THCI)http://www.hrc.ntu.edu.tw/index.htm

  21. Non-subject, fact, and multiple database searching • Non-subject searching • Document type、year of publication、language、author、corporate source • doublelimiting • Fact searching • Search for a known item • Multiple database searching

  22. 檢索技巧(Heuristics) • Language Heuristics • Command Language, Database and File Structure Heuristics • Recall and Precision Heuristics • Heuristics for Increasing Recall • Heuristics for Increasing Precision • Personal Heuristics

  23. Language Heuristics—1/2 • 當有下列情形,應使用自然語言檢索 • One or more of the concepts of interest involves a subtle nuance of meaning • One or more of the concepts of interest is highly specific • One or more of the concepts is relatively new and appropriate terms in the controlled vocabulary don not exist • A highly comprehensive search is desired (high recall) • The literature to be searched is “soft”

  24. Language Heuristics—2/2 • 當有下列情形,應使用控制詞彙檢索 • The concepts of interest can be expressed precisely and unambiguously in the controlled vocabulary • A limited search retrieving a limited number of highly pertinent items is desired • The literature to be searched is “hard”

  25. Command Language, Database and File Structure Heuristics—1/2 • Know the stop words used by the search system • Know the sort order associated with the binary coding system used by the host computer • Know which fields are searched by default, if search fields are not explicitly specified

  26. Command Language, Database and File Structure Heuristics—2/2 • Know the parsing rule used to index each field searched • Always question null sets • Understand Boolean operations with the null set and make use of this knowledge in reformulating search statements

  27. Questions to ask in low recall—1/2 • Am I in the correct database? • Have I overspecified the search problem? • Is there anything done on the topic or problem? Is there a literature on this search problem? • Have sufficient search terms been included to properly represent each concept of the search?

  28. Questions to ask in low recall—2/2 • Where the proximity specifications placed on the search placed on the search terms too restrictive? • Was Boolean logic used correctly? • Did I make a technical error, e.g., in spelling or command syntax? • Should I be searching in natural language fields? • Have all word forms of search terms bee used? Should truncation be employed?

  29. Heuristics for Increasing Recall --1/2 • Use additional synonyms and near synonyms combined with Boolean OR to represent search concepts • Use more generic terms in addition to specific terms to represent search concepts • Use natural language in addition to controlled vocabulary terms • Search additional subject fields

  30. Heuristics for Increasing Recall --2/2 • Delete AND and NOT facets form the formulation • Increase term truncation • Use less restrictive proximity operators, e.g., require that terms appear in the same paragraph rather than the same sentence • Remove any restrictions from the formulation, e.g., language, date of publication, type of publication

  31. Questions to ask in low precision—1/2 • Am I in the correct database? • Have I underspecified the search problem? • Do I need to disambiguate a concept of the problem? • Have I used Boolean logic correctly? • Have I include vague or ambiguous terms, or terms that are too generic?

  32. Questions to ask in low precision—2/2 • Should I restrict search terms to elements of a controlled vocabulary? • Where the proximity specifications too loosely placed on the search terms? • Are false drops resulting from concepts having an unintended relationship with one another? • Has a search term been truncated too severely?

  33. Heuristics for Increasing Precision --1/2 • Delete near synonyms and potentially ambiguous terms • Use more specific terms to represent concepts • Use controlled vocabulary terms if a concept is precisely represented by them; delete controlled vocabulary terms that do not describe a concept precisely • If multiple meaning does not appear to be a major problem, search natural language terms that represent the concepts of interest precisely

  34. Heuristics for Increasing Precision --2/2 • If none of the above conditions applies, search fewer subject fields, deleting fields in the approximate order; full text, abstract, title, identifier, and descriptor • Add additional facets with AND and NOT • Decrease term truncation • Use more restrictive proximity operators • Add restrictions to the formulation, e.g., by date of publication, type of publication, language, etc.

  35. Personal Heuristics—1/2 • Be flexible; stay loose; be willing to look at a search in more than one way. Avoid rigidity in thought and action. • Browse samples of retrieved citations to assess relevancy. • Browse samples of retrieved citations to generate additional search terms. • Be heuristic, interactive. Don’t do “fast batch” searching.

  36. Personal Heuristics—2/2 • Evaluate one’s own work critically. • Always be skeptical of search output. • A mindless faith in controlled vocabularies is not always justified. Be critical of the adequacy of artificial languages for the representation of concepts in documents.

More Related