1 / 28

Dynamic Element Retrieval in a Structured Environment

Dynamic Element Retrieval in a Structured Environment. Crouch, Carolyn J. University of Minnesota Duluth, MN October 1, 2006. Key Problems. Retrieval of elements at desired level of granularity Assigning a rank order to each element that reflects its perceived relevance to the query.

kiana
Download Presentation

Dynamic Element Retrieval in a Structured Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Element Retrieval in a Structured Environment Crouch, Carolyn J. University of Minnesota Duluth, MN October 1, 2006

  2. Key Problems • Retrieval of elements at desired level of granularity • Assigning a rank order to each element that reflects its perceived relevance to the query

  3. Retrieval Environment • Vector Space Model • INEX Environment • Flexible Retrieval

  4. Vector Space Model • Document Indexing • Term Weighting • Similarity Coefficients

  5. INEX- Initiative for the Evaluation of XML Retrieval • INEX provides an environment for experiments in structured retrieval • Traditionally contains two types of topics CO and CAS • Both INEX 2004 and 2005 utilize an evaluation measure known as inex-eval • Recall(the proportion of relevant information retrieved) and Precision(the proportion of retrieved items that are relevant

  6. Flexible Retrieval System • Systems processes XML documents • Smart format(Salton’s Magic Automatic Retriever of Text) • Lnu-ltu term weighting

  7. A Method for Flexible Retrieval • Input to Flexible Retrieval • Construction of the Document Tree • Ranking of Elements • Output of Flexible Retrieval

  8. Input to Flexible Retrieval • Preorder traversal • Ranked terminal leaf nodes(paragraphs) • Generate document tree(schema and paragraphs)

  9. Document Tree

  10. Construction of the Document Tree • Schema determine document tree • Calculate Lnu-ltu term weights

  11. Ranking of Elements • Address ranking issue’s with Lnu-ltu term weighting • Length and normalization issue’s • Pivot and slope

  12. Simple structured document

  13. Lnu(weight of element vector formula) (1 + log(term frequency)) ÷ (1 + log(average term frequency)) __________________________________________________ (1 − slope) + slope × ((number unique terms) ÷ pivot)

  14. Ltu(weighting of query terms formula) (1 + log(term frequency) × log(N ÷ nk) ___________________________________________ (1 − slope) + slope × ((number unique terms) ÷ pivot)

  15. Overview of flexible retrieval 1. Parse to extract leaf nodes from the original XML documents 2. Index leaf nodes and queries using Smart 3. Perform Smart retrieval to get highly correlated leaf nodes

  16. Overview of flexible retrieval(cont) 4. For each document containing a retrieved leaf node a. Get its document schema b. Generate vector representations for inner nodes (elements) 5. For each term in the query a. Get its inverted file entry and corresponding xpaths b. Find nk at all levels

  17. Output of Flexible Retrieval • Equivalent to all-element index

  18. Experiments in flexible retrieval • Factors of interest • Experiments and results

  19. Factors of interest • Slope and pivot during Lnu-ltu term weighting • The n(number of paragraph)

  20. Experiments and Results • Attendant file size(dictionary, inverted index, element vectors reduced by 60%, 50% and 50% respectively) • 30%- 40% less storage than all-element index • Is dynamic element retrieval Cost Effective?

  21. Conclusion • Similar work(Grabs and Shek) • Exhaustivity dependent • Progress in specifity

  22. Researchers • Grabs and Shek(similar work to flexible retrieval) • Govert et al.(term weights are multiplied by a collection-dependent augmentation factor as they are propagated up the doc. Tree • Mass et al.(maintain separate indices for element at different levels of granularity. Solves issues of distorted statistics

  23. Overview of flexible retrieval(cont) 6. Correlate element vectors at each level with query 7. Return ranked list of elements

  24. Table I INEX 2004 INEX 2005 article 12,107 16,440 sections 69,577 94,421 subsections 77,397 104,746 paragraphs 1,029,747 1,378,202 elements 1,188,828 1,593,809 CO Topics 40 Topics 40 Topics (34 assessed) (29 assessed)

  25. Table II. Comparison of All-Element and Flexible Retrieval under Inex-Eval (Generalized) Precision at Rank 2004 2005 Rank All Element Flexible All Element Flexible 1 0.3897 0.3971 0.4224 0.4224 5 0.3088 0.2882 0.3241 0.3413 10 0.2735 0.2669 0.2991 0.2991 20 0.2529 0.2390 0.2841 0.2939 25 0.2456 0.2379 0.2669 0.2800 50 0.2000 0.1972 0.2364 0.2366 100 0.1523 0.1501 0.1921 0.1920 500 0.0697 0.0697 0.0943 0.0949 1500 0.0353 0.0362 0.0472 0.0483

  26. Table II.(cont) Precision at Various Points of Recall 2004 2005 Recall All Element Flexible All Element Flexible 0.01 0.3395 0.3348 0.3562 0.3693 0.25 0.0971 0.0951 0.1131 0.1165 0.50 0.0257 0.0283 0.0385 0.0404 0.75 0.0017 0.0017 0.0097 0.0095 1.00 0.0013 0.0013 0.0015 0.0015 avg prec 0.0625 0.0620 0.0739 0.0750

  27. Table III. Comparison of All-Element and Flexible Retrieval under Inex-Eval (Strict) Precision at Rank 2004 2005 Rank All Element Flexible All Element Flexible 1 0.2000 0.2000 0.1481 0.1481 5 0.1440 0.1200 0.0667 0.0741 10 0.1240 0.1200 0.0852 0.0778 20 0.1120 0.1020 0.0815 0.0815 25 0.1024 0.0992 0.0800 0.0830 50 0.0898 0.0832 0.0689 0.0681 100 0.0628 0.0608 0.0511 0.0500 500 0.0268 0.0259 0.0219 0.0217 1500 0.0141 0.0143 0.0096 0.0097

  28. Table III.(cont) Precision at Various Points of Recall 2004 2005 Recall All Element Flexible All Element Flexible 0.01 0.2134 0.2115 0.1521 0.1535 0.25 0.1006 0.1070 0.0540 0.0515 0.50 0.0411 0.0394 0.0156 0.0191 0.75 0.0166 0.0159 0.0103 0.0104 1.00 0.0042 0.0044 0.0046 0.0048 avg prec 0.0586 0.0577 0.0318 0.0335

More Related