1 / 21

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT. MAYURI UMRANIKAR. CONTENTS . Introduction Retrieval Environment - The Vector Space Model - INEX Environment - Flexible Retrieval System Method Used for Retrieval - Document Tree – Construction

mervyn
Download Presentation

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR

  2. CONTENTS Introduction Retrieval Environment - The Vector Space Model - INEX Environment - Flexible Retrieval System Method Used for Retrieval - Document Tree – Construction - Ranking of Elements - Output Experiments Conclusions

  3. INTRODUCTION • Extensible Markup Language (XML) preferred for representing documents and due to increase of documents, issue of element retrieval arises • Focus on retrieval of relevant elements rather than entire document • INEX – INitiative for Evaluation of XML Retrieval • Flexible Mechanisms • Different Approaches • Term Weighting

  4. RETRIEVAL ENVIRONMENT • 2 Factors – Issues when focus moves from documents to components and Salton’s Vector Space Model • Vector Space Model – Weight number of times a term occurs in the document • Fox’s Extended Vector Space Model – Incorporation of objective identifiers • Document vector consists of subvectors • Contain text independently indexed, weighted, searched and retrieved • Term Weighting – weighting within subjective vectors • Smart Experimental Retrieval System

  5. INEX ENVIRONMENT • Content Only (CO) –ignore document structure, like typical queries, specify only content of search • Content and Structure (CAS) – explicitly refer to structure, exhaustive and specific • CO query directly to user, CAS additional filtering and search of body portion • CAS returns rank ordered list of elements • INEX-EVAL – uses measures of recall and precision ( fig, exhaustivity, specificity mapped to a single relevance) results are ranked

  6. FLEXIBLE RETRIEVAL SYSTEM • Smart Format – documents and topics translated, indexed as extended vectors • Subjective vectors – contain content bearing terms • Objective vectors – serve as filters on result returned by CAS queries • Extended vector – subjective vector, terms having a paragraph in body subvector • Lnu-ltu weighting • Dynamic flexible retrieval- tree representation, rank ordered list by lnu weights

  7. METHOD FOR FLEXIBLE RETRIEVAL • Input – Query Q given and paragraph, retrieve rank ordered list, terminal modes • N top ranked paragraphs as input selected • Set of paragraphs used to identify documents – elements generated and returned as output • Document Tree – Needs information of structure Terminal nodes Pre-order traversal Terminal nodes found in paragraph index

  8. SIMPLE XML DOCUMENT AND ITS SCHEMA

  9. CONSTRUCTION OF DOCUMENT TREE • For query Q, n top ranked paras used to build trees • Leaf elements or terminal nodes - paragraph nodes • Each leaf represented by term-freq weighted frequency vector • 1st – gather all leaf nodes, terminal nodes done • 2nd – merge children vectors for parents • Document schema determine merging • Parent – unique terms of children, term –freq weighted parent vector( has content of children) • Process in recursive manner done

  10. RANKING OF ELEMENTS • Set of elements of document tree generated • Problem- structured retrieval; rank ordered list of elements • Method used – All-element index( separate representation for each element of each document and weighting information) • Lnu weights - elements variable length, do not require global frequency • Normalization and length – failing results in biased values • Pivot – document length probability of relevance= probability of retrieval • Slope- amount of tilting • Pivoted Normalization – reduces difference • Lnu term weights: ((1+log(term_freq))/ (1+log(avg_term_freq)))/((1-slope)+slope*((no_unique_terms)/pivot)

  11. Ltu weighting – N collection size, nk no of elements ((1+log(term_freq))/log(N/nk))/ ((1-slope)+slope*(no_unique_terms)/pivot)) • N,nk element dependent, should be known through indexing • We move up; N – count elements of each type • Nk – inverted file entry in paragraph index, mapping identifiers and xpaths (given)

  12. OUTPUT OF FLEXIBLE RETRIEVAL • Select another leaf node, gather siblings, construct document tree, calculate Lnu term weights, Ltu weighted query; produce another rank ordered list • After n top ranked exhausted, last list produced, merge lists • Single set of elements rank ordered – correlation Q • Comparison – flexible retrieval & all-element index identical – set of n paragraphs i/p to flexible retrieval have all paragraphs same values used for Lnu-ltu

  13. ALGORITHM

  14. EXPERIMENTS • Paragraph – result; set of extended vectors representing paragraph • CO – subvector represents subjective portion, body subvector important (content of element and not type) contained in body • Tree Representation

  15. FACTORS OF INTEREST • Slope, pivot for Lnu-ltu • Effective structure retrieval • Can be determined – empirically, applied from one collection to other; Generic • N- no of paragraphs input, sets upper bound on number per query • Actual trees depend on number of paragraphs having same group or same document

  16. EXPERIMENTS DONE • All-element and dynamic/flexible retrieval experiments and results - body-only retrieval • Correlation between element and query vector produced – correlation of body elements only Table 1

  17. RESULTS • Tables

  18. Result equivalent • Flexible more efficient – file space Time required for indexing is half • Dynamic- Per query basis cost more – n; total trees not exact required specified • Another factor – value of nk

  19. DISCUSSIONS AND CONCLUSIONS • Flexible retrieval dynamically, rank ordered list of elements, single indexing at level - basic indexing node (paragraph) • Basic functions- SMART; extended vector model • Results – flexible capabilities • Attempt to incorporate other subvectors, internal node, weight • INEX – exhaustivity and specificity; results exhaustive; specificity research going on; results are reflection • It is the better way of retrieval than all-indexing

  20. THANK YOU!!!

More Related