1 / 26

Information Retrieval – and projects we have done.

Information Retrieval – and projects we have done. Group Members: Aditya Tiwari (08005036) Harshit Mittal (08005032) Rohit Kumar Saraf (08005040) Vinay Surana (08005031). Guided by Prof. Pushpak Bhattacharyya. Motivation.

ina
Download Presentation

Information Retrieval – and projects we have done.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval –and projects we have done. Group Members: AdityaTiwari (08005036) HarshitMittal (08005032) Rohit Kumar Saraf (08005040) VinaySurana (08005031) Guided by Prof. Pushpak Bhattacharyya

  2. Motivation • Web, documents and encyclopedia all have tremendous amount of data and information in them. The information thus available serves only the intent of the creator or collector of data. • However, there can be other uses of that data/information as well. The need is to mine the right information from the data and use it appropriately.

  3. Information Retrieval

  4. Applications • Web search – Google, Yahoo • Querying/QA system like Watson (developed by IBM). • Spam filtering • Automatic Summarization • Cross-lingual retrieval en.wikipedia.org/wiki/Information_retrieval_applications

  5. Information Retrieval • IR is the study of concerned with searching for documents, and for metadata about documents, as well as that of searching relational databases and the WWW. • The data objects that are collected can be images, documents, videos, mind maps, music en.wikipedia.org/wiki/Information_Retrieval

  6. Wiki Mind Mapping Harshit Mittal (IIT-B) h.mittal83@gmail.com AdityaTiwari (IIT-B) adi.tiwari27@gmail.com AkhilBhiwal (VIT University) bhiwalakhil@gmail.com

  7. Project Idea • Represent the textual information in graphical form which is easier to understand and more intuitive to read. The visual representation should be able to summarize the text.

  8. Research Goal • Use of phrases to represent semantic information. • Hierarchical representation of information of a given text

  9. Mind maps • A mind map is a diagram used to represent words, ideas, tasks, or other items linked to and arranged around a central key word or idea. • Example Mind map in the next slide. http://en.wikipedia.org/wiki/Mind_maps

  10. Mind map http://www.spicynodes.org/blog/2010/05/21/stuff-we-like-climate-change-mind-maps/

  11. What’s the difficult part? • We can’t represent information from any article in mind-map as it is. That would make it incoherent and clumsy. • Phrase extraction • General rules of grammar don’t apply here.

  12. Possible Solution • Develop new linguistic rules for representation of text in visual form. • Use existing summarization tools to generate summary and try to represent that in mind-map.

  13. How we did it. • Pulling out the article section wise from the Wikipedia page. • Parsing each section sentence wise using the Stanford parser. • Extracting “relevant” phrases using Tregex (another Stanford tool). • Putting these phrases into a mind map, section wise. http://nlp.stanford.edu/software/tregex.shtml

  14. Extraction of relevant information • Identifying subtrees from the parse tree of a sentence that are important. • This was done using a few heuristics like: • Presence of a superlative adjective in a noun phrase http://nlp.stanford.edu/software/tregex.shtml

  15. Extraction of relevant information • Presence of a cardinal number in a noun phrase http://nlp.stanford.edu/software/tregex.shtml

  16. Extraction of relevant information • Matching of a particular verb to the bag of verbs that were considered relevant for a particular article. For example : for the history section, verbs like find , discover, settle, decline were considered “more useful”, as compared to words like derive, deduce etc. which were considered useful for some other section.

  17. Extraction of relevant information Ex : The name India is derived from Indus. http://nlp.stanford.edu/software/tregex.shtml

  18. Code Generated Mind Map

  19. Evaluation http://en.wikipedia.org/wiki/Precision_and_recall

  20. Evaluation • Survey based: • Asking a person to generate 10 questions from given article. • Asking another person to answer those question with the help of mind-map. • Repeating the same exercise in reverse manner for another article.

  21. Observations • Pros: • Extraction of right information with high accuracy. • Concept of phrase extraction works well. • High precision value were obtained (between 0.5-0.75).

  22. Observations • Cons • Information presented in mindmap of low depth is clumsy. • Low recall value (0.2 – 0.4) • Linking of node phrases with their apt description. • Heuristics defining “important phrases” need to be refined.

  23. Limitations • Bag of words and Tregex expressions is hand-coded instead of machine learned. • Garbage phrases are being generated. • Level of hierarchy is limited to 3.

  24. Future work • Using machine learning to determine the important keywords for a given sentence. • We want to explore the possibility of finding patterns in subtree expressions using machine learned approach. • Refinement of generated phrases.

  25. References • http://en.wikipedia.org/wiki/Mind_maps • http://en.wikipedia.org/wiki/Precision_and_recall • Tool : Stanford Parser and Stanford Tregex Matchhttp://nlp.stanford.edu/software/tregex.shtml

  26. Vision Based Attribute Segmentation from lists in Web Pages -by Rohit Kumar Saraf

More Related