1 / 21

Information Retrieval

Explore the challenges of gooseberry picking and information retrieval, and discover ways to improve fruit harvest and search results. Topics include textual analysis, word order, context, metadata, external categorization, and growing thornless gooseberries.

migueln
Download Presentation

Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval Liam Quin, Barefoot Computing, Toronto

  2. Agenda • Overview of Information Retrieval • What people want, and how to give it to them • Things people don’t know they want, and how to do them

  3. Chapter One: The Problem gooseberry

  4. Gooseberry Picking Hurts Gooseberries have thorns. Gooseberry pickers in Botswana might not wear shirts (or shoes). When you pick one gooseberry, others fall to the ground. The harvest would be improved if we could retrieve the fallen fruit safely. There are texts on this on the Internet.

  5. Searching for an Answer • Search for “information on texts about gooseberry retrieval” on the web.

  6. The Result • Pages on text retrieval. . . • . . . on information retrieval. . . • . . .and, at the top of the list. . . Cycling in Cape Gooseberry, Labrador

  7. Lessons • Indexes on words alone aren’t enough • Word order can be important • Relevance Ranking is often bogus • Sometimes you have to wear shirt and shoes.

  8. How can we Improve? • Better textual analysis • Word order • Context • Metadata • External categorisation (RDF, Topic Maps) • Grow thornless gooseberries

  9. Better Textual Analysis • Part of speech information during indexing • Stemming (boy/boys, foot/feet, run/running, me/mine) • Record more in index (caps, separation) • Co-location analysis (mine next to gold) • Ask the User what she means in the query (mine as in of me, or as in quarry?) • Thesaurus Expansion of queries

  10. Word Order • Give added weight to word order: • information retrieval vs. retrieval of information • Times Square vs. square times • include all words (What If Inc., The Times)

  11. Context • Co-location of words helps disambiguate • The xml containing element • Feedback from nearby documents (e.g. on the same website, or in the same chapter or publication) • Domain-specific information at index-time

  12. Metadata • Add information to documents • Dublin Core (e.g. Warwick Framework) • The htmlmeta element • The html rel/rev attributes in links

  13. External Categorisation • Use xml schemas to add context information • Document or site-wide information • Resource Description Framework • Topic Maps (iso 13250) • Categorise the result set [see picture]

  14. Grow thornless gooseberries • Sometimes it’s easier to change the problem than to solve it as stated. • Sometimes people don’t describe the problem that they need solved. • Sometimes it’s easier to solve a more general problem (thornless fruit? Or padded shirts)

  15. What Most People Want • Find this string or phrase in this element. • That’s all most people ask for. • It’s all they want. • But it’s hardly ever all they need.

  16. The real needs • Needs of other staff • Executives who understand the problem • Indirect needs • internal use by software • other departments • private uses by sneaky employees • enabling technologies change perspectives

  17. I didn’t know I could... • Quality control • check for known errors • find unusual words or phrases • phrases not marked up • Analysis • look for unusual markup • co-location (phrase summary)

  18. Oooh, can you really? • Automatic linking • Glossary • Glossary Index page • Dictionary Samples • Add markup automatically • based on phrases in context

  19. Summery Summary • You may need more than you thought… • …but it might do more than you expected… • …but...

  20. There is no gooseberry pie for lunch.

  21. Liam Quin Barefoot Computing Toronto http://www.valinor.sorcery.net/~liam/ liam@holoweb.net

More Related