1 / 12

Relevance Ranking and Clustering

Relevance Ranking and Clustering. Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006. Motivation. Help people find what they’re looking for. The problem. A reference librarian often has lots of context when someone walks up to them and says “The Civil War”

nanji
Download Presentation

Relevance Ranking and Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relevance Rankingand Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006

  2. Motivation Help people find what they’re looking for

  3. The problem • A reference librarian often has lots of context when someone walks up to them and says “The Civil War” • Location • Age • Clothing • What’s on the local syllabus • Books they’re carrying • Past interactions • … • A computer program has 13 characters

  4. Diversion – improving the context? • IP addr • ANU, DFAT, BHP, Nicholls Primary • Search history • “spanish history”, “franco”, “gettysburg” • Referrer • ANU Library, Wikipedia, MySpace • Browser • Visually impaired user?

  5. Relevance ranking “The Civil War”: more relevant if • Occurs in Title/Subject/Author rather than notes/TOC; main Title/Author rather than added entry… • Occurs as a phrase or near phrase rather than as scattered words • Occurs as an exact match • Occurs multiple times (especially the unusual words) • Occurs as the only or main words (e.g., as the only subject rather than as 1 of 10) • Is a collection level record • Is widely held • Is held by one of your libraries • Is on the shelf at one of your libraries • Is available online • Is highly rated (sales/reviews) on Amazon or LibraryThing • Is widely cited by other books or by credible web pages • Is available for inexpensive purchase and quick delivery new or second hand

  6. Relevance Ranking Two approaches • TeraText Gateway • Issue a series of searches on each successive criteria • Very hard to incorporate non-binary factors (such as quality of phrase match, number of holdings, …) • Lucene • Combine a “score” for each criteria with an innate “score” for each work

  7. Relevance Ranking Example http://ll01.nla.gov.au/

  8. Clustering Relevance ranking only takes you so farRelevant to what? • English civil war • US civil war • Spanish civil war • Angolan civil war • The church and civil wars • Post-colonial civil wars Relevant to whom? • Audience • Date published • Form • Picture book • Movie • Thesis…

  9. Clustering Group results by various criteria • Subjects (hierarchy or parts/facets) • Material type/form • Genre • When published • Audience • Classification (Dewey, LC) • Author

  10. Extracting data from the MARC record for ranking and clustering • What’s a “title”? • Deriving ranking and clustering fields • Can we use LC/Dewey code names as “subjects”? http://ll01.nla.gov.au/search.jsp?topic=class%253A632%2BPlant%2Binjuries%252C%2Bdiseases%252C%2Bpests • Can we reliably set “audience” based on650 0 v Juvenile fictionGenre: “percussion xylophone” based on048 a pb01Genre: “bibliography” and “technical report” based on008 040308s2003    xraa     bt  f000 0 engSubject: “United States -- Florida” based on043 a n-us-fl

  11. Clustering Example http://ll01.nla.gov.au/

  12. Please Help http://ll01.nla.gov.au/ is a prototype • What do you like and dislike about it? • How can it be improved?

More Related