120 likes | 196 Views
Relevance Ranking and Clustering. Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006. Motivation. Help people find what they’re looking for. The problem. A reference librarian often has lots of context when someone walks up to them and says “The Civil War”
E N D
Relevance Rankingand Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006
Motivation Help people find what they’re looking for
The problem • A reference librarian often has lots of context when someone walks up to them and says “The Civil War” • Location • Age • Clothing • What’s on the local syllabus • Books they’re carrying • Past interactions • … • A computer program has 13 characters
Diversion – improving the context? • IP addr • ANU, DFAT, BHP, Nicholls Primary • Search history • “spanish history”, “franco”, “gettysburg” • Referrer • ANU Library, Wikipedia, MySpace • Browser • Visually impaired user?
Relevance ranking “The Civil War”: more relevant if • Occurs in Title/Subject/Author rather than notes/TOC; main Title/Author rather than added entry… • Occurs as a phrase or near phrase rather than as scattered words • Occurs as an exact match • Occurs multiple times (especially the unusual words) • Occurs as the only or main words (e.g., as the only subject rather than as 1 of 10) • Is a collection level record • Is widely held • Is held by one of your libraries • Is on the shelf at one of your libraries • Is available online • Is highly rated (sales/reviews) on Amazon or LibraryThing • Is widely cited by other books or by credible web pages • Is available for inexpensive purchase and quick delivery new or second hand
Relevance Ranking Two approaches • TeraText Gateway • Issue a series of searches on each successive criteria • Very hard to incorporate non-binary factors (such as quality of phrase match, number of holdings, …) • Lucene • Combine a “score” for each criteria with an innate “score” for each work
Relevance Ranking Example http://ll01.nla.gov.au/
Clustering Relevance ranking only takes you so farRelevant to what? • English civil war • US civil war • Spanish civil war • Angolan civil war • The church and civil wars • Post-colonial civil wars Relevant to whom? • Audience • Date published • Form • Picture book • Movie • Thesis…
Clustering Group results by various criteria • Subjects (hierarchy or parts/facets) • Material type/form • Genre • When published • Audience • Classification (Dewey, LC) • Author
Extracting data from the MARC record for ranking and clustering • What’s a “title”? • Deriving ranking and clustering fields • Can we use LC/Dewey code names as “subjects”? http://ll01.nla.gov.au/search.jsp?topic=class%253A632%2BPlant%2Binjuries%252C%2Bdiseases%252C%2Bpests • Can we reliably set “audience” based on650 0 v Juvenile fictionGenre: “percussion xylophone” based on048 a pb01Genre: “bibliography” and “technical report” based on008 040308s2003 xraa bt f000 0 engSubject: “United States -- Florida” based on043 a n-us-fl
Clustering Example http://ll01.nla.gov.au/
Please Help http://ll01.nla.gov.au/ is a prototype • What do you like and dislike about it? • How can it be improved?