160 likes | 265 Views
LIS618 lecture 6. Thomas Krichel 2003-10-26. Structure. Probabilistic model News from the front line Open WorldCat Pilot Amazon Search Inside the book. probabilistic model ( outline only ). starts with the assumption that there is a subset of documents that form the ideal answer set
E N D
LIS618 lecture 6 Thomas Krichel 2003-10-26
Structure • Probabilistic model • News from the front line • Open WorldCat Pilot • Amazon Search Inside the book
probabilistic model (outline only) • starts with the assumption that there is a subset of documents that form the ideal answer set • query process specifies properties of the answer set • query terms can be used to form a probability that a document is part of the answer • then we start an iterative process with the user to gain more characteristics about the answer set
recursive method • The similarity of the document to the query can be expressed as • s=(probability that the document is part of the answer set / probability that it is not part of the answer set). • If we assume that the probability that the documents that are relevant among a set of initially retrieved documents is proportional to the appearance of index terms that are part of the query, the probability can further be refined.
OpenWorldCat pilot • Aims • to increase the visibility of library collections to current and potential patrons • to enhance the image of libraries to administrators and funding agencies • improve the quality of material accessible from the Web • Pilot ends June 2004
OCLC and Google • OCLC have offered 2 Million out of the 53 Million records of wordcat to Google for indexing. • Only popular records with a minimum of 100 libraries holding them. • Applies only to the 12k academic, public and school libraries that contribute to WorldCat. Others have to ask to participate.
Problem of project • Too thin page ranking • Too thin contents • Too little, much to late? • Google is working on a project to allow full-text access to book. Currently they have agreements for 60,000 books.
Amazon search inside the book • Access via any Amazon search box and enter your search terms. • Implied and, phrase searching seems not to work • A typical list of titles is returned. However, some titles contain extra information and links. They appear directly below the pricing information and begin with the word "exerpt.“ • Click here and you'll see a scanned image of the page with your search term(s) highlighted. • You'll need to be registered with Amazon.Com to access the full-text. • Amazon is using optical character recognition technology to find words embedded in the scanned images.
Amazon search inside the book • Search amazon.com for “radstock coal” • Two books with excerpts. • You could either get the book from amazon • Or you could take the reference and search if can be found in a library near you, with the previous Google service. • OpenURL technology may be helpful.
OpenURL Example • Andy Powell has built an OpenURL based link resolver at http://www.ukoln.ac.uk/distributed-systems/openurl/orlet/ • Click on Go. • Put this link onto the link bar. • Then open a new amazon query • When an interesting book is found, resolve it through the resolver.
ISBN in Amazon • Resolver works beause amazon URL has isbn encoded… • If amazon find that their service is used to find books in a library, they will not be pleased. • But removing the ISBN will also remove the chance of others linking to Amazon to say “buy the book there”. • Use of the full-text search could also be made via the SOAP API of Amazon, to build an integrated system.
Authors to loose out? • The authors guild of america has appealed to members to block their works being searchable. They are especially concerned for • reference works • cook and travel books • Some publishers say that they will not accept blocking requests.
http://openlib.org/home/krichel Thank you for your attention!