1 / 19

Human Expertise and Artificial Intelligence in Vertical Search

Human Expertise and Artificial Intelligence in Vertical Search. Peter Jackson & Khalid Al-Kofahi Corporate Research & Development. Horizontal versus Vertical Search. The Paradox of Search. The further you get from keyword indexing and retrieval, the harder it is to explain a search result

newton
Download Presentation

Human Expertise and Artificial Intelligence in Vertical Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Expertise and Artificial Intelligence in Vertical Search Peter Jackson & Khalid Al-Kofahi Corporate Research & Development

  2. Horizontal versus Vertical Search

  3. The Paradox of Search • The further you get from keyword indexing and retrieval, the harder it is to explain a search result • Professional searchers demand transparency • Tool versus appliance • You need an ‘explanatory model’ that people can relate to and understand, even if it is actually just a cartoon of the real process • Examples: Basic PageRank, Collaborative Filtering • Such models don’t work so well in vertical domains • Links aren’t always endorsements • Sparsity of data in smaller communities

  4. Recent Trends in Search • Fragmentation of ‘horizontal’ search • Media, location, demographics (Weber & Castillo, 2010) • More sophisticated models of user behavior • Post-click behaviors (Zhong, Wang, et al, 2010) • ‘Practical semantics’ versus Semantic Web • Maps as search results for local, micro-results • Incorporation of domain knowledge into search • Taxonomies, vocabularies, use cases, work flows

  5. The Example of Legal Search • The completeness requirement • Recall as important as precision • Less redundancy than on the Web • The authority requirement • Court superiority, jurisdiction • Highly cited cases and statutes • Supercession by statute or regulation • The multi-topical nature of documents • Case may cover many points of law but only cited for one • Citations can be negative as well as positive per topic >These factors also apply to scientific documents

  6. Power Law and Legal Topics

  7. Power Law and Westlaw Users

  8. Expert Search • In many verticals, there are at least two sources of expertise available for enhancing search • Editors and authors, who generate useful metadata • Users, who generate clickstreams and other data • Editorial value addition improves recall especially • Helps find both fat neck and long tail document on a topic • Aggregate user behavior mostly improves precision • Power users find most relevant and important documents • The model of expert search enables and explains the portfolio of results, rather than individual results

  9. Sources of Evidence:Authors & Editors case Burger King Corp, V. Rudzewicz case case = = = = = = = = = 17201 3 (A) 28 (B) 35 4 (A) 5 (B) = = = = = = = = = = = = = = = = = = Headnote, KN Headnote, KN text text text text citation text citation text text case case case = = = = = = = = = = = = = = = = = = = = = = = = = = = 205,310 5 (A) 19 (B) case case case case = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Issue: Long arm jurisdiction 12 A (Key cases) 54 B (Highly Relevant) 9

  10. Sources of EvidenceAuthors & Editors cases cases ALR = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Burger King Corp, V. Rudzewicz cases cases CJS = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = HN1 KN1 HN2 KN2 HN3 KN2 …. …. …. .... HN35 KN14 cases cases AMJUR = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = Another set of related cases 10

  11. Sources of Evidence: Users (I) cases Session 1 = = = = = = = = = = = = = = = = = = = = = = = = = = = Click Query 1 Burger King Corp, V. Rudzewicz Click Actions Print Query 2 KeyCite Query 3 cases Session N = = = = = = = = = = = = = = = = = = = = = = = = = = = Print Actions Click Query N Link query language to document language via click, print, and cite checking behaviors Identify documents that are co-clicked, co-printed, etc, with the Burger King case across user sessions 11

  12. Sources of Evidence: Users (II) cases Session 1 In the last 3 months = = = = = = = = = = = = = = = = = = = = = = = = = = = Burger King Corp, V. Rudzewicz Click Actions Query 1 Original breach of contract and trademark infringement case turned into a civil procedure case about jurisdictionon appeal "personal jurisdiction” 176"minimum contacts” 50"forum selection clause” 39“personal jurisdiction” 39"forum non conveniens” 32"choice of law” 29 cases Session N = = = = = = = = = = = = = = = = = = = = = = = = = = = Print Actions Query N User actions: 10417 Total sessions: 9758 12

  13. AI & The Ranking Problem • Supervised Machine Learning (Ranker SVM) • Iteratively retrieve and rank documents • Incorporate all available cues: text similarity, classifications, citations, user behavior and query logs • All of this requires lots of data! • Training & Validation • Gold data: hand-crafted research reports covering a variety of legal issues • Report contains an issue statement, multiple queries, all seminal, highly relevant documents, some relevant docs • > 100K documents judged against ~400 legal issues • System was also tested by an independent 3rd party

  14. Hadoop for Big Data Processing • At launch, query logs contained ~ 2 Billion records • Queries & user actions • Relied on a Hadoop cluster to • Extract, Transform, and Load processes. • Cluster similar queries together • Extract, normalize, collate citation contexts • Dramatic improvement in processing times • From tens of hours to tens of minutes

  15. Hadoop: Typical Speed Ups

  16. Cluster Configuration: Queries • 8 machines, each with 16 cores • Only 14 cores/machine were available for processing • Giving a total of 112 cores • Block size of 64 MB • Each core processes one block at a time • Cluster can process 7 GB at each step • Latest cluster is twice the size: 224 cores • Almost 1 TB of memory and over 1 PB of storage

  17. The Power of Expert Search • Leverages expertise of community: authors, editors, & users • We know why documents are linked • We know exactly who our users are • Metadata, authority & aggregated user data all contribute to relevance, importance & popularity • Can still benefit from Power Law phenomena so common on the Web • Can exploit data parallelism to achieve the same kind of scale as horizontal search

  18. Lessons Learned • Vertical search is not just about search • It’s about findability • Includes navigation, recommendations, clustering, faceted classification, etc. • It’s about satisfying a set of well-understood tasks • Usually on enhanced content • Usually for expert customers • Leveraging human value addition is key • None of the human actors set out to improve search • Difficult to design complete solution upfront • Need platform for experimentation and validation at scale

  19. questions? • A relevant paper is downloadable from http://labs.thomsonreuters.com

More Related