1 / 17

Structured Queries for Legal Search

This paper discusses structured queries and their effectiveness in the legal search domain, using the TREC 2007 Legal Track as a case study.

jfarber
Download Presentation

Structured Queries for Legal Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structured Queries for Legal SearchTREC 2007 Legal Track Yangbo Zhu, Le Zhao,Jamie Callan, Jaime Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University 11/06/2007

  2. Agenda • Introduction • Main task – ad hoc search • Routing task – relevance feedback

  3. AND OR W/5 guide OR OR strategy family movie approval “G rated” film What is legal search • Goal: retrieve all documents for production requests. • Production request: describes a set of documents that the plaintiff forces the defendant to produce. • Recall-oriented: high risk (value) of missing (finding) important documents. Final query Sample request text: All documents discussing, referencing, or relating to company guidelines, strategies, or internal approval for placement of tobacco products in movies that are mentioned as G-rated.

  4. Data set • 7 million business records from tobacco companies and research institutes. • Metadata: title, author, organizations, etc. • OCR text: contain errors • 50 topics generated from four hypothetical complaints created by lawyers

  5. Main task – Ad hoc search Indri query formulation • Without boolean constraint #combine(ranking function) • With boolean constraints #filreq( #band(boolean constraint) #combine(ranking function) )

  6. AND OR W/5 guide OR OR strategy family movie approval “G rated” film Boolean constraint • Translate the Final Query

  7. AND OR W/5 guide OR OR strategy family movie approval “G rated” film Ranking functions • Bag of words (guide strategy approval family G rated movie film) • Respect phrase operators (guide strategy approval family #1(G rated) movie film) • Group synonyms together (#syn(guide strategy approval) #syn(family #1(G rated)) #syn(movie film))

  8. Experiments and findings • Boolean constraints improve recall and precision • Structured queries outperform bag-of-words ones * B is the number of documents matching the Final Query. Its average value is 5000.

  9. Per topic performance(Difference to the median of 29 manual runs) • est_RB • est_PB

  10. Routing task of Legal track 2007 • Structured queries are known to be hard to construct. • Not, with supervision • Questions • Weighted query help? • Metadata&Annotations help? • A definitive answer from Supervised Structured Query Construction

  11. Structured query • #weight( w1 t1 w2 t2 … wn tn)

  12. Supervised Structured Query Construction • Relevance feedback => supervised learning • Train linear SVM with keyword, keyword.field feature • SVM classifier • fi : training weights for terms, choose to be tfidf/LM scores • Retrieval: #weight( w1 t1 w2 t2 … ) • fi : tfidf/LM scores for terms • Advantages • Given enough training, know for sure whether one type of feature helps

  13. Example Query • <RequestNumber>13</RequestNumber> • <RequestText>All documents to or from employees of a tobacco company or tobacco organization referring to the marketing, placement, or sale of chocolate candies in the form of cigarettes.</RequestText> • <FinalQuery>(cand! OR chocolate) w/10 cigarette!</FinalQuery>

  14. Annotations NE: bush.person sentence: violate.sent meta: television.title • Feedback query:

  15. Performance On 39 topics of Legal 2006 (2/3 of judged documents for training, the rest for testing) On 10 topics of Legal 2007 routing task

  16. Routing Conclusions • A principled way of constructing structured queries • Annotations • Query term weights • Answers from a supervised learning algorithm • Weights helps, annotations less.

  17. Thank you! Questions?

More Related