170 likes | 497 Views
syn(guide strategy approval) #syn(family #1(G rated)) #syn(movie film) ... movie. film. Experiments and findings. Boolean constraints improve recall and precision ...
E N D
1. Structured Queries for Legal SearchTREC 2007 Legal Track Yangbo Zhu, Le Zhao,Jamie Callan, Jaime Carbonell
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
11/06/2007
2. Agenda Introduction
Main task – ad hoc search
Routing task – relevance feedback
3. What is legal search Goal: retrieve all documents for production requests.
Production request: describes a set of documents that the plaintiff forces the defendant to produce.
Recall-oriented: high risk (value) of missing (finding) important documents.
4. Data set 7 million business records from tobacco companies and research institutes.
Metadata: title, author, organizations, etc.
OCR text: contain errors
50 topics generated from four hypothetical complaints created by lawyers
5. Main task – Ad hoc search Indri query formulation
Without boolean constraint
#combine(ranking function)
With boolean constraints
#filreq( #band(boolean constraint) #combine(ranking function) )
6. Boolean constraint Translate the Final Query
7. Ranking functions Bag of words
(guide strategy approval family G rated movie film)
Respect phrase operators
(guide strategy approval family #1(G rated) movie film)
Group synonyms together
(#syn(guide strategy approval) #syn(family #1(G rated)) #syn(movie film))
8. Experiments and findings Boolean constraints improve recall and precision
Structured queries outperform bag-of-words ones
9. Per topic performance(Difference to the median of 29 manual runs) est_RB
10. Routing task of Legal track 2007 Structured queries are known to be hard to construct.
Not, with supervision
Questions
Weighted query help?
Metadata&Annotations help?
A definitive answer from Supervised Structured Query Construction
11. Structured query
#weight( w1 t1 w2 t2 … wn tn)
12. Supervised Structured Query Construction Relevance feedback => supervised learning
Train linear SVM with keyword, keyword.field feature
SVM classifier
fi : training weights for terms, choose to be tfidf/LM scores
Retrieval: #weight( w1 t1 w2 t2 … )
fi : tfidf/LM scores for terms
Advantages
Given enough training, know for sure whether one type of feature helps
13. Example Query <RequestNumber>13</RequestNumber>
<RequestText>All documents to or from employees of a tobacco company or tobacco organization referring to the marketing, placement, or sale of chocolate candies in the form of cigarettes.</RequestText>
<FinalQuery>(cand! OR chocolate) w/10 cigarette!</FinalQuery>
14. Annotations
Feedback query:
15. Performance
16. Routing Conclusions A principled way of constructing structured queries
Annotations
Query term weights
Answers from a supervised learning algorithm
Weights helps, annotations less.
17. Thank you!
Questions?