Query-based Opinion Summarization for Legal Blog Entries

Query-based Opinion Summarization for Legal Blog Entries • Jack G. Conrad, Jochen L. Leidner, Frank Schilder, Ravi Kondadadi • Corporate Technology Research & Development • Twelfth International Conference on Artificial Intelligence & Law (ICAIL 2009) • Barcelona, Spain • 8-12 June 2009

OUTLINE • INTRODUCTION • RELATED WORK • SYSTEM • METHODOLOGY • RESULTS • CONCLUSIONS • FUTURE WORK • AI & LAW PROPOSAL

INTRODUCTION (1/4) — Motivations • Amount and rate of legal information flow increasing • Demands on attorneys for work products very high • Essential for productivity tools to be efficient • Legal blogs provide a more immediate forum • Unmoderated, instantaneous, candid, terse • Contain rich viewpoints, individual or in aggregate • Missing piece: ability to summarize blog entries • Legal professionals busy synthesizing traditional legal materials (cases, statutes, analytical documents) • Pressures due to case load and schedules immense • Increasingly impossible to keep up with information bandwidth • Means of consolidating, summarizing artifacts invaluable

INTRODUCTION (3/4) • Key contributions • First work to perform multi-document opinion-based summarization on legal blog entries • Extends the TAC evaluation of opinion summarization task to assess the accuracy of measured polarity, using expert reviewers • Presents a proposal to the AI & Law community — host a formal track to pursue the topic in a more structured, in-depth manner

INTRODUCTION (4/4) • Opinion Mining for Legal Blogs • Prospective Applications • Monitoring — follow what communities are saying about firms, products, services, topics • Alerting — inform subscribers of unfavorable developments • Profiling — represent litigation patterns of attorneys, courts ... • Tracking — study decisions of judges, reputations of firms ... • Exploration/Education — present law students with contrasting opinions

Ashley & Aleven (1991 ff.) Intelligent tutoring Lerman & McDonald (2009) Sentiment-modeled summarizers Conrad, Leidner, Schilder and Kondadadi (2009) Blawg sentiment summarization RELATED WORK (1/2) Hachey & Grover (2006) Argumentative zoning Saravanan & Raman (2006) Conditional Random Fields Conrad & Schilder (2007) Blawg polarity classification Summarization TREC, TAC, et al. ICAIL, JURIX ICWSM Legal Domain Sentiment Analysis

RELATED WORK (2/2)— TAC, the Text Analysis Conference (www.nist.gov/tac/) • a new annual international workshop sponsored by NIST • the US National Institute of Standards & Technology • organizers disseminate NLP-type tasks and datasets • participants develop systems that solve the tasks • submit their results to NIST for evaluation • members can also propose new tasks for future workshops • the sentiment summarization pilot task consisted of producing short, coherent sentiment summaries of blog text • Thomson Reuters R&D addressed the task • system produced multi-document summaries

SYSTEM (1/3) — Workflow Diagram for Blawg Opinion Summarization Sample Query: Has Google been a consistent supporter of Net neutrality? Sample Target: Google Net Neutrality

SYSTEM (2/3) — FastSum, design and application • TR’s legal blog opinion summarization system • multi-document summarization system • harnesses regression Support Vector Machine (SVM) for ranking candidate sentences • original system extended to sentiment • current system applied to legal domain (blawgs) Summarization (2007) (2008) (2009) Legal Sentiment Summarization Summarization

SYSTEM (3/3) FastSum Blog Opinion Summarization Processing • Key Modifications • A.1 HTML parsing & clean-up module • B.1 Question sentiment & target analyzer • C.1 Sentence tagger • C.2 Target overlap 13

METHODOLOGY (1/7) • Application of Thomson Reuters’ legal blog opinion summarization system • Data collection via Web-based queries • submitted to Web Search Engine, Blog Search Engine • Summary generation • via modified FastSum System • Evaluation • human assessment • two assessors rated each summary • measures modeled on TAC metrics

METHODOLOGY (2/7) Blog Search Engines Examined along with Their Properties

METHODOLOGY (2/4) 17

METHODOLOGY (5/7) — Evaluation • Metrics used modeled on TAC (et al.) evaluation • Two metrics used: • Responsiveness • Linguistic Quality • Scale: Five-point Likert [1- 5 ] • 5 = high • 1 = low • Scores generally track those of TAC, though task not completely identical

METHODOLOGY (6/7)— Evaluation: Responsiveness Reviewer Guidelines for Responsiveness [1-5]

METHODOLOGY (7/7)— Evaluation: Linguistic Quality Reviewer Guidelines for Linguistic Quality [1-5]

RESULTS (1/2) — Baseline Averages • Scores comparable to those of TAC 2008 (in 2-3 range) • Caveat —we scored for correct sentiment polarity; TAC didn’t • Kappa statistic for inter-rater agreement between pair, Κ = 0.75

RESULTS (2/2) — Sample FastSum Summary

RESULTS (2/2) — Sample FastSum Summary Deficient Topical overlap Display of sentiment Useful to researcher

CONCLUSIONS (1/1) • Amount, rate of legal information flow growing • Summarization, identification of trends increasingly valuable • Forums like TAC-opinion summarization beginning to study topic • For certain legal research, such synopses can be very helpful • Viewpoints, individually or in aggregate, can expand arguments, comprehension of underlying legal issues • First effort to produce automatic opinion summaries for entries in legal blog space • Based on multiple documents • For pre-specified polarity • Trained on general, homogeneous news documents (okay) • Trained on specific heterogeneous legal blogs (better) • Assessed by expert legal reviewers • Baseline scores in the low 2.0s out of 5 (comparable to TAC)

FUTURE WORK (1/1) • Compare to other summarization systems/techniques • From TAC or elsewhere • Test against model summaries and use the nugget pyramid evaluation method • Train the ML component of FastSum on various blog entries, rather than general news • Formalize the role input data has on result sets; and the impact output length has on results • Incorporate more structure • Qualitative — best template to harness? • Quantitative — optimal length for each section? • Leverage features from the legal domain • E.g., use a legal dictionary to help rank sentences

AI & LAW PROPOSAL (1/1) • For AI & Law (IAAIL) and TAC (NIST) • NIST offers research groups shared task in multi-document summarization • Why not focus on a shared task in the legal domain? • Need be assessed by IAAIL, NIST communities to determine interest • Who would benefit? • Legal practitioners — potentially highly beneficial results • Legal researchers — thanks to valuable testbed • AI & Law Community — can breath in new life, members • What data collections could be used? • TAC uses the very large BLOG06 collection • Text Entailment uses the RTE collection; a hybrid also possible

Query-based Opinion Summarization for Legal Blog Entries • Jack G. Conrad, Jochen L. Leidner, Frank Schilder, Ravi Kondadadi • Research & Development • Twelfth International Conference on Artificial Intelligence & Law (ICAIL 2009) • Barcelona, Spain • 8-12 June 2009 Gracias! ¿Preguntas?

INTRODUCTION (1/4) — Motivations • Modern legal information environment increasingly dynamic, fast-paced • Blawgs (legal blogs) provide a more immediate forum • Generally unmoderated, instantaneous, candid, terse • Viewpoints to be gleaned, in aggregate or individually, are rich • Missing piece: ability to summarize blog entries • Legal professionals busy simply synthesizing traditional legal materials (cases, statutes, analytical documents) • Pressures due to case load and schedules immense • Increasingly impossible to keep up with information bandwidth • Means of consolidating, summarizing artifacts invaluable

AI & LAW PROPOSAL (1/1) • For AI & Law (IAAIL) and TAC (NIST) • NIST offers research groups shared task in multi-document summarization • Why not focus on such a shared task in the legal domain? • Need be assessed by IAAIL, NIST communities to determine interest • Potentially of great benefit to legal practitioners • Could raise the bar on current baseline system • Could use: • Blog-based data set like BLOG06, as used in TAC 2008, with a legal component • RTE (Recognizing Text Entailment) data set, again with a legal component • a combination of the two

RELATED WORK (1/2) • Ashley & Aleven (1991 ff.) — produce intelligent tutoring applications to teach law students how to argue in the context of caselaw • Farzindar & Lapalme (2004) — present the LetSum system to summarize Canadian court decisions • Hachey & Grover (2006) — apply argumentative zoning to summarize decisions from the House of Lords • Saravanan & Raman (2006) — use statistical graphical models (CFRs) for legal summarization, while extracting rhetorical roles • Lerman, B.-G., and McDonald (2009) — show users have a strong preference for summarizers that model sentiment over non-sentiment baselines

SYSTEM (4/5) • FastSum’s legal blog opinion summarization system • Sequence of operation • Pre-processing • tokenization • sentence splitting • boiler plate expression removal (e.g., ‘Response by ...’) • Question analysis • sentiment analysis (tagging) • target analysis (matching) • Sentiment Filter • sentences with proper polarity selected; else, filtered out

RELATED WORK (2/2)— TAC, the Text Analysis Conference • a new annual international workshop sponsored by NIST • the National Institute of Standards & Technology • organizers disseminate NLP-type tasks and datasets • participants develop systems that solve the tasks • submit their results to NIST for evaluation • members can also propose new tasks for future workshops • the sentiment summarization pilot task consisted of producing short, coherent sentiment summaries of blog text • our system produced multi-document summaries • Related Conferences: • TREC — the Text Retrieval Conference (started in mid-90s) • DUC — Document Understanding Conference (from 2001-07) • evaluated many automatic summarization systems during period

SYSTEM (5/5) • FastSum’s legal blog opinion summarization system • Sequence of operation (cont.) • Feature extraction • focus largely on correspondence with terms in query • at different levels of granularity: title, description, document • also harness sentence-based features • length, position • Sentence ranker • trained regression SVM on feature set — goal: summary worthiness • Redundancy removal • basic idea — change relative importance of remaining sentences w.r.t. currently selected sentences

SYSTEM (5/5) • FastSum’s legal blog opinion summarization system • Sequence of operation (cont.) • Feature extraction • topic word frequency (title, description) • content word frequency • document frequency • headline frequency • sentence-based features (length, position) • Sentence ranker • trained regression SVM on feature set — goal: summary worthiness • Redundancy removal • basic idea — change relative importance of remaining sentences w.r.t. currently selected sentences <topic> <num> D0703A </num> <title> age discrimination </title> <narr> This expose documents the increasing occurrence of age discrimination in the workplace in Canada ... </narr> </topic>

Query-based Opinion Summarization for Legal Blog Entries