1 / 27

Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS)

An eRulemaking Corpus: Identifying Substantive Issues in Public Comments. Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University. Plan for the Talk. Background E-rulemaking CeRI FTA Grant Circulars Corpus

sugar
Download Presentation

Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An eRulemaking Corpus: Identifying Substantive Issues in Public Comments Claire Cardie (CS+IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) CeRI (Cornell eRulemaking Initiative) Cornell University

  2. Plan for the Talk • Background • E-rulemaking • CeRI FTA Grant Circulars Corpus • Text Categorization Experiments

  3. RulemakingE-Rulemaking Rulemaking: one of the principal methods of making regulatory policy in the US ~4000+ per year “notice and comment” rulemaking: formal public participation phase 10 – 500,000 comments per rule comment length: 1 sentence – 10’s of pages agency legally bound to respond to all substantive issues E-rulemaking = e-notice and e-comment

  4. Current Agency Practice

  5. Goals of Our Current Work Determine the degree to which automatic issue categorization can facilitate analysis of comments by identifying and categorizing “relevant issues”. Framed as a text categorization task: Given a comment set, the automated system should determine, for each sentence in each comment, which of a group of pre-defined issue categories it raises, if any. Builds on the work of Kwon & Hovy (2007) and Kwon et al. (2006)

  6. Plan for the Talk Background CeRI FTA Grant Circulars Corpus Difficulties Interannotator agreement results Text Categorization Experiments

  7. FTA Grant Circulars Rule • Topic: guidance to public and private transportation providers applying for federal aid for elderly, disabled and low income persons • 267 comments • shortest: 1 sentence • longest: 1420 sentences • 11,094 sentences total

  8. FTA Grant Circulars Issue Set 17 top-level issues 39 fine-grained issues

  9. Kwon & Hovy (2007) vs.

  10. Difficulties for Text Categorization • Large, hierarchical issue set

  11. FTA Grant Circulars Issue Set 17 top-level issues 39 fine-grained issues

  12. Difficulties for Text Categorization • Large, hierarchical issue set • “NONE” category • Skewed distribution across issues • 87% of the sentences are from 6 categories • 13% of the sentences are from 33 categories • Potentially multiple issues per sentence. • Even long sentences contain few words. • Variation in comment quality, scope, vocabulary and form.

  13. The Annotators

  14. Interannotator Agreement • 146 comments used for the study • 6 annotators • 2.66 annotators per comment • 41.5 sentences per comment • Overlap agreement measure

  15. Plan for the Talk Background E-rulemaking Public comment analysis CeRI FTA Grant Circulars Corpus Difficulties Interannotator agreement results Text Categorization Experiments

  16. Fine-grained issues (39) Coarse-grained issues (17) Standard Text Categorization Algorithms

  17. Cascaded Categorization Some

  18. Cascaded Categorization

  19. Cascaded Categorization

  20. Gold Standard Data Set • Simulate agency comment analysis process • One analyst / rule • Six data sets • One data set / annotator

  21. SVM Results with tf.idf Features

  22. Best-Performing Fine-Grained Issues (Annotator 1)

  23. Progress and Plans • Promising initial results rule-specific issue categorization of public comments • Annotate comments for more rules • Expert (rulewriter) vs. law student annotation • Integrate automatic text categorization into annotation interface • Active learning (Purpura, Cardie & Simons, dg.o 2008) • Collaboration with HCI colleagues in InfoSci

  24. The End • For more on • the hierarchical text categorization method • Cardie et al. (dg.o 2008) • a new structural learning approach for hierarchical classification • Purpura et al. (in preparation) • active learning methods for hierarchical text categorization • Purpura, Cardie & Simons (dg.o 2008)

  25. Minimizing the Costliest Errors** **Underinclusive errors are the most costly

More Related