Presentation Agenda

Presentation Agenda • Introduction • NSF Project Overview • Current State Of The Art • Our Understanding Of Your Requirements • Design • Implementation / Demo • Progress • Questions?

eRulemakingCS501 Presentation 1 The Workgroup

Who We Are • Sam Phillips • MEng in CS • Dan Rassi • Junior in CS • Michael Wang • MEng in CS • Krzysztof Findeisen • Senior in Astro and CS • Raymond McGill • Senior in CIS

Federal Rulemaking • Executive agencies issue over 40001 regulations per year • Preliminary regulations published daily as Notices of Proposed Rulemaking (NPRMs) • Public can submit feedback on NPRMs • Usually ~100, up to 500,000 comments per regulation • 1C. Cardie, C. Farina, & T. Bruce.Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference on Digital Government Research, San Diego, 2006.

Rules and Comments • Rules tend to be long and address several “issue topics”. • Well organized, “written like laws” • Comments vary in type significantly • From individuals in organizations (e.g. Sierra Club, NRA) • From professionals (e.g. lawyers / lobbyists / domain experts) • From potential stakeholders (beneficiaries, those potentially hurt) • General public • Comments may address none to several of the “issue topics”

Federal directive to read and consider all comments Currently comments are read and sorted by hand For controversial issues, this is a lot of work! Natural Language Processors (NLPs) can be used to classify comments NLP software is “trained” through annotation of a subset of comments Putting the “e” in “eRulemaking” Ideally the system can be automated

The Project • The Legal Information Institute (LII) is working on automating the sorting process • Our propose [sic] to apply and develop a range of methods from the field of natural language processing (NLP) to create NLP tools to aid agency rule writers in: • organization, analysis, and management of the sometimes overwhelming volume of comments, studies, and other supporting documents associated with a proposed rule; and • analyzing proposed rules to flag possibly relevant mandates from the large number of statutes and Executive Orders that require studies, consultations, or certifications during rulemaking. • C. Cardie, C. Farina, & T. Bruce.Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference on Digital Government Research, San Diego, 2006.

The Stakeholders

Related Projects • Carnegie Mellon is working on a set of analysis tools2 • Comment statistics • Redundancy • Stakeholder phrases • Correlations between issues • Unknown interest groups • University of Pittsburgh and University of Southern California are also working on eRulemaking. • 2J. Callan, R. Krishnan, & P. Suen. CMU eRulemaking Project Description. http://erulemaking.cs.cmu.edu/.

Current Analyst Workflow • Analysts receive comments by e-mail • They filter comments for useful statements • Build an issues-(comment summary) matrix as they read comments • Categorize type of commenter • Organize by section of regulation • Combine massive charts, discuss, analyze • If rule is adopted, analysts publish statement on how they addressed the comments3 3C. Cardie, C. Farina, & T. Bruce.Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference on Digital Government Research, San Diego, 2006.

Current LII Annotator Workflow • Annotators have set of ~300 comments from Department of Transportation • Annotators agree a priori on set of issues • Issue set relatively large (38) • Annotators identify phrases in each comment with one or more issues (this is annotating) • Multiple annotators per comment for research purposes • Early annotating picks up overlooked issues – Tom Bruce updates issue set • Annotated comments delivered to the NLP group

Callisto Demo • Callisto is the software LII annotators use to annotate • Callisto is published by MITRE, Inc. • Although it works, it is not well-suited for eRulemaking

Term Dictionary • Rule / Reg.: Proposed rule by a federal agency • Rulemaker / Analyst: Domain expert in agency • Issue: A logical facet which the Rule impacts. • Annotate / Tag (v): To “highlight” text and associate it with a specific issue. • Tag (n): The implementation of a tag as metadata • Flag (n): Non-issue related metadata (e.g. workflow)

Requirements • Our understanding of your immediate requirements is: • The system is accessible to any reasonable client system • The system can display several hundred annotated or NLP-processed comments and indicate how each comment is classified • The system must be extensible, so that the LII can continue working towards a production system • The system can display the annotations associated with each comment • The system allows users to add or modify annotations

Requirements • Our understanding of your optional requirements is: • The system can feed comments with changed annotations into the NLP • The system allows users (or a subset thereof) to change the set of issues associated with a regulation (grow/collapse) • The system allows comments to have flags not directly related to issues • The system can handle large numbers of regulations (thousands) and comments per regulation (tens or hundreds of thousands)

Requirements • Our understanding of your long-term requirements is: • The system supports hierarchies of issues • The system blends into the federal department’s workflow • The system must be easy to set up and install

Assumptions • Government agencies work roughly as summarized as the transcripts provided. • When government agencies adopt annotation, they will do so similarly to the LII • LII prefers a solid but feature-sparse prototype to feature-laden but not as easily extensible version • LII prefers a system designed for Rulemakers first with “research” interests as a secondary concern.

Design / Implementation • Based on your requirements, we have selected the iterative design process. • Several iterations at whole project • Implement mock-ups to help clarify • Many stake-holders • Full requirements unknown • Underspecified UI very important • Prototype System • Desires may change is practical issues crop up

Design • Our system design is a standard relational database-driven website • Lots of implementation software available • “Drag and drop” content modules • Minimal retraining of team members • Natural “three tier” architecture • Front-end / Middleware / Backend can be replaced independently • Simple cross-platform compatibility because of web interface

Main Module

Registration Module

Admin Module

User Module

View Module

Annotation Module

Implementation • The website will be written using the Drupal content management system • Designed to produce dynamic websites with minimal administration • LII was already considering Drupal for another project • The database will be running on a mySQL system already present on LII servers • No installation required • Most content management systems require an SQL-based relational database

UI Alpha 0.1 Demo • First Release Alpha Website

Questions? • Any Questions?

Presentation Agenda