230 likes | 405 Views
REGNET. An Information Infrastructure for Government Regulations. Stanford University Gloria Lau Dr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold. WITS’03 Dec 13th, 2003. Motivation. Multiple sources of regulations E.g. federal, state, local Different formats Conflicting ideas
E N D
REGNET An Information Infrastructure for Government Regulations Stanford University Gloria LauDr. Shawn Kerrigan Dr. Kincho Law Dr. Gio Wiederhold WITS’03 Dec 13th, 2003
Motivation • Multiple sources of regulations • E.g. federal, state, local • Different formats • Conflicting ideas Need for a repository • Locate relevant information • E.g. small business Need for analysis tool • Complexity of regulations • Multiple sources • Understanding of regulations & their relationships
Example 1 ADAAG Appendix 4.6.3 … Such a curb ramp opening must be located within the access aisle boundaries, not within the parking space boundaries. CBC 1129B.4.3 … Ramps shall not encroach into any parking space. Exception: 1. Ramps located at the front of accessible parking spaces may encroach into the length of such spaces … • CBC allows curb ramps encroaching into accessible parking stall access aisles, while ADA disallows encroachment into any portion of the stall.
Example 2 ADAAG 4.7.2 Slope. …Transitions from ramps to walks, gutters, or streets shall be flush and free of abrupt changes… CBC 1127B.5.5 Beveled lip. The lower end of each curb ramp shall have a ½ inch (13mm) lip beveled at 45 degrees as a detectable way-finding edge for persons with visual impairments. • ADAAG focuses on wheelchair traversal; CBC focuses on the visually impaired when using a cane.
Scope • Repository development • Shallow parser • Feature extraction • Ontology development • Automated extraction of related provisions • Feature matching • Structural matching • Application to e-rulemaking • Compliance assistance using a Q&A system • FOPC logic implementation • Q&A compliance check
Shallow parser • Data Source • Accessibility standards • US, UK and Scotland • Drinking water standards in Environmental regulations • Federal and California • Current standard: HTML, PDF, hardcopy... • Our system standard: XML • Unit of extraction: section <regElement name=”ufas.4.32.1” title=”minimum number” asterisk=”0” > <regText> Fixed or built-in seating, ... </regText> <ref name=”ufas.4.5” num=”1” /> <ref name=”ufas.4.32” num=”1” /> </regElement>
40 CFR 279 … Subpart A Subpart B Subpart I (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. … … Section 279.10 Section 279.11 Section 279.12 Subsection (a) Subsection (b) Subsection (c) Example: … (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units … contains Automated Translation to Hierarchical Structure PART 279—Standards For The Management Of Used Oil Subpart B – Applicability … § 279.12 Prohibitions. (a) Surface impoundment prohibition. Used oil shall not be managed in surface impoundments or waste piles unless the units are subject to regulation under parts 264 or 265 of this chapter. (b) Use as a dust suppressant. The use of used oil as a dust suppressant is prohibited, except when such activity takes place in one of the states listed in § 279.82(c). (c) Burning in particular units. Off-specification used oil fuel may be burned for energy recovery in only the following devices: (1) Industrial furnaces identified in § 260.10 of this chapter; (2) Boilers, as defined in § 260.10 of this chapter, that are identified as follows: (i) Industrial boilers located on the site of a facility engaged in a manufacturing process where substances are transformed into new products, including the component parts of products, by mechanical or chemical processes; ….
Feature extraction • Generic features • Concepts • Exceptions • Definitions • Domain-specific features • Glossary terms • Author-prescribed indices • Effective dates • Measurements • Chemicals, e.g., drinking water contaminants
XML regulation with features added Original section 141.11.b from the 40 CFR § 141.11 Maximum contaminant levels for inorganic chemicals. (a) The maximum contaminant level for arsenic applies only to community water systems ... (b) The maximum contaminant level for arsenic is 0.05 milligrams per liter for community water systems until January 23, 2006. Refined section 141.11.b in XML format <regElement id=”40.cfr.141.11.b” name=””> <dwc name=”arsen” times=”1” /> <concept name=”commun water system” times=”1” /> <measurement unit=”ppm” size=”0.05” quantifier=”max” /> <date to=”January 23, 2006” /> ... <regText> The maximum contaminant level for arsenic is 0.05 milligrams per liter for community water systems until January 23, 2006. </regText> </regElement>
Similarity Score computation • Feature matching • f0 = (i = featuresfi) / # features i • Features • Concept & index match • tf idf vector • tf = term frequency • idf = inverse document frequency = log(n/ni) • Chemical match • Measurement match • Exception match • Effective date match • Glossary/definition term match
Score refinements • Near-tree neighbors • Self vs. parent-sibling-child (psc), fs-psc • psc vs psc, fpsc-psc
Score refinements • Reference distribution, frd • Not-so-immediate neighbor effect on score • E.g. f(A5.3, U6.4(a)) updates f(A2.1, U3.3)
Preliminary results: UFAS vs BS8300 • Phrasing difference between American and British regulations ufas.4.13.9Door Hardware. Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy … bs8300.12.5.4.2Door Furniture. Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip … • Neighbor similarities imply similarity between the interested nodes
Preliminary results: e-rulemaking • Application domain: e-rulemaking • Comparison between draft of rules and the associated public comments • ADAAG Chapter 11, rights-of-way draft • Less than 15 pages • Over 1400 public comments received within 4 months • Comments ~10MBin size; most are several pages long New regulation draft can easily generate a huge amount of data that needs to be reviewed and analyzed
Preliminary results: e-rulemaking • Related draft section and public comment Adaag.1105.4.1 Where signal timing is inadequate for full crossing of all traffic lanes or where the crossing is not signalized, cut-through medians … Deborah Wood, October 29, 2002 … This often means walk lights that are so short in duration that by the time a person who is blind realizes … • No identified related section Donna Ring, September 6, 2002 If you become blind, no amount of electronics … will make you safe … You have to learn modern blindness skills from a good teacher. You have to practice your new skills … Concern not addressed in the draft
Conclusions • An infrastructure for • Repository development • Shallow parser • Feature extraction • Ontology development • Automated extraction of related provisions • Feature matching • Structural matching • Application to e-rulemaking • Compliance assistance using a Q&A system • FOPC logic implementation • Q&A compliance check • Future Directions • Application on other semi-structured documents • Inconsistency identification