420 likes | 603 Views
REGNET. An E-Government Infrastructure for Regulation Parsing and Relatedness Analysis. Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold http://eig.stanford.edu/regnet Contact glau@stanford.edu http://eig.stanford.edu/glau. ADAAG in HTML. UK DDA in HTML. IBC in PDF.
E N D
REGNET An E-Government Infrastructure for Regulation Parsing and Relatedness Analysis Gloria Lau, Dr. Shawn Kerrigan, Prof. Kincho Law, Prof. Gio Wiederhold http://eig.stanford.edu/regnet Contact glau@stanford.edu http://eig.stanford.edu/glau
ADAAG in HTML UK DDA in HTML IBC in PDF Motivation • Multiple sources of regulations • Multiple jurisdictions: federal, state, local, etc. • Different formats, terminologies, contexts • Amending rules, conflicting ideas
Motivation • Multiple sources of regulations • Multiple jurisdictions: federal, state, local, etc. • Different formats, terminologies, contexts • Amending rules, conflicting ideas Need for a repository • Locate relevant information • E.g., small business: penalty fees for violations Need for analysis tool • Complexity of regulations • Multiple jurisdictions • Understanding of regulations & their relationships
Example 1: Related Provisions ADAAG Appendix 4.6.3 … Such a curb ramp opening must be located within the access aisle boundaries, not within the parking space boundaries. CBC 1129B.4.3 … Ramps shall not encroach into any parking space. Exception: 1. Ramps located at the front of accessible parking spaces may encroach into the length of such spaces … • CBC allows curb ramps encroaching into accessible parking stall access aisles, while ADA disallows encroachment into any portion of the stall.
Example 2: Related but Conflicting Provisions ADAAG 4.7.2 Slope. …Transitions from ramps to walks, gutters, or streets shall be flush and free of abrupt changes… CBC 1127B.5.5 Beveled lip. The lower end of each curb ramp shall have a ½ inch (13mm) lip beveled at 45 degrees as a detectable way-finding edge for persons with visual impairments. • ADAAG focuses on wheelchair traversal; CBC focuses on the visually impaired when using a cane.
Scope 1. Overview • Examples of system capabilities 2. Repository development 3. Relatedness analysis
Overview of System Capabilities: Parsing Original 40CFR 40CFR natural structure
Overview of System Capabilities: Parsing IBC in 2-columned PDF <regulation id="ibc" name="international building code" type="private"> <regElement id="ibc.1107" name="special occupancies"> … <regElement id="ibc.1107.2" name=“assembly area seating"> <reference id="ibc.1107.2.4.1" times="1" /> <concept name="assembl area" times="1" /> … <regText>Assembly areas with fixed seating shall comply with Sections … </regText> <regElement id="ibc.1107.2.1" name="services"> ... </regElement> </regElement> </regElement> </regulation> XML hierarchy
Overview of System Capabilities: Feature Parsing Extracted features Usages of features
Overview of System Capabilities: Comparisons Regulation comparison: 40CFR vs. 22CCR
Overview of System Capabilities: E-rulemaking Drafted regulations compared with public comments
Scope 1. Overview • Examples of system capabilities 2. Repository development 3. Relatedness analysis
Shallow parser • Data Source • Americans with Disabilities Act Accessibility Guide (ADAAG), Uniform Federal Accessibility Standards (UFAS), Code of Federal Regulations Title 40 (40CFR), UK and Scottish Disability Discrimination Act, etc. • Current standard: HTML, PDF, hardcopy... • Our system standard: XML • Unit of extraction: section/provision <regElement id=”ufas.4.32.1” name=”minimum number” asterisk=”0” > <regText> Fixed or built-in seating, ... </regText> <ref name=”ufas.4.5” num=”1” /> <ref name=”ufas.4.32” num=”1” /> </regElement>
40cfr.279.12 (a) Surface impoundment prohibition. Used oil shall not be managed in sur- face impoundments or waste piles un- less the units are subject to regulation under parts 264 or 265 of this chapter. Shallow parser: PDF Basic XML format
Shallow parser: HTML Basic XML format <regulation id="40.cfr" name="code of federal regulations" type="federal"> ... <regElement id="40.cfr.279.12.c" name="Burning in particular units."> ... <regElement id="40.cfr.279.12.c.3" name=""> <reference id="40.cfr.264.O" times="1" /> ... <concept name="waste incinerator" times="1" /> <regText> Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter. </regText> </regElement> </regElement> </regulation>
Shallow parser: extracting references <regulation id="40.cfr" name="code of federal regulations" type="federal"> ... <regElement id="40.cfr.279.12.c" name="Burning in particular units"> ... <regElement id="40.cfr.279.12.c.3" name=""> <reference id="40.cfr.264.O" times="1" /> ... <concept name="waste incinerator" times="1" /> <regText> Hazardous waste incinerators subject to regulation under subpart O of parts 264 or 265 of this chapter. </regText> </regElement> </regElement> </regulation>
Shallow parser: feature extraction • Non-structural characteristics specific to a corpus • To aid user retrieval of relevant materials • For analysis purpose
Shallow parser: feature extraction • Generic features • Concepts - noun phrases • Exceptions - negated provisions • Definitions - terminologies defined in regulations • Domain-specific features • Glossary terms - definitions from reference guides • Author-prescribed indices - concepts from field handbooks • Measurements - e.g., 2 inches max, 4 ppm • Chemicals - list of drinking water contaminants from EPA • Effective dates - provision updates
Example of definition/glossary tags Original section 3.5 from the ADAAG 3.5 DEFINITIONS. Accessible. Describes a site, building, facility, or portion thereof … Clear. Unobstructed. Refined section 3.5 in XML format <regElement name=”adaag.3.5” title=”definitions” asterisk=”0”> <indexTerm name=”facility” num=”1” /> <definition> <term> accessible </term> <definedAs> Describes a site, building, facility, or portion thereof... </definedAs> </definition> <definition> <term> clear </term> <definedAs> Unobstructed. </definedAs> </definition> </regElement>
Example of indexTerm, concept, measurement & exception tags Original section 4.6.3 from the UFAS 4.6.3* PARKING SPACES. Parking spaces for disabled people shall be at least 96 in (2440 mm) wide and shall have an adjacent access aisle 60 in (1525 mm) wide minimum (see Fig. 9). Parking access aisles shall be part of ... EXCEPTION: … an adjacent access aisle at least 96 in (2440 mm) wide complying with 4.5... Refined section 4.6.3 in XML format <regElement name=”ufas.4.6.3” title=”parking spaces” asterisk=”1”> <concept name=”access aisle” num=”3” /> … <indexTerm name=”accessible circulation route” num=”1” /> <measurement unit=”inch” magnitude=”96” quantifier=”min” /> <ref name=”ufas.4.5” num=”1” /> <regText> Parking spaces for disabled people shall ... </regText> <exception> If accessible parking spaces for ... </exception> </regElement>
Usages of extracted features revisited Extracted features Usages of features
Scope 1. Overview • Examples of system capabilities 2. Repository development 3. Relatedness analysis
Relatedness analysis • To utilize the structure, referencing of regulations and domain knowledge to obtain a better comparison • Measure • Similarity score f(A, U) (0, 1) • Nodes A and U are provisions from two different regulation trees f (0, 1)
Base score f0 computation • Linear combination of feature matching • F(A,U,i) = similarity score between Sections (A,U) based on feature i • N = total number of features • Feature matching • Based on the Vector model using cosine similarity as the distance between feature vectors • Similarity between two documents M and N = • dM and dN are document vectors • Cosine is normalized => always between 0 and 1
Example of feature vectors • Traditional term match • each index term i is assigned a positive and non-binary weight wi,M in each document vector d M • Weight selection • Frequency of term, or • tf idf model • tf = term frequency; term density • idf = inverse document frequency = log(n/ni); term rarity • Excluding stopwords • Feature = concept • Concept vectors are formed per provision based on concept frequency in each provision • F(provision 1, provision 2, feature=concept) = cosine between two concept vectors
Axis dependency: non-Boolean matching • Vector model assumes mutual independence between axes • Domain experts do not necessarily agree • A measurement of “2 inches max” can be a 70% match to “2 inches” • Synonyms exist, e.g., ontology defined for chemicals • Limitation observed • Need flexibility to model domain knowledge, such as a 0, 50%, 75% and 100% measurement match:
Proposed non-Boolean matching model • Define a feature matching matrix E • Eij= % match between features i and j • E.g., a 3-dimensional vector space using “2 ppm”, “2 ppm max” and “2 ft” as the first, second and third measurement axes: E = • Vector space transformation • Map feature vectors onto an alternate space via matrix D • Cosines are computed on the consolidated frequency vectors • E.g., similarity based on measurements =
Vector space transformation • Define D such that E = DTD is fulfilled • Cosine between the consolidated frequency vectors: = = = = • Reduces to a Boolean cosine when E = I
Score refinements based on regulation structure • Neighbor inclusion • Diffusion of similarity between clusters of nodes in the tree • Self vs. parent-sibling-child (psc), fs-psc • psc vs. psc, fpsc-psc
Neighbor inclusion: psc vs. psc • Take a linear combination of neighboring pair scores • Formulate a neighbor structure matrix N • Define score matrix • We have psc-psc = NA0NUT
Neighbor inclusion: self vs. psc • Take a linear combination of neighbor vs. self scores • Formulate a neighbor structure matrix N • Define score matrix • We have s-psc = ½ (0NUT + NA0)
Score refinements based on regulation structure • Reference distribution • Diffusion of similarity between referencing nodes and referenced nodes in the tree • E.g., f(A5.3, U6.4(a)) updates f(A2.1, U3.3)
Reference distribution: s-ref and ref-ref • Take a linear combination of reference vs. self and reference vs. reference scores • Formulate a reference structure matrix R • Define score matrix • We have ref-ref = RA0RUT and s-ref = ½ (0RUT + RA0)
Example of results: UFAS vs BS8300 • Phrasing difference between American and British regulations ufas.4.13.9Door Hardware. Handles, pulls, latches, locks, and other operating devices on accessible doors shall have a shape that is easy … bs8300.12.5.4.2Door Furniture. Door handles on hinged and sliding doors in accessible bedrooms should be easy to grip … • Neighbor similarities imply similarity between the interested nodes
Example of results: almost identical provisions Regulation comparison: 40CFR vs. CCR
Example of results: e-rulemaking • Application domain: e-rulemaking • Comparison between draft of rules and the associated public comments • ADAAG Chapter 11, rights-of-way draft • Less than 15 pages • Over 1400 public comments received within 4 months • Comments ~10MBin size; most are several pages long New regulation draft can easily generate a huge amount of data that needs to be reviewed and analyzed
Example of results: e-rulemaking Regulations compared with public comments
Example of results: e-rulemaking • Related draft section and public comment Adaag.1105.4.1 Where signal timing is inadequate for full crossing of all traffic lanes or where the crossing is not signalized, cut-through medians … Deborah Wood, October 29, 2002 … This often means walk lights that are so short in duration that by the time a person who is blind realizes … • No identified related section Donna Ring, September 6, 2002 If you become blind, no amount of electronics … will make you safe … You have to learn modern blindness skills from a good teacher. You have to practice your new skills … Concern not addressed in the draft
Conclusions • An infrastructure for • Repository for regulations • Shallow parser • Feature extractions • Similarity comparison • Base score • Score refinements • Results • Comparisons between Federal codes, European codes • Application to e-rulemaking • Future Directions • Extension of application to other domains of semi-structured documents • Conflict analysis?