260 likes | 281 Views
This study presents REGNET, a relatedness analysis approach using a regulatory repository to compare regulations and aid in e-rulemaking. The approach involves feature extraction in XML, structural and feature matching, and score refinements based on regulation structure. Performance evaluation shows promising results compared to Latent Semantic Indexing.
E N D
REGNET A Relatedness Analysis Approach for Regulation Comparison and E-Rulemaking Applications Gloria Lau, Haoyi Wang, Kincho Law, Gio Wiederhold Stanford University May 16th, 2005
ADAAG in HTML UK DDA in HTML IBC in PDF Motivation: regulatory comparison • Multiple sources of regulations • Multiple jurisdictions: federal, state, local, etc. • Different formats, terminologies, contexts • Amending rules, conflicting ideas
Motivation: e-rulemaking • Increasing amount of electronic data in e-rulemaking • Example • Alcohol and Tobacco Tax and Trade Bureau received over 14,000 comments in 7 months, the majority of which were emails, on a flavored malt beverages proposal • Originally in the Federal Register: • “All comments posted on our Web site will show the name of the commenter but will not show street addresses, telephone numbers, or e-mail addresses.” • Later in the Federal Register: • due to the “unusually large number of comments received,” the Bureau later announced that it was difficult to remove all street addresses, telephone numbers and email addresses “in a timely manner.”
Relatedness analysis based on a regulatory repository • XML regulatory repository with features extracted • Shallow parser to consolidate regulations • HTML, PDF, plain text XML regulations • Features, references, etc. • Relatedness analysis to help understanding of regulations and the relationships between them • Feature matching • Structural matching • Application to e-rulemaking • Comparisons of drafted regulations and public comments
Feature Extraction in XML <regulation id="ibc" name="international building code" type="private"> <regElement id="ibc.1107" name="special occupancies"> … <regElement id="ibc.1107.2" name=“assembly area seating"> <reference id="ibc.1107.2.4.1" times="1" /> <concept name="assembl area" times="1" /> … <regText>Assembly areas with fixed seating shall comply … </regText> <regElement id="ibc.1107.2.1" name="services">...</regElement> <regElement id="ibc.1107.2.2" name=“wheelchair …">...</regElement> </regElement> </regElement> </regulation> reference parse tree
Relatedness analysis Structural comparisons ADAAG 4.1.6(3)(d) Doors (i) Where it is technically infeasible to comply with clear opening width requirements of 4.13.5, a projection ... UFAS 4.14.1 Minimum Number Entrances required to be accessible by 4.1 shall be part of an accessible route and shall comply with ... Related elements: door and entrance
Relatedness analysis • To utilize the computational properties of regulations for a complete comparison • Measure • Degree of relatedness: similarity score f(A, U) (0, 1) • Nodes A and U are provisions from two different regulation trees
Base score f0 computation • Linear combination of feature matching • F(A,U,i) = similarity score between Sections (A,U) based on feature i • N = total number of features • = weighting coefficient • Feature matching • Based on the Vector model using cosine similarity as the distance between feature vectors • Non-Boolean features • A measurement of “2 inches max” can be a 70% match to “2 inches” • Synonyms exist, e.g., ontology defined for chemicals • Perform vector-space transformation prior to cosine computation
Score refinements based on regulation structure • Neighbor inclusion • Diffusion of similarity between clusters of nodes in the tree
Score refinements based on regulation structure • Reference distribution • Diffusion of similarity between referencing nodes and referenced nodes in the tree • E.g., f(A5.3, U6.4(a)) updates f(A2.1, U3.3)
Performance evaluation • Conduct a user survey of rankings of similarity • 10 randomly chosen sections from the ADAAG and UFAS • Ranks 1 to 100 in the order of relevance • Root mean square error (RMSE) • = user-generated ranking vector • = machine-predicted ranking vector
Survey results - Tabulated RMSE’s • Compared our analysis to Latent Semantic Indexing (LSI) • = structural weighting coefficient • = feature weighting coefficient • Average RMSE smaller than LSI • Measurement feature performs best • No improvement in result observed for structural comparison
Results of comparisons: ADAAG vs. UFAS • Related accessible elements: door and entrance • No ontological information • Neighbor inclusion reveals higher similarity • Content of neighbors imply similarity between Section 4.1.6(3)(d) in ADAAG and Section 4.14.1 in UFAS
Results of comparisons : UFAS vs. BS8300 • Terminological differences - revealed through neighbor inclusion
Results of comparisons : UFAS vs. Scottish Technical Standards • Terminological differences - revealed through reference distribution • Stairs and ramps
Application to e-rulemaking • Application domain: e-rulemaking • Comparison between draft of rules and the associated public comments • ADAAG Chapter 11, rights-of-way draft • Less than 15 pages • Over 1400 public comments received within 4 months • Comments ~10MBin size; most are several pages long New regulation draft can easily generate a huge amount of data that needs to be reviewed and analyzed • Parsing of the draft and comments • From HTML to XML • Recreate structure of the draft using our shallow parser • Extract features from the draft and comments • Treat individual comments as provisions
Application to E-Rulemaking Drafted regulations compared with public comments
Results from e-rulemaking application • Related section in draft and public comment
Results from e-rulemaking application • No related provisions identified • Concern not addressed in the draft
Results from e-rulemaking application • Related section in draft and public comment • Commenting per provision • Forward to right personnel
Results from e-rulemaking application • Related section in draft and public comment • Suggested revision cannot be located automatically • Linguistic analysis can potentially help
Results from e-rulemaking application • Comment on the general intent of the draft • Clustering of comments might help
Conclusions • Prototype for relatedness comparisons of regulations • Contextual comparisons • Domain knowledge • Structural comparisons • Performance Evaluation, Results and Applications • User survey and comparisons with LSI • Observations of comparisons between Federal, State, non-profit organization mandated codes and European standards • Application to e-rulemaking • Compare drafted rules with public comments • Observations of comparisons based on a rights-of-way draft
Future research directions • Regulatory comparison • Regulatory competition • Cross border data transfer laws • Especially in the polyglot countries in EU • Regulatory updates • Track changes in updates • Track cross references between regulations • E-rulemaking • Automated routing of comment to person in charge • Clustering of comments • Web portal for comment submission per provision, in addition to per draft • Linguistic analysis to match patterns of suggested revision embedded in comments