1 / 23

Mapping Regulations to Industry–Specific Taxonomies

Mapping Regulations to Industry–Specific Taxonomies. Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June 5, 2007. Motivating Problem. To Legal Practitioners: Hierarchical, well-structured Precise and concise

justinej
Download Presentation

Mapping Regulations to Industry–Specific Taxonomies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mapping Regulations to Industry–Specific Taxonomies Chin Pang Cheng, Gloria T. Lau, Kincho H. Law Engineering Informatics Group, Stanford University June 5, 2007

  2. Motivating Problem To Legal Practitioners: • Hierarchical, well-structured • Precise and concise • Familiar with regulatory organization systems To Industry Practitioners: • Voluminous • Not trained to read regulations • More familiar with industry-specific terminology and classification structure

  3. Mapping Regulations to Taxonomies • Possible Cases: • One-Taxonomy-One-Regulation • One-Taxonomy-N-Regulation • N-Taxonomy-One-Regulation • N-Taxonomy-N-Regulation

  4. One-Taxonomy-One-Regulation • Simple keyword latching task • Stemming (e.g. piling  pile, disabled  disable) • Word interval • Concept: “fire alarm system” • Regulation: “… fire alarm and detection system …”

  5. Inverted Regulations • Each taxonomy concept is hyperlinked • “No Matched Sections” for non-matched OmniClass concepts • See other matched related concepts in that section

  6. One-Taxonomy-N-Regulation Alabama (AL) regulation Arizona (AZ) regulation

  7. One Regulation as the Base (AL) (AZ)

  8. Similarity Comparison on Sections parent child node f0 reference node A U sibling nodes in comparison s-psc child s-ref psc(A) psc(U) ref(U) psc-psc Core from Lau, Law and Wiederhold (2005) • Feature extraction (e.g. concepts, measurements) • Comparison of shared features • Consideration of hierarchical and referential information AL regulation AZ regulation G.Lau, K.Law and G.Wiederhold. “Legal Information Retrieval and Application to E-Rulemaking,” In Proceedings of the 10th International Conference on Artificial Intelligence and Law (ICAIL 2005), Bologna, Italy, pp. 146-154, Jun 6-11, 2005.

  9. Inclusion of Regulation Hierarchy • Terminological differences: revealed by neighbor inclusion

  10. N-Taxonomy-One-Regulation • Multiple taxonomies exist in a single industry • Translation is unavoidable • E.g. in architectural, engineering and construction (AEC) industry • Industry Foundation Classes (IFC) • CIMsteel Integration Standards (CIS/2) • Automating Equipment Information Exchange (AEX) • UniFormatTM, MasterFormatTM • etc. • Possible solution: Merging taxonomy  Unfamiliar taxonomy

  11. Proposed System

  12. Proposed Methodology of Taxonomy Mapping T1 sprinkler system T2 water flow orifice [F] 903.4.2 Alarms. Approved audible devices shall be connected to every automatic sprinkler system. Such sprinkler water-flow alarm devices shall be activated by water flow equivalent to the flow of a single sprinkler of the smallest orifice size installed in the system. Alarm devices shall be provided on the exterior of the building in an approved location. Where a fire alarm system is installed, actuation of the automatic sprinkler system shall actuate the building fire alarm system. T1 T2 alarm fire alarm system fire • Taxonomy Mapping: • Mainly manually nowadays • Usually term matching (e.g. fire  fire alarm)

  13. Demonstration in Construction Industry IfcSlab steel Taxonomy 2 (ifcXML) Taxonomy 1 (OmniClass) Knowledge Corpus International Building Code, IBC • Corpus: carefully selected (in the same domain)

  14. Relatedness Analysis on Concepts Notations: • a pool of m concepts for a taxonomy • a corpus of N regulation sections • frequency vector is an N-by-1 vector storing the occurrence frequencies of concept i among the N documents • frequency matrix C is an N-by-m matrix in which the i-th column vector is • m = 4, N = 5 • = Example: C = Concept 3 is matched to Section 4 3 times

  15. Cosine Similarity Measure • Common arithmetic measure of similarity to compare documents in text mining • Finding angle between two frequency vectors in N dimensions and from Taxonomy 1 and 2 respectively • Similarity score = [0, 1] • Represented using dot product and magnitude, the similarity score is given by:

  16. Jaccard Similarity Coefficient N11 = number of sections both concepts i and j are matched to N10 = number of sections concept i is matched to but not concept j N01 = number of sections concept j is matched to but not concept i • Statistical measure of the extent of overlapping of two vectors in N dimensions and from Taxonomy 1 and 2 • Defined as size of intersection divided by size of union of the vector dimension sets: • For concept relatedness analysis,

  17. Market Basket Model • Probabilistic measure to find item-item correlation used in data-mining • Two main elements: (1) set of items; (2) set of baskets • Association rule means a basket containing all the items is very likely to contain item j • Confidence of a rule = • Interest of a rule = • Example: • Coca-cola  Pepsi: Low-confidence but high-interest

  18. Market Basket Model (cont’d) • For concept relatedness analysis • N11 = number of sections both concepts i and j are matched to • N01 = number of sections concept j is matched to but not concept i • N10 = number of sections concept i is matched to but not concept j • N00 = number of sections both concepts i and j are NOT matched to • Probability of concept j is • Confidence of association rule is • Forward similarity of concept i and j is the interest as:

  19. Asymmetry of Market Basket Model • Asymmetry of market basket model: • Forward similarity: • Backward similarity:

  20. Evaluation of Accuracy • Root Mean Square Error (RMSE): • Difference between the true values and the predicted values • For Taxonomy1 of m concepts and Taxonomy2 of n concepts: • Precision: • Fraction of predictions that are correct • Recall: • Fraction of correct matches that are predicted

  21. Evaluation Results • 20 concepts from OmniClass, 20 concepts from ifcXML • Cosine Similarity: • Average among three metrics • Jaccard Similarity: • NOT preferred (unacceptably low recall, though high precision) • Market Basket Model: • Preferred (lowest RMSE, highest recall)

  22. Conclusion • Mapping industry-specific taxonomy to regulation allows industry practitioners to retrieve regulations faster • Four cases: • 1-Taxonomy-1-Regulation: simple keyword latching • 1-Taxonomy-N-Regulation: hierarchy of regulation sections considered • N-Taxonomy-1-Regulation: 3 similarity analysis metrics introduced (cosine similarity, Jaccard similarity, market basket model) • N-Taxonomy-N-Regulation: future step

  23. ~ Thank You ~

More Related