230 likes | 395 Views
Topic: Identifying the Data Schema behind SNOMED CT. Jon Patrick , Centre for Health Informatics Research & Development, University of Sydney Ming Zhang, Donna Truran National Centre for Classification in Health. Outline. Project description Research methodology Experiments and Results
E N D
Topic:Identifying theData Schema behind SNOMED CT Jon Patrick, Centre for Health Informatics Research & Development, University of Sydney Ming Zhang, Donna Truran National Centre for Classification in Health
Outline • Project description • Research methodology • Experiments and Results • Conclusion • Limitation • Recommendation for future work
Project Description • Project background SNOMED CT – The core content is stored in simple tables • Project Objective To discover the conceptual model of SNOMED CT by reverse engineering
Research methodology • Data preparation Transfer the SNOMED CT core content table into RDBMS , that is the Text file into MySQL • Ontology Structure Investigation Database querying -- Explicit characteristics Programming – Implicit characteristics • Data modelling Analysis of the different characteristics and features so as to generate the conceptual data model
Experiment and Result • Explicit Characteristics of the Ontology Original data over view Fully defined and primitive Relationship types Hierarchy structure Multiple inheritance Full structure • implicit Characteristics of the Ontology Classification principles Relationship patterns
Original Data model • 3 data tables: Concepts: one clinical idea is recorded as an concept: Descriptions: one clinical idea could have more than one description in this table • Relationships: each row represents a relationship between two concepts
Fully defined and primitive concepts • Primitive: A concept is primitive if its defining characteristics are insufficient to define it – that is it has more content than indicated by its attributes and relationship, e.g. clinical finding • Fully defined concepts A concept is fully defined if its defining characteristics are sufficient • “sufficient” and “insufficient” are determined by SNOMED experts. • Currently 41244 (11%) concepts are fully defined
Relationship types • Relationships between two concepts • “laterality” is a “relationship type” According to the statistics there are 1.4 million records of relationships, There are 62relationship types used currently to represent the relationships between two concepts.
Hierarchystructure • In the collection of relationship types, “IS_A” represents the hierarchal relationship. • 485,335 records in relationships tables are stored in the hierarchal information of SNOMED CT • The main hierarchal features root level(no parents): one root “SNOMED CT CONCEPT” middle node level (have parents and children): 80895 (22%) concepts 25687 nodes have only 1 child leaf node level (no children) 285283 (78%) concepts
Multiple inheritance • one concept in SNOMED CT may have many children and many parents
Hierarchystructure - example Root Middle Nodes leaf Multiple parents
Experiment and Result • Explicit Characteristics of the Ontology Original data over view Fully defined and primitive Relationship types Hierarchy structure Multiple inheritance Fully structure • Implicit Characteristics of the Ontology Classification principle Relationship patterns
Classification principle Top level categories: 18 direct children of root Each concept belongs to only one top level category So all concepts in SNOMED CT can be divided into 18 groups
Relationship patterns The specific relationship type between any two Top categories
Relationship patterns • Pattern: {C1,type,C2} C1 is the one of 18 top categories type is the one of 62 relationship types C2 is the one of 18 top categories There are 18x62x18 = 20088 possible patterns • Each record in 1.4 million relationships records match one pattern. • To avoid ambiguity, the scope of this study covers only is “active” concepts • The results show only 78 patterns have instance in relationship table.
Data modelling based on patterns For example: to find the relationship between “clinical finding” and other top categories.
Future Work • Design a methods of defining real-world constraints over the relationships • E.g. suicide can have slow onset • Develop storage and maintenance procedures for managing the data, e.g. there is no constraint over the data model as it exists at the moment. • Design a terminology server to deliver SCT to vendors. • Work with vendors to define a transport mechanism for vendors to be able to install SCT. • Create Internet access to SCT content for ad hoc users. • Start working on systems that demonstrate the value of SCT for clinical and administrative work.