600 likes | 789 Views
Terminology and Metadata Whys and hows. Harold Solbrig Apelon, Inc. Outline. “Terminology” – Why does it matter? Metadata and its relationship to terminology Creating and managing terminological resources Description of Apelon and its role in all of this. Terminology – why does it matter?.
E N D
Terminology and MetadataWhys and hows Harold Solbrig Apelon, Inc
Outline • “Terminology” – Why does it matter? • Metadata and its relationship to terminology • Creating and managing terminological resources • Description of Apelon and its role in all of this
Terminology – why does it matter? • Information technology (IT) is about _____? • Depending on your perspective, information: • Reduces uncertainty on the part of the receiver • IS the reduction of uncertainty on the part of the receiver • The transfer of information between a sender and a receiver is known as “communication” • The business of IT is accurate, timely and relevant communication.
Communication and Language • Language - a “specification” that enables communication • Semantics - the association between signs or symbols and their intended “meaning” • Syntax - the rules for ordering and structuring the signs into phrases and sentences • Pragmatics - the relationship between signs and symbols and the recipient. Broadly, the shared context.
The Semiotic Triangle Thought or Reference Refers to Symbolises Symbol Referent Stands for C.K Ogden and I. A. Richards. The Meaning of Meaning.
The Semiotic Triangle Thought or Reference Refers to Symbolises Symbol Referent Stands for “Rose”, “ClipArt” C.K Ogden and I. A. Richards. The Meaning of Meaning.
The Communication Process CONCEPT CONCEPT Symbolises Refers To Refers To Symbolises “I see a ClipArt image of a rose” “Rose”, “ClipArt” “Rose”, “ClipArt” Stands For Stands For Referent Symbol Symbol
The Communication Process Semantics CONCEPT CONCEPT Symbolises Refers To Refers To Symbolises “I see a ClipArt image of a rose” “Rose”, “ClipArt” “Rose”, “ClipArt” Stands For Stands For Referent Symbol Symbol
The Communication Process Semantics CONCEPT CONCEPT Symbolises Refers To Refers To Symbolises “I see a ClipArt image of a rose” “Rose”, “ClipArt” “Rose”, “ClipArt” Stands For Stands For Referent Symbol Symbol Syntax
Context The Communication Process Semantics CONCEPT CONCEPT Symbolises Refers To Refers To Symbolises “I see a ClipArt image of a rose” “Rose”, “ClipArt” “Rose”, “ClipArt” Stands For Stands For Referent Symbol Symbol Context Syntax Shared Context
Shared Context Impacts how much information can be contained in a symbol. Information / Symbol No Shared Context Shared Sun Shared Species Common Culture Common Profession Shared Universe Shared Planet Common Language Similar Education Common Specialty
Shared Universe Pioneer 10 & 11 Voyager “Golden Record”
Common Specialty “Interferons are a family of cytokines that exerts antiviral, antitumor and immunomodulatory actions by inducing a complex set of proteins. One of the best known IFN-induced protein is the dsRNA-dependent protein kinase (PKR), that mediates both antiviral and anticellular activities. PKR inhibits translation initiation through the phosphorylation of the alpha subunit of the initiation factor eIF-2 (eIF-2 ) and also controls the activation of several transcription factors such as NF- B, p53, or STATs. …” Marino Estiban. Induction of apoptosis by the dsRNA-dependent protein kinase (PKR): Mechanism of action. Apoptosis, Springer, Volume 5, Number 2, April 2000
The impact of context on communication Shared context: • Allows information to be communicated in larger, more succinct “chunks”. • Drug, analgesic and NSAID are all “chunks”, yet differ markedly in conceptual complexity. • Enables specialized symbol sets: • Contrast the amount of information contained in the formula E=MC2 versus that contained in this presentation...
Contextual Formalism The degree of formality in a shared context can vary across a wide spectrum: • Tacit context which is simply presumed • Contextual negotiation proceeding the actual message • Rigorous and formal rules and documents describing the form and possible meanings behind every message and phrase.
Factors Effecting the Degree Contextual Formalism • Number of participating parties • Formalism needs to increase as number of participants increase • Geographic, cultural and temporal proximity of communicators • The further apart communicators are, the less they can assume • Amount of shared context • The more you have, the more important it becomes to be organized
Factors Effecting the Degree Contextual Formalism • The cost of imprecise communication • Poetry and literature - low cost (some may argue actual gain) • Technical and professional - high to very high cost • What is the cost of assuming the units of a thrust specification? • What is the cost of assuming the dose of a prescription? • What is the cost of assuming the century in which the communication originated?
Terminology • Symbols • Their encoding and decoding • Vocabularies, Dictionaries, Enumerations, Codes, ... • Context • Recording and sharing • Glossaries, textbooks, college courses, operations manuals, information models
Terminology in the Digital Era • Multi-layered • We’ll ignore the lower layers – polarity of diodes representing bits, bits representing numbers, characters, …
Terminology in the Digital Era • Focus is on metadata • What is a particular data collection about? • What information can be found in it? • How is that information recorded? • What are the contextual assumptions?
The Communication Process Display Form CONCEPT CONCEPT Symbolises Refers To Refers To Symbolises Decode Encode Stands For Stands For Transform Referent
Metadata and the Communication Process • Metadata describes the forms, data bases, encoding processes, etc. • Terminology is the component of metadata that: • Manages symbols and their “meanings” • For users (e.g. what are the possible choices for field ‘x’, and what does each of them mean) • For IT professionals (the Information Model) • Maintains context • What else does a given specialty, department, company, etc. assume is known in beyond the simple definition of symbols
Terminology and Metadata • Standard modeling tools (UMLS, XML Schema, …) have provided a way to communicate the structure and content of data stores and messages. • Models, however, have to include information about their intended context and meaning to allow data sharing across domains. Terminology provides (or is, in some senses) this component.
Terminology and Metadata(continued) • Amongst other things, ISO 11179 provides a model of how terminology and metadata go together • It has the advantage of being (or being in the process of becoming) a standard • ISO 11179 also provides astandard model of terminology content, which would provide a vehicle for interchange in the appropriate contexts. • There are other models of interest as well…
Terminology Sounds easy enough – why not just put together a set of tables and get going? Because… • Terminology has to be shared across multiple domains. This, after all, is its raison d'être • The model of the terminology itself has to be shareable. • The semantics of the terminology have to be shareable. • Terminology and knowledge management are inextricably intertwined • Fractal in nature – you can never stop adding • Boundaries are imprecise and expand • This means that there is no such thing as a “small terminology” • The components of terminology can also be viewed as declarative programs. • This means that the rigor of software development is applicable as well.
Terminology(continued) 3)The knowledge behind terminology needs to be shared • Terminology resources depend on specialists (e.g. doctors, physicists, biologists, geneticists, etc…) • Development is expensive • Maintenance is often very expensive.
Prerequisites to Terminology Creation • Know the standards • General standards (SKOS, RDF, OWL, 11179, SBVR, XML, UML, XMI, …) • Domain specific. • Example: Medical – HL7, LQS, CTS, CTS-2, UMLS, SNOMED, … • Know the tools • Development: TDE, Protégé, Obo Edit, Fact++, Racer, Jena, EVS, LexGrid… • Distribution: DTS, RDF, OWL, SKOS, … • Know the content • General (Dublin Core, CYC, SUMO, …) • Domain specific (Medical: NCIt, UMLS, ICD’s, SNOMED-CT, Gene Ontology, …)
Terminology and Workflow • Terminology management includes: • Discovery • Federation • Authoring • Review • Distribution • Adoption
Process (Example Sequencing) Import Report Review Transform Author Translate Approve Extract Load Post-coordinate Plan Federate Incorporate Map Version Review in Context Access Customize Maintain Submit Publish Subscribe Process Submissions Migrate Reevaluate Replace
Content Update Applications VOSER Semantic MediaWiki (++) Annotations and Change Requests Status Report Core SME Submission Work Flow
Key Points • Terminology is a critical component for cross-discipline, cross-enterprise information sharing. • Terminology development is a non-trivial task – it needs to be done correctly. • Terminology resources need to be federated, shared and reused. • But… there’s help!
Apelon • Largest provider of terminology products and services • Unique expertise
Employees • Internationally known terminology experts • Regular contributors to industry standards, publications and conferences
Mission • Apelon software and services support the development, maintenance, and practical deployment of structured terminologies • Put another way, we help our customers - create, - maintain, and - leverage • standard and enterprise terminologies • It’s all about speaking the same language
Facts Most of the world’s standard healthcare terminology resources have been built and/or are maintained with Apelon tools, including • SNOMED • CPT • ICD-9-CM • NDF-RT • UMLS
Software Products • Terminology Development Environment (TDE) • Distributed Terminology System (DTS) • TermWorks
1 – Terminology Authoring (TDE) • Tools to create and maintain structured terminologies • Improve productivity, data quality and scalability • Enhance the value of enterprise assets • Commercial product – CPT • Internal infrastructure – Kaiser Permanente CMT • Public benefit – SNOMED CT, NDF-RT, NCI Thesaurus Author ICD CPT SNOMED NDF-RT . . .
1 - TDE • Based on Description Logic (DL) • Automated classification • Identifies redundancy • Provably consistent terminology • Collaborative features • Distributed authoring • Workflow • Conflict identification / resolution • Version control • Customizable interface and constraints
Body Disease is-a part-of Heart is-a affects part-of is-a affects 1 – Automatic Classification Cardiac Disease Mitral Stenosis Mitral Valve
Terminology servers reduce costs of terminology acquisition, integration and management Applications EMRs and CDRs NextGen, VA Knowledge repositories CDC, NCI Healthcare information portals HKHA Deploy Applications Customize 2 – Terminology Deployment
2 – What is a Terminology Server? A terminology server is • a networked software component • that centralizes terminology content and reasoning • to provide (complete, consistent and effective) terminology services for other network applications
2 – How is a Terminology Server Used? • By informaticists to create, maintain, localize and map terminologies • By clinical applications and their users to select and record standardized data • By integration engines to map data elements between applications
Term/name normalization: What is the SNOMED CT name for heart attack? Code translation: What is the ICD-9 code for Myocardial Infarction? Grouping and aggregation: Is Myocardial Infarction a Cardiac Disease? Clinical knowledge: What drug treats Myocardial Infarction? Local information: Add L227 as the local code for Serum Calcium. Myocardial Infarction 410.9 Yes Streptokinase OK 2 - Examples of Terminology Services
2 – Apelon’s DTS Product • Integrated repository for all terminologies • Varying release cycles regular releases • Inconsistent data models common object model • Independent views integrated view with mappings • Current snapshot version management • Extensible with local terminology and maps • Subsets • Easy subscription updates (with exception reports) • Desktop editor and webtop browser • Workflow support • Flexible import, export and integration • Open source
Terminology Server Standards • OMG’s Lexicon Query Services (LQS) • AKA TQS • Health Level Seven (and ANSI) Common Terminology Services (CTS) • In ISO Standardization as well • CTS-II • In process • Led by Apelon
DTS and Standards CTS wrapper for DTS is available INTEL Healthcare SOA using DTS for CTS extensions • Currently ahead of CTS-II • Will be fed back into CTS-II
2 – Knowledge Base (KB) • Clinical (SNOMED CT) • Reimbursement (ICD, CPT, HCPCS) • Pharmaceuticals (Multum, NDF-RT) • Labs (LOINC) • Nursing (NIC, NOC, and NANDA) • Adverse events (MedDRA, COSTART, WHOART) • Extensive crosswalks • Mappings to MeSH and UMLS CUIs • Local additions
DTS Server Tomcat (DTS Client) DTS Editor DTS Browser DTS Client Application 2 - Software Architecture DTS Database