Introduction to the caDSR Presented to HL7 Vocab SIG January 24, 2005

Introduction to the caDSR Presented to HL7 Vocab SIG January 24, 2005. Denise Warzel National Cancer Institute, Center for Bioinformatics caDSR Project Officer, Software Development. Presentation Outline. caCORE Overview ISO/IEC 11179 Overview caDSR Implementation and tooling.

Introduction to the caDSR Presented to HL7 Vocab SIG January 24, 2005

  Introduction to the caDSRPresented to HL7 Vocab SIG January 24, 2005 Denise Warzel National Cancer Institute, Center for Bioinformatics caDSR Project Officer, Software Development

  2. Presentation Outline • caCORE Overview • ISO/IEC 11179 Overview • caDSR Implementation and tooling

  3. caCORE Components • caCORE is the open-source foundation upon which the NCICB builds its research information management systems Bioinformatics Objects Data Standards Enterprise Vocabulary

  4. Public APIs Domain object metadata Common data elements Common data elements (CDEs) Vocabulary for CDE specification Dictionary, thesaurus services caCORE Infrastructure wiring

  5. Presentation Outline • caCORE Overview • ISO/IEC 11179 Overview • caDSR Implementation and tooling

  6. Administered Item: A registry itemfor which administrative information is recorded in an Administration Record Data Element: A unit of data for which the definition, identification,representation, and permissible values are specified by means of a set of attributes. Data Element Concept: An idea that can be represented in the form of a data element, described independently of any particular representation. Value Domain: A set of attributes describing representational characteristics of instance data with or without enumerated permissible values. Terms and Definitions for ISO/IEC 11179 Data Element: A unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of attributes. Data Element Concept: Data Element Representation: An idea that can be represented in the The part of a data element having form of a data element, described A value domain, datatype,and other independently of any particular representation. representational specifications. Representation Class: A classification of data elements based upon the type of representational form. Value Domain: A set of attributes Conceptual Domain: A set of possible value meanings describing representational Characteristics of instance data with or without permissible values. of a data element expressed without representation. Value Meaning: Permissible Value: A member of the An set of finite allowed inventory of expression of a value notions that can be categorized meaning in a specific for a conceptual domain. value domain

  7. What is ISO/IEC 11179? • ISO/IEC 11179 Parts 1-6: Information technology – Specification and Standardization of data elements • A metamodel for ‘data element’ metadata • Standard by which to convey semantic, syntactic and lexical meaning • Human and machine understandable • Unambiguous

  8. ISO/IEC 11179 Information technology Standard • ISO/IEC 11179 Part 1: Framework for the specification and standardization of data elements • ISO/IEC 11179 Part 2: Classification for data elements • ISO/IEC 11179 Part 3: Registry metamodel and basic attributes • ISO/IEC 11179 Part 4: Rules and Guidelines for the Formulation of Data Elements • ISO/IEC 11179 Part 5: Naming and Identification Principles for Data Elements • ISO/IEC 11179 Part 6: Registration of data elements • Publically Available from: • http://isotc.iso.ch/livelink/livelink/fetch/2000/2489/Ittf_Home/PubliclyAvailableStandards.htm??Redirect=1

  9. Basic Metamodel Components Conceptual_Domain Data_Element_Concept Data Element Concept Conceptual Domain +having +specifying 1..1 1..1 0..* 0..* 1..1 1..1 data_element_concept_conceptual_domain_relationship 1..1 1..1 +specified_by +represented_by Perception expression specification Representation +representing +providing_representation_to 0..* 0..* 0..* 0..* Value_Domain Data_Element Value Domain representation Data Element 0..* 0..* 1..1 1..1 +providing_representation_for +represented_with

  10. Why ISO/IEC 11179? • “What is this datum?” • Provides concrete guidance on the creation and maintenance of discrete data element attributes and metadata (semantics) enabling the formulation of data elements in a consistent, standard manner – • “Metadata Repository/Registry” • Framework for Data element standardization and registration allow the creation of a shared data environment in much less time and with much less effort than it takes for conventional data management methodologies. • Adoption of 11179 Allowed us to “Get on with it”

  11. ISO/IEC 11179AdministeredItems Derivation_Rule

  12. Unique Identifier Administrative Status Registration Status Creation Date Administrative Note(s) Effective Date Change Date(s) Change Description(s) Origin Until Date Created By Modified By Name(s) Definition(s) Stewardship Information Submitter Information Reference Document(s) Classifications ISO/IEC Administered Item Administration Record and Common Attributes

  13. ISO/IEC 11179NCICB Extensions Form Concept Class The Concept Class Provides Semantic Linkage Derivation_Rule

  14. caDSR Implementation of ISO/IEC 11179 Model Conceptual Domain Agent Object Agent Valid Values Cyclooxygenase Inhibitor Doxercalciferol Eflornithine … Ursodiol Data Element Concept Chemopreventive Agent Value Domain Chemopreventive Agent Name Property Chemopreventive Classification Schemes caDSRTraining Representation Name Data Element Chemopreventive Agent Name Context caCORE

  15. NCICB Concept ClassCommon Attributes • Concept Class • Administered Item attributes + • Concept Unique Identifier • Pointer to an externally defined concept • Concept Definition Source • Names the source terminology/ontology/vocabulary • Concept Relationship • Semantic Order of the concepts • NOTE: ISO describes a ‘Concept Relationship’ as a semantic link among two or more concepts. There is a subtlety in our implementation. In caDSR use the concept relationships as more of a derivation rule, naming the order of the concepts - not semantic relationships in an ontologic or object model sense of ‘relationship’. • Object Class, Property, Representation term, Qualifier terms, Value Domains

  16. Why vocabularies/ontology important? • Goal: “Semantically unambiguous, interoperability” • Data Element curators are not necessarily vocabulary experts • NCI had a terminology and vocabulary services group: EVS • Semantic integration is achieved by tying Standard vocabulary identifier codes to the caDSR metadata • The ISO 11179 provides the framework – we were looking for something that could be computed without a human having to read and interpret definitions • By abstracting the curation of concepts in caDSR and instead relying on external vocabularies

  17. EVS and caDSR Distinctions • caDSR is a metadata repository • maintains metadata to permit a user to locate the correct data element defining the characteristics of a piece of datum, an instance of a specific concept, in sufficient detail to be collected and stored on a computer • EVS is a terminology server • provides services for synonymy, mapping between vocabularies, hierarchical structures, Subconcepts, Superconcepts, Roles, Semantic type, etc.

  18. Presentation Outline • caCORE Overview • ISO/IEC 11179 Overview • caDSR Implementation and tooling

  19. caDSR Overview • NCI Data Element Metadata repository and registry • Based on the ISO/IEC 11179 • Designed to integrate caCORE infrastructure • Supports the development and deployment of Data Elements that are used as metadata descriptors, primarily for NCI-sponsored research, with an ever widening girth of end users • Available as an open-source download

  20. caDSR Tools • Goals of caDSR Tools development: • Simplify development and creation of ISO/IEC 11179 compliant metadata by Data Element Curators and UML Modelers • Simplify consumption of Data Elements by end users and application developers • Enhance reuse of Data Elements for all • Enable semantic consistency across research domains • Support metadata life-cycle and governance processes

  21. Curators Developers General caDSR Home Page

  22. Introduction to caDSR Tools • CDE Browser to Search for and Download • Form Builder to Create user specified collections of CDEs • Side-by-Side Compare • CDE Curation Tool to Create Data Elements • Admin Toolto Curate and Administer caDSR - “Power Users” • Sentinel Tool (3.0) • Generates end user ‘Alerts’ triggered by metadata changes • Batch Load to import Administered Items • Excel Loader (MS Excel) • UML Loader (XMI) • Case Report Form Loader (MS Excel) Access, Develop, Manage, Consume

  23. CDE Browser “CONTEXT Browsing” • View, Search, Download • Shopping cart feature • FormBuilder to Build / Download Forms and Data Elements • “Context Browsing” Tree • By Classification Schemes • By Forms • CDE Basic Search Criteria • Google-like search • Sortable search results by clicking on column headings Basic Search

  24. CDE Browser • Advanced Search Criteria • Leverages ISO attributes • Find all with “18254-3” permissible value • Find all with “Gene*” • Find all with “Released” workflow status • Find all with “Standard” Registration status • Etc. Advanced Search

  25. Form Builder • Create and Manage Forms • Organize CDEs into modules within a Form • Attach pdf or word format • Classify Forms into groupings for specific end user communities • “Publish” “Un-Publish” for Browser Catalog visibility • “Printer Friendly” version • Download CDEs

  26. CDE Side-by-Side Compare • CDE Side-by-Side Compare • Build shopping cart, compare CDE metadata side by side • Download to excel spreadsheet

  27. Curation Tool • To Create, Edit or Version: • Data Element Concepts • Value Domains • Data Elements • ISO 11179 Wizard • Construct ISO compliant Data Elements by building up the pieces • Builds Names and Definitions from underlying components. • “Get Associated” • Leverage ISO to retrieve related CDEs • “Block Edit” • “shopping cart” • Assign classification schemes • Versioning

  28. Administration Tool • System Administration • User Accounts and Security • Lists of Values (LOVs) used in content creation • Create “Framework”: • Conceptual Domains • Classification Schemes (basis for organizing CDEs in Browser) • Protocols

  29. Sentinel Tool • Create “Alerts” • User defined triggers based on data element metadata attributes • “notify me of any change to the Value Domain for any CDE on the Adverse Event Form • Generates and emails a report of changes matching “Alert” criteria

  30. Batch Loading • Excel Loaders • Formatted MS Worksheet • Administered Item • Form • UML Loader • XMI representation of a UML Class Diagram • Class Object Class • Attribute Property • Data Element Concept, Value Domain and Data Element derived from the above

  31. Current User Base • Cancer Biomedical Informatics Grid (caBIG) – 820/466/180/ 61%* • Center for Cancer Research (CCR) – 821/573/506/ 12% • Clinical Data Interchange Standard Consortium (CDISC) - 3/0 • Center for Cancer Imaging (CIP) - 238/151/148/ 2% • Cancer Therapy Evaluation Program (CTEP) – 8029/2432/2428/ .1% • Division of Cancer Prevention (DCP) – 427/321/286/ 11% • National Heart Lung and Blood Institute (NHLBI) – 0/0 • Early Detection Research Network (EDRN) – 121/1/1/ 100% • Divisions of Population Sciences and Cancer Control (PS & CC) 85/9 • Specialized Programs of Research Excellence (SPOREs) – 719/197/120/ 39% • Cancer Ontologic Research Environment (caCORE) – 1028/810/810 0% * Total CDEs in this Context / ”Released” workflow status / ”Released” and developed by this context / “Reused” from other contexts

  32. Exploring • National Institute of Neurological and Disorders and Syndromes (NINDS) • National Icelandic Center for Oncology • Cancergrid – UK

  33. Operating Environments • Database Repository • Oracle 9i • Administration Tool • Oracle PL/SQL, Oracle 9i Application Server • CDE Browser • Java, Oracle 9i Application Server • CDE Curation Tool • Jakarta Tomcat

  34. Support • NCICB Help Desk • ncicb@pop.nci.nih.gov and telephone support • Bi-weekly Software meetings • Hosted by Denise Warzel • Telconference and web-cast • Bi-weekly Content Development Meetings • Hostd by George Komasoulis • Telconference and web-cast • Open end user requirements meetings, design reviews and prototyping/feedback sessions • Training • Web-cast and telconference

  35. Contact Information • caDSR Home Page • http://ncicb.nci.nih.gov/core/caDSR • caDSR Users ListServ • http://list.nih.gov to subscribe to caDSR_Users@list.nih.gov • caDSR Training Home Page • http://ncicb.nci.nih.gov/NCICB/core/caDSR/Training • caDSR Training ListServe • http://list.nih.gov to subscribe to caDSR_Training-L@list.nih.gov

  36. Documentation/Recommended Reading Materials • caDSR Homepage: • http://ncicb.nci.nih.gov/core/caDSR • caCORE User Application Manual: • ftp://ftp1.nci.nih.gov/pub/cacore/NCICBapplications/NCICBAppManual.pdf • caCORE Technical Guide: • ftp://ftp1.nci.nih.gov/pub/cacore/caCORE2.0_Tech_Guide.pdf – caDSR APIs • caDSR API Guide: • ftp://ftp1.nci.nih.gov/pub/cacore/caDSR/caCORE2.0_caDSR_API.pdf • caDSR Business Rules • http://ncicb.nci.nih.gov/NCICB/core/caDSR/BusinessRules • caDSR Content Meetings • http://ncicb.nci.nih.gov/NCICB/core/caDSR/Content • caDSR_Users List serv subscribe: • http://list.nih.gov • Send Request for caDSR Account to: ncicb@pop.nci.nih.gov

  37. NCICB Peter Covitz Denise Warzel ScenPro Bill McCurry Tom Phillips Robert Harding Jennifer Brush Larry Hebel Smita Hastak Oracle Edmond Mulaire Ram Chilukuri Prerna Aggarwal Dan Ladino Christophe Ludet Shaji Kakkodi Jane Jiang SAIC Kathleen Gundry Tommie Curtis Brenda Maeske caDSR Tools Team

