240 likes | 371 Views
OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer. January 20, 2014 9:00 – 10:00 AM PST. Agenda. Roll Call. Meeting Etiquette. Announce your name prior to making comments or suggestions Keep your phone on mute when not speaking (#6)
E N D
OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer January 20, 2014 9:00 – 10:00 AM PST
Meeting Etiquette • Announce your name prior to making comments or suggestions • Keep your phone on mute when not speaking (#6) • Do not put your phone on hold • Hang up and dial in again when finished with your other call • Hold = Elevator Music = very frustrated speakers and participants • Meetings will be recorded and posted • Another reason to keep your phone on mute when not speaking! • Use the join.me “Chat” feature for questions / comments / Votes • We will follow Robert’s Rules of Order From eTMF Std TC to Participants: Hi everyone: remember to keep your phone on mute NOTE: This meeting is being recorded and minutes will be posted on TC page after the meeting
Outreach Subcommittee • Status – New Members: • Oracle – Joined • In Progress: EMC, Kaiser Permanente, Shire, Medtronics • Activities / Milestones
Tech Discussion • Status • Timeline • In parallel with other Tech work from charter
Content Classification System Discussion • Classification System Components: • Classification Categories • Taxonomy, hierarchy • Metadata (‘Tags’) • Characterizes content • Content Model • Published set of classifications, metadata for a domain (e.g., eTMF)
Classification Categories Component Classification Categories Hierarchy Classification Categories Component • Hierarchy of categories • Categories, subcategories, content types • Defined relationships with rules: Parent-Child • All categories, content types required to have unique names and machine codes • Each content type is associated with Metadata Properties (includes core and domain-specific) • Content items are linked to content types. • Unique classification and term codes based on Universal Decimal Classification System (UDC) numbering, widely used in libraries worldwide. Human and machine readable; infinitely expandable • Can be described, edited and validated using OWL editor (like open source editor Protégé’) • Supports any simple text vocabulary, including TMF Ref Model and other vocabularies • W3C OWL2 and RDF/XML supported Study Digital Content
Metadata Component Core Metadata Example – File Properties: Metadata Component • Used to tag or index digital content items Metadata Classes: Core - Comprised of four areas: File Properties, Classification, Audit Trail Business Process Domain-specific -- Metadata for a domain in life sciences such as eTMF, finance, legal administration, or others. Uses standards-based terms from groups like NCI Org Specific – Metadata that meets organizations needs – not standards based General – obtained from public standards-based vocabulary terminology resources like dublin core Annotation Properties Metadata about classification categories and metadata: • Core, Org-Specific metadata
Content Model Component Content Model Component • Contains classification hierarchy, metadata in machine readable format:
Classification System – Term Sources Term Sourcing Concepts: • Terms adopted by standards bodies should be used first in eTMF model Primary Term Sources for eTMF Classification System: • Internet Standards Dev Orgs: W3C, IETF, ISO, etc. • Required for interoperability of machine code • NIH NCIthesaurus: Term database for FDA, CDISC, HL7, other orgs • Required for interoperability of clinical / health sciences data Secondary Term Sources for eTMF Classification System: • Industry sources – widely used terms in enterprise content mgmt software, TMF RM *Spec, Table 6, p21
Classification Categories Component Classification Categories Hierarchy and Numbering [1]: Classification Categories Component • Classification hierarchy and numbering is based on UDC library numbering standard and XML naming • Digital dot notation – Designed for human and machine readability • Each number is also a unique code for naming and ordering in the hierarchy • Primary Categories (PC): Three digit. eTMF: 100-200 • Subcategories (SC): Two digit: 10-99 • Content Types (CT): : Two digit: 10-99 • Maximum number of Sub-Category divisions is 5,excluding the 3-digits for the Primary Category [1] Per spec section 2.1.1; 6.0 • Hierarchy Numbering/Naming Considerations: • Flexible, standards-based approach (W3C XML compliant naming*) • Ability to add multiple hierarchy divisions / levels • Proposed: 5 divisions = [100*905) = 5.9x1011Content Types • Uniqueness of numbers – usable as machine code identifiers • Machine readable, human readable • No sorting issues, no need for leading zeros*, no special chars • *Leading zeros in XML syntax are ignored: • http://www.w3.org/TR/REC-xml/
Classification Categories Component Numbering and Naming Scheme Numbering • Primary Categories and Sub-Categories : • Category Code number • Content Type: • Content Type ID Naming • Primary Categories and Sub-Categories • Simple text-based names • Unique name, 64 char limit • Abbreviation – 16 char limit suggested • Compatible with W3C XML naming standards : No special characters : ( ) < > ? / % # @ ! Example: Classification Categories Hierarchy, Naming, Numbering
Classification Categories Component Modifying Classification Category Entities – General Editing Rules Domain Specific – Classifications cannot be deleted –> Reserve/Unreserve – Modifications allowed to some annotation properties (see spec) – Codes (Category Codes, CT Type ID) cannot be generated Organization Specific – Classifications can be deleted – Modifications allowed for classification metadata, annotations – Codes (Category Codes, CT Type ID) can be generated Classification Category, Content Type Editing Rules* **Annotation metadata *Spec, Table 6, p21
Classification Editing Tool – Free, Open Source Protégé (From Stanford University: http://protege.stanford.edu/ ) Protégé Editor: -Edit Classification Taxonomy and Metadata Terms -Validate Taxonomy and Term name compliance -Create valid RDF/XML Ontology *Spec, Table 6, p21
Classification Categories - Summary Proposed Classification System has following Properties: • Based on Naming and Numbering that is W3C XML compliant • No special characters: ( ) & # @ / … etc. • No leading zeros in classification numbers • Based on Universal Decimal Classification (UDC) system for content classification: • 100199 : eTMF Domain • UDC system used in 170+ countries worldwide; expandable, human and machine readable, sortable http://en.wikipedia.org/wiki/Universal_Decimal_Classification • Flexible and customizable for organizations, yet interoperable • Domain classifications – Standardized; Organization-specific classifications – Editable • Defined set of rules for Editing, modifying Taxonomy • Any Organization can Modify/Edit taxonomy using open source editors like Protégé *Spec, Table 6, p21
Classification System – Core Terms Content Classification System – Core Terms needed for Architecture – Objectives: • Classification, Subclassificationconcept - • Supports RDF/XML, OWL languages • Non-domain specific, generic terms • Easily understandable by anyone - conveys concept • Conveys hierarchy • No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s • First priority – Source terms from standards bodies *Spec, Table 6, p21
Classification System – Core Terms Content Classification System – Core Terms needed for Architecture • Classification, Subclassificationterm concept: Proposed Term *Spec, Table 6, p21
Classification System – Core Terms Content Classification System – Core Terms needed for Architecture • Classification, Subclassificationterm concept: Proposed Term *Spec, Table 6, p21
Classification System – Core Terms Content Classification System – Core Terms needed for Architecture – Objectives: • Content Type concept • Supports RDF/XML, OWL languages • Non-domain specific, generic terms • Easily understandable by anyone – conveys concept • No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s • First priority – Source terms from standards bodies *Spec, Table 6, p21
Classification System – Core Terms Content Classification System – Core Terms needed for Architecture • Content Type term concept: Proposed Term *Spec, Table 6, p21
Classification System – Core Terms Content Classification System – Core Terms needed for Architecture • Content Type term concept: Proposed Term *Spec, Table 6, p21
Draft Agenda: Next Meeting • Roll call • Reports • Outreach • Tech Discussion: Classification Layer: Core Metadata (Charter item 2, p.2) • New business