270 likes | 400 Views
http://metacognition.info/presentations/SW-usecases-outcomes-research.ppt. Semantic Web use cases in outcomes r esearch. Experiences from building a patient repository and developing standards. Chimezie Ogbuji Metacognition Inc. (Owner). Outline. Me
E N D
http://metacognition.info/presentations/SW-usecases-outcomes-research.ppthttp://metacognition.info/presentations/SW-usecases-outcomes-research.ppt Semantic Web use cases in outcomes research Experiences from building a patient repository and developing standards Chimezie Ogbuji Metacognition Inc. (Owner)
Outline • Me • Semantic Web and Semantic Web technologies • RDF, GRDDL, OWL, RIF, and SPARQL • Cleveland Clinic Semantic DB project • Content repository • Data collection workflow • Quality and outcomes reporting • Cohort identification • Use of the system
Me and Semantic Web • I’ve been developing software using standards of the Semantic Web since 2001 • Began working on Cleveland Clinic SemanticDB project in 2003 • Began working in the World-Wide Consortium (W3C), developing the SPARQL and GRDDL standards in 2007 and 2006, respectively • I contribute to and maintain several open source software projects related to Semantic Web technologies: • RDFLib (https://code.google.com/p/rdflib/) • FuXi (https://code.google.com/p/fuxi/) • Akamu (https://code.google.com/p/akamu/)
The Semantic Web • The Semantic Web • A vision of how the existing WWW can be extended such that machines can interpret the meaning of data involved in protocol interactions • A vision of the founder of the World-wide Web Consortium (W3C) and inventor of the internet (Tim Berners-Lee) • Semantic Web technologies / standards • A technological roadmap that attempts to realize this • Layers of W3C standards (“Layer cake”)
http://www.bnode.org/blog/2009/07/08/the-semantic-web-not-a-piece-of-cakehttp://www.bnode.org/blog/2009/07/08/the-semantic-web-not-a-piece-of-cake
“Focus” standards • Resource Description Framework • Gleaning Resource Descriptions from Dialects of Language • SPARQL Protocol And RDF Query Language • Ontology Web Language
RDF • A framework for representing information in the Web. • Motivation • machine interpretable metadata about web resources • mashup of application data • automated processing of web information by software agents • Graph data model (directed, labeled graph) • Nodes and links are labeled with URIs • Some nodes are not labeled (Blank nodes) • Links are called RDF sentences or triples http://www.w3.org/TR/rdf-concepts/
GRDDL • A protocol for sowing semantics in structured (XML) web content for harvest • Vast amount of latent semantics in web documents • Web content today is primarily built for human consumption http://www.w3.org/TR/grddl/
Faithful Rendition “By specifying a GRDDL transformation, the author of a document states that the transformation will provide a faithful rendition in RDF of information (or some portion of the information) expressed through the XML dialect used in the source document.” • Licenses an interpretation of an XML document that is certified by the author
Architectural value • XML is well suited for messaging, data collection, and structural validation • RDF is well suited for expressive logical assertions, querying, and inference. • RDF graphs can be created, update, deleted, etc. (managed) using a particular XML vocabulary • vocabulary can be specific to a particular purpose rather • GRDDL facilitates mutually beneficial use of XML and RDF processing and representation
SPARQL • The query language for RDF content • It operates over an RDF dataset • Comprised of named RDF graphs and a single RDF graph without a name • Operationally and structurally similar to SQL • Many implementations (including the one we used) build on existing relational database management systems • Translate SPARQL queries into SQL queries Elliott et al. A complete translation from SPARQL into efficient SQL. 2009 http://www.w3.org/TR/sparql11-query/
OWL • Language for describing and constraining the semantics of an RDF vocabulary • Such constraints (often hierarchical) are called ontologies • An ontology specifies a conceptualization of a particular domain as categories, relationships between them, and constraints on both. • By defining an OWL document for the terms in an RDF graph, additional RDF sentences can be inferred • Additionally, an RDF graph can be determined to be consistent or inconsistent with respect to the ontology • Both tasks can be done by a logical reasoning engine
Semantic Database (SDB) • Cleveland Clinic’s Heart and Vascular Institute (HVI) • Challenges: • fragmented gathering and storing of clinical research data • compartmentalization of medical science and practice • clinical knowledge is typically expressed in ambiguous, idiosyncratic terminology • problematic for longitudinal patient data that can feasibly span multiple, geographically separated sources and disciplines • Longitudinal patient record: • patient records from different times, providers, and sites of care that are linked to form a lifelong view of a patient’s health care experience http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClinic/
Project goals • Create a framework for context-free data management • Usable for any domain with nothing (or little) assumed about the domain • Expert-provided, domain-specific knowledge is used to control most aspects of • Data entry • Storage • Display • Retrieval • Formatting for external systems
Components • Content repository • supports data collection, document management, and knowledge representation for use in managing longitudinal clinical data • manages patient record documents as XML and converts them to RDF graphs for downstream semantic processing • Data collection workflow • process of transcribing details of a heart procedure from the EHR into a registry • RDF used as the state machine of a workflow engine Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting. 2012 Ogbuji. A Role for Semantic Web Technologies in Patient Record Data Collection. 2009
Workflow State as RDF Dataset • Each task is an XML document in a content repository • Mirrored into a named RDF graph that shares a web location (the name) with the document • (SPARQL) query is dispatched against a workflow dataset to find tasks in particular states or assigned to particular people • Applications interact with task information and fetch: • JSON and XML representations (for client-side web applications) • XHTML documents that render as faceted views of a collection of tasks • faceted view includes links to subsequent stages in workflow and into other web applications on server
Reporting challenges • Reporting places a heavy burden on institutions to produce data in specific formats with precise definitions • Definitions vary across reports • makes it difficult to use the same source data for all reports • Institutions are typically forced to manually abstract the data for each report • This is done separately to conform to the requirements for each report Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting. 2012
Components: reporting • Quality and outcomes reporting • generate outcomes reports both for internal and external consumption • internal reports were generated monthly and external reports are generated quarterly • quarterly reports submitted to Society of Thoracic Surgeons (STS) Adult Cardiac Surgery National Database and American College of Cardiology (ACC) CathPCI Database • submissions are required for certification Pierce et al. SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting. 2012
Cohort identification • SPARQL and RDF datasets are well-suited as infrastructure for a longitudinal patient record data warehouse • HVI software development team partnered with Cycorp to build a cohort identification interface called the Semantic Research Assistant (SRA) • Based on the Cyc inference engine • a powerful reasoning system and knowledge base with built-in capability for natural language (NL)processing, forward-chaining inference and backward-chaining inference. • incorporates Cyc's NL processing to permit a user to compose a cohort selection query by typing an English sentence or sentence fragment Lenat et al. Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries. 2010.
RDF dataset warehouse • CycL to SPARQL • domain-specific medical ontologies in conjunction with the Cyc general ontology are used to convert the NL query into a formal representation and then into SPARQL queries. • SPARQL queries are submitted to the SemanticDB RDF store for execution • Cleveland Clinic’s registry of 200,000 patient records comprises an RDF graph of roughly 80 million RDF assertion
Dataset topology • An RDF dataset with no default graph and one named graph per patient record (a patient record graph) • Beyond identifying the cohort, most subsequent query processing happens within a single patient record graph • In our vocabulary, there are instances of PatientRecord, Operation, Patient, MedicalEvent, HospitalEpisode, etc. • PatientRecord resources share a URI with their containing graph
GRAPH operator can be used to optimize the search space • Optimal for the following cohort querying paradigm • Constraints in the first part of query are cross-graph and the second part are intra-graph
Use of system • From 2009 through June of 2011 • over 200 clinical investigations utilized SemanticDB to identify study cohorts and retrieve appropriate data for analysis • studies ranged from relatively simple feasibility assessments to extremely complex investigations of time-related events and competing risks of the patient experiencing a certain outcome after treatment • prior cohort identification and data export queries for studies would have been performed by a skilled database administrator (DBA) interpreting instructions from domain experts • Using SemanticDB and the SRA, a non-technical domain expert performed most of the queries