410 likes | 550 Views
caTIES 2.0 APIII 2006. Rebecca Crowley Kevin Mitchell. Presentation Overview. caTIES – Goals Tissue Banking Collaboration Grid Trust Fabric Concept coding and recoding Data stewardship, data sharing and honest brokering Interoperability within a grid community. caTIES – Goals.
E N D
caTIES 2.0 APIII 2006 Rebecca Crowley Kevin Mitchell
Presentation Overview • caTIES – Goals • Tissue Banking Collaboration • Grid Trust Fabric • Concept coding and recoding • Data stewardship, data sharing and honest brokering • Interoperability within a grid community University of Pittsburgh
caTIES – Goals • Extract coded information from free text Surgical Pathology Reports (SPRs) using controlled terminologies to populate caBIG-compliant data structures. • Provide researchers with the ability to query, browse, and acquire annotated tissue data and physical material across a network of federated sources. • Provide a collaboration space in which researchers may construct and manage retrospective tissue distribution protocols. • Pioneer research for distributed text information extraction within the context of caBIG. caTIES modules will be developed as generalized components available in caBIG, to encourage reuse by other caBIG projects that require tissue information extraction. The Cancer Text Information Extraction System (caTIES) pilot project will focus on two important challenges of bioinformatics: (1) information extraction from free text and (2) access to tissue. Specifically, caTIES has four primary goals: University of Pittsburgh
Tissue Banking Collaboration • Administrator initiation of a Research Protocol – The IT System’s administrator is responsible for providing support for the electronic capture of research information. The Administrator works with Researchers, Health Care Professionals, IRBs and others to establish repositories of electronic data often categorized by study • Researcher case discovery and order generation – In conducting tissue sample based retrospective research studies, Researchers examine free text descriptions of those tissue or delegate the responsibility of gathering a tissue collection to Honest Brokers. • Honest Brokerorder facilitation – Work with Tissue Bank personnel to acquire tissue and tissue related materials. Work with courier system to deliver orders to researchers. These orders often need to maintain a degree of atomicity University of Pittsburgh
Administrator – Create New Study University of Pittsburgh
Administrator – Assign Organization Role University of Pittsburgh
Administrator – Add User to Study as Role University of Pittsburgh
Researcher Perspective University of Pittsburgh
Researcher - Graphical Search Specification University of Pittsburgh
Honest Broker – Verifies Physical Material University of Pittsburgh
Honest Broker – Relays Order Status back to Researcher University of Pittsburgh
Grid Trust Fabric • Electronic Components (4 Pillars of security) • Identity (DN or public key) • Isolation • Traceability • Authentication (TLS handshake) • Prevent Identity Theft • Authorization (gridmapfile or Globus+OGSA-AuthZ+Services) • Access Control • Resource Control • Audit (logfiles) • Troubleshooting • Forensics • Accounting University of Pittsburgh
Grid Trust Fabric (cont) • Social Fabric • Narrative DeIdentification defined by levels or kind of DeIdentification. • Narrative redactors • Concept Coders • Information Extraction to Synoptic Structures • IRB must endorse federated environment • Individuals must maintain a level of integrity University of Pittsburgh
Current caTIES Security Summary of caTIES’ current security solution • User Registration with IMS – GUMS • User Registration with caTIES System – CTRM • Authentication and Authorization – GUMS + CTRM • User Access to caTIES Resources – caTIESClient University of Pittsburgh
User Authentication Scenario: Users log into the caTIES client with their GUMS username and password. The caTIES client securely connects to GUMS with the user’s GUMS X.509 certificate and retrieves the GUMS user proxy. The caTIES client uses the user proxy to securely connect to the EVS service exposed by caTIES. This is essentially a connectivity check, and any caTIES secured service could be used. User Authentication - GUMS User Authentication University of Pittsburgh
User Authorization - CTRM • CTRM contains user authorization information. It contains information about how users are related to organizations. It classifies these user-organization relationships by the following roles - Researcher, Institution Honest Broker or Local Administrator. • The CTRM service is responsible for issuing queries to the CTRM. When a user is authenticated, the user proxy’s distinguished name is sent as a query parameter to the CTRM service by the caTIES Client. • CTRM Services in turn fetches the user’s role from CTRM and sends the user’s role information to the client. University of Pittsburgh
De-Identification • caTIES De-Identification service scrubs pathology report, creates de-identified identifiers, loads ‘De-Identified’ caTIES datastore • caTIES de-identification service wraps the de-ID™ software; easy to switch • Safe-Harbor method removes HIPAA mandated identifiers • Creates tokens for names and preserves temporal relationships • De-ID will work with adopters as each site comes on-line • Currently evaluating Harvard Scrubber open-source option University of Pittsburgh
Concept Coding and Recoding • Changing dimensions necessitate recoding • Vocabulary revisions • Algorithmic enhancements and bug fixes • De-Identification redactor errors • What is the necessary level of auditing for recoding? University of Pittsburgh
Concept Coded Structured Data University of Pittsburgh
Data stewardship, data sharing and honest brokering • CaTIES maintains data in three databases that are schematically equivalentbut differ in their deployment location, security configuration, and the data being held. Each Role has limited access to the set of data sources • public datastore – (Researcher) • private datastore – (Honest Broker) • central tissue resource manager datastore (Administrator, Researcher, Honest Broker) University of Pittsburgh
caTIES Model Three points for Data Access: University of Pittsburgh
Interoperability within a grid community • MDA - caBIG uses Model Driven Architecture to automatically generate Object Relational Mapping (ORM) middleware. • Following caBIG’s semi-automated guidelines for application development guarantees grid compliant data services. • caBIG annotates data and service interfaces with a conceptual ontology. This provides an environment for intelligent discovery and automatic data transformation. University of Pittsburgh
caTIES Development Process • Design UML Model in Enterprise Architect • Metadata annotation using NCIT (public model only) • CDEs are registered in the caDSR in the ‘caBIG’ context • Run Model through caCORE SDK to generate API and caTIES Silver Application • Implement API generated by the SDK for caTIES’ Client’s functions • Utilize caGrid SDK to generate Gold front-end to the caTIES Silver Application University of Pittsburgh
University of Pittsburgh caTIES Phase 2 Grid-Enabled [Public] Model
Development Process Summary University of Pittsburgh
Access to caTIES Public Resources • Dual Access to caTIES • Via caTIES Client • Via caGrid Gold API. The caTIES Gold Service provides programmatic access to caTIES’ resources. The caGrid Browser implements this API to query resources. University of Pittsburgh
Sample Query Silver Format DetachedCriteria p = DetachedCriteria.forClass(PathologyReport.class); p.add(Restrictions.like(“uuid","e44ddc0f-c589-11da-bbee-5103a71c2a47")); List resultList = appService.query(p,PathologyReport.class.getName()) ; for(int i=0;i<resultList.size();i++){ PathologyReport pr = (PathologyReport)reslutSet.get(i); pr.getDocumentText(); } Gold Format <caBIGXMLQuery name="MyQueryTest3"> <Target name="edu.upmc.opi.caBIG.caTIES.database.domain.PathologyReport"> <Objects name="edu.upmc.opi.caBIG.caTIES.database.domain.PathologyReport"> <Property name="uuid" predicate="equal" value="e44ddc0f-c589-11da-bbee-5103a71c2a47"/> </Objects> </Target> </caBIGXMLQuery> University of Pittsburgh
Query run by caTIES Client University of Pittsburgh
Query run through caGrid Browser University of Pittsburgh
Query run through caGrid Browser University of Pittsburgh
Query run through caGrid Browser University of Pittsburgh
Equivalent Results • Both methods return the same Pathology Report caGRID Browser caTIES Client University of Pittsburgh
CaDSR CDEs CAP Protocols University of Pittsburgh
Shallow Structure Derivation based on conceptual matching. University of Pittsburgh
Deep Structure Inference Based on Discourse Reasoning University of Pittsburgh