250 likes | 383 Views
Connecticut is Data Rich but Information Poor. Our Vision: Connecting the Silos. How PATH Works Example of PATH installed as P20WIN PATH vs Desktop Integrator. PATH Presentation CT Data Collaborative June 2014. Virtual Data Warehouse
E N D
How PATH Works • Example of PATH installed as P20WIN • PATH vs Desktop Integrator PATH PresentationCT Data Collaborative June 2014
Virtual Data Warehouse • Identity Resolution across multiple sources that don’t share a Gold Standard Identifier • HIPAA and FERPA Compliant • Always transfers Fact data separately from Demographic data or Personally Identifiable Information • Data Owners control which data is exported to a location outside of their data center • Data Owners approve all queries How PATH Works
Completed Phases • 2007 - Established in Statute - Public Act 07-02 • 2008 - Initial Development as CHIN, inclusion of 4 initial data sources • 2009 - Implemented advanced record linkage in a virtual data warehouse • 2011 - Scalability to 1M+ individuals, ability to add additional data sources and manage metadata w/o code modifications, unlimited data sources • 2014 - Implemented for P20WIN 40M Records, 1.6B Data Elements Now Available to CT Agencies and Organizations as PATH PATH History
People Records • Demographic Information such as Name, Address, SSN, DOB, etc. • Also known as PII – Personally Identifiable Information • Fact Records • Education, Health, Labor, etc. Information about a person BUT without the PII information • De-Identified or Anonymized Data Data Categories
PATH Remote Software installed at each Participating Agency • Agency Data Steward uses the PATH Metadata Editor to Identify: • Table/Record Schema of Agency Data • Data at the Field or Table Level marked Available or Unavailable for Download • Common Data Element fields used for linking records - provides Identity Resolution across the different sources Agency Data Agency Data Agency Data Agency Data Step 1 SDE CCC Metadata Editor & ETL CSU Metadata Editor & ETL DOL Metadata Editor & ETL Metadata Editor & ETL
During Remote Initialization the Extract/Transform/Load function of PATH builds a Record Index of the People Records from each Data Source Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Agency Data Record Index Step 2 SDE CCC CSU Record Index DOL
PATH Software installed at a Main Location - for P20WIN this location is DAS/BEST Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Record Index Agency Data Step 3 SDE CCC CSU Record Index Main @ DAS/BEST DOL Probabilistic Integrator - Pi UI, Security, Workflow, Query Engine
During Main Initialization Using each Agency’s Record Index, Extracts Common Data Elements from People Records Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Agency Data Record Index Step 4 SDE CCC CSU Record Index Main @ DAS/BEST DOL Probabilistic Integrator - Pi UI, Security, Workflow, Query Engine
During Main Initialization Using each Agency’s Record Index, Extracts Common Data Elements from People Records Sends them to Main & Loads into Memory ONLY Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Agency Data Record Index Step 4 SDE CCC CSU Record Index Main @ DAS/BEST DOL Probabilistic Integrator - Pi UI, Security, Workflow, Query Engine
During Main Initialization Extracts Common Data Elements from People Records using each Agency’s Record Index Sends them to Main & Loads into Memory ONLY Combines multiple records for individuals into Clusters via Probabilistic Integration Utility Table of Clusters containing only Agency Record Indices remains in memory Agency PII flushed from memory Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Record Index Agency Data Step 4 SDE CCC CSU Record Index Main @ DAS/BEST DOL Probabilistic Integrator - Pi UI, Security, Workflow, Query Engine
Use UI features to establish user Roles, Login, etc. Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Agency Data Record Index Step 5 SDE CCC CSU Record Index Main @ DAS/BEST DOL Probabilistic Integrator - Pi UI, Security, Workflow, Query Engine
Use UI features to establish user Roles, Login, etc. • Use UI features to: • Create a Query • Approve a Query • Schedule a Query Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Agency Data Record Index Step 5 SDE CCC CSU Record Index Main @ DAS/BEST DOL Probabilistic Integrator - Pi UI, Security, Workflow, Query Engine
Use UI features to establish user Roles, Login, etc. • Use UI features to: • Create a Query • Approve a Query • Schedule a Query • Use Query Engine to: • Build Agency Query Requests • Uses ONLY Data Available for Download in Query Request Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Agency Data Record Index Step 5 SDE CCC CSU Record Index Main @ DAS/BEST DOL Probabilistic Integrator - Pi UI, Security, Workflow, Query Engine
SDE Query Engine uses Clusters of Indices to Get the needed Agency Records Indices Queries Only Agency Data marked Available for Download Transfers only data marked Available for Download to the Main Downloads Only Approved Queries Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Agency Data Record Index Step 6 CCC CSU DOL Record Index Main @ DAS/BEST Probabilistic Integrator - Pi UI, Security, Workflow, Query Engine De-identified Integrated Data
Remote Components • Metadata Editor • Extract, Transform and Load Module • Main Components • Integration Engine • User Interface • Security • Workflow Module • Query Engine with Filtering Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Agency Data Record Index PATH Components Record Index Main @ DAS/BEST Metadata Editor & ETL Probabilistic Integrator - Pi Integration Engine ` UI, Security, Workflow, Query Engine UI, Security, Workflow, Query Engine De-identified Integrated Data
Security • Personally Identifiable Information never written outside of Agency Data Center • Encrypted transfer of all data • PII & Fact records never transmitted together • Audit logs • Query Approval Workflow • Multiple Secure User Roles • Ease of Use • System Administration • Data Management • Query Filtering • Query results delivered as de-identified data Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Record Index Agency Data PATH Functionality Data Mgmt Record Index Main @ DAS/BEST Metadata Editor & ETL Probabilistic Integrator - Pi Integration Engine ` Encrypted Xfer PII & Facts separate Xfer No PII User Roles UI, Security, Workflow, Query Engine UI, Security, Workflow, Query Engine Audit logs Sys Admin De-identified Integrated Data Approval req’d No PII Query Filtering
Remote Components • Metadata Editor • Extract, Transform and Load Module • Main Components • Integration Engine • User Interface • Security • Workflow Module • Query Engine with Filtering Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Metadata Editor & ETL Agency Data Agency Data Agency Data Record Index Record Index Record Index Agency Data Competitor Components Data Mgmt Record Index Metadata Editor & ETL Integration Engine ` Encrypted Xfer PII & Facts separate Xfer No PII User Roles UI, Security, Workflow, Query Engine UI, Security, Workflow, Query Engine Audit logs Sys Admin De-identified Integrated Data Approval req’d No PII Query Filtering
Desktop Integration Engine • Minimal Security • No Encrypted Transfer of Data • No Audit Logs • Transfer of Facts with PII • No Secure Logins • FTP or Thumb Drive Transfers • No Anonymized Data • No Access Control - No Approval Workflow • No Chain of Custody Assurance – Possibility for Cherry-Picked Data Agency Data Agency Data Agency Data Agency Data Competitor Deficits Integration Engine ` Copies of Agency Data PII Visible Integrated Data