190 likes | 419 Views
Education Data Warehouse Building Blocks: Identity Matching and Data Governance. IPMA May 21, 2013. AGENDA. Vision (Marc Baldwin) Identity matching (John Sabel) Data Governance (Melissa Beard) Questions. IDENTITY MATCHING. Protecting Personally Identifiable Information ( PII) .
E N D
Education Data Warehouse Building Blocks: Identity Matching and Data Governance IPMA May 21, 2013
AGENDA • Vision (Marc Baldwin) • Identity matching (John Sabel) • Data Governance (Melissa Beard) • Questions
Protecting Personally Identifiable Information (PII) • Step 1: Isolate PII data from all other data • Link PII data in isolated environment to create linking IDs. • Perform data analysis linked data in different environment. This environment has no PII data. • Step 2: Redact data • FERPA (Family Educational Rights and Privacy Act) requirements. • Subject to data sharing agreements.
P-20 Data Warehouse Inputs through OutputsPersonally Identifiable Information (PII) is encapsulated in the MDM Oracle database. Data Store Output (Business Intelligence) Identity Matching Input Sector Data Providers DEL OSPI SBCTC PCHEES ESD DRS NSC L&I Data Sets Operational Data Store (SQL Server Database) Non-PII Data (Bulk of Data) Master Data Management (MDM) (Oracle Database) Cubes Linked IDs Only PII Data Stars (PII Data Stripped) 5
Identity Matching Challenges • Most of education data involves deduplication (i.e. consolidation). • Between sources of data, varying number and quality of common identifiers. • Public post-secondary instruction data has SSNs but K12 data does not. • Idiosyncrasies in data • For example, Jan 1st birth dates are often used when the birth day and month are unknown. • Twins
Identity Matching Challenge Matrix * Example: Linking birth certificate data to hospitalization data. ** Example: Post-secondary instruction data. A single student can be enrolled in multiple colleges, both longitudinally (over time) as well as at the same time.
Identifiers in the Perfect World SELECT K12 .*, College.* FROM K12 INNER JOIN College ON K12.Bulletproof_Surefire_Global_Student_ID = College.Bulletproof_Surefire_Global_Student_ID
Identifiers in the Other Perfect World SELECT K12 .*, College.* FROM K12 INNER JOIN College ON K12.SSN = College.SSN Note: Every student has a valid, properly assigned SSN.
Addressing Identity Matching Challenges • Deduplicate each data source first • You then can take advantage of source specific identifiers. For example, K12 data has the State Student Identifier (SSID). • Merge deduplicated data source with the rest of the data warehouse.* * This is itself a deduplication process.
Identity Matching Opportunities • Use name change data • For example, DOH marriage and divorce data.* * As of 2012, marriage and divorce contains inferred name changes for females only.
Identity Matching Mechanics • First, deterministically deduplicate data • Always strive first to minimize false positives and then try to minimize false negatives. • These matches are then auto-merged. • Second, use probabilistic techniques to auto-merge additional data • Last, use probabilistic techniques to create manual review sets • These are selectively merged .
ERDC Data Governance • No data warehouse without data governance • Rules of engagement • Goal: Link data so it can be shared • Data contributors • Data sharing policy workgroup • Defined set of tasks • Temporary • Small group of problem-solvers
P-20W Data Governance Committee Structure Office of Financial Management Education Research & Data Center (ERDC) Agency directors or deputies from agencies contributing data ERDC Guidance Committee Research Coordination Committee Data Custodians Committee Data Stewards Committee Experts directly familiar with data from their agency used in research. Technical experts responsible for the technical delivery of data to and from the warehouse. Policy experts who interact with agency decision-makers, stakeholders, and researchers.
CONTACT INFORMATION • Marc Baldwin, OFM Assistant Director, Forecasting • Marc.baldwin@ofm.wa.gov • 360-902-0590 • John Sabel, Education Research Analyst • John.sabel@ofm.wa.gov • 360-902-0943 • Melissa Beard, Data Governance Coordinator • Melissa.beard@ofm.wa.gov • 360-902-0584