160 likes | 171 Views
This conference provides an overview of the Virginia Longitudinal Data System (VLDS) components, including CRM, dashboard, workflows, and the Shaker tool.
E N D
25th Annual STATS-DC 2012 Data Conference - -Virginia Longitudinal Data System(VLDS) July 12th, 2012
Agenda • Introductions • Background • CRM • Dashboard • Overview • Accounts • Contacts • Entities (RP, Artifacts, Contracts) • Workflows • Portal • User Interface • Data Dictionary and Selection Tool • Data Request Tool • Shaker • Questions / Discussion
VLDS Component Overview • SLDS Components • Portal • Security • Workflow • Reporting • Lexicon • Shaker • Data • Service Oriented Architecture • Database Agnostic • Supports Federated or Warehouse Data Models SLDS Portal Workflow Reporting Data Security Shaker Lexicon Security
Data/Communication Graph Portal/CRM Lexicon Exp DB DA Exp DB DA Shaker Exp DB DA
Shaker Ingredients • 2 parts Data Request Rx • 2 parts Data Request Parsing • 1 part Identity Resolution • 2 parts Query Execution • Yield: Dataset Output
Shaker Details • Identity Resolution • Source demographics are not constant, so all demographic records are collected in the exposed database • The demographic records are grouped by local identifier, ranked, and assigned an alternate identifier • The top results are hashed and sent to the Shaker • Matching is performed using deterministic fields first, then by probabilistic algorithm involving: • Hashed First Name (modified Jaro-Winkler distance algorithm) • Hashed Last Name (modified Jaro-Winkler distance algorithm) • Hashed Month of birth • Hashed Year of birth • Hashed Gender • Location (ZIP to FIPS region) • Product: ID Map of alternate identifiers
Shaker Details • Query Execution • Each set of alternate identifiers is sent back to its exposed database to retrieve the data requested by the researcher/user • Dataset Assembly • A final identifier is generated to replace the alternate identifiers, thereby achieving our super-secret, double-de-identified directive • The final output is a dataset for distribution to a researcher/user or to back an aggregate report • Notifications are sent back to the CRM component • Then everything is reset for the next data request…
Questions & POCs Sponsors Bethann Canada – bethann.canada@doe.virginia.gov Tod Massa – todmassa@schev.edu Jeremy Deyo – jeremy.deyo@vec.virginia.gov Program Matt Bryant matthew.bryant@doe.virginia.gov Technical Ajay Rohatgi (Technical PM) – ajay.rohatgi@vita.virginia.gov Will Goldschmidt (Workflow & Portal PM) – will.goldschmidt@vita.virginia.gov Kathy Graham (Reporting PM) – kathy.graham@vita.virginia.gov Aaron Schroeder (Lexicon & Shaker) – aaron.schroeder@vt.edu
Portal Features(Public Facing) • General Information • FAQs • Aggregated Data Reports • Links to Agency Reports • Request for Named User Account (Potentially)
Portal Features(Named Users) • My VLDS • Team Member Management • Research Information Management (Who, What, Where, When, Why) • Data request and retrieval • Document management (NDAs, research papers, etc.) • Ability to check status, modify or cancel account and/or data request • Help / Training • Password reset • Reports • Data Request Tool (DRT) • Data Dictionary & Selection Tool
Workflow Features • Manage Contacts (Researchers) • Create / Edit / Approve / Disapprove: • Research Purposes • Restricted Use Data Agreements • Data Packages • Artifacts (Documents) • Automated Email Notifications and Tasks • Document Storage • Audit Logs • Integration with Microsoft Outlook