590 likes | 633 Views
New York State Data Warehouse Environment (NYSHADE): Erie & Niagara Counties. Monday, July 21, 2014 1:00-2:30. Introductions. OTDA Staff Linda Camoin Richard Umholtz HUD TA Staff Chris Pitcher, ICF International Erie & Niagara County CoC Members.
E N D
New York State Data Warehouse Environment (NYSHADE): Erie & Niagara Counties Monday, July 21, 2014 1:00-2:30
Introductions • OTDA Staff • Linda Camoin • Richard Umholtz • HUD TA Staff • Chris Pitcher, ICF International • Erie & Niagara County CoC Members
What is a Data Warehouse? • In essence, it is a central database for organizing and analyzing data from more than one source, such as multiple HMIS implementations and/or state mainstream systems • No fixed definition - there are many possible ways to structure a data warehouse
What is a Data Warehouse? Typical data warehouse components Source data: Data elements to be collected from defined participants Unique identifier: A way to de-duplicate records Common “schema” and process for extraction, transformation, and loading of data (ETL) A relational database Security: Secure Socket Layer (SSL), encryption, firewall Analysis and reporting software
What is a Data Warehouse? Data Sources (e.g., HMIS implementations and/or state mainstream data systems) ETL: Extract, Transform, Load Warehouse New Structure Data Re-organize Reports
Why an HMIS Data Warehouse? A good data warehouse will: Provide the RIGHT data To the RIGHT people At the RIGHT time RIGHT NOW Data in – information out Multiple data sources can be analyzed in combination
Most reports require data from many tables Most HMIS is designed for data input and ease of day-to-day client transactions, but less so for ease of reporting Data warehouses are designed for ease of data retrieval, analysis, and reporting across large data sets Why an HMIS Data Warehouse?
Why an HMIS Data Warehouse? • An effective tool for decision-making support • The potential uses of HMIS/human data warehouses: • Analyzing regional or state demographics, trends, and outcomes • Assessing use of mainstream services by persons experiencing homelessness • Calculating the cost of homelessness • Determining successful interventions to prevent and end homelessness • Informing the development of regional or state 10-year plans to end homelessness
Overview The New York State Data Warehouse Project is an initiative by the Office of Temporary and Disability Assistance (OTDA) to understand the nature and scope of homelessness across the State of New York. The Data Warehouse, phased in over a few years, will be created, maintained and operated by OTDA.
Phase I Begins with the Solutions To End Homelessness Program (STEHP) data from local Homeless Management Information System (HMIS) implementations that serve programs receiving STEHP funds
Phase I Will require technical efforts to engage each local HMIS implementation Assess CSV or XML extract capabilities Assess data quality Develop policies and procedures for the extraction, exchange, security, privacy, confidentiality and reporting of the STEHP data
Phase II Will build upon the structure created in Phase I Expand the data set to include all data contained within all New York State HMIS implementations Will require technical efforts to engage the remaining local HMIS implementations that do not have STEHP funds
Phase II Assess CSV or XML extract capabilities Assess data quality Develop policies and procedures for the extraction, exchange, security, privacy, confidentiality and reporting of the HMIS complete data set
Phase III • Will build upon Phase I and Phase II • Begin to incorporate non-HMIS data sources at the state-level • Programs within and outside of OTDA
Phase III Assessing the potential and viability of these non-HMIS data sources Assessing data quality Assessing applicability to homeless and at-risk client-level data Assessing the ability of the data partner to contribute to project efforts
Beyond Phase III On-going analysis of the nature and scope of homelessness across the State of New York Widespread Benefits
NYS OTDA Data Warehouse Pilot To accomplish the Data Warehouse project, OTDA enlisted technical assistance from US Department of Housing and Urban Development (HUD) in the Spring of 2012 Several pilot discussion meetings were convened with Continuums of Care throughout New York State
NYS OTDA Data Warehouse Pilot • Pilot Communities: • Syracuse/Onondaga County CoC • Utica/Oneida County CoC • Albany City/County CoC • Ithaca/Tompkins County CoC • Ulster County CoC
PPI Definition • Private Personal Information (PPI) is a category of sensitive information that is associated with an individual person. • PPI may be used to: • uniquely identify, contact, or locate a single person • enable disclosure of non-public personal information
Why Do We Need PPI? • To identify a unique individual in receipt of service • Uniqueness is crucial to differentiate between an individual with multiple instances of homelessness vs. several individuals with a single instance each • Allows NY State to capture a more accurate picture of homelessness throughout the state and not just NYC.
What We Are NOT Doing With PPI • PPI is NOT being used to track individuals • PPI is NOT being reported on in any way • PPI is NOT shared with any other entity, including: • Other agencies/providers • BHHS staff • Database Users • Programmers
What We Are NOT Doing With PPI • PPI is NOT stored in plain text • PPI is NOT stored alongside of any service data • PPI is NOT available to any system other than the HMIS data warehouse de-duplication process
How Is PPI Used and Maintained? • PPI is immediately stripped from all data files and funneled into the HMIS de-duplication process. • PPI is securely stored in an encrypted format in its own data store • The de-duplication process uses this information to determine if the PPI represents an individual that has previously been brought into the system
How Is PPI Used and Maintained? • PPI that matches previously existing information will not be stored and a previously generated unique identifier will be returned from the de-duplication process • PPI that is determined to be new will be stored and a new unique identifier will be generated and return • Note: The generated unique identifiers will not be traceable to an individual. Nobody, including the individual and state staff, will be aware of any individual’s unique identifier.
How Do We Get HMIS Data • Most HMIS Systems in use are currently required to have a data export function. Data can be exported as a .csv or .xml file • These files will be generated from the HMIS system, converted to zip format, and sent to OTDA via a Secure Socket Layer (SSL) HTTPS/SSL Transfer Through State Firewall OTDA Secure Server HMIS Data File
Priorities • Security – Information needs to be safely secured • Timeliness – COC experience needs to treated as a professional and productive interaction • Data Integrity – The information in the incoming data needs to be usable • Identification – Increased recognition of duplicate records across COCs for more accurate reporting • Adaptability – Need to be able to accommodate both current and future requirements • Reports – HMIS and HUD
HMIS Data Upload Security • A secure single point of access web application available for PCs with internet connectivity • Protected by NYS Directory Services utilizing “Siteminder” security for authentication and authorization • Authorized users will be granted a username and password to access the HMIS Data Warehouse file management system. • HTTPS/SSL encryption ensures security of all user interaction and file transmission. • All PPI information is encrypted through all stages of integration
HMIS Data Upload Interface • Standard File Upload Dialog – Choose File Right From Your PC
STEP 1 • Zip file is uploaded to NYS secure server .Zip File
HMIS Data Upload Interface • Timeliness - User receives instant feedback on status of upload
Step 2 • If Zip content is XML then parse to CSV Data.xml
Step 3 • Move CSV recs to encrypted Raw tables
Step 4 • Validate recs to secure encrypted Staging
Step 5 • Entire validation halts if single threshold is exceeded • Validation can be set to automatically proceed to integration to warehouse upon successful validation or await manual preview (usually done for 1st time upload by COC) • Record level error report can be generated as needed • Review validation results
Step 6 • Purge all existing records for COC • If Refresh indicator on Export file dictates that all existing records be cleared out prior to integration, then all existing warehouse records for that COC will be removed following a successful validation. • Note that if the Refresh indicator does not dictate a purge prior to integration, then a matching algorithm is applied to each record and counts are used to report how many new vs. modified records are integrated into the warehouse for that load
Step 7 • Integrate validated recs to warehouse De-duplication process server HMIS Data Warehouse HMIS Data Stripped of PPI sent to HMIS Data store with unique identifier HMIS Secure Encrypted and Validated Staging Data Is Sent Into De-Duplication Process PPI data store is queried. If there is a match, the unique ID is returned. If not, a new unique ID is generated. Each record is removed from Secure Staging as it is integrated into warehouse Secure PPI Data Store With Encrypted Data New PPI Sent To PPI Data Store
Step 8 • Remove remaining raw encrypted recs • Following a successful integration into the data warehouse, any remaining raw encrypted records for that upload are cleared permanently from the secure staging area • Note: Records are also cleared if the integration is unsuccessful – done after all necessary information is provided to CoC.
Step 9 • Send final upload statistics • Reports can be emailed to all users for a given CoC that have supplied an email address
HMIS Data Upload Feedback • Receive Detailed Feedback of File Import Results • Total Records On File • # of New Individuals Added, Matched, and in Error • # of Services Added and Invalid • Reasons for file rejection
Step 10 • Utilize updated warehouse for reports • Separate warehouse processes can arrange data to enable expedited reporting runs
Next Steps • OTDA will continue to develop the technical and programmatic aspects of the NYS Data Warehouse • OTDA will work with STEHP recipients to begin phase I of the NYS Data Warehouse • OTDA will continue convening the NYS Data Warehouse Workgroup
NYS Data Warehouse Workgroup • The Data Warehouse Workgroup consists of NYS HMIS community invested members • The Data Warehouse Workgroup assists NYS with the implementation of the NYS Data Warehouse project
NYS Data Warehouse Workgroup • Members will consist of: • Grantees under the NYS Solutions to End Homelessness Program (STEHP) • CoC volunteer agencies • HMIS Administrators • OTDA agency representatives
NYS Data Warehouse Workgroup • Key roles include: • identify implementation concerns • provide feedback • identify training needs • provide overall assistance with the HMIS data warehouse project • provide input on project policy as needed