480 likes | 644 Views
Privacy, Confidentiality and Data Security (PCDS) in HSR: Best Practices. Alan M. Zaslavsky Department of Health Care Policy Harvard Medical School. Privacy, Confidentiality and Data Security (PCDS). Importance and sensitivity of PCDS Basic concepts of disclosure risk
E N D
Privacy, Confidentiality and Data Security (PCDS) in HSR: Best Practices Alan M. Zaslavsky Department of Health Care Policy Harvard Medical School
Privacy, Confidentiality and Data Security (PCDS) • Importance and sensitivity of PCDS • Basic concepts of disclosure risk • Deidentification and reidentification • Disclosure control • Institutional and regulatory frameworks • Common Rule, HIPAA, Data use agreements • File organization, data flow and computer security
This presentation offered in our department at least annually • Required attendance by all programmers, students, fellow, project managers with data responsibilities • Presented to faculty at meetings • Shortened version for lower-level staff • Tracking of attendance by personnel manager • Sanction is loss of computer account • Seek to fully involve project management in PCDS issues
Definitions • Privacy: the right of an individual to keep information about herself or himself from others. • Confidentiality: safeguarding, by a recipient, of information about another individual • Disclosure: release (direct or indirect) of information about an identifiable individual
Definitions (continued) • Data security: protections on data to prevent unauthorized access or destruction • Informed consent: a person's agreement to allow person data to be provided for research and statistical purposes • Research: study producing generalizable knowledge • excludes internal operations, quality assurance
Importance of PCDS Nexus for balance between • benefits of information to society • possible harms of information use to individuals in conducting the research enterprise. One person’s “invasion of privacy” is another’s “essential use of information.”
Inherent conflicts • Law enforcement / legal process • General access to research data • Freedom of Information Act (FOIA) • Commercial use / beneficial products & services? • Prevention of harm • Need to save data for verification, revision
Costs of violations of PCDS • Damage to subjects • Material • Psychological/social • Damage to the research enterprise • Exposure to legal/administrative sanctions for researchers and data providers and their institutions
Direct and indirect identifiers Key: variable or combination of variables, the value for which results in a record being unique in the target and population data Direct identifier: Information that is uniquely associated with a person. Indirect identifier: Data which, in combination are uniquely associated with a person. Information which facilitates such associations.
Direct Identifiers (keys) . • Name • Telephone number • Street /e-mail address • Unique features (SSN, Medicare ID, Health plan, Medical record #, Certificate/License, voice-finger prints, photos)
Re-identification by Matching De-identification Original target file Nameabcdefghijkl Anonymized target file abcdefghijkl Re-identification key Anonymized target file abcdefghijkl Population file abcdefmnop Name
Data in Combination Variables might be identifying in combination that are not identifying by themselves • Month, day and year of birth • Gender • Zip code
Example of reidentification using three variables Variables % Unique in Maine state voter registration list Birthdate alone 12 Birthdate + gender 29 Birthdate + Zip (5) 69 Birthdate + Zip (9) 97 Sweeney, 1997
Population (External) Data Bases • Voter Registration Lists • Research files • State & Federal Files • Survey files with added administrative data • Information Vendor Files • The unknown: what might an “intruder” know about some or all members of your population?
Identifiable population groups (entire data set highly identifiable) • Rare diseases • Sample drawn from a particular area
Unique/unusual cases: rare values • 110 year-old woman • Man who weighs 350 pounds • Income > $100 million • Verbatim text containing identifying details
Unique/unusual cases: rare combinations of values • 16 year-old widow • 20 year-old Ph.D. • Asian race in rural mid-west • Female/Asian Executive • 60-year old male married to 30 year-old female • Cause of death = prostate cancer for 30 year-old male
Micro Data Protection 1 • Remove direct identifiers • Restrict geographical detail • Code to remove detail – larger categories, top/bottom coding • Remove, code or edit verbatim comments • Case suppression • Variable suppression
Micro Data Protection 2 • Special handling (e.g. coding) of data from external sources (esp. area data) • Statistical modification (“noise”) • Sample/subsample • Eliminate link between persons and establishments
Tabular data • Information on individuals deduced from unique cases in tables • Reidentification usually related to small groups, small cell counts • Rounding, cell suppression, complementary suppression might be required
Technical issues • Highly technical issues in both microdata and tabular nondisclosure • Intersection of stats, math, computer science • Software for detecting disclosure risk • RTI, m-argus, etc. • Nontechnical variables • Resources and intentions of “intruder”
Disclosure control in released data • Affect us as producers and consumers of data • Masking • Affects analyses if performed on data we receive • Complex to implement on our releases • Limited access data centers
Restricted access data centers • Alternative to fully-deidentified public-use microdata files • Data are held at restricted center • Limited set of researchers submit analyses through intermediaries • Output reviewed for nondisclosure • Only feasible for organizations with substantial, persistent resources • e.g. NCHS, Census
Institutional and regulatory frameworks for PCDS • Common Rule / IRB • HIPAA • Data Use Agreements • State regulations
Common Rule • Governs protection of research subjects in all Federally-funded research • IRB evaluates adherence by researcher • Institutional sanctions for violations • Many institutions extend to all research • Objective: protection of subject from harm • In HSR, often there is no intervention • Typically, commitment to minimal risk of disclosure
Common Rule (continued) • Informed consent • generally required in primary data-collection • appropriate information about use of data • might be waived where impractical to obtain (e.g. intrusive), if risks minimal & rights not injured • Exemption from (full) review • No intervention that could harm subject • Secondary data with no identifiable data • Requires determination by IRB (but less tedious)
Implications for researchers • Commitments are made • To subjects: consent language • To IRB: safeguards promised in IRB application • To funding agencies: in grant application • May involve • Protection of data while used • Limits on duration of use
HIPAA Health Insurance Portability and Accountability Act • Specific rules for electronic transmission of health data • Primarily for efficiency but includes Privacy Rule • Obligations imposed on health care providers • Includes direct providers, health plans and insurers • Research data distinguished from health plan / provider operational functions • Researchers must respect these obligations
Who is Covered by HIPAA? • A health care provider who transmits health information in electronic transactions Example: a physician or hospital who electronically bills for services • A health plan • A health care clearinghouse
HIPAA implications for research • Practical implications of HIPAA • What data providers will be looking for • Need to work around restrictions on content • More elaborate paths for data control • HIPAA provisions for releasing data for research • fully deidentified • limited use dataset • waiver
Option 1: De-identified Health Information • Completely de-identified information (18 elements removed) and no knowledge that remaining information can identify the individual. OR • Statistically “de-identified” information where a qualified statistician determines that there is a “very small risk ” that the information could be used to identify the individual and documents the methods and analysis.
Removal of These Identifiers Makes Information De-identified • Certificate/license #s • VIN and Serial #s, license plate #s • Device identifiers, serial #s • Web URLs • IP address #s • Biometric identifiers (finger prints) • Full face, comparable photo images • Unique identifying #s • Names • Geographic info (including city and ZIP) • Elements of dates (except year) • Telephone #s • Fax #s • E-mail address • Social Security # • Medical record, prescription #s • Health plan beneficiary #s • Account #s If the covered entity has actual knowledge that remaining information can be used to identify the individual, the information is considered individually identifiable, and therefore, generally is PHI.
Option 2: Limited Data Set with Data Use Agreement • The Privacy Rule permits limited types of identifiers to be released for research with health information (referred to as a Limited Data Set). • Limited Data Sets can only be used and released in accordance with a DataUseAgreementbetween the covered entity and the recipient.
Limited Data Set w/ Data Use Agreement • The Limited Data Set CAN contain • Elements of Dates • City and ZIP • Other unique identifiers, characteristics and codes not previously listed as direct identifiers (previous slide) • CANNOT contain other direct identifiers (among the 18)
Option 3: Waiver of Authorization May use or disclose personal inforamtion for research if IRB or Privacy Board determines that : • research involves no more than minimal risk • research does not adversely affect the “ rights and welfare” of subjects • the research could not be done without a waiver
Data Use Agreements (DUA) • Between data provider and data user • Restrictions: • access by specific personnel • use for a specific reason • defined duration of retention • Implements commitments made by data provider
State regulations • Variable from state to state • Some are relatively restrictive • requires negotiation with data provider
Iron-clad protection? • Certificate of Confidentiality • Issued by DHHS • Protects data against legal process • Typically for sensitive topics, e.g. illicit drugs • O, Canada!
Data security in complex projects • Multisite projects: special needs • Careful mapping of data flow and access • Minimal identifying information at each stage • Particular care in technical aspects of security
File management for PCDS • General practices of good management • Practices necessary to maintain project continuity • Well-structured directory organization and naming • Include documentation with files • Separate project data from personal directories • Separate datasets from programs • Separate raw data from analytic datasets
We typically follow this presentation with a 15-minute tutorial on good practices for data and file management
Backups • Conflict of privacy/confidentiality (restrict) and data security (maintain) • Basic backup schedule (undeletable) • All Unix files: 4 month retention • PC files: 2 month retention • Project-specific backup: by request • Only possible if material is properly organized • Permanent media, physical security
The backup policy described here was adopted after several months of faculty discussion • Computer system managers wanted longer retention • Faculty concerned about unexpected discovery of material intended to be deleted • Conflicts of DUA requirements with rules regarding retention of data for verification, revision of manuscripts, etc.
General computer security • Proper use of computer accounts, only by authorized individuals • Secure connections for outside access • Remote users • Home or “on road” access via Internet • Applications can be “tunneled” securely • Good practices with passwords • Maintain file permissions to restrict access to authorized users
We follow this up with a training on mechanics of computer security • Permissions, file organization, etc. • More or less fine-grained tools for protection of various files • IT staff included in training • Responsible for implementing security and data retention policies for various project datasets • Teach methods for both Unix and Windows sides of our system
Conclusions • Know your data • Be prepared to accommodate restrictions required by data providers • Maintain general security • Seek guidance for tough situations!