220 likes | 241 Views
This presentation provides an overview of the scoring approach developed by methodologists and subject matter experts to identify 'safe' tables of frequency counts in administrative crime surveys. It discusses the variables available in the Uniform Crime Reporting Incident-based Survey (UCR) and Homicide Survey, as well as the disclosure issues and control rules for tabular administrative justice data.
E N D
Disclosure Control for Tables of Frequency Counts using Administrative Justice Data Sarah Franklin October 30th, 2013
Overview of presentation • In 2013, the Canadian Centre for Justice Statistics (CCJS) placed two administrative crime surveys in the Research Data Centres (RDCs) • Methodologists and subject matter experts developed a scoring approach for tables of frequency counts to identify ‘safe’ tables • Each variable in a dataset is assigned a sensitivity score. A table’s overall score is the sum of the variable scores. If the score is below a given threshold, the table is safe.
Uniform Crime Reporting Incident-based Survey (UCR) and the Homicide Survey • Administrative datasets • Mandatory reporting by all police services • Criminal incidents substantiated by the police • UCR is a sample of crime data • not all crime comes to the attention of the police • over 2 million incidents of crime annually • Homicide Survey data more sensitive than UCR • All homicides; 543 homicides in 2012 • Information on incident, victim(s), accused(s)
UCR , Homicide variables available to researchers • most serious violation for the incident of crime (e.g., homicide, robbery, mischief) • geography (region, province, CMA) • location (e.g., residential home, convenience store) • weapon causing injury (e.g., handgun, knife) • relationship between victim and accused • age and sex of victim and/or accused • clearance status (accused charged vs cleared otherwise)
Publicly available STC police-reported crime data UCR and homicide data available to all Canadians: • CANSIM tables (very aggregate) • Tables and graphs appearing in Juristat articles • Custom tabulations upon request
2009 RDC Pilot - Homicide Survey • Homicide Survey was available through RDCs • Results positive, 4 proposals submitted and 3 research reports completed • Researchers commented on the ease of use of data file, documentation and wealth of data/information • Researchers noted that vetting of data tables too long • RDC analysts noted that data disclosure rules difficult to implement and required additional work
Disclosure Issues : What are we concerned about? • Statistics Act, paragraph 17(1)(b):No person […] shall disclose […] any information obtained under this Act in such a manner that it is possible from the disclosure to relate the particulars obtained from any individual return to any identifiable individual person, business or organization. • Main disclosure issues: • Identity disclosure: can identify an individual • Attribute disclosure: learn something new • Group attribute disclosure: learn something about a group • Residual disclosure: disclosure by combining results
RDC disclosure control rules for tabular administrative justice data • Scoring approach developed by the Institut de la Statistique du Québec and is used by all STC administrative datasets in the RDCs • assign a sensitivity score to each variable • table’s score = sum of variables’ scores • if table score greater than a threshold value, cannot release table • Go back and use more aggregated variables with lower scores Or • perform controlled rounding
Identified variables to be excluded due to: unique identifiers name of victim/accused, date of birth of victim/ accused, fingerprint of accused, incident file identifier data quality issues aboriginal variable, firearm information (registered, licensed) too sensitive homicide victim was pregnant, blood alcohol level of homicide victim, person accused of homicide has suspected mental or developmental disorder Reviewed all variables to appear on the RDC files
Incident clearance status (UCR, Homicide Survey) suicide of accused → cleared otherwise Most serious violation aggregations Homicide Survey 1st degree murder, 2nd degree murder → murder manslaughter, infanticide → other homicides UCR sexual violations against children → other sexual assaults Aggregated sensitive codes of variables
0 = not sensitive region=national; sex of victim/accused; vehicle type; target vehicle; motor vehicle recovered; fraud type; property stolen; location of incident; attempted vs completed violation; most serious weapon status 1-7 = sensitive but can be used in a table 8 = sensitive, cannot appear in a table police service id, exact date of incident Table threshold: ≤7 pass; ≥ 8 fail Scored all UCR variables to appear on the RDC files
Variables deemed sensitive (score 1-7) geography (region, province, CMA) age of victim/accused (aggregated, detailed) most serious violation (aggregated) most serious weapon (aggregated, detailed) clearance status (aggregated, detailed) level of injury (aggregate, detailed) relationship of victim and accused (detailed, aggregated) Sensitive variables on the UCR, Homicide surveys
Aggregated relationship between victim and accused (score=3) Homicide victim was killed by: • Family – spouse • Family – parent • Family – other • Other intimate relationship • Casual acquaintance • Criminal relationship • Stranger • Other • Unknown, n/a
Factors considered when scoring a variable • Scores, thresholds consistent across surveys • Maximum number of dimensions for RDC tables • 8 dimensions for UCR; 3 for Homicide • Homicide data: single year vs 10 year data • Wanted scores to work for all CCJS tables: • UCR scores: passed all CANSIM and Juristat tables • Homicide scores: passed all CANSIM tables but not all Juristat tables
Factors considered when scoring a variable • Principle behind scoring approach: • table is safe as long as sensitive characteristics cannot be attributed to a person or a group • Scrutinized tables with scores < 8 for sensitive characteristics revealed through: Identity disclosure • Examined cells with counts of 1 or 2 Attribute disclosure • Examined full cells, zero cells
extract of UCR table with score=7 Sexual violation incidents, victim=female age 25-34, accused=male, Canada, 2011
Status of UCR, Homicide RDC pilots UCR: • Crime data for 2007-2011 available in RDCs • 7 research proposals submitted and accepted • Disclosure control vetting committee for the pilot • ensure disclosure control rules applied correctly • evaluate/fine tune disclosure control approaches Homicide: • Homicide data for 1961-2011 available in RDCs
Pros and cons of scoring • Pros • easy for RDC researchers and CCJS to apply rules • rules are consistently applied • no distortion of data • Cons • determining scores and thresholds is time-consuming • difficult to determine scores if lots of variables or variables have lots of categories • for Homicide, the pass/fail scoring approach for RDCs is very restrictive • not immune to residual disclosure
Conclusion The scoring approach for frequency counts works well: • for crime-reported data and effectively mimics subject matter experts’ judgement when vetting • for census administrative data with an extensive history of published tables that set the standard for releasing tables • when there are a manageable number of variables and categories within variables Once developed, the scoring approach is easy to apply
For more information, please contact / Pour plus de renseignements, veuillez contacter: Sarah Franklin Sarah.Franklin@statcan.gc.ca