170 likes | 240 Views
Anti-discrimination and privacy protection in released datasets. Sara Hajian Josep Domingo- Ferrer. Data mining. There are negative social perceptions about data mining, among which potential Privacy invasion Potential discrimination. Discrmination.
E N D
Anti-discrimination and privacy protection in released datasets Sara Hajian Josep Domingo-Ferrer
Data mining • There are negative social perceptions about data mining, among which potential • Privacy invasion • Potential discrimination
Discrmination • Discrimination is unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit.
Discrimination • Example: U.S. federal laws prohibit discrimination on the basis of: • Race , Color, Religion, Nationality, Sex, Marital status, Age, Pregnancy • In a number of settings: • Credit/insurance scoring • Sale, rental, and financing of housing • Personnel selection and wage • Access to public accommodations, education, nursing homes, adoptions, and health care.
Discrimination • Discrimination can be either direct or indirect: • Direct discrimination occurs when decisions are made based on sensitive attributes. • Indirect discrimination occurs when decisions are made based on non-sensitive attributes which are strongly correlated with biased sensitive ones.
Discrimination in Data mining • Automated data collection and Data mining techniques such as classification rule mining have paved the way to making automated decisions: • loan granting/denial • insurance premium computation • Personnel selection and wage
Discrimination in Data mining • If the training datasets are biased in what regards discriminatory attributes like gender, race, religion, discriminatory decisions may ensue. • Anti-discrimination techniques have been introduced in data mining • Discrimination discovery • Discrimination prevention
Discrimination in Data mining • Discrimination discovery • Consists of supporting the discovery of discriminatory decisions hidden, either directly or indirectly, in a dataset of historical decision records.
Discrimination Discovery • Different measures of discrimination power of the mined decision rules can be defined, according to the provision of different anti-discrimination regulations. • Extended lift (elift) • Selection lift (slift)
Discrimination in Data mining • Discrimination prevention • Consists of inducing patterns that do not lead to discriminatory decisions even if trained from a dataset containing them.
Discrimination Prevention • How can we train an unbiased classifier when the training data is biased? • As for privacy, the challenge is to find an optimal trade-off between (measurable) protection against unfair discrimination, and (measurable) utility of the data/models for data mining.
Discrimination Prevention • Methods: • Transform the source data • Modify the data mining methods • Modifying discriminatory models
The framework • The framework for discrimination prevention can be described in terms of two phases: • Discrimination Measurement • Data Transformation
Data transformation • The purpose is transform the original data DB in such a way to remove direct and/or indirect discriminatory biases, with minimum impact • on the data and • on legitimate decision rules, • so that no unfair decision rule can be mined from the transformed data.
Data transformation • As part of this effort, the metrics should be developed that specify • which records should be changed, • how many records should be changed • and how those records should be changed during data transformation.
Utility measures • Measuring direct discrimination removal • Measuring indirect discrimination removal • Measuring Data Quality • Misses Cost (MC) • Ghost Cost (GC)