780 likes | 794 Views
This paper explores the major security threats in machine learning and proposes solutions to mitigate them. It also discusses the challenges of maintaining confidentiality in machine learning models and user data.
E N D
SoK: Security and Privacy in Machine Learning Nicolas Papernot∗, Patrick McDaniel∗, Arunesh Sinha†, and Michael P. Wellman† ∗ Pennsylvania State University † University of Michigan {ngp5056,mcdaniel}@cse.psu.edu, {arunesh,wellman}@umich.edu Presented by Jonny Dowdall CSE 914 2 Feb 2019
What is machine learning? Relevant terms A few math proofs Threats What are the major security threats in machine learning? Solutions What can we do about these threats? Machine Learning Objectives
Machine Learning An Over-Simplification
Machine Learning A computer’s ability to infer something based on experience rather than being explicitly programmed. An Over-Simplification
Machine Learning Given some input, produce some output. Supervised Learning
Machine Learning Train by providing explicit input and output pairs. Supervised Learning
Machine Learning Identify patterns in unlabeled data. Unsupervised Learning
Machine Learning No training in the conventional sense. Unsupervised Learning
Machine Learning Given a state, perform an action that leads you to an eventual desired state. Reinforcement Learning
Machine Learning No explicitly correct output for any input. Reinforcement Learning
Machine Learning Train by having an agent interact with an environment. Reinforcement Learning
Machine Learning Reward/penalize agent for reaching desired states. Reinforcement Learning
Machine Learning Typically, input data is represented as a vector (or matrix/tensor) of values. Call these input values features. Features
Machine Learning Some function transforms the input features into target output(s). Parameters
Machine Learning Function is typically a weighted sum of input features. Parameters
Machine Learning Call the weights parameters. Parameters
Machine Learning Start with random weights. Loss function
Machine Learning Grab a sample from the training data and compute some (bogus) output value(s). Loss function
Machine Learning The loss function tells us how far predicted output is from ground-truth output. Loss function
Machine Learning The gradient of the loss function tells us how to change our weights. Gradient
Machine Learning Next time we see that training sample, the loss should be smaller. Training
Machine Learning Rinse and repeat with all training samples until loss is low and weights stop changing. Training
Machine Learning Now, use these weights to predict values for new, unobserved inputs. This is our model. Inference
Confidentiality/Privacy Uncovering ML model itself, which may be confidential intellectual privacy. Uncovering users’ data, which may be private (model input/output). Integrity Exploiting the model to produce outputs that don’t correspond to patterns in training data. Availability Preventing access to an output or action induced by a model output. Threats (CIA)
Confidentiality/Privacy Uncovering ML model itself, which may be confidential intellectual privacy. Uncovering users’ data, which may be private (model input/output). Exploiting the model to produce outputs that don’t correspond to patterns in training data. Preventing access to an output or action induced by a model output. Integrity Threats (CIA)
White-box vs. Black-box Adversaries White-box Black-box • Access to the machine-learning model internals • Can see parameters/architecture • Only able to interact with model • Can provide input and view output
Model extraction • Recover model parameters by testing many input output pairs (Tramer et al). • Requires access to class probabilities.
Membership Attack • Test whether or not a data point is in the training set. • Exploit differences on model’s confidence to identify points that were trained on (Shokri). • Generate synthetic data until model produces an output with very high confidence.
Model Inversion • Fredrikson et al. present the model inversion attack. • Given the output of a model, predict the input. • For a medicine dosage prediction task, they show that given access to the model and auxiliary information about the patient’s stable medicine dosage, they can recover genomic information about the patient.
Analyzes the privacy guarantees of algorithms. Differential Privacy Framework
An algorithm’s output should not differ significantly statistically for two versions of the data differing by only one record (Dwork et al). Differential Privacy Framework
Add randomization to the machine learning pipeline. Solutions?
Randomly add noise to every input while training. Solutions? Local Privacy
E.g. Google Chrome collects user data with a probability q that the data is real and 1-q that the data is random. Resulted in meaningful and privacy-preserving statistics (Erlingsson) Solutions? Local Privacy
It was shown that adding random noise to the loss function during training provides differential privacy (Chaudhuri). Solutions? Perturbing loss
Randomly perturbing gradients before applying parameter updates guarantees even stronger differential privacy (Abadi). Solutions? Perturbing gradients
Randomly perturb output values. Solutions? Noisy output
This method degrades performance. Solutions? Noisy output
“Precisely quantifying the learning algorithm’s sensitivity to training points is necessary to establish differential privacy guarantees. For nonconvex models (e.g., neural nets), current loose bounds on sensitivity require that learning be heavily randomized to protect data—often at the expense of utility.”
Training attack • Poisoning: input/label manipulation • Test distribution is different from training distribution • Model learns to behave in an unintended way • Example • Intentionally label genuine emails as spam
Assumption: Poisoned data points are typically outside the expected input distribution. Solution: Reduce the influence of outliers in training. Solutions?
Use a modified PCA algorithm: • Maximize the variance between training samples while reducing variance of outliers. (Rubinstien et al.) Solutions? Reduce outlier sensitivity
Adding a regularization term to the loss function, which in turn reduces the model sensitivity to outliers (Stempfel and L. Ralaivola). Solutions? Reduce outlier sensitivity
Inference attack • Adversarial example • Exploit weakness in model • Perturb a sample until it is misclassified • E.g. How can we get past malware detection by changing our malware as little as possible?
Inference attack • The more information provided in the output, the easier this is to exploit. • Especially easy when the system is an oracle. • Adversary can issue queries for any chosen input and observe the model output.
Inference attack • Lowd and Meek introduce ACRE learnability, which poses the problem of finding the least cost modification to have a malicious input classified as benign using a polynomial number of queries to the ML oracle.