1 / 32

ONN the use of Neural Networks for Data Privacy

Instituto de Investigación en Inteligencia Artificial Consejo Superior de Investigaciones Científicas. ONN the use of Neural Networks for Data Privacy. Jordi Pont-Tuset Pau Medrano Gracia Jordi Nin Josep Lluís Larriba Pey Victor Muntés i Mulero. Presentation Schema. Motivation

min
Download Presentation

ONN the use of Neural Networks for Data Privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instituto de Investigación en Inteligencia Artificial Consejo Superior de Investigaciones Científicas ONN the use of Neural Networksfor Data Privacy Jordi Pont-Tuset Pau Medrano Gracia Jordi Nin Josep Lluís Larriba Pey Victor Muntés i Mulero

  2. Presentation Schema • Motivation • Basic Concepts • Ordered Neural Networks (ONN) • Experimental Results • Conclusions and Future Work

  3. Our Scenario: attribute classification • Classification of attributes • Identifiers (ID) • Quasi-identifiers • Confidential (C) • Non-Confidential (NC) ID ID C NC NC

  4. Data Privacy and Anomymization Original Data Released Data External Data Source Record Linkage NC NC ID Confidential data disclosure!!! C ID

  5. Data Privacy and Anomymization Anonymization Process Goal: Ensure protection while preserving statistical usefulness External Data Source Record Linkage NC’ NC NC Trade-Off: Accuracy vs Privacy ? ID Privacy in Statistical Database (PSD) Privacy Preserving Data Mining (PPDM) C ID

  6. Presentation Schema • Motivation • Basic Concepts • Ordered Neural Networks (ONN) • Experimental Results • Conclusions and Future Work

  7. Best Ranked Protection Methods [DT01] • Rank Swapping (RS-p) [Moore96] • Sorts values in each attribute and swaps them randomly with a restricted range of size p • Microaggregation (MIC-vm-k) [DM02] • Builds small clusters from v variables of at least k elements • Then, it replaces each value by the centroid of the cluster to which it belongs [DT01] Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier Science (2001) 111-133 [Moore96]Moore, R.: Controlled data swapping techniques for masking public use microdata sets. U.S. Bureau of the Census (Unpublished manuscript) (1996) [DM02] Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. on KDE 14 (2002) 189-201

  8. Our contribution... • We propose a new perturvative protection method for numerical data based on the use of neural networks • Basic idea: learning a pseudo-identity function (quasi-learning ANNs) • Anonymizing numerical data sets

  9. Artificial Neural Networks • Each neuron weights its inputs and applies an activation function: • For our purpose, we assume ANNs without feedback connections and layer-bypassing (Sigmoid)

  10. Backpropagation Algorithm • Allows the ANN to learn from a predefined set of input-output example pairs • It adjusts weights in the ANN iteratively • In each iteration we calculate the error in the output layer using a sum of the squared difference • Weights are updated using an iterative steepest descent method

  11. Presentation Schema • Motivation • Basic Concepts • Ordered Neural Networks (ONN) • Experimental Results • Conclusions and FutureWork

  12. Ordered Neural Networks (ONN) • Key idea: innacurately learning the original data set, using ANNs, in order to reproduce a similar one: • Similar enough to preserve the properties of the original data set • Different enough not to reveal the original confidential values

  13. Ordered Neural Networks (ONN) • How can we learn the original data set? a Try to learn the original data set with a single neural network TOO COMPLEX n

  14. Ordered Neural Networks (ONN) • How can we learn the original data set? a The pattern to be learnt may still be too complex a n We could sort each attribute independently in order to simplify the learning process

  15. Ordered Neural Networks (ONN) • How can we learn the original data set? a The concept of tuple is lost! a Reordering of each attribute separately n Why are we so keen on preserving the attribute semantics?

  16. Ordered Neural Networks (ONN) • Different approach: • We ignore the attribute semantics mixing all the values in the database • We sort them to make the learning process easier • We partition the values into several blocks in order to simplify the learning process

  17. ONN General Schema

  18. Vectorization • ONN ignores the attribute semantics to reduce the learning process cost

  19. Sorting • Objective: simplifying the learning process and reduce learning time

  20. Partitioning • The size of the data set may be very large • A single ANN would make the learning process very difficult • ONN will use a different ANN for each partition k

  21. Normalize • In order to make the learning process possible, it is necessary to normalize input data We normalize the values for their images to fit in the range where the slope of the activation function is rellevant [FS91] [FS91] Freeman, J.A., Skapura, D.M. In: Neural Networks: Algorithms, Applications and Programming Techniques. Addison-Wesley Publishing Company (1991) 1-106

  22. Learning Step • Given P partitions, we have one ANN per partition • Each ANN is fed with values coming from the P partitions in order to add noise k a p1 d p1 a b c ? = a a’ g Backpropagation p2 d e f P pP p3 g h i

  23. Learning Step • Given P partitions, we have one ANN per partition • Each ANN is fed with values coming from the P partitions in order to add noise k c p1 f p1 a b c ? = c c’ i p2 d e f P pP p3 g h i

  24. Protection Step • First, we propagate the original data set through the trained ANNs • Finally, we derandomized the generated values k a p1 d p1 a b c a’ g De-normalization p2 d e f P pP p3 g h i

  25. Presentation Schema • Motivation • Basic Concepts • Ordered Neural Networks (ONN) • Experimental Results • Conclusions and FutureWork

  26. Experiments Setup • Data used in CASC Project (http://neon.vb.cbs.nl/casc) • Data from US Census Bureau: • 1080 tuples x13 attributes =14040 values to be protected • We compare our algorithm with the best 5 parameterizations presented in the literature for: • Rank Swapping • Microaggregation • ONN is parameterized adhoc

  27. Experiments Setup • ONN parameterization: • P: Number of Partitions • B: Normalization Range Size • E: Learning Rate Parameter • C: Activation Function Slope Parameter • H: Number of neurons in the hidden layer

  28. Score: Protection Methods Evaluation • We need a protection quality score that measures: • The difficulty for an intruder to reveal the original data • The information loss in the protected data set Score = 0.5 IL + 0.5 DR IL = 100(0.2 IL1 + 0.2 IL2 + 0.2 IL3 + 0.2 IL4 + 0.2 IL5) DR = 0.5 DLD + 0.5 ID IL1 = mean of absolute error DLD = number of links using DBRL IL2 = mean variation of average ID = protected values near orginal IL3 = mean variation of variance IL4 = mean variation of covariancie IL5 = mean variation of correlation

  29. Results 7 variables 13 variables

  30. Presentation Schema • Motivation • Basic Concepts • Ordered Neural Networks (ONN) • Experimental Results • Conclusions and FutureWork

  31. Conclusions & Future Work • The use of ANNs combined with some preprocessing techniques is promising for protection methods • In our experiments ONN is able to improve the protection quality of the best ranked protection methods in the literature • As future work, we would like to establish a set of criteria to automatically tune the parameters of ONN

  32. Any questions? Contact e-mail: vmuntes@ac.upc.edu DAMA Group Web Site: http://www.dama.upc.edu

More Related