1 / 15

Privacy Preserving Data Mining

Privacy Preserving Data Mining. Yehuda Lindell & Benny Pinkas. Summary . Objective Various components / tools needed Algorithm. Objective . Perform Data-mining on union of two private databases Data stays private i.e. no party learns anything but output. Assumptions.

omer
Download Presentation

Privacy Preserving Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas

  2. Summary • Objective • Various components / tools needed • Algorithm

  3. Objective • Perform Data-mining on union of two private databases • Data stays private i.e. no party learns anything but output

  4. Assumptions • Large Databases – Generic Solutions not possible • Semi-Honest Parties

  5. Classification by Decision Tree Learning <attribute,value> Attributes Class Attribute Want to Predict Class, using only non-class attributes Transaction

  6. Decision Tree • Rooted tree with nodes/edges • Internal Nodes => Attributes • Edges leaving nodes => Possible values • Leaves => Expected Class for transaction • Traverse tree using known attributes • Predict class given leaf node’s value

  7. Constructing Tree • Top-down • At each level – find attribute that “best” classifies transactions => gives least overhead • Best => Attribute that minimizes entropy (maximizes information gain) • Entropy = -xlnx • Entropy of class = 0

  8. Entropy calcluations • Entropy – H(T) = sum (-x ln x ) • Hc(T) => Info needed to ID class of transaction T • X = set of transactions for each class • Sum over all possible classes • Hc(T | A) => Info needed to ID class of transaction T, Given value v of attribute A • X = transactions with value = v for attribute A • Gain = Hc(T) – Hc(T | A)

  9. Given only x1 and f1(x1,y), function S1 exists s.t.: P2 provides input x1 to P1 P2 can compute corresponding view of P1’s DB (desired <att,value> pairs) Private Computation S1 View Party 2 f1(x1,y) x1 f1 Party 1

  10. Oblivious Evaluation • What if in previous example: Party 2 does not want Party 1 to know what input (x1) it is providing? • Oblivious Evaluation: Receiver obtains P(x) without learning anything else about polynomial P. Sender learns nothing about x.

  11. ri = receiver’s random number Ri = sender’s random number X = input from rcvr Sender Receiver s (secret key) (ari, as*rj ax) Oblivious Evaluation (2) –Simplified Version (aRi, as*R aP(x) asri) Divide 2nd element by 1st element raised to power s to get P(x) a P(x) = (aRi, as*R aP(x) asri) / (aRi * ari)s

  12. Algorithm • Step 1 - Each party computes ID3 – decision tree learning – (O(# attributes)) • Step 2 - Combine results using cryptographic protocols like oblivious evaluation - (O(log(#transactions))) • Result - Each party gains results of data-mining without learning more than necessary

  13. Algorithm (2)Finding “best” attribute is hardest part • Each party computes their “share” of entropy • For each attribute, combine values from each party • Results in private computation of Entropy (-xlnx) • Choose attribute that minimizes entropy • Provides maximum information gain • Ensures most efficient tree with least overhead • Use oblivious Evaluation

  14. Discussion of Algorithm • Efficient: • Large Databases accommodated: Algorithm relies on number of possible values for attributes – NOT number of transactions in database • Private: • Each step depends on local computation and private protocol • Uses techniques like oblivious transfer / evaluation to exchange information • Paper proves individual steps are private, AND can predict control flow between steps ONLY based on input/output – so also private

  15. Discussion of Algorithm (2) • Approximate ID3 used instead of actual ID3 – shown to be as secure and provide same information

More Related