1 / 20

Data Security against Knowledge Loss in Collaborative Corporations

This paper explores the challenges and possible approaches to ensure data security against knowledge discovery in collaborative corporations. Topics include data mining in disease outbreak analysis, privacy concerns in sharing insurance data, and collaborative classifier development.

jscheele
Download Presentation

Data Security against Knowledge Loss in Collaborative Corporations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Security against Knowledge Loss *) by Zbigniew W. Ras University of North Carolina, Charlotte, USA

  2. Data Security against Knowledge Discovery Possible Challenge Problems: • Centers for Disease Control (CDC) use data mining to identify trends and patterns in disease outbreaks, such as understanding and predicting the progression of a flu epidemic. • Insurance companies have considerable data that would be useful, but are unwilling to disclose this due to patient privacy concerns. • Alternative approach is to have insurance companies provide knowledge extracted from their data that can not be traced to individual people, but can be used to identify the trends and patterns of interest to the CDC.

  3. Data Security against Knowledge Discovery Collaborative Corporations • Ford and Firestone shared a problem with a jointly produced product - Ford Explorer with Firestone tires. Ford and Firestone may have been able to use association rule techniques to detect problems earlier. This would have required extensive data sharing. • Factors such as trade secrets and agreements with other manufacturers stand in the way of needed data sharing. • Could we obtain the same results by sharing the knowledge, while still preserving the secrecy of each side's data?

  4. Data Security against Knowledge Discovery Possible Approach (developing joint classifier): • Lindell and Pinkas (CRYPTO'00), proposed a method that enabled two parties to build a decision tree without either party learning anything about the other party's data, except what might be revealed through the final decision tree. • Clifton and Du (SIGMOD'02), proposed a method that enabled two parties to build association rules without either party learning anything about the other party's data. Alternative Approach: • Each site develops a classifier independently, these are used jointly to produce the global classifier. It protects individual entities but it has to be shown that the individual classifiers do not release private information.

  5. Ontology S2 S1 r1 r2 qS qS=[a: c, d, b] S KB KBS

  6. Ontology S2 S1 r1 r2 qS qS=[a, c, d : b] S KBS

  7. Data Security against Knowledge Discovery Problem… Give a strategy for identifying minimal number of cells which additionally have to be hidden at site S(part of DIS) in order to guarantee that hidden attribute in S cannot be reconstructed by Distributed Knowledge Discovery. KB S

  8. Give a strategy for identifying the minimal number of cells in Swhich additionally have to be hidden in order to guarantee that attributeacannot be reconstructed by Distributed Knowledge Discovery. Data Security against Knowledge Discovery Problem… S KB

  9. Give a strategy for identifying the minimal number of cells in Swhich additionally have to be hidden in order to guarantee that attributeacannot be reconstructed by Distributed Knowledge Discovery. Data Security against Knowledge Discovery Problem… S KB

  10. Original site S Reconstructed site S

  11. Disclosure Risk of Confidential Data Research Problem Give a strategy that identifies minimum number of attribute values that need to be additionally hidden from Information System S to guarantee that a hidden attribute cannot be reconstructed by Local & Distributed Chase KDD Lab

  12. Chain of predictions by global and local rules Object x6 Confidential data sal=$500 is hidden Global rule r1 = a2b2  sal=$50,000, we hide b2 additionally Due to a local rule r2 = c3  b2, confidential data is restored KDD Lab

  13. Algorithm SCIKD (bottom-up strategy) KB - knowledge base r1 = [ b1 c1 a1 ], r2 = [ c1 f1 a1 ], D – decision attribute KDD Lab

  14. Algorithm SCIKD (bottom-up strategy) {a1}* = {a1} unmarked {b1}* = {b1} unmarked {c1}* = {a1,b1,c1,d1,e1}  {d1} marked {e1}* = {b1, e1} unmarked {f1}* = {d1,f1}  {d1} marked {g1}* = {g1} unmarked {a1, b1}* = {a1, b1} unmarked {a1, e1}* = {a1, b1, e1} unmarked {a1, g1}* = {a1, g1} unmarked {b1, e1}* = {b1, e1} unmarked {b1, g1}* = {b1, g1, e1} unmarked {e1, g1}* = {a1,b1,c1,d1,e1, g1}  {d1} marked {a1, b1, e1}* = {a1, b1, e1} unmarked /maximal subset/ {a1, b1, g1}* = {a1, b1, g1} unmarked /maximal subset/ {b1, e1, g1}* = {e1, g1}*marked {a1, b1, e1, g1}* = {e1, g1}*marked

  15. Data security versus knowledge loss Database d1 - has to be hidden {a1, b1, e1}* = {a1, b1, e1} unmarked /maximal subset/ {a1, b1, g1}* = {a1, b1, g1} unmarked /maximal subset/ KDD Lab

  16. Data security versus knowledge loss Database D1 Database D2 Rk = {rk,i: i I} set of rules extracted from Dk and [ck,i, sk,i] denotes the confidence and support of rule ri for all i  Ik, k=1,2 With Rk we associate the number K(Rk) = { ck,isk,i: i  Ik}. D1 contains more hidden knowledge than D2, if K(R1)  K(R2). KDD Lab

  17. Objective Interestingness Basic Measures for : Domain: card[] Support or Strength: card[  ] Confidence or Certainty Factor: card[]/card[] Coverage Factor: card[]/card[] Leverage: card[] – card[]*card[] Lift: card[]/[card[]*card[]]

  18. Data security versus knowledge loss Database D1 Database D2 Rk = {rk,i: i I} set of rules extracted from Dk and [ck,i, sk,i] denotes the coverage factor and support of rule ri for all i  Ik, k=1,2 With Rk we associate the number K(Rk) = { ck,isk,i : i  Ik}. D1 contains more hidden knowledge than D2, if K(R1)  K(R2). KDD Lab

  19. Data security versus knowledge loss Database D1 Database D2 Action rule r: [(b1, v1→ w1)  (b2, v2→ w2)  … ( bp, vp→ wp)](x)  (d, k1→ k2)(x) The cost of r in D:costD(r) = {D(vi , wi) : 1  i  p} Action rule r is feasible in D, if costD(r) < D(k1, k2). Rk = {rk,i: i I} set of feasible action rules extracted from Dk and [ck,i, sk,i] denotes the confidence and support of action rule ri for all i  Ik, k=1,2 With Rk we associate the number K(Rk) = { ck,isk,i [1/costDk(rk,j)]: i  Ik}. D1 contains more hidden knowledge than D2, if K(R1)  K(R2). KDD Lab

  20. Questions? Thank You KDD Lab

More Related