550 likes | 717 Views
Privacy Framework for RDF Data Mining. Master’s Thesis Project Proposal By: Yotam Aron. Overview . Motivation and Goal Background Proposed Solution and Design Example Conclusion. Motivation. D ata mining continues to become more widespread. Useful for research, public policy, etc.
E N D
Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: YotamAron
Overview • Motivation and Goal • Background • Proposed Solution and Design • Example • Conclusion
Motivation • Data mining continues to become more widespread. • Useful for research, public policy, etc. • Want to maintain privacy of participants in the database. • Little work has been done for privacy for semantic web data.
Previous Work • Anonymization • K-Anonimity1 • Differential Privacy systems: PINQ2, AIRAVAT3. • Drawbacks: • Do not apply to semantic web data. • Do not support SPARQL.
Goal • Develop a system to protect dataset participants’ personal data in SPARQL. • Integrates well with existing SPARQL endpoints. • Relatively easy for the user and the administrator to use.
Background • Rule-based Privacy Policies in AIR • Differential Privacy
Rule-based Privacy Policies in AIR4 • Rules define patterns in a SPARQL query. • If pattern is matched, rule infers compliance or non-compliance of incoming SPARQL query.
AIR Example5 AIR Policy (extract) • AIR will show that the query is non-compliant with Policy4. air:if { :W s:TriplePattern :T . :T log:includes { :X type:F :V }. };air:then [air:description(“type:F was selected in " q:QUERY) ;air:assert { q:QUERY air:non-compliant-with q:Policy4 . } ] . Query SELECT ?s WHERE {?s type:F ?p}
Differential Privacy Overview • Minimize probability of privacy breach. • Maximize statistical accuracy. • Definition requires that given two similar datasets, a function query on those two datasets give similar results with high probability. • Makes no assumptions on the underlying dataset.
Differential Privacy • Definition: We say a randomized computation M provides ɛ-differential privacyif for any two data sets A and B, and any set of possible outputs S ⊆ Range(M), Pr[M(A) ∈ S] ≤ Pr[M(B) ∈ S] × exp(ɛ × |A ⊕ B|).
Differential Privacy in Practice • Each user is given an ɛ value that cannot be exceeded. • Each query qi has some noise value ɛi. In total, the user’s queries must satisfy the property • Noise (usually Laplace), which depends on the aggregate function, is added with variance
Limitations of Differential Privacy • Only statistical data protected. • High variance in data yields poor query results. • Theory not always perfect in practice. • Assume no collusion among users. • Covert channel attacks.6 • What value of ɛ to choose?
Example, No DP SELECT COUNT(Name) WHERE (Age < 25) 2
Example, No DP SELECT COUNT(Name) WHERE (Age < 25) 1 Big difference in answers!!
Example, With DP SELECT COUNT(Name) WHERE (Age < 25) 2 + noise = ~2 (with high probability)
Example, With DP SELECT COUNT(Name) WHERE (Age < 25) 1+ noise = ~2(with high probability) With high probability, records are indistinguishable!
Practical Consequences of DP • An individual’s inclusion in the dataset is not likely a privacy risk. • The answers to the queries can still be useful.
Achieving Differential Privacy in RDF • Current techniques for differential privacy are developed for relational databases. • As a first approximation, reduce triple-store to a relational database. • Improved mechanism as project progresses.
Example of RDF-RDBS Reduction :Person1 foaf:name “Alice”; foaf:member :DIG foaf:age “21” foaf:knows :Person2 :Person3. :Person2 foaf:name “Bob”; foaf:member :DIG; foaf:knows:Person3. :Person3 foaf:name “Charlie”; foaf:age “22”.
Proposed Solution • SPARQL Privacy Insurance Module(SPIM) • Build layer between user and endpoint. • Integrate both AIR and differential privacy. • Integrate credential-checking system. • Modify existing differential privacy framework for use with triple-stores.
Contributions • Complete privacy protection for triplestores. • Differential Privacy sensitivity for SPARQL 1.1 aggregate functions including count, sum, avg, sum, min, and max.
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • TAAC Will: • Verify user has permission to access • Send central module data about user Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM: • Controls order of privacy operations. • Interfaces with the SPARQL endpoint. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • AIR: • Reasoner that uses rule-based policies to check queries for privacy hazards. • Extracts information for differential privacy. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Policy Files: • Contain the rules for AIR. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Differential Privacy Module: • Checks to see for query limits (based off ɛ use. • Applies noise to statistical data. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • User Data: • Contains user ɛ data. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM: • Controls order of privacy operations. • Interfaces with the SPARQL endpoint. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Service Description: • Contains information to be used for the addition of noise. Differential Privacy Module Service Description
Miscellaneous: • Interface to SPARQL Endpoint • Transaction File • Improved Differential Privacy Output • Service Description Generator
Potential Extensions: • Robustness against attacks • Concurrency • Optimization for large systems • Customizable UI • Accountability
Sample Scenario • Triplestoredatamining in biotechnological applications. • Biofirm provides data about hospitals in the US. • Alice is a PhD student at MIT. • Alice would like to query Biofirm’s database for research purposes. She just got permissions yesterday and is logging in for the first time.
Preprocessing • Biofirm installs SPIM, and runs the service description generation code. • May need to create the correct interface. • Makes sure the UI is accessible online.
Sample Compliant Query • Alice would like to know the total number of visits that Boston hospitals received. SELECT (SUM(?s) as ?people) WHERE{ ?h a biofirm:Hospital. ?h biofirm:visits ?s. ?h biofirm:locationgeo:Boston. } Epsilon value: 1.0
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data Differential Privacy Module • Alice enters query into the provided user interface. Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • TAAC insures that biofirm has given Alice access to its triple-store. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Query request arrives at SPIM central module. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface • Policyrunner is called upon to check query for triple patterns that are in violation. • No violations found. • Since this is Alice’s first time, AIR extracts what type of permissions Alice has. User Data Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM creates a profile for Alice. • Gives her an ɛ value (suppose it 2.0). • Stores it in triple store. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM extracts which variables will yield statistical results and will have differential privacy applied. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Differential Privacy module assures that query’s results will not exceed given epsilon value. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • This is Alice’s first time, and her epsilon value is 2.0 and the epsilon for this query is 1.0. Everything looks good. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Query is sent to the endpoint. • Results are received. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Differential privacy module adds noise to appropriate fields, and updates epsilon values. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • SPIM is ready to return the results. Differential Privacy Module Service Description
Policy Files AIR Rule Based Privacy TAAC Credential Checking SPARQL Endpoint SPIM Privacy Module User Interface User Data • Alice receives results. Differential Privacy Module Service Description
Summary • System will combine rule-based privacy with differential privacy. • Develop differential privacy techniques for semantic web data. • Make privacy module client and administrator friendly.