130 likes | 147 Views
This project focuses on storing RDF patient records securely in untrusted datastores for efficient retrieval by trusted users, addressing data privacy and integration challenges. Implementation details include securing data, hashing techniques, and wildcard queries. Future work includes processing voluminous RDF data and benchmarking.
E N D
Storing Semantic Web Data in an Untrusted Datastore
Scenario Health Service organizations need to store patient records. Two important considerations: 1. Not all clinics and hospitals can afford dedicated hardware and personnel for record keeping. 2. Data integration is a major need for providing effective health services
Initial System Architecture DataStore Data Source Hospital Data Aggregation Module DB Data Source Clinic Data Query Module Client (Hospital) Client (HMO)
Type of Data For this particular project, we are interested in RDF (Resource Description Framework). RDF data model is based on a simple binary graph concept: Node Node Subject Predicate Object
RDF Instance Patient Record of John Smith <rdf:Description rdf:about="http://.../JohnSmith"> <info:age> 25 </info:age> </rdf:Description> http://.../JohnSmith 25 info:age
Objectives: Privacy of Stored Data Efficient text and number based data retrieval Only trusted users can query Definition - Trusted users: A person X is trusted iff she has the authorized privilege from the original data source to access the data.
Revised System Architecture DataStore Secure CoProcessor Data Source Hospital Data Aggregation Module Enc. DB Data Query Module Client (HMO)
Implementation Details Database Schema (fixed 5 attribute tables) EncT Sub Pred Obj Role EncT = EK(Sub,Pred,Obj)
Implementation Details How to store integer data: 0 100 200 300 400 500 Divide the domain of the attribute into buckets. Q: Fixed sized bucket or expanding bucket? A: Fixed sized – harder to implement, more security than expanding.
Implementation Details How to store Character data: Hash(“http://.../JohnSmith) = v Brute-Force Attack: Hash(some_string) = = v ? If yes, some_string = http://.../JohnSmith Solution: 1) Use Keyed Hash 2) Partition the range of hash function into buckets 0 100 200 300 400 500
Challenges: How to perform %LIKE% (wildcard) queries on character data Hash(http://.../JohnSmith) ~ Hash(http://.../John?????) SELECT ?x WHERE { ?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John????? " }
Potential Solution Storing Character Data for %LIKE% queries String S = h t t p : / / . . ./ J o h n S m i t h Total 26 characters in the alphabet. Define: Function Ind = index into the string FunctionAlpPos = index into the English alphabet Enc(S) = (x – Ind(h)AlpPos(h)) (x – Ind(t)AlpPos(t)…
Future Work Process full-fledged RDF data. Benchmark with voluminous data. Transport the application into SCP.