280 likes | 421 Views
A Framework for Developing Privacy Middleware for Cloud Data Services. Mamadou H. Diallo. Outline. Overview/Motivation Approach: A framework for developing privacy middleware Abstract Service Model Privacy Middleware Architecture Data Protection Model Implementation
E N D
A Framework for Developing Privacy Middleware for Cloud Data Services Mamadou H. Diallo
Outline • Overview/Motivation • Approach: A framework for developing privacy middleware • Abstract Service Model • Privacy Middleware Architecture • Data Protection Model • Implementation • Based on proxy – adaptation of Sahi • Web application: Google Calendar • Google Calendar Service Model • Data protection: cryptographic algorithms • Implementation Status • Implemented Features • Remaining Features
Overview/Motivation • Increase of web based data services • Some Benefits: improved service, accessibility, availability, low cost, etc… • Examples: Google calendar, Microsoft Live Mesh, Yahoo Briefcase, etc… • Privacy issues • Outsider attacks – (Internet hackers) • Insider attacks – (non honest employees) • Lack of support for privacy enforcement from web applications • Current approaches • Assumption: cooperative servers • Algorithms and protocols – supported by servers • Drawbacks: web service providers not willing to cooperate • Proposed approach: privacy middleware • Assumption: un-cooperative servers • Techniques: encryptions • Advantages: address insider attacks, policy-based, • Challenges: • Service abstraction, Service adaptation • Query processing – privacy enforcement • Sharing - keys distribution and revocation • Support for other servers
Approach: A Framework for Privacy Middleware • Standard web application architecture • Three logical layers • Client layer – implemented in a browser • Presentation and business logic layer – implemented in a web server • Data layer – implemented as a database
Approach: A Framework Privacy Middleware • New logical layer: privacy enforcement layer • Implemented in a privacy middleware • Design and implementation - based on proxy technology
Abstract Service Model: Data Model • Data modeled as objects • Object: O = {(A1,V1), (A2,V2), …, (An,Vn)}, where (Ai,Vi) an attribute/value pair, n the total number of pairs • Granularity of objects: depend on data types • Event-based: unit = event • File-based: unit = file • Data categories • Structured: examples – events in calendar, database entries • Unstructured: examples – text documents, video files, audio files • Data types • Ordered data: examples – dates, numerical data • Non ordered data: examples - text document, presentation document • Other data: categorical data (list of choices), boolean data (YES/NO)
Abstract Service Model: Operations • Operations modeled as functions • Function: inputs, processing, outputs • Create/store and modify objects • Inputs: object, privacy policies • Processing: encryption, tagging • Outputs: encrypted object with tags • Fetch/retrieve objects • Inputs: HTML pages with encrypted data • Processing: decryption, un-tagging • Outputs: HTML pages with no encrypted data • Query objects • Inputs: query parameters • Processing: encryption • Outputs: encrypted parameters • Share objects • Inputs: object ID, sharing policies • Processing: encryption • Outputs: encrypted data (object ID, keys and metadata for decrypting the object)
Data Protection Model • Approaches • Based on cryptographic techniques • Encryption/decryption mechanisms • Challenges • Supporting web applications services • Issues: accessing encrypted multi-data set • Examples: • Searching text, searching range text, etc… • Sharing personal data, sharing documents, etc… • Collaboration, integration, etc… • Available techniques • Non efficient encryption • More security vs. poor performance • Examples: Randomized encryption – retrieve all data for each query • Efficient searchable encryption • Less security vs. better performance • Examples: ordered-preserving encryption, bucketization based encryption
Data Protection Model • Encryption Strategy • Ordered Data • Order-preserving encryption schemes • Example: keyword-based encryption • Non Ordered Data • Searchable encryption schemes • Example: order-preserving encryption • Other Data • May not be encrypted • Example: categorical data, boolean data • Key Management • Storage and retrieval • Keys and metadata stored on the server – portability • Encrypted using a master key for the owner • Retrieved once for each web session • Representation • XML Schema • Need to flattened before storing • Extensibility
Privacy Policies • Definition (illustration) • PP = <PolicyID, CreationDate, ExpirationDate, Statements> • Statement = <Object, Attribute, EncryptionMethod> • Example: Google Calendar • “Hide my meeting with Bob on 01/01/2009”Encoding:{Policy1, 1/1/2010, 12/31/2010, {Event1, Event1.What, KDE1},{Event1, Event1.When, OPE1},{Event1, Event1.Where, KDE1},{Event1, Event1.Descryption, KDE1}}where,KDE= keyword-based encryption, OPE= order-preserving encryption • Policy enforcement • Attribute-level: encrypt all attributes or none • Object-level: more flexible, but more challenging (information leakage)
Framework Architecture • Privacy middleware: 7 components • Communication: HTTP messages • Trusted: messages cannot be intercepted by others • Untrusted: messages are susceptible to be intercepted by others
Implementation • Approach • Proxy-based • Browser independent • Web application: Google calendar • Adapted from Sahi • Sahi • Automation and testing tool for web applications • Open-source application • Based on proxy server technology • Browser independent • Developed in Java and JavaScript • Some Features • Injects JavaScript code into web pages to help record and playback events on the browser • Provides support for • Database based testing • File read/write APIs for data driven testing • HTTP and HTTPS
Google Calendar Model • Data Model • Calendar • A set of events • Event: composed of parameters • Parameters • <what, When, Repeats, Where, who, Calendar, Description, Attachment> • What: String – (non ordered data) • When: - (ordered data) • start/end date: Date • start/end time: (xx:xx am/pm) • Repeats: categorical (daily, weekly, etc) • Where: String – (non ordered data) • Who (Guests): • Guest id: email • Permission: choices (modify event, invite others, see guest list) • Calendar (owner): String – non ordered data • Description: String – non ordered data
Google Calendar Services • Query events • Basic query: any text in any parameter, operation (AND) • Advanced: specific parameters, range query, operations (AND, NOT) • Sharing and Invitations • Sharing a calendar, • Publish a calendar - (embed, public calendars) • Event invitations - (invite guests, allow guests to modify events, allow guests to see the guest lists) • Notifications • Types: create, change, cancel invitations • SMS (text messaging): mobile phones • Sync Events • Microsoft Outlook - options (1-way, 2-way) • Other calendars: Apple iCal, Mozilla Sunbird • Mobile devices: Windows Mobile, iPhone, BlackBerry • Others • Support for many languages
Technique 1: Keyword-based Searchable Encryption • Basic Approach • Based on keyword encryption • Use a hash function to bucketize the keywords • Original plaintext • Parse original text into a set of words • W = {W1, W2, …, Wn}, where Wi is a dictionary word • Keyword generation and bucketization • Generate keywords from W Kw = {Kw1, Kw2, …, Kwm}, where Ki is the key selected from W • Bucketize the keywords using a hash function – H: {0,1}* ----> {0,1}l HV = {HV1, …, HVk} • Encryption • Encrypt W using a non-deterministic encryption scheme, E(W) • Block cipher based encryption • Example: AES, Blowfish • Encrypt Kw using a deterministic encryption scheme, E(HV) • Examples: RSA • Tag E(HV) to E(W)
Technique 2: Order-preserving Encryption (OPE) • Definition • Deterministic encryption schemes: preserve numerical order • For A,B in N, |A| <= |B| • f: A ----> B is order-preserving if for all I, j in A, f(i) > f(j) iff i>j • SE = (K, Enc, Dec) is order-preserving if Enc(k, .) is an order-preserving function for all k output by K. • Security • IND-OCPA generalization of IND-DCPA – does not work • Based on the approach used to define PRPs • Note: order-preserving functions are injective • POPF-CCA • POPF: Pseudorandom order-preserving functions • SE = (K, Enc, Dec), A an adversary against SE • Lazy simple a random order-preserving function (ROPF) • Lazy Sampling • Connection: random order-preserving function & HG probability distribution • Use HG distribution to lazy sample a ROPF and its inverse
Technique 2: OPE of Dates • Approach • Uses order-preserving symmetric encryption (OPE) scheme • OPE based on Hypergeometric distribution • Maps the dates from a domain (D) to a range (R) • Domain D: set of dates • Range R: set of dates • F: D ----> R, where D <= R • D={D1, D2, …, Dm}, R={D1, D2, …, Dn}, m<=n • Example: • D={01/01/2009-1:00am, 12/31/2009-1:00am} • R={01/01/2009-1:00am, 12/31/2011-1:00am} • Plaintexts: 06/06/2009 ----------> Cipher: 08/15/2010 • Plaintexts: 06/07/2009 ----------> Cipher: 10/25/2010 • OPE • Uses consecutive numbers • Mapping dates to numbers • 1 --------------> 30mn • X --------------> Y mn • X = Y mn / 30mn • Examples: 3h30mn = 7, 1 day = 48
Technique 2: OPE Proposed Improvement • Approach • Use bucketization technique • Domain and Range • D = [SD, ED], where SD = start domain date, ED = end domain date • R = [SR, ER], where, SR = start range date, ER = end range date • Process • Bucketization • Break domain and range into smaller ones • D = {D1, D2, …, Dn}, R = {R1, R2, …, Rm}, n<=m • Sub-ranges don’t have to be consecutive • Mapping Buckets • Use pseudorandom function to deterministically map domain to range • Di -----> Ri • Examples • Domain= January 2009, Range = 2009 • D = {D1, D2, D3}, R = {R1, …, R10} • D1 = [1/1/2009, 1/10/2009], ….. • R1 = [1/1/2009, 2/15/2009], ….. • D1 -----> R4, D2 -----> R10, D3 -----> R1
Technique 3: Bucketization • Approach • Relation: • R = (V, F), where V is a set of values sorted in increasing order and F the set of corresponding frequencies of V in R • Domain: • D = {V1, V2, …, Vn}, Vi<Vj for all i<j • Buckets: divide D into k blocks • B = {B1, B2, …, Bk}, |B| = |D|/k • Codes: • Used to represent buckets • Set of codes: C = {C1, C2, …, Cl} • Mapping buckets to codes • Requirements: each bucket needs to be mapped to 1 to l codes • Mapping: C(Bi) = {Ci, …, Cj} (increasing onder) • Number of mappings for bucket: NM(Bi) = C(k,1) + C(k,2)+ … + C(k,k)=N • Number of possible mapping for all buckets: N^k • Bucketization scheme • Select one mapping from N^k • Goal: maximizing privacy
Technique 3: Bucketization • Choosing a mapping • Mapping scheme needs to enforce the privacy definition • Operations on the scheme • Insertion (encryption) • Convert data (Wi) to bucket ID (Bi): Bi(Wi) • Map bucket ID (Bi) to corresponding code IDs • Result: Wi ---> Bi ---> {Ci, Cj}, size q • Retrieval/Query (decryption) • Find bucket Bi for the data Wi • Generate q codes for Bi • Search and retreive all d codes • Filter out the false posive • Range Query Find all the buckets in the data range Generate a query for each bucket OR the results of the queries after filtering them.
Implementation Status • Remaining Features • Sharing data • Policy management • Service adapter • Mobile access • More encryption algorithms: bucketization, • Implemented Features • HTTP Proxy Server • HTTP Parser • Operations: create, modify, query events • Two cryptographic algorithms: KDE, OPE
Data Storage Model • Service provider storage • Client application: embeds application specific queries in HTTP query messages • Both storage data and retrieval of data • Server: uses HTTP response messages to respond to application requests • HTTP Request Messages • Request message: <request line, headers, empty line, body (optional)> • Methods: HEAD, GET, POST, PUT, DELETE, TRACE, OPTIONS, CONNECT • Data: attribute-value pairs (attribute=value) • Sources: query string (request line URL), data string (body in POST), cookie string (HTTP) • HTTP Response Messages • Response message: <status line, response header fields, content body> • Data: plaintext (content body)
Data Model: Representation • Objects Hierarchies • Representation: XML tree • Data (attribute/value): resides at the leaf nodes – (represented here by a rectangle) • Metadata: internal nodes only File-Oriented Event-Oriented
Services: Query Model • Simple query: (structure or content) • Q:= set of words = {w1, w2, …, wn}, where wi is a word • Data types: Number, String, Date • Operations: AND, OR, NOT, EXACT • Complex query (content) • General • Q = set of attributes/predicate pairs = {<a1, p1>, …, <ak, pk>}, where ai is the attribute and pi is the predicate • Data types: Number, String Date, • Operations: AND, OR, NOT, EXACT • Range query • Q = set of attributes/predicate pairs = {<a1, p1>, …, <ak, pk>}, where there exists at least one range • Range: defined by two pairs <ai, pl>, <ai, ph>, where pl=lower bound, ph = higher bound • Range data types: Number, Date • Non range data types: Any • Non range operations: AND, OR, NOT, EXACT
Services: Sharing and Collaboration • Objects • Based on user ID • Example: email address • Can be shared at any internal level of the hierarchy • Examples: a single event, an entire calendar • An object can be shared with multiple users • Example: an event for a meeting – all participant can share it • Policies used to set permissions • Examples: view only the object, edit the object, share the object with others
Sharing and Collaboration: Approach • Key Management (Encryption) • Objects encryption: individual or group • Model: <OwnerID, Object, Kenc> • Examples: <Bob, Meeting 1, K1> • Objects Sharing: individual or group • Model: <Owner, Target, Object, Keys, Policies> • Example: <Bob, Alice, April-Events, K1, P1> • Objects Multiple Sharing • Same objects and same policies • Examples: <Bob, Alice, April-Events, K1, P1><Bob, John, April-Events, K1, P1> • Same objects and different policies • Examples: <Bob, Alice, April-Events, K1, P1><Bob, John, April-Events, K2, P2> • Objective • Minimize the number of encryption keys while enforcing the sharing policies and ensuring the confidentiality of data at the server.
Sharing and Collaboration: Approach • Objective • Minimize the number of keys while enforcing the sharing policies and providing the confidentiality of data at the server • Approach • Data: Set of documents D = {D1, D2, … Dn} • Document: D = {O1, O2, …, Om} • Model: Dt = {N1, N2,…, Nn} (Internal nodes, and leaf nodes) K = {K1, K2, …, Kn} • Complete encryption: Enc(K)[D] = D* = {N1*, N2*,…, Ni*} • Partial encryption: Enc(K)[D] = D* = {N1, N2,…, Ni} + {N1*, N2*,…, Nj*}, Ni in (NODES*) U (NODES)