1 / 28

A Framework for Developing Privacy Middleware for Cloud Data Services

A Framework for Developing Privacy Middleware for Cloud Data Services. Mamadou H. Diallo. Outline. Overview/Motivation Approach: A framework for developing privacy middleware Abstract Service Model Privacy Middleware Architecture Data Protection Model Implementation

lana
Download Presentation

A Framework for Developing Privacy Middleware for Cloud Data Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Framework for Developing Privacy Middleware for Cloud Data Services Mamadou H. Diallo

  2. Outline • Overview/Motivation • Approach: A framework for developing privacy middleware • Abstract Service Model • Privacy Middleware Architecture • Data Protection Model • Implementation • Based on proxy – adaptation of Sahi • Web application: Google Calendar • Google Calendar Service Model • Data protection: cryptographic algorithms • Implementation Status • Implemented Features • Remaining Features

  3. Overview/Motivation • Increase of web based data services • Some Benefits: improved service, accessibility, availability, low cost, etc… • Examples: Google calendar, Microsoft Live Mesh, Yahoo Briefcase, etc… • Privacy issues • Outsider attacks – (Internet hackers) • Insider attacks – (non honest employees) • Lack of support for privacy enforcement from web applications • Current approaches • Assumption: cooperative servers • Algorithms and protocols – supported by servers • Drawbacks: web service providers not willing to cooperate • Proposed approach: privacy middleware • Assumption: un-cooperative servers • Techniques: encryptions • Advantages: address insider attacks, policy-based, • Challenges: • Service abstraction, Service adaptation • Query processing – privacy enforcement • Sharing - keys distribution and revocation • Support for other servers

  4. Approach: A Framework for Privacy Middleware • Standard web application architecture • Three logical layers • Client layer – implemented in a browser • Presentation and business logic layer – implemented in a web server • Data layer – implemented as a database

  5. Approach: A Framework Privacy Middleware • New logical layer: privacy enforcement layer • Implemented in a privacy middleware • Design and implementation - based on proxy technology

  6. Abstract Service Model: Data Model • Data modeled as objects • Object: O = {(A1,V1), (A2,V2), …, (An,Vn)}, where (Ai,Vi) an attribute/value pair, n the total number of pairs • Granularity of objects: depend on data types • Event-based: unit = event • File-based: unit = file • Data categories • Structured: examples – events in calendar, database entries • Unstructured: examples – text documents, video files, audio files • Data types • Ordered data: examples – dates, numerical data • Non ordered data: examples - text document, presentation document • Other data: categorical data (list of choices), boolean data (YES/NO)

  7. Abstract Service Model: Operations • Operations modeled as functions • Function: inputs, processing, outputs • Create/store and modify objects • Inputs: object, privacy policies • Processing: encryption, tagging • Outputs: encrypted object with tags • Fetch/retrieve objects • Inputs: HTML pages with encrypted data • Processing: decryption, un-tagging • Outputs: HTML pages with no encrypted data • Query objects • Inputs: query parameters • Processing: encryption • Outputs: encrypted parameters • Share objects • Inputs: object ID, sharing policies • Processing: encryption • Outputs: encrypted data (object ID, keys and metadata for decrypting the object)

  8. Data Protection Model • Approaches • Based on cryptographic techniques • Encryption/decryption mechanisms • Challenges • Supporting web applications services • Issues: accessing encrypted multi-data set • Examples: • Searching text, searching range text, etc… • Sharing personal data, sharing documents, etc… • Collaboration, integration, etc… • Available techniques • Non efficient encryption • More security vs. poor performance • Examples: Randomized encryption – retrieve all data for each query • Efficient searchable encryption • Less security vs. better performance • Examples: ordered-preserving encryption, bucketization based encryption

  9. Data Protection Model • Encryption Strategy • Ordered Data • Order-preserving encryption schemes • Example: keyword-based encryption • Non Ordered Data • Searchable encryption schemes • Example: order-preserving encryption • Other Data • May not be encrypted • Example: categorical data, boolean data • Key Management • Storage and retrieval • Keys and metadata stored on the server – portability • Encrypted using a master key for the owner • Retrieved once for each web session • Representation • XML Schema • Need to flattened before storing • Extensibility

  10. Privacy Policies • Definition (illustration) • PP = <PolicyID, CreationDate, ExpirationDate, Statements> • Statement = <Object, Attribute, EncryptionMethod> • Example: Google Calendar • “Hide my meeting with Bob on 01/01/2009”Encoding:{Policy1, 1/1/2010, 12/31/2010, {Event1, Event1.What, KDE1},{Event1, Event1.When, OPE1},{Event1, Event1.Where, KDE1},{Event1, Event1.Descryption, KDE1}}where,KDE= keyword-based encryption, OPE= order-preserving encryption • Policy enforcement • Attribute-level: encrypt all attributes or none • Object-level: more flexible, but more challenging (information leakage)

  11. Framework Architecture • Privacy middleware: 7 components • Communication: HTTP messages • Trusted: messages cannot be intercepted by others • Untrusted: messages are susceptible to be intercepted by others

  12. Implementation • Approach • Proxy-based • Browser independent • Web application: Google calendar • Adapted from Sahi • Sahi • Automation and testing tool for web applications • Open-source application • Based on proxy server technology • Browser independent • Developed in Java and JavaScript • Some Features • Injects JavaScript code into web pages to help record and playback events on the browser • Provides support for • Database based testing • File read/write APIs for data driven testing • HTTP and HTTPS

  13. Google Calendar Model • Data Model • Calendar • A set of events • Event: composed of parameters • Parameters • <what, When, Repeats, Where, who, Calendar, Description, Attachment> • What: String – (non ordered data) • When: - (ordered data) • start/end date: Date • start/end time: (xx:xx am/pm) • Repeats: categorical (daily, weekly, etc) • Where: String – (non ordered data) • Who (Guests): • Guest id: email • Permission: choices (modify event, invite others, see guest list) • Calendar (owner): String – non ordered data • Description: String – non ordered data

  14. Google Calendar Services • Query events • Basic query: any text in any parameter, operation (AND) • Advanced: specific parameters, range query, operations (AND, NOT) • Sharing and Invitations • Sharing a calendar, • Publish a calendar - (embed, public calendars) • Event invitations - (invite guests, allow guests to modify events, allow guests to see the guest lists) • Notifications • Types: create, change, cancel invitations • SMS (text messaging): mobile phones • Sync Events • Microsoft Outlook - options (1-way, 2-way) • Other calendars: Apple iCal, Mozilla Sunbird • Mobile devices: Windows Mobile, iPhone, BlackBerry • Others • Support for many languages

  15. Technique 1: Keyword-based Searchable Encryption • Basic Approach • Based on keyword encryption • Use a hash function to bucketize the keywords • Original plaintext • Parse original text into a set of words • W = {W1, W2, …, Wn}, where Wi is a dictionary word • Keyword generation and bucketization • Generate keywords from W Kw = {Kw1, Kw2, …, Kwm}, where Ki is the key selected from W • Bucketize the keywords using a hash function – H: {0,1}* ----> {0,1}l HV = {HV1, …, HVk} • Encryption • Encrypt W using a non-deterministic encryption scheme, E(W) • Block cipher based encryption • Example: AES, Blowfish • Encrypt Kw using a deterministic encryption scheme, E(HV) • Examples: RSA • Tag E(HV) to E(W)

  16. Technique 2: Order-preserving Encryption (OPE) • Definition • Deterministic encryption schemes: preserve numerical order • For A,B in N, |A| <= |B| • f: A ----> B is order-preserving if for all I, j in A, f(i) > f(j) iff i>j • SE = (K, Enc, Dec) is order-preserving if Enc(k, .) is an order-preserving function for all k output by K. • Security • IND-OCPA generalization of IND-DCPA – does not work • Based on the approach used to define PRPs • Note: order-preserving functions are injective • POPF-CCA • POPF: Pseudorandom order-preserving functions • SE = (K, Enc, Dec), A an adversary against SE • Lazy simple a random order-preserving function (ROPF) • Lazy Sampling • Connection: random order-preserving function & HG probability distribution • Use HG distribution to lazy sample a ROPF and its inverse

  17. Technique 2: OPE of Dates • Approach • Uses order-preserving symmetric encryption (OPE) scheme • OPE based on Hypergeometric distribution • Maps the dates from a domain (D) to a range (R) • Domain D: set of dates • Range R: set of dates • F: D ----> R, where D <= R • D={D1, D2, …, Dm}, R={D1, D2, …, Dn}, m<=n • Example: • D={01/01/2009-1:00am, 12/31/2009-1:00am} • R={01/01/2009-1:00am, 12/31/2011-1:00am} • Plaintexts: 06/06/2009 ----------> Cipher: 08/15/2010 • Plaintexts: 06/07/2009 ----------> Cipher: 10/25/2010 • OPE • Uses consecutive numbers • Mapping dates to numbers • 1 --------------> 30mn • X --------------> Y mn • X = Y mn / 30mn • Examples: 3h30mn = 7, 1 day = 48

  18. Technique 2: OPE Proposed Improvement • Approach • Use bucketization technique • Domain and Range • D = [SD, ED], where SD = start domain date, ED = end domain date • R = [SR, ER], where, SR = start range date, ER = end range date • Process • Bucketization • Break domain and range into smaller ones • D = {D1, D2, …, Dn}, R = {R1, R2, …, Rm}, n<=m • Sub-ranges don’t have to be consecutive • Mapping Buckets • Use pseudorandom function to deterministically map domain to range • Di -----> Ri • Examples • Domain= January 2009, Range = 2009 • D = {D1, D2, D3}, R = {R1, …, R10} • D1 = [1/1/2009, 1/10/2009], ….. • R1 = [1/1/2009, 2/15/2009], ….. • D1 -----> R4, D2 -----> R10, D3 -----> R1

  19. Technique 3: Bucketization • Approach • Relation: • R = (V, F), where V is a set of values sorted in increasing order and F the set of corresponding frequencies of V in R • Domain: • D = {V1, V2, …, Vn}, Vi<Vj for all i<j • Buckets: divide D into k blocks • B = {B1, B2, …, Bk}, |B| = |D|/k • Codes: • Used to represent buckets • Set of codes: C = {C1, C2, …, Cl} • Mapping buckets to codes • Requirements: each bucket needs to be mapped to 1 to l codes • Mapping: C(Bi) = {Ci, …, Cj} (increasing onder) • Number of mappings for bucket: NM(Bi) = C(k,1) + C(k,2)+ … + C(k,k)=N • Number of possible mapping for all buckets: N^k • Bucketization scheme • Select one mapping from N^k • Goal: maximizing privacy

  20. Technique 3: Bucketization • Choosing a mapping • Mapping scheme needs to enforce the privacy definition • Operations on the scheme • Insertion (encryption) • Convert data (Wi) to bucket ID (Bi): Bi(Wi) • Map bucket ID (Bi) to corresponding code IDs • Result: Wi ---> Bi ---> {Ci, Cj}, size q • Retrieval/Query (decryption) • Find bucket Bi for the data Wi • Generate q codes for Bi • Search and retreive all d codes • Filter out the false posive • Range Query Find all the buckets in the data range Generate a query for each bucket OR the results of the queries after filtering them.

  21. Implementation Status • Remaining Features • Sharing data • Policy management • Service adapter • Mobile access • More encryption algorithms: bucketization, • Implemented Features • HTTP Proxy Server • HTTP Parser • Operations: create, modify, query events • Two cryptographic algorithms: KDE, OPE

  22. Questions?

  23. Data Storage Model • Service provider storage • Client application: embeds application specific queries in HTTP query messages • Both storage data and retrieval of data • Server: uses HTTP response messages to respond to application requests • HTTP Request Messages • Request message: <request line, headers, empty line, body (optional)> • Methods: HEAD, GET, POST, PUT, DELETE, TRACE, OPTIONS, CONNECT • Data: attribute-value pairs (attribute=value) • Sources: query string (request line URL), data string (body in POST), cookie string (HTTP) • HTTP Response Messages • Response message: <status line, response header fields, content body> • Data: plaintext (content body)

  24. Data Model: Representation • Objects Hierarchies • Representation: XML tree • Data (attribute/value): resides at the leaf nodes – (represented here by a rectangle) • Metadata: internal nodes only File-Oriented Event-Oriented

  25. Services: Query Model • Simple query: (structure or content) • Q:= set of words = {w1, w2, …, wn}, where wi is a word • Data types: Number, String, Date • Operations: AND, OR, NOT, EXACT • Complex query (content) • General • Q = set of attributes/predicate pairs = {<a1, p1>, …, <ak, pk>}, where ai is the attribute and pi is the predicate • Data types: Number, String Date, • Operations: AND, OR, NOT, EXACT • Range query • Q = set of attributes/predicate pairs = {<a1, p1>, …, <ak, pk>}, where there exists at least one range • Range: defined by two pairs <ai, pl>, <ai, ph>, where pl=lower bound, ph = higher bound • Range data types: Number, Date • Non range data types: Any • Non range operations: AND, OR, NOT, EXACT

  26. Services: Sharing and Collaboration • Objects • Based on user ID • Example: email address • Can be shared at any internal level of the hierarchy • Examples: a single event, an entire calendar • An object can be shared with multiple users • Example: an event for a meeting – all participant can share it • Policies used to set permissions • Examples: view only the object, edit the object, share the object with others

  27. Sharing and Collaboration: Approach • Key Management (Encryption) • Objects encryption: individual or group • Model: <OwnerID, Object, Kenc> • Examples: <Bob, Meeting 1, K1> • Objects Sharing: individual or group • Model: <Owner, Target, Object, Keys, Policies> • Example: <Bob, Alice, April-Events, K1, P1> • Objects Multiple Sharing • Same objects and same policies • Examples: <Bob, Alice, April-Events, K1, P1><Bob, John, April-Events, K1, P1> • Same objects and different policies • Examples: <Bob, Alice, April-Events, K1, P1><Bob, John, April-Events, K2, P2> • Objective • Minimize the number of encryption keys while enforcing the sharing policies and ensuring the confidentiality of data at the server.

  28. Sharing and Collaboration: Approach • Objective • Minimize the number of keys while enforcing the sharing policies and providing the confidentiality of data at the server • Approach • Data: Set of documents D = {D1, D2, … Dn} • Document: D = {O1, O2, …, Om} • Model: Dt = {N1, N2,…, Nn} (Internal nodes, and leaf nodes) K = {K1, K2, …, Kn} • Complete encryption: Enc(K)[D] = D* = {N1*, N2*,…, Ni*} • Partial encryption: Enc(K)[D] = D* = {N1, N2,…, Ni} + {N1*, N2*,…, Nj*}, Ni in (NODES*) U (NODES)

More Related