320 likes | 331 Views
This research focuses on securing stored data in distributed repositories, ensuring availability, integrity, and confidentiality even in compromised scenarios. The study introduces a hybrid approach combining pure replication and secret-sharing for improved security and performance. Various protocols and techniques are discussed in detail to achieve this goal.
E N D
Responsive Security for Stored Data Subramanian Lakshmanan Mustaque Ahamad H. Venkateswaran College of Computing Georgia Institute of Technology
Introduction • Problem definition • A distributed data repository. • Guarantees availability, integrity and confidentiality of stored data in the face of a limited number of compromised nodes. • Better performance. • Organization • Motivation • Existing techniques • Our approach • Related work • System architecture and protocols • A simple analysis
At your service! Get his medical records, fast! I don’t want the data to be lost, never! Could prove vital! Hope the records have not been tampered Motivation Hope no one is looking at my tax documents Would any one know about this?
So will I! Having a copy always helps I will pass on what I see to you, real fast! Approach I will be faithful anyway I’m not going to trust any of you guys. No one is going to get all the information Wish I could capture more nodes! I can’t even wipe or corrupt the data, let alone leaking info! I don’t want to talk to ALL of u, takes hell a lot of time
E(d,k) S1 E(d,k’) E(d,k) S2 E(d,k) E(d,k’) C E(d,k’) E(d,k) S3 E(d,k’) E(d,k) S4 E(d,k’) Pure replication • Periodic re-encryption • Client has to be present for re-encryption • Compromised server could retain old data
Secret-sharing algorithms • A (b,k) secret-sharing scheme fragments data into k shares so that b shares give no information, b+1 give all the information. • A (b,2b+1) scheme guarantees confidentiality, integrity and availability of data in the face of a maximum of b compromised nodes. • Data shares can be renewed, recovered periodically by the servers, even in the absence of the client, in a purely distributed fashion tolerating a maximum of b malicious nodes.
S1 f1 (1,5) scheme S2 f2 D (f1,f2..f5) f3 C S3 f4 S4 f5 S5 2. Suffers from the problem of related attacks A pure-secret sharing scheme 1. A write involves talking to all servers
Our approach • Pure replication • Poor security tolerating malicious faults • Better performance (access cost, availability) • Pure secret-sharing • Better security tolerating malicious faults • Poor performance • Hybrid scheme • Do limited secret-sharing • Replicate the shares • Offer the benefits of both schemes
Related work • Replication for Byzantine fault tolerance • Schneider’s state machine approach for fault tolerance • Secure FS, Practical Byzantine fault tolerance at MIT - Castro and Liskov • Quorum systems • Phalanx and Fleet – Reiter et. al. • Dynamic quorums – Alvisi et. El.
Related work (contd.) • Secret-sharing • Shamir’s scheme based on polynomial interpolation • Detecting and recovering corrupted shares – Feldman, Pederson • Proactive secret-sharing, periodic share renewal and share recovery – Herzberg et. al. • PASIS at CMU. • Fragmentation-scattering for intrusion tolerance at LAAS, France. • Data dissemination • Epidemic algorithms for non-malicious environment, Demers et. al. • Dissemination in Byzantine environment – Malkhi et. al.
Disseminate along a column Our system D f1 ,f2 ,f3 f3 f1 f2 Write along a row Read along a row Periodic share renewal • Pure secret sharing : number of rows = 1 • Pure replication : number of columns = 1
Assumptions • N Servers S1..Sn. • Requests authenticated and authorized independently at each server, secure communication channels • Compromises of two different servers not related. • Chosen threshold value b, number of server failures to be tolerated. • Number of columns- c, number of rows –r, rc = n, c>b. • Protocols designed for chosen matrix dimensions and chosen threshold value.
Read and write protocols Write(x,v) by Client C (b,c) 1. v v1,v2,…vc 2. Compute one-way function h(vi) = gvi 3. Form Verification string VS = h(v1)|h(v2)|..h(vn) sig = {uid(x),ts,v}KC-1 4. Choose a row k. for (m = 1 to c) send{“write”,uid(x),ts,vm,VS,sig} to sever Sk,m 5. Repeat 4 for different k until number of rows contacted l is such that c - b/l b+1.
C fc,VS, f1,VS, f2,VS, l b/l b+1 Write protocol (b,c) D f1 ,f2 VS = h(f1)|h(f2)|..h(fn)
Read and write Protocols (contd.) Read(x) by Client C 1. Choose a row k. for (m = 1 to c) send{“write”,uid(x),ts,vm,VS,sig} to sever Sk,m 2. Get a list of {ts, VS,vm,sig} from Sk,m 3. Choose the highest timestamp that occurs in b+1 or more replies with same VS. 4. If no such timestamp exists, repeat from 1 for a different k. 5. Pick shares corresponding to this timestamp. Pick b+1 shares that are verified successfully by VS. Reconstruct data value v from b+1 shares. 6. Return v if sig is valid, else repeat from 1 for different k
C f1’,VS1, f2’,VS2, fc’,VSc, (b,*) 3. fi1 ,fi2 ,fib+1 D Read protocol 1. VSi1 = VSi2 = ..Vsi b+1 ? 2. h(fi’) = h(fi) in VS ?
Data dissemination • Disseminate shares along columns • Increases availability and system performance • Better data sharing for shared data • Better support for mobile or roaming client • Replicated copies serve as back-ups
f1 VS f2 VS fc VS f’2 f2 Remarks : 1. VS is accepted as valid only if either directly heard from client or b+1 other servers report same VS . Dissemination protocol 1. Detect/suspect corruption 2. Pull verification string from b+1 servers 3. Check if share is valid using VS 4. Do share recovery if share is corrupted 2. Disseminate to other servers only those VS that are accepted as valid.
Share renewal • Assumption : In any timeframe of length Tv, an adversary can compromise a maximum of b nodes • Question : What happens over a time interval of length 2Tv? • Adversary compromises more than b nodes over a longer period of time. • Renew the shares at least once every Tv seconds. • Shares before share renewal do not make any sense with new shares. • Done by servers in the absence of client, distributed, secure against b compromised nodes.
f1, VS f2, VS f3, VS f1 f2 f2’ f1’, VS’ f2’, VS’ f3’, VS’ Periodic share renewal Share renewal (contd.)
Analysis • In any time fram of length Tv, a server can be compromised with probability p • Expected number of failures = np • Threshold value b, degree of replication r (or c) determine the level of security and performance offered by the system • Time taken to complete a read/write much less than Tv
Security Metrics • Availability • Probability that a legitimate client can read a data item that has been written successfully. • Confidentiality • Complement of the probability that an adversary can read a data item that has been written successfully. • Integrity • Complement of the probability that any client could be given corrupted or modified data content when a read on a data item is done.
Security metrics(contd.) 1. Availability(): = probability of finding at least b+1 non-faulty servers, each from a different column c • (b,c) = (c) (1-pr)i * (pr) (c-i) i i = b+1 2. Confidentiality(): = 1 - probability of finding at least b+1 malicious servers, each from a different column c • (b,c) = 1 - (c) (1-qr)i * (qr) (c-i) , q = 1-p i i = b+1 3. Integrity – same as confidentiality or depends on the strength of the underlying digital signature scheme
Performance metrics • Read cost • Expected number of servers a client needs to contact to read a data item successfully. • Involves collecting b+1 distinct shares that are not corrupted. • (2b+1)/pr, pr – probability of a read completing successfully after contacting 2b+1 servers. • Write cost • Number of servers a client needs to contact to write a data item at a confidence level h. • h = probability of success = probability that at least one server from each of b+1 or more columns receive the write.
Availability, Confidentiality as functions of b for constant c Availability Confidentiality
Access costs as functions of b for constant c Read cost Write cost
Availability and Confidentiality as functions of c for constant b Availability Confidentiality
Access costs as functions of c for constant b Read cost Write cost
Availability and Confidentiality against threshold value, c = 2b+1
Access costs against threshold value, c = 2b+1
Remarks • When access cost or availability is the most important metric to be optimized and confidentiality is not an issue, set r = n, c = 1, b = 0 (pure replication) • When confidentiality is the most important metric to be optimized and low performance is accepted, set r = 1, c = n, b = (c-1)/2 (pure secret-sharing) • Requirements on both security and performance would need combination of replication and secret-sharing • 1-10-3.5, access cost 22 servers => b = 10, c = 21 • Higher confidentiality => higher access costs and lower availability • Related attacks • Place servers vulnerable to similar attacks in same column
Future work • Per object customizable security • Intrusion detection and correction • Dynamic inclusion and exclusion of servers • Implementation and experimental evaluation