1 / 50

Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams. Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology Marios Hadjieleftheriou , AT&T Labs Research George Kollios , Boston University.

leo-murphy
Download Presentation

Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology MariosHadjieleftheriou, AT&T Labs Research George Kollios, Boston University

  2. Outsourced stream model: stock trading monitoring Servers (bloomberg) Q Provider: A stock broker Register Queries: Sliding window query and/or One shot query Clients

  3. Data Publishing Model [HIM02] Owner: publish data Servers: host (or monitor) the data and provide query services Clients: query the owner’s data through servers clients servers owner H. Hacigumus, B. R. Iyer, and S. Mehrotra, ICDE02

  4. Information Security Issues • The third-party (server) cannot be trusted • Lazy server • Malicious intent • Compromised equipment • Unintentional errors (e.g. bugs)

  5. Problem 1: Injection Select * from T where 5<A<11 client owner Returns 7, 8, 9 server

  6. Problem 2: Drop Select * from T where 5<A<11 client owner Returns 7 9 ri+1 server

  7. Query Authentication: Goals • Query Correctness results do exist in the owner's database • Query Completeness no records have been omitted from the result

  8. General Approach Authenticated Structures Verification Object (VO) Query results clients servers owner

  9. Recent n tuples 2, D xt-n xt xt+1 Tuple-based Window SELECT SUM(stock_price) FROM Stock_trace WHERE stock_name = A in last 100 Trades SLIDES every 1 trade Time-based Window SELECT SUM(stock_price) FROM Stock_trace WHERE stock_name = A in last 5 Minutes SLIDES every 1 minute Sliding Window Query … 2, A 2, B 4, A 9, C 5, A 8, A 7, C 7, B xt-n+1 This talk concentrates on tuple-baesd window, generalizing to time-based window is in the paper. For tuple-based window, the timestamp is simply the arrival id of the tuple.

  10. Recent n tuples xt-n xt Tuple-based Window SELECT SUM(stock_price) FROM Stock_trace WHERE stock_name = A in last 100 Trades One Shot Query … 2, A 2, B 4, A 9, C 5, A 8, A 7, C 7, B

  11. Sign(h1..8,SK)  h1..8 h1..4 h5..8 h12 h34 h56 h78 h1 h2 h3 h4 h5 h6 h7 h8 Merkle Hash Tree[M89]-Amortizing Signature Cost Collision resistant hash function any change in the tree will lead to a different hash value for the root Digital signature of the root  no one except the owner could produce the signature Single signature to sign many messages Hash function is publicly known Ver(h1..8,  ,pK)=valid?  h1..8 h1..4 h5..8 h12= H(h1|h2) h56 h78 h5 h6 m1 m2 m3 m4 m5 m5 m6 m6 m7 m8 R. C. Merkle. CRYPTO, 1989

  12. Sign(h1..8,SK)  h12 h34 h56 h78 h1 h2 h3 h4 h5 h6 h7 h8 q Extends to Range Query: f=2 (f is the fanout) Select * from T where 5<A<11  h1..8 h1..4 h1..4 h5..8 h5..8 1 2 3 4 5 5 6 9 12 12 VO: 5, 12, h1..4,  LB(q) RB(q)

  13. Ver(h1..8,PK, ) Valid? h1..8 h5..8 h56 h78 h5 h6 h7 h8 Reconstruct query subtree q Client Side Verification Select * from T where 5<A<11 VO: 5, 12, h1..4,  Query results: 6, 9 h1..4 Unknown to the client 5 6 9 12

  14. Solution Overview • Sign Every Tuple (with query attribute(s) and timestamp) • Expensive update cost for the data provider • Expensive communication cost between server and clients as VO size is large • But it provides timely answer on a per-tuple basis • Amortize the signing cost by “proof-infusing” on a group of tuples: • A delayed response, can often be tolerated. • Query with d query attributes is a query in d+1 dimension. • N: maximum window size; n: window size for a particular query; b: the delay

  15. Tumbling Merkle Tree (TM-tree) Sign(hroot|t1|tb) … … … … Merkle binary search tree for every b tuples Merkle binary search tree for every b tuples Time ti: timestamp of the ith tuple

  16. TM-tree Continues Build Merkle tree Query Attribute A Sort by A … … Time

  17. Sliding window query on the TM-tree Tuples to be added to results Tuples to be removed from results • • • 2. Window slides 1. Initialization: Query n/b trees 3. Incremental update: query four boundary trees

  18. Query the TM-tree Q False positives Value Sent to clients Time Remove from results Q Query shifts by b Added to results False positives

  19. Correctness and Completeness • Correctness: • Guaranteed by each individual Merkle tree • Completeness: • Completeness in each small Merkle tree is guaranteed by what we have studied in the first part of this talk • Overall completeness: • Check that the results returned are obtained by querying consecutive trees that fall within the query range on time dimension and they completely cover the query range on time dimension. • This is possible as two boundary tuples’ timestamps have been signed in each tree (hence these timestamps have to be included in the VO by the server).

  20. Limitation of TM-tree • Only supports one dimensional query • False positives lead to large VO size, especially when each tuple has non-trivial size.

  21. Merkle kd tree (Mkd-tree) • To get rid of false positives: • Obviously we need a multi-dimensional indexing structure • KD-tree: an excellent candidate with bounded query performance of and to bulk-load. • A space-partition structure: partition along each dimension in turn.

  22. Mkd-tree and TMkd-tree • Incorporating Merkle tree into KD-tree: • Leaf node: H(p), p is the point contained in this node • Index node u with children v, w and dividing line lu: H(hv|hw|lu) • Tumbling Merkle kd-tree (TMkd-tree) • Similar idea as it is in TM-tree, but we are using Mkd-tree as each small tree. • Boundary trees no longer introduce false positives!

  23. Is this good enough? • Tumbling trees are good for maintaining the update to sliding window queries • They both have linear space to N and log b update cost, and • But they are expensive for answering one-shot queries (or the initialization of sliding window queries) • query with window size n: have to query n/b trees: linear in n and could be expensive for large values of n.

  24. Dyadic Merklekd-tree (DMkd-tree): 1D queries N+b N+b • • • 4b 4b 4b • • • 2b 2b 2b 2b 2b 2b b b b b b b b b b b b b N+b Q 4b Merkle tree 2b Discarded 2b Mkd-tree b b

  25. Exponential Merklekd-tree (EMkd-tree):Multi-dimensional queries 4b 4b 4b T’l T’l Tl 2b 2b 2b 2b 2b 2b T’1 T1 T’1 T1 b b b b b b b b b T0 T0 T’0 new T0 T’0 Q Materialized Mkd-tree Non-materialized Mkd-tree

  26. Some Experiments • We use real streams: • World Cup Data (WC) • IP traces from the AT&T network (IP) • We perform the following query: • WC: Query attribute is the response size • IP: Query attribute is the packet size • Hardware: • 2.8GHz Intel Pentium 4 CPU • Linux Machine

  27. Tumbling trees: update cost 1. b=1000 is a sweet point 2. This delay is small: in real streams it spans less than one or two seconds

  28. Tumbling trees: size They both have linear size (to number of tuples covered in maximal window size of N)

  29. Query cost per sliding period, b=1,000: fixed sliding period as b Linear scan of TM-tree at leaf level results in locality which greatly improves its performance

  30. VO size per sliding period, b=1,000: fixed sliding period as b TM-tree incurs roughly 4γb false positives

  31. DM-kd Tree, EM-kd Tree Update Cost

  32. DMkd, EMkd trees: size

  33. One Shot Query Cost

  34. One Shot Query: VO size

  35. Summary • All trees support aagregations • TM-tree and DMkd-tree support only one-dimensional queries • TMkd-tree and EMkd-tree support multi-dimensional queries • Tumbling trees are good for maintaining updates to sliding window queries, while DMkd-tree and Emkd-tree are good for one shot queries.

  36. Thanks! • Questions

  37. Query q Intuition on Authenticating Aggregation Query Naïve solution: answer it as a range selection query linear authentication cost k (k tuples in the range)! Find the canonical cover: authentication cost log k !

  38. m KeyGen (SK, PK)  SK  m Ver(m, PK, )  valid? Sign(m, SK)   Public key digital signature schemes Sender Insecure Channel Recipient

  39. Merkle Tree: Verifying A Single Value • SELECT Airline FROM Flights WHERE price = $600 apply merkle tree to database authentication [DGMS03] P. Devanbu, M. Gertz, C. Martel, and S. G. Stubblebine Journal of Computer Security 2003 Ver(hroot ,  , PK)=valid? hroot 410 Query result: h12 h34 320 600 t4 Verification Object: h3 h4 h3 Sibling hash values along the query path h12 t1 t2 t3 t4  $250 $320 $410 $600

  40. m1 mk m1 mk 1 k  =combine(1,…, k) Reduce S/C communication Cost [MNT04] • Aggregation Signature: Condensed RSA Overhead: computation cost of modular multiplication with big modular base number, close to 100 s E. Mykletun, M. Narasimha, and G. Tsudik. NDSS'04

  41. Condensed RSA[MNT04] • KeyGen: • Choose two large primes, p and q, pq • Set n=pq • Compute (n)=(p-1)(q-1) • Choose e s.t. 1<e<(n) and e is coprime to (n) • Compute d s.t. de1 (mod (n)) • (d, n) is the secret key and (e, n) is the public key

  42. Sign: • Given mi, compute hi=H(mi) • Compute • Compute • Verify: • Given mi, compute hi=H(mi) • Check that: Condensed RSA[MNT04]

  43. DMkd-tree vs. EMkd-tree

  44. Tool 1: Collision-Resistant Hash Functions • Example SHA1: variable input size  20 bytes (can also plug in any newer replacement) • Observations: • Computation cost: 2-3 s (for up to 500 bytes input) • Storage cost: 20 bytes H H x1 x2 hard to find collision

  45. Tool 2: Public Key Digital Signature Schemes • Formally defined by [GMR88] • The message has not been changed in any way • The message is indeed from the sender (corresponding to the public key) • No one except the secret key owner could produce a signature • One such scheme: RSA [RSA78] • Observations • Computation cost: about 3-4 ms for signing and more than 100 s for verifying • Storage cost: 128 bytes S. Goldwasser S. Micali R. Rivest SIAM Journal on Computing 1988. R. Rivest A. Shamir L. Adleman, Commun. ACM 1978

  46. Problem 3: Omission Select * from T where 5<A<11 client owner Returns 7,9 Update server

  47. Roadmap • Solution overview • Efficient authentication of sliding window queries when window slides • Efficient authentication of one shot queries (also the sliding window query initialization): • Experiment • Conclusion

  48. Is this good enough? • Tumbling trees are good for maintaining the update to sliding window queries • They both have linear space to N and log b update cost, and • But they are expensive for answering one-shot queries (or the initialization of sliding window queries) • query with window size n: have to query n/b trees: linear in n and could be expensive for large values of n.

  49. Query cost per sliding period, b=1,000: fixed query selectivity as 0.1 Query upto 2/b+2 boundary trees

  50. VO size per sliding period, b=1,000: fixed query selectivity as 0.1 TM-tree incurs roughly (2/b+2)b false positives

More Related