1 / 85

A Scalable Information Management Middleware for Large Distributed Systems

A Scalable Information Management Middleware for Large Distributed Systems. Praveen Yalagandula HP Labs, Palo Alto Mike Dahlin, The University of Texas at Austin. Trends. Large wide-area networked systems Enterprise networks IBM 170 countries > 330000 employees Computational Grids

analu
Download Presentation

A Scalable Information Management Middleware for Large Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Scalable Information Management Middleware for Large Distributed Systems Praveen Yalagandula HP Labs, Palo Alto Mike Dahlin, The University of Texas at Austin

  2. Trends • Large wide-area networked systems • Enterprise networks • IBM • 170 countries • > 330000 employees • Computational Grids • NCSA Teragrid • 10 partners and growing • 100-1000 nodes per site • Sensor networks • Navy Automated Maintenance Environment • About 300 ships in US Navy • 200,000 sensors in a destroyer [3eti.com]

  3. Trends • Large wide-area networked systems • Enterprise networks • IBM • 170 countries • > 330000 employees • Computational Grids • NCSA Teragrid • 10 partners and growing • 100-1000 nodes per site • Sensor networks • Navy Automated Maintenance Environment • About 300 ships in US Navy • 200,000 sensors in a destroyer [3eti.com]

  4. Trends • Large wide-area networked systems • Enterprise networks • IBM • 170 countries • > 330000 employees • Computational Grids • NCSA Teragrid • 10 partners and growing • 100-1000 nodes per site • Sensor networks • Navy Automated Maintenance Environment • About 300 ships in US Navy • 200,000 sensors in a destroyer [3eti.com]

  5. Trends • Large wide-area networked systems • Enterprise networks • IBM • 170 countries • > 330000 employees • Computational Grids • NCSA Teragrid • 10 partners and growing • 100-1000 nodes per site • Sensor networks • Navy Automated Maintenance Environment • About 300 ships in US Navy • 200,000 sensors in a destroyer [3eti.com]

  6. Trends • Large wide-area networked systems • Enterprise networks • IBM • 170 countries • > 330000 employees • Computational Grids • NCSA Teragrid • 10 partners and growing • 100-1000 nodes per site • Sensor networks • Navy Automated Maintenance Environment • About 300 ships in US Navy • 200,000 sensors in a destroyer [3eti.com]

  7. Trends • Large wide-area networked systems • Enterprise networks • IBM • 170 countries • > 330000 employees • Computational Grids • NCSA Teragrid • 10 partners and growing • 100-1000 nodes per site • Sensor networks • Navy Automated Maintenance Environment • About 300 ships in US Navy • 200,000 sensors in a destroyer [3eti.com]

  8. Trends • Large wide-area networked systems • Enterprise networks • IBM • 170 countries • > 330000 employees • Computational Grids • NCSA Teragrid • 10 partners and growing • 100-1000 nodes per site • Sensor networks • Navy Automated Maintenance Environment • About 300 ships in US Navy • 200,000 sensors in a destroyer [3eti.com]

  9. Information Management Information Management Research Vision Wide-area Distributed Operating System • Goals: • Ease building applications • Utilize resources efficiently Data Management Security Monitoring ...... Scheduling

  10. Information Management • Most large-scale distributed applications • Monitor, query, and react to changes in the system • Examples: • A general information management middleware • Eases design and development • Avoids repetition of same task by different applications • Provides a framework to explore tradeoffs • Optimizes system performance Job Scheduling System administration and management Service location Sensor monitoring and control File location service Multicast service Naming and request routing ……

  11. Contributions – SDIMS Scalable Distributed Information Management System • Meets key requirements • Scalability • Scale with both nodes and information to be managed • Flexibility • Enable applications to control the aggregation • Autonomy • Enable administrators to control flow of information • Robustness • Handle failures gracefully

  12. SDIMS in Brief • Scalability • Hierarchical aggregation • Multiple aggregation trees • Flexibility • Separate mechanism from policy • API for applications to choose a policy • A self-tuning aggregation mechanism • Autonomy • Preserve organizational structure in all aggregation trees • Robustness • Default lazy re-aggregation upon failures • On demand fast reaggregation

  13. Outline • SDIMS: a general information management middleware • Aggregation abstraction • SDIMS Design • Scalability with machines and attributes • Flexibility to accommodate various applications • Autonomy to respect administrative structure • Robustness to failures • Experimental results • SDIMS in other projects • Conclusions and future research directions

  14. Outline • SDIMS: a general information management middleware • Aggregation abstraction • SDIMS Design • Scalability with machines and attributes • Flexibility to accommodate various applications • Autonomy to respect administrative structure • Robustness to failures • Experimental results • SDIMS in other projects • Conclusions and future research directions

  15. Attributes • Information at machines • Machine status information • File information • Multicast subscription information • ……

  16. Aggregation Function • Defined for an attribute • Given values for a set of nodes • Computes aggregate value • Examples • Total users logged in the system • Attribute – numUsers • Aggregation function – summation

  17. Aggregation Trees f(f(a,b), f(c,d)) A2 • Aggregation tree • Physical machines are leaves • Each virtual node represents a logical group of machines • Administrative domains • Groups within domains • Aggregation function, f, for attribute A • Computes the aggregated value Ai for level-i subtree • A0 = locally stored value at the physical node or NULL • Ai = f(Ai-10, Ai-11, …, Ai-1k) for virtual node with k children • Each virtual node is simulated by some machines f(a,b) f(c,d) A1 A0 d c a b

  18. Example Queries • Job scheduling system • Find the least loaded machine • Find a (nearby) machine with load < 0.5 • File location system • Locate a (nearby) machine with file “foo”

  19. Example – Machine Loads • Attribute: “minLoad” • Value at a machine M with load L is ( M, L ) • Aggregation function • MIN_LOAD (set of tuples) (C, 0.1) (A, 0.3) (C, 0.1) (D, 0.7) (C, 0.1) (A, 0.3) (B, 0.6) minLoad

  20. Example – Machine Loads Query: Tell me the least loaded machine. • Attribute: “minLoad” • Value at a machine M with load L is ( M, L ) • Aggregation function • MIN_LOAD (set of tuples) (C, 0.1) (A, 0.3) (C, 0.1) (D, 0.7) (C, 0.1) (A, 0.3) (B, 0.6) minLoad

  21. Example – Machine Loads Query: Tell me a (nearby) machine with load < 0.5. • Attribute: “minLoad” • Value at a machine M with load L is ( M, L ) • Aggregation function • MIN_LOAD (set of tuples) (C, 0.1) (A, 0.3) (C, 0.1) (D, 0.7) (C, 0.1) (A, 0.3) (B, 0.6) minLoad

  22. Example – File Location • Attribute: “fileFoo” • Value at a machine with id machineId • machineId if file “Foo” exists on the machine • null otherwise • Aggregation function • SELECT_ONE(set of machine ids) B B C null C B null fileFoo

  23. Example – File Location Query: Tell me a (nearby) machine with file “Foo”. • Attribute: “fileFoo” • Value at a machine with id machineId • machineId if file “Foo” exists on the machine • null otherwise • Aggregation function • SELECT_ONE(set of machine ids) B B C null C B null fileFoo

  24. Outline • SDIMS: a general information management middleware • Aggregation abstraction • SDIMS Design • Scalability with machines and attributes • Flexibility to accommodate various applications • Autonomy to respect administrative structure • Robustness to failures • Experimental results • SDIMS in other projects • Conclusions and future research directions

  25. Scalability • To be a basic building block, SDIMS should support • Large number of machines (> 104) • Enterprise and global-scale services • Applications with a large number of attributes (> 106) • File location system • Each file is an attribute  Large number of attributes

  26. f1,f2,…,f7 f1, f2, f3 f4,f5,f6,f7 f6, f7 f2, f3 f4, f5 f1, f2 Scalability Challenge • Single tree for aggregation • Astrolabe, SOMO, Ganglia, etc. • Limited scalability with attributes • Example: File Location

  27. f1,f2,…,f7 f1, f2, f3 f4,f5,f6,f7 f6, f7 f2, f3 f4, f5 f1, f2 Scalability Challenge • Single tree for aggregation • Astrolabe, SOMO, Ganglia, etc. • Limited scalability with attributes • Example: File Location • Automatically build multiple trees for aggregation • Aggregate different attributes along different trees

  28. Building Aggregation Trees • Leverage Distributed Hash Tables • A DHT can be viewed as multiple aggregation trees • Distributed Hash Tables (DHT) • Supports hash table interfaces • put (key, value): inserts value for key • get (key): returns values associated with key • Buckets for keys distributed among machines • Several algorithms with different properties • PRR, Pastry, Tapestry, CAN, CHORD, SkipNet, etc. • Load-balancing, robustness, etc.

  29. DHT - Overview • Machine IDs and keys: Long bit vectors • Owner of a key = Machine with ID closest to the key • Bit correction for routing • Each machine keeps O(log n) neighbors Key = 11111 11101 10111 11000 10010 get(11111) 00001 00110 01100 01001

  30. DHT Trees as Aggregation Trees 111 Key = 11111 11x 1xx 010 110 001 100 000 101 011 111

  31. DHT Trees as Aggregation Trees Mapping from virtual nodes to real machines 111 Key = 11111 11x 1xx 010 110 001 100 000 101 011 111

  32. 000 111 111 00x 11x 11x 0xx 1xx 1xx 010 010 110 110 010 110 001 001 001 100 100 100 101 000 000 101 101 011 011 000 011 111 111 111 DHT Trees as Aggregation Trees Key = 11111 Key = 00010

  33. 000 111 00x 11x 0xx 1xx 010 110 010 110 001 001 100 100 101 000 101 011 000 011 111 111 DHT Trees as Aggregation Trees Key = 11111 Key = 00010 Aggregate different attributes along different trees hash(“minLoad”) = 00010  aggregate minLoad along tree for key 00010

  34. Scalability • Challenge: • Scale with both machines and attributes • Our approach • Build multiple aggregation trees • Leverage well-studied DHT algorithms • Load-balancing • Self-organizing • Locality • Aggregate different attributes along different trees • Aggregate attribute A along the tree for key = hash(A)

  35. Outline • SDIMS: a general information management middleware • Aggregation abstraction • SDIMS Design • Scalability with machines and attributes • Flexibility to accommodate various applications • Autonomy to respect administrative structure • Robustness to failures • Experimental results • SDIMS in other projects • Conclusions and future research directions

  36. Flexibility Challenge • When to aggregate? • On reads? or on writes? • Attributes with different read-write ratios #reads >> #writes #writes >> #reads { read-write ratio File Location Total Mem CPU Load Best Policy Aggregate on reads Aggregate on writes Partial Aggregation on writes Astrolabe Ganglia DHT based systems Sophia MDS-2

  37. Flexibility Challenge • When to aggregate? • On reads? or on writes? • Attributes with different read-write ratios #reads >> #writes #writes >> #reads { read-write ratio File Location Total Mem CPU Load Best Policy Aggregate on reads Aggregate on writes Partial Aggregation on writes Single framework – separate mechanism from policy  Allow applications to choose any policy  Provide self-tuning mechanism Astrolabe Ganglia DHT based systems Sophia MDS

  38. API Exposed to Applications • Install • Update • Probe • Install: an aggregation function for an attribute • Function is propagated to all nodes • Arguments up and down specify an aggregation policy • Update: the value of a particular attribute • Aggregation performed according to the chosen policy • Probe: for an aggregated value at some level • If required, aggregation is done • Two modes: one-shot and continuous

  39. Flexibility Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0

  40. Flexibility Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0

  41. Flexibility Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0

  42. Flexibility Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0

  43. Self-tuning Aggregation • Some apps can forecast their read-write rates • What about others? • Can not or do not want to specify • Spatial heterogeneity • Temporal heterogeneity • Shruti: Dynamically tunes aggregation • Keeps track of read and write patterns

  44. Shruti – Dynamic Adaptation R A Update-Up Up=all Down=0

  45. A Lease based mechanism Shruti – Dynamic Adaptation R A Update-Up Up=all Down=0 Any updates are forwarded until lease is relinquished

  46. Shruti – In Brief • On each node • Tracks updates and probes • Both local and from neighbors • Sets and removes leases • Grants leases to a neighbor A • When gets k probes from A while no updates happen • Relinquishes leases from a neighbor A • When gets m updates from A while no probes happen

  47. Flexibility • Challenge • Support applications with different read-write behavior • Our approach • Separate mechanism from policy • Let applications specify an aggregation policy • Up and Down knobs in Install interface • Provide a lease based self-tuning aggregation strategy

  48. Outline • SDIMS: a general information management middleware • Aggregation abstraction • SDIMS Design • Scalability with machines and attributes • Flexibility to accommodate various applications • Autonomy to respect administrative structure • Robustness to failures • Experimental results • SDIMS in other projects • Conclusions and future research directions

  49. A D B C Administrative Autonomy • Systems spanning multiple administrative domains • Allow a domain administrator control information flow • Prevent external observer from observing the information • Prevent external failures from affecting the operations • Challenge • DHT trees might not conform

  50. } Ensure that virtual nodes aggregating data of a domain are hosted on machines in the domain A D B C Administrative Autonomy • Our approach: Autonomous DHTs • Two properties • Path locality • Path convergence

More Related