1 / 2

SCUD: Scalable Counting of Unique Data

Further Information. Challenges. Demo. Problem Description. SCUD: Scalable Counting of Unique Data. Dmitry Kit, Prince Mahajan, Navendu Jain, Praveen Yalagandula*, Mike Dahlin, and Yin Zhang Laboratory for Advanced Systems Research, The University of Texas at Austin

sailor
Download Presentation

SCUD: Scalable Counting of Unique Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Further Information Challenges Demo Problem Description SCUD: Scalable Counting of Unique Data Dmitry Kit, Prince Mahajan, Navendu Jain, Praveen Yalagandula*, Mike Dahlin, and Yin Zhang Laboratory for Advanced Systems Research, The University of Texas at Austin * Hewlett-Packard Labs, Palo Alto, CA Laboratory for Advanced Systems Research Our Approach Push query processing into the network Counting information about unique data is an important basic operation in large scale distributed applications • Web Demo: • Perform two operations on the system put/get. • Can specify the address from which requests originate. • View the top-10 list of clients who performed a put or a get. • Details: • Uses Bamboo DHT as the storage system. • When Bamboo receives a “put” or a “get” request it notifies SDIMS with the new access count (old_count+1). • SDIMS notifies to Bamboo the list of aggregated top-10 client IPs in terms of the “put” and “get” operations. • Bamboo updates its local top-10 list. • If there is a change, this list is inserted into SDIMS under as a different aggregation. • The lists from each root is combined to form the final top-10 list. • Demo URL: • http://z.cs.utexas.edu/users/dkit/bamboo_test.php • Leverage SDIMS system • Multiple aggregation trees. • Flexible API to install aggregation functions and control the aggregate value propagation. • Chaining of Aggregation functions • Several Applications: • Distributed storage: [e.g., Bamboo/OpenDHT] • Top-k users in a storage system grouped by activity type. • Put – store information. • Get – retrieve information. • Network Monitoring: [e.g., Heavy Hitters] • Top-k flows by bytes and packets. • Aggregate MAX/MIN/AVG. incoming flows in an organization. • Content Distribution Networks: [e.g., Akamai] • Counting the number of unique accesses per webpage. • Telematics Applications: [e.g., BMW Assist] • Counting the number of unique cars per highway. • Use one aggregation function to aggregate information per attribute • Example: Aggregating the number of accesses made by a client can be aggregated across multiple DHT trees. • Use a basic SUM aggregation for each client: • Chain the output of the first aggregation function into the second aggregation. [e.g., Top-K] • The results of the first aggregation reside on a large number of nodes (e.g., 1,000,000). • The chaining process allows us to combine this distributed information. • Example: Maintain a top-10 list of heavy hitter flows by source IPs in terms of the total bytes sent. • Scalability: • large number of unique items. • large number of distributed data sources at which the information about these items are being updated. • Existing schemes don’t scale: • Centralized: High bandwidth cost, large processing delay, high response latency. • Hierarchical aggregation: Root node and nodes near the root: O(# items) message cost, storage cost. Top-10 aggregation <9.2.5.89, 4.2MB><9.2.5.15, 2.2MB><9.2.5.12, 1.5MB><9.2.5.67, 1.3MB> … <9.2.5.23, 800 KB> <9.2.5.12, 1.5 MB> <9.2.5.11, 400 KB> <9.2.5.10, 653 KB> <9.2.5.67, 1.3 MB> <9.2.5.15, 2.2 MB> URL: http://www.cs.utexas.edu/users/ypraveen/sdims Email: sdims@cs.utexas.edu <9.2.5.56, 257 KB> <9.2.5.21, 120 KB> <9.2.5.89, 4.2 MB> <9.2.5.25, 900 KB> <9.2.5.20, 760 KB> <9.2.5.24, 100 KB>

  2. Sensor Data <IP1, count1> Top-K list of <IP, count> pairs <IP2, count2> <IP3, count3> Aggregation 2: Top-K Aggregation 1: SUM …<IPn, countn> AVG aggregation MAX aggregation MIN aggregation Major Roads in Texas The tree Major Roads for each city (identified by traffic greater than some threshold) Cars/minute on the roads in Houston Cars/minute on the roads in Dallas Cars/minute on the roads in San Antonio Estimated Network Size: 18 nodes Top 10 Gets by client: Client IP Number of Gets 127.0.0.1 1182 128.83.144.30 243 128.83.120.172 168 128.83.120.245 147 128.83.120.138 120 128.83.120.21 90 128.83.144.43 75 128.83.130.11 48 128.83.120.114 27 128.83.144.241 12 Top 10 Puts by client: Client IP Number of Puts 127.0.0.1 1100 128.83.120.138 180 128.83.144.30 144

More Related