410 likes | 552 Views
CIS 6930.008: Internet-Scale Networked Systems. Adriana Iamnitchi (Anda) anda@cse.usf.edu. Contact Info. Email : anda@cse.usf.edu Office : ENB 334 Office hours : Wed 2-4 and by appointment (email me) Course page : http://www.csee.usf.edu/~anda/cis6930.008. CIS 6930.008: Course Goals.
E N D
CIS 6930.008: Internet-Scale Networked Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu
Contact Info Email: anda@cse.usf.edu Office: ENB 334 Office hours: Wed 2-4 and by appointment (email me) Course page: http://www.csee.usf.edu/~anda/cis6930.008 CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
CIS 6930.008: Course Goals • Primary • Gain deep understanding of fundamental issues that affect design of large-scale federated distributed systems • Map primary contemporary research themes • Gain experience in distributes systems research • Secondary • By studying a set of outstanding papers, build knowledge of how to present research • Learn how to read papers & evaluate ideas CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
What I’ll Assume You Know • Basic Internet architecture • IP, TCP, DNS, HTTP • Basic principles of distributed computing • Asynchrony (cannot distinguish between communication failures and latency) • Partial global state knowledge (cannot know everything correctly) • Failures happen. In very large systems, even rare failures happen often • If there are things that don’t make sense, ask! CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Examples of Distributed Systems ATT web Gnutella network The Internet A Sensor Network CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Definition (a version) • A distributed system is a collection of autonomous, programmable, failure-prone entities that are able to communicate through a communication medium that is unreliable. • Entity=a process on a device (PC, PDA, mote) • Communication Medium=Wired or wireless network • “Internet-Scale”: – Spanning multiple institutional or network (DNS) domains • (Much) Larger than “cluster” CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
This semester’s Theme (a proposal) Exploiting Emergent Behavior in Large-Scale Distributed Systems CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Filecules and Small Worlds in a Scientific Workload: Characteristics and Significance
Grid: Resource-Sharing Environment • Users: • 1000s from 10s institutions • Well-established communities • Resources: • Computers, data, instruments, storage, applications • Owned/administered by institutions • Applications: data- and compute-intensive processing • Approach: common infrastructure CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
The Problem • We have now: • Mature grid deployments running in production mode • We do not have yet: • Quantitative characterization of real workloads. • How many files, how much input data per process, etc. • And thus, benchmarks, workload models, reproducible results • Costs: • Local solutions, often replicating work • “Temporary” solutions that become permanent • Far from optimal solutions • Impossible to compare alternatives on relevant workloads CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Still, Why Should We Care? • Impossibility results, high costs: Tradeoffs are necessary • Solution: Select tradeoffs based on • User requirements (of course) • Usage patterns • Patterns exist and can be exploited. Examples: • Zipf distribution for request popularity (web caching) Breslau et al., Infocom’99 • Network topology: Partial Topology Random 30% die Targeted 4% die from Saroiu et al., MMCN 2002 CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
The DØ Experiment • High-energy physics data grid • 72 institutions, 18 countries, 500+ physicists • Detector Data • 1,000,000 Channels • Event rate ~50 Hz • So far, 1.9 PB of data • Data Processing • Signals: physics events • Events about 250 KB, stored in files of ~1GB • Every bit of raw data is accessed for processing/filtering • Past year overall: 0.6 PB • DØ: • … processes PBs/year • … processes 10s TB/day • … uses 25% – 50% remote computing CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Filecules and Small Worlds in Scientific Communities: Characteristics and Significance Joint work with Matei Ripeanu (UBC) and Ian Foster (ANL and UChicago)
New metric: The Data-Sharing Graph GmT(V, E): • V is set of users active during interval T • An edge in E connects users that asked for at least m common files within T “Yellow Submarine” “Les Bonbons” “No 24 in B minor, BWV 869” “Les Bonbons” “Yellow Submarine” “Wood Is a Pleasant Thing to Think About” “Wood Is a Pleasant Thing to Think About” CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
# Existing Edges CCoef = # Possible Edges The DØ Collaboration 6 months of traces (January – June 2002) 300+ users, 2 million requests for 200K files Small average path length Small World! Large clustering coefficient CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Small-World Graphs • Small path length, large clustering coefficient • Typically compared against random graphs • Think of: • “It’s a small world!” • “Six degrees of separation” • Milgram’s experiments in the 60s • Guare’s play “Six Degrees of Separation” CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Food web LANL coauthors Film actors Power grid Web Internet Word co-occurrences Other Small Worlds D. J. Watts and S. H. Strogatz, Collective dynamics of small-world networks. Nature, 393:440-442, 1998 R. Albert and A.-L. Barabási, Statistical mechanics of complex networks, R. Modern Physics 74, 47 (2002). CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
300s, 1file 1800s, 10file 7200s, 50files 3600s, 50files 1800s, 100files Web Data-Sharing Graphs Data-Sharing Relationships in the Web, Iamnitchi, Ripeanu, and Foster, WWW’03 CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
DØ Data-Sharing Graphs 28 days, 1 file 7days, 1file CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
KaZaA Data-Sharing Graphs 2 hours 1 file 28 days 1 file 1 day 2 files 4h 2 files 12h 4 files 7day, 1file Small-World File-Sharing Communities, Iamnitchi, Ripeanu, and Foster, Infocom ‘04 CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Interest-Aware Data Dissemination D0 Web Kazaa Interest-Aware Information Dissemination in Small-World Communities, Iamnitchi and Foster, HPDC’05 CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Current Work: Tagging Communities Tracking User Attention in Collaborative Tagging Communities, Elizeu Santos-Neto, Matei Ripeanu, and Adriana Iamnitchi, Workshop on Contextualized Attention Metadata (CAMA'07), Vancouver, Canada, June 2007. CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
DØ Workload Characterization Joint work with Shyamala Doraimani (USF) and Gabriele Garzoglio (FNAL)
DØ Traces • Traces from January 2003 to May 2005 • 234,000 jobs, 561 users, 34 domains, 1.13 million files accessed • 108 input files per job on average • Detailed data access information about half of these jobs (113,062) CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
File size distribution Expected: log-normal. Why not? Deployment decisions Domain specific Data transformation File popularity distribution Expected: Zipf. Why not? (speculations): Scientific data is uniformly interesting User community is relatively small Contradicts Traditional Models CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Filecules: Intuition CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Filecules: General Characteristics Filecules in High-Energy Physics: Characteristics and Impact on Resource Management, Adriana Iamnitchi, Shyamala Doraimani, Gabriele Garzoglio, HPDC’06 CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Filecules: Size Filecules of different sizes: • Largest filecule:17 TB or 51,841 files • 28% mono-file filecules CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Consequences for Caching • Use filecule membership for prefetching • When a file is missing from the local cache, prefetch the entire filecule • Use time locality in cache replacement • Least Recently Used (classic algorithm) • Implemented: • LRU with files and LRU with filecules • Greedy Request Value: prefetching + job reordering • Does not exploit temporal locality • Prefetching based on cache content • Our variant of LRU with filecules and job reordering E. Otoo, et al. Optimal file-bundle caching algorithms for data-grids. In SC ’04 CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Comparison: Caching Algorithms (1) CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Comparison: Caching Algorithms (2) % of cache change is a measure of transfer costs. CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Summary Part 1 • Revisited traditional workload models • Generalized from file systems, the web, etc. • Some confirmed (temporal locality), some infirmed (file size distribution and popularity) • Compared caching algorithms on D0 data: • Temporal locality is relevant • Filecules guide prefetching CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Summary • Workload characterization based on a HEP grid • Quantify scale (data processed, number of files) • Contradict traditional models • Patterns can guide system design • Filecules: caching, data replication • Small world data sharing: adaptive information dissemination, replica placement CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Administravia:Paper Reviewing (1) • Goals: • Think of what you read • Get used to writing paper reviews • Reviews due by noon before class • Be professional in your writing • Have an eye on the writing style: • Clarity • Beware of traps: learn to use them in writing and detect them in reading • Detect (and stay away from) trivial claims. E.g., 1st sentence in the Introduction: “The tremendous/unprecedented/phenomenal growth/scale/ubiquity of the Internet…” CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Administravia:Paper Reviewing (2) Follow the form provided when relevant. • State the main contribution of the paper • Critique the main contribution: Rate the significance of the paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two. Rate how convincing the methodology is. • Do the claims and conclusions follow from the experiments? • Are the assumptions realistic? • Are the experiments well designed? • Are there different experiments that would be more convincing? • Are there other alternatives the authors should have considered? • (And, of course, is the paper free of methodological errors?) CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Administravia:Paper Reviewing (3) • What is the most important limitation of the approach? • What are the three strongest and/or most interesting ideas in the paper? • What are the three most striking weaknesses in the paper? • Name three questions that you would like to ask the authors. • Detail an interesting extension to the work not mentioned in the future work section. • Optional comments on the paper that you’d like to see discussed in class. CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Administravia:Discussion leading • Come prepared! • Prepare discussion outline • Prepare questions: • “What if”s • Unclear aspects of the solution proposed • … • Similar ideas in different contexts • Initiate short brainstorming sessions • Leaders do NOT need to submit paper reviews • Main goals: • Keep discussion flowing • Keep discussion relevant • Engage everybody (I’ll have an eye on this, too) CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Administravia:Projects • Combine with your research if relevant to the class • Get approval from all instructors if you overlap final projects: • Don’t sell the same piece of work twice • You can get more than twice as many results with less than twice as much work • Aim high! • Put one extra month and get a publication out of it • It is doable (we have proofs) • Try ideas that you postponed out of fear: it’s just a class, not your PhD. CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Administravia:Project deadlines (tentative) • January 30: 1-page project proposal • Feb. 26: 3-page literature survey • Know relevant work in your problem area • If implementation project, list tools, similar projects • March 31: 5-page Midterm project due • Have a clear image of what’s possible/doable • Report preliminary results • Last class:In-class project presentation • Demo, if appropriate • May 1: • Final report due CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Next Classed • Lectures on basics of distributed systems • Will start reading papers in about 2 weeks CIS6930.008: Internet-Scale Networked Systems (Spring 2008)
Questions? CIS6930.008: Internet-Scale Networked Systems (Spring 2008)