1 / 39

Cloud Computing Research at T U Delft (2008—ongoing)

Cloud Computing Research at T U Delft (2008—ongoing). Parallel and Distributed Systems Group Delft University of Technology The Netherlands. 3TU. =. +. +.

cruz
Download Presentation

Cloud Computing Research at T U Delft (2008—ongoing)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cloud Computing Research at TU Delft (2008—ongoing) Parallel and Distributed Systems GroupDelft University of TechnologyThe Netherlands 3TU. = + + Our team: Undergrad Gargi Prasad, Arnoud Bakker, Nassos Antoniou, Thomas de Ruiter, … Grad Siqi Shen, Nezih Yigitbasi, Ozan Sonmez Staff Henk Sips, Dick Epema, Alexandru Iosup Collaborators Ion Stoica and the Mesos team (UC Berkeley), Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), Derrick Kondo, Emmanuel Jeannot (INRIA), ...

  2. TUD Team: 2 Staff, 2+3PhD, n MSc, ... Our team: Undergrad Adrian Lascateu, Alexandru Dimitriu (UPB, Romania), …, Grad Vlad Nae (U. Innsbruck, Austria), Siqi Shen, Nezih Yigitbasi (TU Delft, the Netherlands), …Staff Alexandru Iosup, Dick Epema, Henk Sips (TU Delft), Thomas Fahringer, Radu Prodan (U. Innsbruck), Nicolae Tapus, Mihaela Balint, Vlad Posea (UPB), etc.

  3. What is Cloud Computing? “The path to abundance” On-demand capacity Pay what you use Great for web apps (EIP, web crawl, DB ops, I/O) VS Tropical Cyclone Nargis (NASA, ISSS, 04/29/08) http://www.flickr.com/photos/dimitrisotiropoulos/4204766418/ • “The killer cyclone” • Not so great performance for sci. applications1 • Long-term perf. variability2 • How to manage? 1- Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, IEEE TPDS, 2011. 2- Iosup et al., On the Performance Variability of Production Cloud Services, CCGrid 2011. Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 3

  4. What do We Want from Clouds? Good IaaS, PaaS, SaaS • Portability (Virtualisation, no vendor lock-in) • Accountability (lease what you use) • … for eScience • … for Massively Social Gaming Good resource management • Elasticity • Reliability • Efficiency (Scheduling) • Data-aware mechanisms • Being “green”? Performance evaluation (What is “Good”?)

  5. Agenda • Introduction • Cloud Performance Studies • The Cloud Workloads Archive • Massivizing Online Social Games using Clouds • Platform Challenge • Content Challenge • Analytics Challenge • Other Cloud Activities at TUD • Take-Home Message

  6. Cloud Performance Studies • Many-Tasks Scientific Computing • Quantitative definition: J jobs and B bags-of-tasks • Extracted proto-MT users from grid and parallel production environments • Performance Evaluation of Four Commercial Clouds • Amazon EC2, GoGrid, Elastic Hosts, Mosso • Resource acquisition, Single- and Multi-Instance benchmarking • Low compute and networking performance • Clouds vs Other Environments • Order of magnitude better performance needed for clouds • Clouds already good for short-term, deadline-driven scientific computing 1- Iosup et al., Performance Analysis of Cloud Computing Services for Many Tasks Scientific Computing, IEEE TPDS, 2011 (in print) http://www.st.ewi.tudelft.nl/~iosup/cloud-perf10tpds_in-print.pdf 2- Iosup et al., On the Performance Variability of Production Cloud Services, CCGrid 2011, pds.twi.tudelft.nl/reports/2010/PDS-2010-002.pdf

  7. Performance Evaluation of Clouds [1/3]Tools: C-Meter Yigitbasi et al.: C-Meter: A Framework for Performance Analysis of Computing Clouds. Proc. of CCGRID 2009

  8. Performance Evaluation of Clouds [2/3]Low Performance for Sci.Comp. • Evaluated the performance of resources from four production, commercial clouds. • GrenchMark for evaluating the performance of cloud resources • C-Meter for complex workloads • Four production, commercial IaaS clouds: Amazon Elastic Compute Cloud (EC2), Mosso, Elastic Hosts, and GoGrid. • Finding: cloud performance low for sci.comp. S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema, A Performance Analysis of EC2 Cloud Computing Services for Scientific Computing, Cloudcomp 2009, LNICST 34, pp. 115–131, 2010.

  9. Performance Evaluation of Clouds [3/3]Cloud Performance Variability • Long-term performance variability of production cloud services • IaaS: Amazon Web Services • PaaS: Google App Engine • Year-long performance information for nine services • Finding: about half of the cloud services investigated in this work exhibits yearly and daily patterns; impact of performance variability depends on application. Amazon S3: GET US HI operations A. Iosup, N. Yigitbasi, and D. Epema, On the Performance Variability of Production Cloud Services, CCGrid 2011.

  10. Agenda • Introduction • Cloud Performance Studies • The Cloud Workloads Archive • Massivizing Online Social Games using Clouds • Platform Challenge • Content Challenge • Analytics Challenge • Other Cloud Activities at TUD • Take-Home Message

  11. Traces: Sine Qua Non in Comp.Sys.Res. • “My system/method/algorithm is better than yours (on my carefully crafted workload)” • Unrealistic (trivial): Prove that “prioritize jobs from users whose name starts with A” is a good scheduling policy • Realistic? “85% jobs are short”; “10% Writes”; ... • Major problem in Computer Systems research • Workload Trace = recording of real activity from a (real) system, often as a sequence of jobs / requests submitted by users for execution • Main use: compare and cross-validate new job and resource management techniques and algorithms • Major problem: real workload traces from several sources August 26, 2010 11

  12. The Cloud Workloads Archive (CWA)What’s in a Name? CWA = Public collection of cloud/data center workload traces and of tools to process these traces; allows us to: • Compare and cross-validate new job and resource management techniques and algorithms, across various workload traces • Determine which (part of a) trace is most interesting for a specific job and resource management technique or algorithm • Design a general model for data center workloads, and validate it with various real workload traces • Evaluate the generality of a particular workload trace, to determine if results are biased towards a particular trace • Analyze the evolution of workload characteristics across long timescales, both intra- and inter-trace 12

  13. One Format Fits Them All CWJ CWJD CWT CWTD A. Iosup, R. Griffith, A. Konwinski, M. Zaharia, A. Ghodsi, I. Stoica, Data Format for the Cloud Workloads Archive, v.3, 13/07/10 • Flat format • Job and Tasks • Summary (20 unique data fields) and Detail (60 fields) • Categories of information • Shared with GWA, PWA: Time, Disk, Memory, Net • Jobs/Tasks that change resource consumption profile • MapReduce-specific (two-thirds data fields) 13

  14. CWA Contents: Large-Scale Workloads Trace ID System Size J/T/Obs Period Notes CWA-03 CWA-07 CWA-05 CWA-01 Facebook 2 Facebook 4 Facebook eBay 1.1M/-/- ?/?/- 61K/10M/- 5m/2009 3m/02+2010 10d/2009 Full detail 23 Sep 2010 Full detail Time & IO CWA-04 Facebook 3 ?/?/- 10d/01-2010 Full detail CWA-06 CWA-02 CWA-08 Yahoo M Google 2 Twitter 28K/28M/- 20d/2009 Need help! 25 Aug 2010 ~Full detail CWA-09? Google 9K/177K/4M 7h/2009 Coarse,Period • Tools • Convert to CWA format • Analyze and model automatically  Report 14

  15. Trace ID Total IO [MB] Rd. [MB] Wr [%] HDFS Wr[MB] CWA-01 10,934 6,805 38% 1,538 CWA-02 75,546 47,539 37% 8,563 The Cloud Workloads Archive • Looking for invariants • Wr [%] ~40% Total IO, but absolute values vary • # Tasks/Job, ratio M:(M+R) Tasks, vary • Understanding workload evolution

  16. Agenda • Introduction • Cloud Performance Studies • The Cloud Workloads Archive • Massivizing Online Social Games using Clouds • Platform Challenge • Content Challenge • Analytics Challenge • Other Cloud Activities at TUD • Take-Home Message

  17. What’s in a name? MSG, MMOG, MMO, … 250,000,000 active players3BN hours/week world-wide Massively Social Gaming = (online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience • Virtual worldExplore, do, learn, socialize, compete+ • ContentGraphics, maps, puzzles, quests, culture+ • Game dataPlayer stats and relationships Romeo and Juliet

  18. Sources: CNN, Zynga. Source: InsideSocialGames.com FarmVille, a Massively Social Game

  19. Sources: MMOGChart, own research. Sources: ESA, MPAA, RIAA. MSGs are a Popular, Growing Market • 25,000,000 subscribed players (from 250,000,000+ active) • Over 10,000 MSGs in operation • Subscription market size $7.5B+/year, Zynga $600M+/year

  20. Massivizing Games using Clouds • (Platform Challenge) • Build MSG platform that uses (mostly) cloud resources • Close to players • No upfront costs, no maintenance • Compute platforms: multi-cores, GPUs, clusters, all-in-one! Nae, Iosup, Prodan, Dynamic Resource Provisioning in Massively Multiplayer Online Games, IEEE TPDS, 2011. • (Content Challenge) • Produce and distribute content for 1BN people • Game Analytics  Game statistics • Auto-generated game content Iosup, POGGI: Puzzle-Based Online Games on Grid Infrastructures, EuroPar 2009 (Best Paper Award) • (Analytics Challenge) Build cloud-based layer to Improve gaming experience • Game Analytics Ranking / Rating • Game Analytics Matchmaking / Recommendations Iosup, Lascateu, Tapus. CAMEO: social networks for MMOGs through continuous analytics and cloud computing, ACM NetGames 2010.

  21. Cloudifying: PaaS for MSGs (Platform Challenge) Build MSG platform that uses (mostly) cloud resources • Close to players • No upfront costs, no maintenance • Compute platforms: multi-cores, GPUs, clusters, all-in-one! • Performance guarantees • Code for various compute platforms—platform profiling • Misprediction=$$$ • What services? • Vendor lock-in? • My data Nae, Iosup, Prodan, Dynamic Resource Provisioning in Massively Multiplayer Online Games, IEEE TPDS, 2011.

  22. Proposed hosting model: dynamic • Using data centers for dynamic resource allocation Massive join Massive leave Massive join • Main advantages: • Significantly lower over-provisioning • Efficient coverage of the world is possible [Source: Nae, Iosup, and Prodan, ACM SC 2008]

  23. Staticvs.DynamicAllocation Q:What is the penalty for static vs. dynamic allocation? 250% 25% [Source: Nae, Iosup, and Prodan, ACM SC 2008]

  24. Cloudifying: Content, Content, Content (Content Challenge) Produce and distribute content for 1BN people • Game Analytics  Game statistic • Crowdsourcing • Storification • Auto-generated game content • Adaptive game content • Content distribution/Streaming content A. Iosup, POGGI: Puzzle-Based Online Games on Grid Infrastructures, EuroPar 2009 (Best Paper Award)

  25. Derived Content NewsGen, Storification (Procedural) Game Content (Generation) Hendricks, Meijer, vd Velden, Iosup, Procedural Game Content Generation: A Survey, Working Paper, 2010 Game Design Rules, Mechanics, … Game Scenarios Puzzle, Quest/Story, … Game Systems Eco, Road Nets, Urban Envs, … Game Space Height Maps, Bodies of Water, Placement Maps, … Game Bits Texture, Sound, Vegetation, Buildings, Behavior, Fire/Water/Stone/Clouds

  26. The New Content Generation Process* Only the puzzle concept, and the instance generation and solving algorithms, are produced at development time * A. Iosup, POGGI: Puzzle-Based Online Games on Grid Infrastructures, EuroPar 2009 (Best Paper Award)

  27. Puzzle-Specific ConsiderationsGenerating Player-Customized Content 4 Puzzle difficulty • Solution size • Solution alternatives • Variation of moves • Skill moves Player ability • Keep population statistics and generate enough content for most likely cases • Match player ability with puzzle difficulty • Take into account puzzle freshness 21

  28. Cloudifying: Social Everything! • Social Network=undirected graph, relationship=edge • Community=sub-graph, density of edges between its nodes higher than density of edges outside sub-graph (Analytics Challenge) Build cloud-based layer to Improve gaming experience • Ranking / Rating • Matchmaking / Recommendations • Play Style/Tutoring Organize Gaming Communities • Player Behavior A. Iosup, CAMEO: Continuous Analytics for Massively Multiplayer Online Games on Cloud Resources. ROIA, Euro-Par 2009 Workshops.

  29. Continuous Analytics for MMOGs MMOG Data = raw and derivative information from the virtual world (millions of users) Continuous Analytics for MMOGs = Analysis of MMOG data s.t. important events are not lost • Data collection • Data storage • Data analysis • Data presentation • … at MMOG rate and scale

  30. Continuous Analysis for MMOGsMain Uses By and For Gamers • Support player communities • Understand play patterns(decide future investments) • Prevent and detect cheating or disastrous game exploits (think MMOG economy reset) • Broadcasting of gaming events • Data for advertisement companies(new revenue stream for MMOGs)

  31. The CAMEO Framework* • Address community needs • Can analyze skill level, experience points, rank • Can assess community size dynamically • Using on-demand technology: Cloud Comp. • Dynamic cloud resource allocation, Elastic IP • Data management and storage: Cloud Comp. • Crawl + Store data in the cloud (best performance) • Performance, scalability, robustness: Cloud Comp. * A. Iosup, CAMEO: Continuous Analytics for Massively Multiplayer Online Games on Cloud Resources. ROIA, Euro-Par 2009 Workshops, LNCS 6043, (2010)

  32. CAMEO: Cloud Resource Management Dynamic Analytics Steady Analytics Unexpected Periodic Burst • Snapshot = dataset for a set of players • More machines = more snapshots per time unit

  33. CAMEO: Exploiting Cloud Features • Machines close(r) to server • Traffic dominatedby small packets(latency) • Elastic IP to avoid traffic bans(legalese: acting on behalf of real people) A. Iosup, A. Lascateu, N. Tapus, CAMEO: Enabling Social Networks for Massively Multiplayer Online Games through Continuous Analytics and Cloud Computing, ACM NetGames 2010.

  34. Sample Game Analytics ResultsSkill Level Distribution in RuneScape • RuneScape: 135M+ open accounts (world record) • Dataset: 3M players (largest measurement, to date) • 1,817,211 over level 100 • Max skill 2,280 • Number of mid- and high-level players is significantNew Content Generation Challenge MidLevel HighLevel

  35. Cost of Continuous RuneScape Analytics • Put a price on MMOG analytics (here, $425/month, or less than $0.00015/user/month) • Trade-off accuracy vs. cost, runtime is constant

  36. Cloud Scheduling A Provisioning-and-Allocation problem Many other possibilities Manage Queue Queue Application Job We’re just started working on this problem Provision Before experiment During experiment Allocate When needed

  37. Take Home Message: TUD Research in Clouds • Understanding how real clouds work (focus on data-intensive) • Modeling cloud infrastructure (performance, availability) and workloads • Compare clouds with other platforms (grids, parallel production env., p2p,…) • The Cloud Workloads Archive: easy to share cloud workload traces and research associated with them • Complement the Grid Workloads Archive • Scheduling: making clouds work • eScience and gaming applications(cloud application architectures) • MapReduce • Massive Gaming: services on clouds • CAMEO: Massive Game Analytics • Toolkit for Online Social Network analysis • POGGI: game content generation at scale Publications2008: ACM SC2009: ROIA, CCGrid, NetGames, EuroPar (Best Paper Award) 2010: IEEE TPDS, Elsevier CCPE,…2011: ICPE, CCGrid, Book Chapter CAMEO+Clouds, IEEE TPDS, IJAMC, … Graduation (Forecast)2011-2014: 2+3PhD, 10+MSc, nBSc

  38. Thank you for your attention! Questions? Suggestions? Observations? More Info: Alexandru IosupA.Iosup@tudelft.nlhttp://www.pds.ewi.tudelft.nl/~iosup/ (or google “iosup”)Parallel and Distributed Systems GroupDelft University of Technology • http://www.st.ewi.tudelft.nl/~iosup/research.html • http://www.st.ewi.tudelft.nl/~iosup/research_gaming.html • http://www.st.ewi.tudelft.nl/~iosup/research_cloud.html Do not hesitate to contact me…

More Related