1 / 29

CS 294-42: Project Suggestions

CS 294-42: Project Suggestions. September 14, 2011. Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/). Projects. This is a project oriented class Reading papers should be means to a great project not a goal in itself! Strongly prefer groups of two

pmangino
Download Presentation

CS 294-42: Project Suggestions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 294-42: Project Suggestions September 14, 2011 Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/)

  2. Projects • This is a project oriented class • Reading papers should be means to a great project not a goal in itself! • Strongly prefer groups of two • Perfectly fine to have the same project at cs262 • Today, I’ll present some suggestions • But, you are free to come up with your own proposal • Main goal: just do a great project

  3. Where I’m Coming From? • Key challenge: maximize economic value of data, i.e., • Extract value from data while reducing costs (e.g., storage, computation)

  4. Where I’m Coming From? • Tools to extract value from big-data • Scalability • Response time • Accuracy • Provide high cluster utilization for heterogeneous workloads • Support diverse SLAs • Predictable performance • Isolation • Consistency

  5. Caveats • Cloud computing is HOT, but lot of NOISE! • Not easy to • differentiate between narrow engineering solutions and fundamental tradeoffs • predict the importance of the problem you solve • Cloud computing it’s akin Gold Rush!

  6. Background: Mesos • Rapid innovation in cloud computing • No single framework optimal for all applications • Running each framework on its dedicated cluster • Expensive • Hard to share data Dryad Cassandra Hypertable Pregel Need to run multiple frameworks on same cluster

  7. Background: Mesos – Where We Want to Go uniprogramming multiprogramming Today: static partitioning Mesos: dynamic sharing Hadoop Pregel Shared cluster MPI

  8. Background: Mesos – Solution • Mesos is a common resource sharing layer over which diverse frameworks can run Hadoop MPI Mesos … Node Node Node Node

  9. Background: Workload in Datacenters Priority Response High Low Interactive (low-latency) Batch

  10. Datacenter OS: Resource Management, Scheduling

  11. Hierarchical Scheduler (for Mesos) • Allow administrators to organize into groups • Provide resource guarantees per group • Share available resources (fairly) across groups • Research questions • Abstraction (when using multiple resources)? • How to implement using resource offers? • What policies are compatible at different levels in the hierarchy?

  12. Cross Application Resource Management • An app uses many services (e.g., file systems, key-value storage, databases, etc) • If an app has high priority and the service it uses doesn’t, the app SLA (Service Level Agreement) might be violated • Research questions • Abstraction, e.g., resource delegation, priority propagation? • Clean-slate mechanisms vs. incremental deployability • This is also highly challenging in single node OSes!

  13. Resource Management using VMs • Most cluster resource managers use Linux containers (e.g., Mesos) • Thus, schedulers assume no task migration • Research questions: • Develop scheduler for VM environments (e.g., extend DRF) • Tradeoffs between migration, delay, and preemption

  14. Task Granularity Selection (Yanpei Chen) • Problem: number of tasks per stage in today’s MapRed apps (highly) sub-optimal • Research question: • Derive algorithms to pick the number of tasks to optimize various performance metrics, e.g., • utilization, response time, network traffic • subject to various constraints, e.g., • capacity, network

  15. Resource Revocation • Which task we should revoke/preempt? • Two questions • Which slot has least impact on the giving framework? • Is the slot acceptable to receiving framework? • Research questions • Identify feasible slot for receiving framework with least impact on giving framework • Light-weight protocol design

  16. Control Plane Consistency Model • What type of consistency is “good-enough” for various control plane functions • File system metadata (Hadoop) • Routing (Nicira) • Scheduling • Coordinated caching • … • Research question • What are trade-off between performance and consistency? • Develop generic framework for control plane

  17. Decentralized vs. Centralized Scheduling • Decentralized schedulers • E.g., Mesos, Hadoop 2.0 • Delegate decision to apps (i.e., frameworks, jobs) • Advantages: scale and separation of concerns (i.e., apps know the best where and which tasks to run) • Centralized schedulers • Knows all app requirements • Advantages: optimal • Research challenge: • Evaluate centralized vs. decentralized schedulers • Characterize class of workloads for which decentralized scheduler is good enough

  18. Opportunistic Scheduling • Goal: schedule interactive jobs (e.g., <100ms latency) • Existing schedulers: high overhead (e.g., Mesos needs to decide on every offer) • Research challenge: • Tradeoff between utilization and response time • Evaluate hybrid approach

  19. Background: Dominant Resource Fairness • Implement fair (proportional) allocation for multiple types of resources • Key properties • Strategy proof: users cannot get an advantage by lying about their demands • Sharing incentives: users are incentivized to share a cluster rather than partitioning it

  20. DRF for Non-linear Resources/Demands • DRF assume resources & demands are additive • E.g., task 1 needs (1CPU, 1GB) and task 2 needs (1CPU, 3GB)  both tasks need (2CPU, 4GB) • Sometime demands are non-linear • E.g., shared memory • Sometime resources are non-linear • E.g., disk throughput, caches • Research challenge: • DRF-like scheduler for non-linear resources & demands (could be two projects here!)

  21. DRF for OSes • DRF designed for clusters using resource offer mechanism • Redesign DRF to support multi-core OSes • Research questions: • Is resource offer best abstraction? • How to best leverage preemption? (in Mesos tasks are not preempted by default) • How to support gang scheduling?

  22. Storage & Data Processing

  23. Resource Isolation for Storage Services • Share storage (e.g., key-value store) between • Frontend, e.g., web services • Backend, e.g., analytics on freshest data • Research challenge • Isolation mechanism: protect front-end performance from back-end workload

  24. “Quicksilver” DB • Goal: interactive queries with bounded error on “unbounded” data • Trade between efficiency and accuracy • Query response time target: < 100ms • Approach: random pre-sampling across different dimensions (columns) • Research question: given a query and an error bound, find • Smallest sample to compute result • Sample minimizing disk (or memory) access times • (Talk with Sameer, if interested)

  25. Split-Privacy DB (1/2) result fprivate fpublic Public DB Private DB • Partition data & computation • Private • Public (stored on cloud) • Goal: use cloud without revealing the computation result • Example: • Operation f(x, y) = x + y, where • x: private • y: public • Pick random number a, and compute x’ = x + a • compute f(x’, y) = r’ = x’ + y • recover result: r = r’ – a = (x’ – a) + y = x + y

  26. Split-Privacy DB (2/2) result fprivate fpublic Public DB Private DB • Partition data & computation • Private • Public (stored on cloud) • Example: patient data (private), public clinical and genomics data sets • Goal: use cloud without revealing the computation result • Research questions: • What types of computation can be implemented? • Any more powerful than privacy-preserving computation / Data Mining?

  27. RDDs as an OS Abstraction • Resilient Data Sets (RDDs) • Fault-tolerant (in-memory) parallel data structures • Allows Spark apps to efficiently reuse data • Design cross-application RDDs • Research questions • RDD reconstruction (track software and platform changes) • Enable users to share intermediate results of queries (identify when two apps compute same RDD) • RDD cluster-wide caching

  28. Provenance-based Efficient Storage (Peter B and Patrick W) • Reduce storage by deleting data that can be recreated • Generalization of previous project • Research challenges: • Identify data that can deterministically recreated and the code to do so • Use hints? • Tradeoff between re-creation and storage • May take into account access patter, frequency, performance

  29. Very-low Latency Streaming • Challenge: straglers, failures • Approaches to reduce latency: • Redundant computations • Speculative execution • Research questions • Theoretical trade-off between response time and accuracy? • Achieve target latency and accuracy, while minimizing the overhead

More Related