1 / 7

An Overview of Map-Reduce Research

An Overview of Map-Reduce Research. Main Themes. Designing Efficient Algorithms on Map-Reduce Extensions on Map-Reduce Modeling Map-Reduce Computation. Limitations. Selective Access To Data High Communication Cost Redundant and Wasteful Processing

bchristina
Download Presentation

An Overview of Map-Reduce Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of Map-Reduce Research

  2. Main Themes • Designing Efficient Algorithms on Map-Reduce • Extensions on Map-Reduce • Modeling Map-Reduce Computation

  3. Limitations • Selective Access To Data • High Communication Cost • Redundant and Wasteful Processing • Lack of Early Termination • Lack of Iteration • Quick Retrieval of Approximate Results • Load Balancing • Lack of Real-time and Interactive Processing • Lack of Support for n-way Operations

  4. Interactive Processing Streaming Pipelining In-Memory Processing Pre-computation Dremel, Tenzing, BlinkDB M3R, Shark Data Access Indexing Partitioning Co-location, Data Layout Co-Hadoop(*), Hadoop++, HAIL, LlAH, Llama, Cheetah Avoidance of Redundant Processing Batch Processing of Queries Result Materialization Incremental Processing Result Sharing ReStore, InCoop, MRShare Processing n-way Operations Spatial / Temporal Joins Additional MR Phase Redistribution of Keys Record Duplication Controlled-Replicate(*), RCCIS(*) Iterative Processing Looping, Caching Pipelining, Recursion Incremental Processing HaLoop, ReDoop, InCoop Extensions On Map-Reduce Query Optimization Parameter Tuning, Plan Refinement Operator Reordering, Code Analysis Data Flow Optimization HadoopDB, Clydesdale, Starfish, AQUA, Adaptive-MR(*) Processing Industry Specific Data Spatio - Temporal Data Geo-Spatial Data Agriculture / Oil & Gas / Energy BLAST(*), Spatial-Hadoop, Hadoop-GIS Fair Work Allocation Batching, Sampling, Re-partitioning Skew-Tune, Skew-Reduce, Themis Early Termination Sorting , Sampling EARL, RanKloud (*) – Contributed by IBM

  5. Designing Efficient Algorithms on Map-Reduce • Joins • Multi-way Joins • Similarity Joins • Theta Joins • Spatial Joins • Interval Joins • Entity Resolution • Graph Algorithms • Machine Learning • Computational Geometry

  6. Modeling Computation on Map-Reduce • Two main cost components • Time spent in communication from map tasks to reduce tasks • Time spent in computation as part of reduce tasks • These two components involve a trade-off • Given - an analytics problem, the input-data and the number of reduce tasks • What is the minimum communication cost, a map-reduce algorithm for the given analytics and the corresponding input-data is going to incur?

  7. Survey References • A Survey on Large-Scale Analytical Query Processing in Map-Reduce • Christos Doulkeridis and Kjetil Norwag • In VLDB Journal, 23(3), 2014 • Distributed Data Management on Map-Reduce • Feng Li, Beng Chin Ooi, M. Tamer. Ojsu and Sai Wu • In ACM Computing Survey, 46(3), 2014

More Related