1 / 7

Low Latency Computations on Massive Data

Low Latency Computations on Massive Data. Ion Stoica CS Division, UC Berkeley. Fujitsu Symposium Mountain View, June 5, 2013. UC BERKELEY. Challenges. Data grows faster than Moore’s law* Data is dirty uncurated , no schema, no consistent syntax and sematics Complex questions, e.g.,

laurel
Download Presentation

Low Latency Computations on Massive Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Low Latency Computations on Massive Data Ion Stoica CS Division, UC Berkeley Fujitsu Symposium Mountain View, June 5, 2013 UC BERKELEY

  2. Challenges • Data grows faster than Moore’s law* • Data is dirty • uncurated, no schema, no consistent syntax and sematics • Complex questions, e.g., • Is there a virus outbreak? • Is the building structurally safe? *[IDC report, Kathy Yelick, LBNL]

  3. Low Latency & Massive Data • May not be able to achieve both of them! • Even if all data in memory, computation may take tens of seconds

  4. Key Insight Answers don’t always need to be exact • Input often noisy:exact computations do not guarantee exact answers • Error often acceptable if small and bounded Best scale ± 0.5lb error Speedometers ± 2.5 % error (edmunds.com) OmniPod Insulin Pump ± 0.96 % error (www.ncbi.nlm.nih.gov/pubmed/22226273)

  5. Error-bounded Computations • Error depends on sample size (S) not on original data size: • error ~ • E.g., error of a poll on 1,000 people is “same” for a population of 1M or 100M people New generation of scale-independent algorithms

  6. What Does It Mean? • Can trade between answer’s latency and accuracy • Data rapid increase no longer a problem…

  7. What Does It Mean? • Can trade between answer’s latency and accuracy • Data rapid increase no longer a problem… Moore’s Law  error halves every two years

More Related