70 likes | 171 Views
Low Latency Computations on Massive Data. Ion Stoica CS Division, UC Berkeley. Fujitsu Symposium Mountain View, June 5, 2013. UC BERKELEY. Challenges. Data grows faster than Moore’s law* Data is dirty uncurated , no schema, no consistent syntax and sematics Complex questions, e.g.,
E N D
Low Latency Computations on Massive Data Ion Stoica CS Division, UC Berkeley Fujitsu Symposium Mountain View, June 5, 2013 UC BERKELEY
Challenges • Data grows faster than Moore’s law* • Data is dirty • uncurated, no schema, no consistent syntax and sematics • Complex questions, e.g., • Is there a virus outbreak? • Is the building structurally safe? *[IDC report, Kathy Yelick, LBNL]
Low Latency & Massive Data • May not be able to achieve both of them! • Even if all data in memory, computation may take tens of seconds
Key Insight Answers don’t always need to be exact • Input often noisy:exact computations do not guarantee exact answers • Error often acceptable if small and bounded Best scale ± 0.5lb error Speedometers ± 2.5 % error (edmunds.com) OmniPod Insulin Pump ± 0.96 % error (www.ncbi.nlm.nih.gov/pubmed/22226273)
Error-bounded Computations • Error depends on sample size (S) not on original data size: • error ~ • E.g., error of a poll on 1,000 people is “same” for a population of 1M or 100M people New generation of scale-independent algorithms
What Does It Mean? • Can trade between answer’s latency and accuracy • Data rapid increase no longer a problem…
What Does It Mean? • Can trade between answer’s latency and accuracy • Data rapid increase no longer a problem… Moore’s Law error halves every two years