1 / 10

Information Capture and Re-Use

Information Capture and Re-Use. Joe Hellerstein. Scenario. Ubiquitous computing is more than clients! sensors and their data feeds are key smart dust (MEMS sensors) biomedical monitoring devices (MEMS sensors) every item of value records its use/misuse (disposable computing)

sawyer
Download Presentation

Information Capture and Re-Use

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Capture and Re-Use Joe Hellerstein

  2. Scenario • Ubiquitous computing is more than clients! • sensors and their data feeds are key • smart dust (MEMS sensors) • biomedical monitoring devices (MEMS sensors) • every item of value records its use/misuse (disposable computing) • tacit information from human behavior • video from surveillance cameras, broadcasts, etc.

  3. There’s a Data Flood Coming

  4. There’s a Data Flood Coming • What does it look like? • Never ends: interactivity required • Big: data reduction/aggregation is key • Unpredictable: this scale of devices and nets will not behave nicely • Key Technologies: • CONTROL: • early answers and interactivity • online aggregation for data reduction • River/Eddy: • massively parallel, adaptive dataflow

  5. CONTROLContinuous Output and Navigation Technology with Refinement On Line • Data-intensive jobs are long-running. How to give early answers and interactivity? • Statistical estimators, and their performance implications • online query processing algs: ripple joins • online interactivity over feeds: data “juggle” • Appreciate interplay of massive data processing, stats, and UIs • Challenges: apply to sequence data, scale up

  6. Q River • We built the world’s fastest sorting machine • On the “NOW”: 100 Sun workstations + SAN • But it only beat the record under ideal conditions! • River: performance adaptivity for data flows on clusters • simplifies management and programming • perfect for sensor-based streams • Challenges: deploy over a wide area

  7. Q Eddy • How to order and reorder operators over time • key complement to River: adapt not only to the hardware, but to the processing rates • Challenges: scale up, consider parallel scheduling

  8. Telegraph: Putting it Together • Want to build next-gen global DB system. Capture and Re-Use Embodied in a vertical solution. • Marriage of: • CONTROL, River & Eddy • OceanStore + optionally-Xactional storage that handle new hardware realities, scale • Federation in the wide area via Negotiation/Economics • Combinations of browse/query/mine at UI • no magic bullet there! CONTROL is key.

  9. Integration with other options • Integration • Use Oceanic Data Utility for distribution, caching, protection of streams • Use negotiation architectures to connect federated and stored streams • Be data-intensive backbone to diverse clients • Be a scalable platform for tacit knowledge extraction • Cooperation • Tacit information as a feed • Capture/merge classroom feeds • Use UI design tools for device-independent, interactive stream-based apps

  10. Plan for Success • One Year • Implement River/Eddy over parallel cluster, deploy CONTROL modules • Deploy data analysis apps over sequence data (MEMS/Web/Video) • Three Year • Integrate w/ wide area storage & processing • Get data-intensive Endeavour apps running on architecture (e.g. tacit knowledge mining) • Develop UI tools for interacting with never-ending streams

More Related