1 / 25

phi

phi. public health for the internet joe hellerstein intel research & uc berkeley. agenda. three visions driving j building block: the PIER query engine challenges, synergies. vision 1: shift network security from medicine to public health. security tools focused on “medicine”

liam
Download Presentation

phi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. phi public health for the internet joe hellerstein intel research & uc berkeley

  2. agenda • three visions driving j • building block: the PIER query engine • challenges, synergies

  3. vision 1: shift network security from medicine to public health • security tools focused on “medicine” • vaccines for viruses • improving the world one patient at a time • weakness/opportunity in the “public health” arena • public health: population-focused, community-oriented • epidemiology: incidence, distribution, and control in a population • j: a new approach • enable population-wide measurement • engage end users: education and prevention • understand risky behaviors, at-risk populations.

  4. a center for disease control? • [staniford/paxson/weaver 2002] • am I being targeted? • is this remote host a “bad guy”? • is there a new type of activity? • is there global-scale activity • who owns the center? what do they control? • this will be unpopular at best • electronic privacy for individuals • the internet as “a broadly surveilled police state”? • dan geer, former cto of @Stake • provider disincentives • Transparency = maintenance cost • and hardly ubiquitous • can monitor the chokepoints (isp’s) • but inside intranets?? • e.g. corporate IT • e.g. berkeley dorms • e.g. grassroots WiFi agglomerations?

  5. energizing the end-users • endpoints are ubiquitous • internet, intranet, hotspot • toward a uniform architecture • end-users will help • populist appeal to home users is timely • enterprise IT can dictate endpoint software • differentiating incentives for endpoint vendors • the connection: peer-to-peer technology • harnessed to the good! • ease of use • built-in scaling • decentralization of trust and liability p2p technology is ripe. a noble app here with significant uptake?

  6. demo time

  7. vision 2: shared network monitoring • endpoint monitoring becoming a trend • NETI@Home (GA Tech) • DIMES (TAU) • ForNet (Polytechnic) • DShield • DOMINO (Wisconsin) • we share the vision! • but all facing key challenges in getting uptake • what’s in it for the community members? • disincentives: privacy & security risks

  8. a communal approach • enable multiple efforts with a single distributed infrastructure • extensible endpoint “sensors” and visualizations • shared engine connecting them up • a group bands together on the hard systems and crypto • cost-effective data processing and analysis • verifiable data and processing • distributed resource limiting • toolkit of privacy-preserving, distributed dataflow components • a theme: dissemination is as important as collection • attract end-users with visible community information • enable real-time swapping across research teams • there may be much more here (see next vision!) • intel research is prepared to invest in this community • as we did with planetlab

  9. vision 3: the network oracle • imagine that you knew everything about the internet, at every moment • network maps • link loading • point-to-point latency and bandwidth • event detections (e.g., from firewalls) • naming (DNS, ASes, etc.), • end-system software configuration information • router configurations and routing tables • how would this change things? • the design of protocols • the design of networked applications • network and system management (performance and security) • the economy (and policy) of nw clients and isp’s • etc.

  10. a dirty (not-so) secret • we’re sneaking up on the oracle already • overlays are a subversive attempt to wrest control from ISPs • overlays compute and disseminate measurements • measurement and functionality appetite growing • everybody’s favorite planetlab exercise: all-pairs ping • detour routing a la RON • custom routing a la i3/ROSE • but this is not being done systematically • every overlay does its own thing, opaquely • granularity of aggregation in time and space not well explored • measurement & dissemination often 2ndary/implicit • algorithmic/architectural choices abound, little exploration • and the brass ring remains…

  11. wrapping up: 3 visions • multiple rationales to pursue this agenda • commonalities • many networked sensors • many computational agents for data processing • many destinations for result dissemination • decentralized infrastructure: • organic scaling • no centralized maintenance • no single unified repository of raw data (privacy ramifications) • differences (invariably!) • desired data granularities, in time and space • “reach” of querying and dissemination • sensitivity to privacy issues • goal: a shared infrastructure • shared effort to develop and extend it, seeded by intel research • shared bootstrap deployment (planetlab and beyond)

  12. agenda • three visions driving j • building block: the PIER query engine • challenges, synergies

  13. pier: p2p information exchange & retrieval • a wide-area distributed dataflow engine • designed to scale to thousands or millions of nodes • outfitted with “streaming” relational operators, recursive graph queries • fully extensible dataflow graphs, SQL-like interface for convenience • built on distributed hash table (DHT) overlays • a put()/get() hashtable interface for the Internet. • content-based routing, soft-state semantics • pier is DHT-agnostic (CAN  chord  bamboo) • a very different design point than DB2, Oracle, etc. • scale = # machines, not necessarily # bytes • relaxed consistency a requirement (not really a dataBASE at all) • organic scaling • data lives in its natural habitat

  14. initial pier applications • φintrusion app • real-time snort aggregation from ~300 planetlab nodes • identification of top-10 attackers (validating DOMINO) • real time joins: “who are my attackers attacking” • plausible end-user visualizations • transitive closures and other graph algorithms • distributed gnutella crawler • distributed web crawler • shortest paths queries (distance vector routing) • improved filesharing for rare items • deployed as hybrid gnutella ultrapeer on 50 planetlab nodes • intercepts gnutella queries, identifies “rare items and publishes” • 18% decrease in number of unnecessarily empty query results • 66% possible with better “rare item” identification • upshot: reasons to believe the generality is real

  15. pier in the j context • goal is for pier to serve as an information plane • gather data from “sensors” • perform basic filtering, aggregation, combination • though aggregation can be rather fancy (e.g. wavelet encoding) • disseminate the right “cooked” data to the right people • and do so in a “trusted” way • privacy and security • manageability • but … only a piece of the puzzle • active probing • mapping • backbone monitors • network forensics, tomography • honeypots • etc. • we won’t do all of this ourselves! • gathering playmates

  16. agenda • three visions driving j • building block: the PIER query engine • challenges, synergies

  17. Declarative Queries Security Privacy Quality of Service GeneralChallenges Overlay Network Query Plan Query Optimization Multi-Query Optimization Catalogs Persistent Storage Recursion on graphs Physical Network Query Dissemination Replication Soft-State Quality of Service Net-Embedded functions Resilience Route Flapping Efficiency Challenges

  18. current limitations of pier • query per client • no systematic sharing of computation/results across queries • locality control forfeited to dht • difficult to express local gossiping rules • queries, not triggers • alerts currently supported via polling • loose query semantics • network dynamics and timing make guarantees hard • active monitoring • we can do it, but it’s not systematic • security/privacy • we’re attacking many of these now

  19. so, is pier the “right” infrastructure • not today • though many of the decisions seem sound • level of indirection between task specification and execution • non-hierarchical model provides flexibility and simplicity • vs. domain hierarchy (a la ip naming) • vs. data hierarchies (a la xml) • extensible aggregation + relational operators covers a lot of territory • monitoring • routing

  20. potential synergies • design of shared info plane • scenarios & requirements • architectural brickbats • built-in components • complementary components • and requirements for integration • understanding the opportunity • what if the network oracle existed • fostering the community • leveraging each other’s efforts to get mindshare • resources • if the intel genie granted you a wish… • (think about building/leveraging community)

  21. backup slides

  22. A Note on Structured Data on Networks • Industrial Revolution for Information • Mechanized data generation • Sensing the physical world • Monitoring software, networks, machines • Tracking objects, processes, behaviors • Uniformity of products • Mass Transport of Data and Computation • Data generators and consumers spread over the Internet and the Planet • Happening at both extremes • Compare to hand-generation of text

More Related