80 likes | 210 Views
PIER & PHI Overview of Challenges & Opportunities. Ryan Huebsch † Joe Hellerstein † ° , Boon Thau Loo † , Sam Mardanbeigi † , Scott Shenker †‡ , Ion Stoica † p2p@db.cs.berkeley.edu † UC Berkeley, CS Division ‡ International Computer Science Institute, Berkeley CA ° Intel Research Berkeley.
E N D
PIER & PHIOverview of Challenges & Opportunities Ryan Huebsch† Joe Hellerstein† °, Boon Thau Loo†,Sam Mardanbeigi†, Scott Shenker†‡, Ion Stoica† p2p@db.cs.berkeley.edu †UC Berkeley, CS Division ‡International Computer Science Institute, Berkeley CA°Intel Research Berkeley STREAM DAY 5/7/04
PIER • P2P Information Exchange & Retrieval • A wide-area distributed dataflow engine • Outfitted with relational operators • Designed to scale to thousands or millions of nodes • Motivation: • It’s an interesting challenge • Lowers the barrier of entry for large-scale applications • No massive infrastructure for server farms • Cost is distributed among participants • Provide a viable solution where other options are not socially acceptable • We are NOT trying be better than other (centralized) solutions, we are trying to be different.
Declarative Queries Security Privacy Quality of Service GeneralChallenges Query Plan Overlay Network Query Optimization Multi-Query Optimization Catalogs Persistent Storage Recursion Physical Network Query Dissemination Replication Soft-State Quality of Service Resilience Route Flapping Efficiency Challenges
Applications & Requirements • File sharing • Flooding works for popular items • Need something better for rare items • May want ‘triggers’ when a new item matches an old search • Network Monitoring • Aggregation & grouping very common • Continuous queries with well defined semantics • PHI is one use of PIER…
PHI • Public Health for the Internet • Community-based monitoring • The metaphor: • Old way – Treat computers with medicine • Virus protection • New way – Monitor the community • Like the Center for Disease Control • Global CDC has social implications • Central repository, privacy, who controls it, who pays for it… • PHI wants to create the Center for Disease Control without the Center (of control) • Motivation is to inform users about the dangers of the Internet
PHI Example • PIER is currently deployed on 150-300 PlanetLab nodes. • ~100 sites • Some nodes on DSL,1Mbps, 10 Mbps, etc. • Very unreliable • SNORT is the primary data source • ~2400 rules • 10’s - 1000’s of tuples per day per node • Schema: time, rule, source socket, destination socket • Quick Demo: • Shows the top ten sources of events across all of PlanetLab (live), i.e. who are the bad guys?
What’s next… • PIER • Lots of problems, including the meta-problem of what problem to work on • No streaming semantics, no language to describe windows, etc… • Additional challenges: Interaction with soft-state, no synchronized clocks, unknown (changing) network latencies • PHI • Create a complete application • Gets intrusion data from a variety of sources (including the built-in Windows Firewall • Develop a snazzy visualization • Release to the world, first using PlanetLab as the query processor, eventually the world • Scale to at least 10,000’s nodes and explore the design space