IFLOW: Self-managing distributed information flows

IFLOW: Self-managing distributed information flows Brian Cooper Yahoo! Research Joint work with colleagues at Georgia Tech: Vibhore Kumar, Zhongtang Cai, Sangeetha Seshadri, Greg Eisenhauer, Karsten Schwan and others

Overview • Motivation • Case study: inTransit • Architecture • Flow graph deployment/reconfiguration • Experiments • Other aspects of the system

Motivation • Lots of data produced in lots of places • Examples: operational information systems, scientific collaborations, end-user systems, web traffic data

Airline example Shop for flights Rebook missed connections Check seats Flights arriving Concourse display Flights departing Bags scanned Gate display Customers check-in Baggage display Weather updates Catering updates Home user display FAA updates

Previous solutions • Tools for managing distributed updates • Pub/sub middlewares • Transaction Processing Facilities • In-house solutions • Times have changed • How to handle larger data volumes? • How to seamlessly incorporate new functionality? • How to effectively prioritize service? • How to avoid hand-tuning the system?

Approach • Provide a self-managing distributed data flow graph Select ATL data Weather data Terminal or web Predict delays Flight data Generate customer messages Correlate flights and reservations Check-in data

Approach • Deploy operators in a network overlay • Middleware should self-manage this deployment • Provide necessary performance, availability • Respond to business-level needs

IFLOW X-Window Client AirlineFlowGraph { Sources ->{FLIGHTS, WEATHER, COUNTERS} Sinks ->{DISPLAY} Flow-Operators ->{JOIN-1, JOIN-2} Edges ->{(FLIGHTS, JOIN-1), (WEATHER, JOIN-1), (JOIN-1, JOIN-2), (COUNTERS, JOIN-2), (JOIN-2, DISPLAY)} Utility ->[Customer-Priority, Low Bandwidth Utilization] } CollaborationFlowGraph { Sources ->{Experiment} Sinks ->{IPaq, X-Window, Immersadesk} Flow-Operators ->{Coord, DistBond, RadDist, CoordBond} Edges ->{(Experiment, Coord), (Coord, DistBond), (DistBond, RadDist), (DistBond, RadDist), (RadDist, IPaq), (CoordBond, ImmersaDesk), (CoordBond, X-Window)} Utility ->[Low-Delay, Synchronized-Delivery] } Coordinates Coordinates +Bonds Calculates Distance and Bonds FLIGHTS Radial Distance OVERHEAD-DISPLAY WEATHER Molecular Dynamics Experiment ImmersaDesk IPaq Client IFLOW middleware COUNTERS [ICAC ’06]

Case study • inTransit • Query processing over distributed event streams • Operators are streaming versions of relational operators

Architecture Flow-graph control PDS IFLOW inTransit Distributed Stream Management Infrastructure Query? Application layer Data-flow parser Middleware layer ECho pub-sub Stones Messaging Underlay layer [ICDCS ’05]

Application layer • Applications specify data flow graphs • Can specify directly • Can use SQL-like declarative language STREAM N1.FLIGHTS.TIME, N7.COUNTERS.WAITLISTED, N2.WEATHER.TEMP FROM N1.FLIGHTS, N7.COUNTERS, N2.WEATHER WHEN N1.FLIGHTS.NUMBER=’DL207’ AND N7.COUNTERS.FLIGHT_NUMBER= N1.FLIGHTS.NUMBER AND N2.WEATHER.LOCATION=N1.FLIGHTS.DESTINATION; ⋈ N1 ⋈ ‘DL207’ N2 N10 N7

Middleware layer • ECho – pub/sub event delivery • Event channels for data streams • Native operators • E-code for most operators • Library functions for special cases • Stones – operator containers • Queues and actions Channel 1 ⋈ Channel 3 Channel 2

Middleware layer CPU • PDS – resource monitoring • Nodes update PDS with resource info • inTransit notified when conditions change CPU CPU? CPU

Flow graph deployment • Where to place operators?

Flow graph deployment • Where to place operators? • Basic idea: cluster physical nodes

Flow graph deployment • Partition flow graph among coordinators • Coordinators represent their cluster • Exhaustive search among coordinators N1 ? ⋈ ‘DL207’ ⋈ N10 N2 ? ? N7

Flow graph deployment • Coordinator deploys subgraph in its cluster • Uses exhaustive search to find best deployment ⋈ ?

Flow graph reconfiguration • Resource or load changes trigger reconfiguration • Clusters reconfigure locally • Large changes require inter-cluster reconfiguration ⋈

Hierarchical clusters • Coordinators themselves are clustered • Coordinators form a hierarchy • May need to move operators between clusters • Handled by moving up a level in the hierarchy

What do we optimize • Basic metrics • Bandwidth used • End to end delay • Autonomic metrics • Business value • Infrastructure cost [ICAC ’05]

Experiments • Simulations • GT-ITM transit/stub Internet topology (128 nodes) • NS-2 to capture trace of delay between nodes • Deployment simulator reacts to delay • OIS case study • Flight information from Delta airlines • Weather and news streams • Experiments on Emulab (13 nodes)

Approximation penalty Flow graphs on simulator

Impact of reconfiguration 10 node flow graph on simulator

Impact of reconfiguration Network congestion Increased processor load 2 node flow graph on Emulab

Different utility functions Simulator, 128 node network

Query planning • We can optimize the structure of the query graph • A different join order may enable a better mapping • But there are too many plan/deployment possibilities to consider • Use the hierarchy for planning • Plus: stream advertisements to locate sources and deployed operators • Planning algorithms: top-down, bottom-up [IPDPS ‘07]

Planning algorithms • Top down A ⋈ B ⋈ C ⋈ D C ⋈ D A ⋈ B ⋈ A B C D C ⋈ D A ⋈ B ⋈

Planning algorithms • Bottom up A ⋈ B A ⋈ B A ⋈ B ⋈ C ⋈ D A B C D A ⋈ B A ⋈ B ⋈ C ⋈ D

Query planning 100 queries, each over 5 sources, 64 node network

Availability management • Goal is to achieve both: • Performance • Reliability • These goals often conflict! • Spend scarce resources on throughput or availability? • Manage tradeoff using utility function

Fault tolerance ⋈ • Basic approach: passive standby • Log of messages can be replayed • Periodic “soft-checkpoint” from active to standby • Performance versus availability (fast recovery) • More soft-checkpoints = faster recovery, higher overhead • Choose a checkpoint frequency that maximizes utility ⋈ X ⋈ [Middleware ’06]

Proactive fault tolerance • Goal: predict system instability

Proactive fault tolerance

Mean time to recovery

IFLOW beyond inTransit Science app … Complex infrastructure inTransit Pub/sub Self-managing information flow

Related work • Stream data processing engines • STREAM, Aurora, TelegraphCQ, NiagaraCQ, etc. • Borealis, TRAPP, Flux, TAG • Content-based pub/sub • Gryphon, ARMADA, Hermes • Overlay networks • P2P • Multicast (e.g. Bayeux) • Grid • Other overlay toolkits • P2, MACEDON, GridKit

Conclusions • IFLOW is a general information flow middleware • Self-configuring and self-managing • Based on application-specified performance and utility • inTransit distributed event management infrastructure • Queries over streams of structured data • Resource-aware deployment of query graphs • IFLOW provides utility-driven deployment and reconfiguration • Overall goal • Provide useful abstractions for distributed information systems • Implementation of abstractions is self-managing • Key to scalability, manageability, flexibility

For more information • http://www.brianfrankcooper.net • cooperb@yahoo-inc.com

IFLOW: Self-managing distributed information flows