System S – High-Performance Stream Computing Platform

System S –High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research Center

Outline • System S Overview • System S for Energy Trading • System S for Astronomy

Stream processing will be everywhere

Minimizing time to react Process data as it is continuously generated data Extracting and organizing information and intelligence Data Sources Stream Processing System What is Stream Processing? Database/data warehouse

Transaction processing Batch processing Without Stream Processing? Minimizing time to react Process data as it is continuously generated Database/data warehouse data Extracting and organizing information and intelligence Data Sources

Minimizing time to react Process data as it is continuously generated stream data packet Database/data warehouse data operator Extracting and organizing information and intelligence Sensor Network Stream Processing System What Makes a Stream Processing System?

Tooling Developer UI Composition UI Analyst UI Application Interconnection of operators Runtime Environment Job management, resource management, content routing, programming model, object store Hardware Platform Servers, networks, storage, operating system, file system What Makes a Stream Processing System? Stream Processing System

Event/Data Volume & Diversity High Volume Complex Analysis Time Sensitive Time Sensitivity Analysis Complexity Continuous Event and Stream Processing • Stream Processing enables… • high message/data rates, • low (msec-secs) latency, • advanced analysis • Today’s Complex Event Processing (CEP) solutions target… • 10K messages/sec, • secs-minutes latency, • rules-based analysis

System S Stream Processing • New stream computing paradigm • Pull information from anywhere in real time • Ultra-low latency, ultra-high throughput • Scalable

System S: A Closer Look Analytics may be a combination of provided and user-developed/legacy operators System S continually adapts to new inputs, new modalities • This notional System S application… • Calculates VWAP • Calculates P/E, based earningsfrom Edgar • Refines earnings based on encumbrances identified in newsfeeds System S applications can seamlessly process structured (event) and unstructured data

Correlate Transform Annotator Edge Adapters Segmenter Filter Classifier SPADE Building BlocksClassifiers, Annotators, Correlators, Filters, Aggregators

Application Programming Sink Adapters Operator Repository Source Adapters • Consumable • Reusable set of operators • Connectors to external static or streaming data sources and sinks MARIO: Automated Application Composition SPADE: Stream processing dataflow scripting language Platform Optimized Compilation

SPADE SPADE (Stream Processing Application Declarative Engine) is an intermediate language for streaming applications. Simplifies design of applications used by System S Hides complexities of manipulating data streams (e.g., contains generic language support for data types and building block operations) fanning out applications to distributed heterogeneous nodes transporting data through diverse computer infrastructures (ingesting external data, routing intermediate results, looping in feedback, branching, outputing the results, ...)

Basic Promises of SPADE SPADE is easy to use Programmers provide descriptions of stream-based data processing tasks using SPADE’s intermediate language SPADE’s query engine comes up with an execution plan, builds it, and hands it off to System S runtime for deployment SPADE enables rapid application development Customizable operators – do not require low-level coding Support for user defined operators and legacy code SPADE is high performance Optimized code generation

Functor Aggregate Source Sink A simple example [Application] SourceSink trace [Typedefs] typespace sourcesink typedef id_t Integer typedef timestamp_t Long [Program] // virtual schema declaration vstream Sensor (id : id_t, location : Double, light : Float, temperature : Float, timestamp : timestamp_t) // a source stream is generated by a Source operator – in this case tuples come from an input file stream SenSource( schemaFor(Sensor)) := Source( ) [ “file:///SenSource.dat” ] {} // this intermediate stream is produced by an Aggregate operator, using the SenSource stream as input stream SenAggregator ( schemaFor(Sensor) ) := Aggregate( SenSource<count(100),count(1)> ) [ id . location ] { Any(id), Any(location), Max(light), Min(temperature), Avg(timestamp) } // this intermediate stream is produced by a functor operator stream SenFunctor( id: Integer, location: Double, message: String ) := Functor( SenAggregator) [ log(temperature,2.0)>6.0 ] { id, location, “Node ”+toString(id)+ “ at location ”+toString(location) } // result management is done by a sink operator – in this case produced tuples are sent to a socket Null := Sink( SenFunctor) [ “cudp://192.168.0.144:5500/” ] {}

Performance optimization and scalability • Split/Aggregate/Join architectural pattern • High-ingest rate input stream must be split • Aggregate: model creation • Join: correlation • Operator Fusion • Fine-granularity operators • From small parts, make a big one that fits • Code generation • Actual code must match the underlying runtime environment • Number of cores • Interconnect characteristics • Architecture-specific instructions • Compiler-based optimization • Driven by automatic profiling • Driven by incremental learning of application characteristics Logical app view Physical app view

Operator Fusion - Illustration One PE per Operator Spade compiler can generate optimized operator grouping schemes Fuse all except sources and sinks A truly random partitioning

Operating System Transport System S Data Fabric X86 Box X86 Blade FPGA Blade X86 Blade Cell Blade X86 Blade X86 Blade X86 Blade X86 Blade X86 Blade System S Runtime Services Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters Processing Element Container Processing Element Container Processing Element Container Processing Element Container Processing Element Container

Operating System Transport System S Data Fabric BG Node BG node BG node BG node BG node X86 Blade X86 Blade FPGA Blade X86 Blade X86 Blade Cell Blade X86 Blade X86 Blade X86 Blade X86 Blade System S Runtime Services Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation Adapts to changes in resources, workload, data rates Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters Processing Element Container Processing Element Container Processing Element Container Processing Element Container Processing Element Container Capable of exploiting specialized hardware

Site B Site B Site C Site C Site A Site A Distributed operation

Source Adapters Sink Adapters Operator Repository Automated, Optimized Composition (SPADE, MARIO) Automated, Optimized Deploy and Management (Scheduler) Advantages of Stream Processing as Parallelization Model • Streams as first class entity • Explicit task and data parallelism • More intuitive approach to multi-core exploitation • Automated composition • Query optimization over well-known operators • Inquiry optimization using semantic tagging of operators and data sources • Operator and data source profiling for better resource management • Reuse of operators across stored and live data • MapReduce is similar programming model with storage as transport

System S Pilots Trading Advantage Manufacturing Improving the quality semi-conductor wafers with dynamic manufacturing tools tuning Identification of and response to opportunities in real-time market data 2.5K events / sec 10 msec latency 5 Million events / sec Millisecond latency Semiconductor Solutions Astrophysics Government World’s largest and first fully digital radio observatory for astrophysics, space and earth sciences, and radio research Detect & respond to phenomena based on large volumes of structured and unstructured information 1.5 Million events / sec

Sneak Preview: IBM InfoSphere Streams Applications Data Warehouse Business Process Management Business Intelligence

Outline • System S Overview • System S for Energy Trading • System S for Astronomy

The Energy Trading Scenario using Stream Computing • Sample application showing power of Stream Computing • Only one of many possible applications/services • Weather conditions and events drive pricing of energy futures • natural events interfering with energy supply • announcements, news stories, … • Energy traders today struggle to integrate info from multiple sources • cannot get it in real time, to inform their trade decisions • they see 8 screens, integrate manually via a spreadsheet • IBM Stream Computing assembles, deploys applications • integrates diverse sources of data • provides timely correlations, analyses

Illustrative Example: Fear and Opportunity in the Gulf • News Flash: • Hurricane Dean Upgraded to Category 5 • Path Projected through Gulf • Oil Stocks Uniformly Down • If you saw it coming, because you watched for more primal data… • Like hurricane path predictions from NOAA • Or even weather satellite and/or sensor data • Real-time equities trade data • If you’ve been accumulating intelligence on the location (and value) of company assets that could be in the path… • If you could apply such analysis before the news cycle… • You could take advantage – in both directions… Same story, viewed 2 hours later. The story’s the same, but… …the live tickers tell a different story Oil company stocks down, based on early fears. Properties with significant assets in path are still down More affected companies have been identified(a bit late, no?) Others are up, showing recovery even beforethe storm hits

Recommendations Based on Hurricane Forecast Capture market data (high volume) Compute portfolio market indicators (low latency) Correlate combined risk and trade VWAP to determine buy/sell recommendations Make recommendations and notify DHTML Result rendering Capture weather sensor data, analyses hurricane predicted path Dynamically updated risk assessment for assets in projected path Estimate impact on portfolios Real-time projections of hurricane path Web Zero platform System S platform

Outline • System S Overview • System S for Energy Trading • System S for Astronomy • Past & current projects

Past Projects • Outlier detection from single tripole • Decomposing combined DOA’s from single tripole • SPADE UDOP’s • Linking against Lapack and Blas libraries • About 50 non-trivial processing elements • Being optimized by SPADE team now • Convolutional resampling (tConvolve) on System S • Mostly built-in operators (soon built-in operators only) • Fully parametrizable using Perl; e.g., # of w planes • Does scale very well!

Outlier detection from single tripole • Receive 3D electric field • Demultiplex 3D electric field • Each UDP packet contains multiplexed electric fields • Compute intensity • I(t)=|Ex(t)|2+|Ey(t)|2+|Ez(t)|2 • Detect outliers • Outlier detected if: mI(t-N:t-1) + T.I(t-N:t-1)  I(t)  mI(t-N:t-1) - T.I(t-N:t-1) • Visualize detected outliers in Matlab in real-time

Field intensity Windowed statistics |Ex|2 ^2 Aggregate Avg 10c,1c Avg mI2 Data Demux |Ey|2 Barrier Seq,  Barrier mI Sqrt(mI2-mI2) UDP Source I mI Aggregate Avg 10c,1c Avg mI, I |Ez|2 mI-T.I mI+T.I U, L Filter out Seq<10 UDP sink Filter out empty lists UIL? Barrier I,{U,L} File sink Outlier detection SPADE Flow of Operators

Field intensity |Ex|2 Data Demux |Ey|2 Barrier Seq,  Source I |Ez|2 Field Intensity

Source Data Demux |Ez|2 |Ey|2 |Ex|2 Barrier Seq,  Field intensity I

Windowed statistics ^2 Aggregate Avg 10c,1c Avg mI2 Barrier mI Sqrt(mI2-mI2) mI Aggregate Avg 10c,1c Avg mI, I mI-T.I mI+T.I U, L Windowed Statistics

Aggregate Avg 10c,1c ^2 Avg Aggregate Avg 10c,1c Avg mI2 mI Barrier mI-T.I mI+T.I mI Sqrt(mI2-mI2) Windowed statistics U, L mI, I

Filter out Seq<10 UDP sink Filter out empty lists UIL? Barrier I,{U,L} File sink Outlier detection Outlier Detection Intensity U, L

U, L Outlier detection Barrier I,{U,L} UIL? Filter out empty lists Filter out Seq<10 Intensity UDP sink File sink

Decomposing combined DOA’s from single tripole

Direction of Arrival (DOA) • Simple case • Two orthogonal waves

Pseudo-Orbital MomentumTwo-wave case, same frequencies

Pseudo-Orbital MomentumTwo-wave case, same phases

System S – High-Performance Stream Computing Platform