490 likes | 623 Views
Best Practices and Design Patterns for Pushing Fast Data to the Edge. Prasen Palvankar Director Product Management. Oracle Event Processing. Transportation & Logistics. Financial Services. Telecommunications. Container Tracking Vehicle Management Passenger Alerts. Algorithmic Trading
E N D
Best Practices and Design Patterns for Pushing Fast Data to the Edge Prasen PalvankarDirector Product Management
Transportation & Logistics Financial Services Telecommunications • Container Tracking • Vehicle Management • Passenger Alerts • Algorithmic Trading • Fraud Detection • Risk Management • WiFi Off-Loading • Video Analytics • Network Management Public Sector & Healthcare Utilities & Oil and Gas Manufacturing & Retail • Safe Cities • Medical Device Monitoring • Military Asset Allocation • Outage Intelligence • Workforce Management • Real-time Drilling Analysis • Smart Mall • Quality Control • Building Management Oracle Event Processing Applications
Why Use Event Processing Infrastructure? Application has any one or more of the following conditions: • Requires high throughput and low latency processing. • Has continuously streaming data. • Real-time correlation between multiple incoming data sources. • Time-sensitive alerts, aggregations and calculations. • Needs to look for patterns in the data stream. • Data does not need to be stored, if there is nothing of interest in it. • Problem is more easily solved by analyzing before storing in DB.
Oracle Event Processing (OEP) • High-Volume, Low-Latency Event Processing Infrastructure • Time-Sensitive Processing & Pattern Matching • Light-weight Java Application Server • Deploy in Data Center or Distributed Locations
Time Management & Pattern Matching • Pattern Matching • Continuous Query Language (CQL) • Detect Absence of Events & Missing Events • Event “A” NOT followed by Event “B” within 10 minutes • Event “A”, Event “B” should occur next, but Event “C” occurs instead.
CQL Processor Output Adapter Channel DB Input Adapter Channel CQL Processor Channel Input Adapter Channel Output Adapter CQL Processor Channel Output adapters send data and alerts to downstream systems and business processes Input adapters connect to data sources Channels help control the flow of data and can be tuned for optimal performance Databases and Coherence caches can be referenced directly in CQL processors CQL processors contain correlation, aggregation and pattern matching business logic Oracle Event Processing Application OSGi Bundle/Spring Application Context Coherence
Cache Event Bean Channel Adapter Event Sink Processor Event Processing Network (EPN) • Adapters: Receive Data From Input Sources -> Convert To Event • Channels: Buffer and Control Flow of Events (Sync/Async & Number of Threads) • Processors: Perform Complex Time and/or Pattern Matching Logic • Cache: Low-Latency, Fault-Tolerant Data Store / Perform Any Grid Operation • Event-Beans/Sinks: Perform Any Java Logic
Event Type Repository Define Java Objects as Event Types Define Events in XML
Adapters Specify a provided adapter and use optional converter class
Adapters Easily create your own adapter in a few lines of Java
Channels Declare Channels with an Event Type Specify Sources and/or Listeners Optional Channel Configuration
Processors Contain CQL Queries
Processors Use CQL Views to Process Data
CQL Operators Time-Based Stream-to-Relation Window Operators [NOW] This time-based range window outputs an instantaneous relation. The smallest granularity of time in OEP is nanoseconds and hence all these events expire 1 nanosecond later.
CQL Operators Time-Based Stream-to-Relation Window Operators RANGEHolds the events in a time window for the specified period of time. By default, the range time unit is second, so S[range 1] is equivalent to S[range 1 second]. Optionally, use the SLIDE operator to control the output. SLIDE controls the output so that it is only sent at the time interval specified. [slide 30 seconds] means the user is interested in looking at the output after every 30 seconds. RANGE UNBOUNDED Events remain in the window indefinitely.
CQL Operators Row-Based Stream-to-Relation Window Operators ROWS Holds the specified number of events in window. Optionally, use the “slide” operator to control the output or “partition by” to separate events from an incoming stream. ROWS 3 SLIDE 3 controls the output so that the query executes on every 3 events. PARTITION BY partitions an incoming event stream into separate relations according to the column specified.
CQL Operators Relation-to-Stream Operators ISTREAM Only evaluate events as they are inserted into the window. This will process events that are new. DTREAM Only evaluate events as they are deleted from the window. This allows the developer to determine when events are expiring from a time window. RSTREAMOutputs all the events as each new event is received. Maintains the entire current state of its input relation and outputs all of the events insertions at each time step.
CQL Pattern Recognition MATCH_RECOGNIZE The pattern recognition functionality in Oracle CQL allows you to define conditions on the attributes of incoming events and to identify these conditions by using String names called correlation variables. The pattern to be matched is specified as a regular expression over these correlation variables and it determines the sequence or order in which conditions should be satisfied by different incoming events to be recognized as a valid match. Additional clauses such as ALL MATCHES, PARTITION BY, and DURATION give you more control over the way the pattern recognition is performed over the input stream..
CQL Data Cartridges Spatial Data Cartridge Oracle Spatial is an option for Oracle Database that provides advanced spatial features to support high-end geographic information systems (GIS) and location-enabled business intelligence solutions (LBS). The Oracle Spatial data cartridge is an optional data cartridge which allows you to write Oracle CQL queries and views that seamlessly interact with Oracle Spatial classes in your Oracle CEP application. Using the Oracle Spatial data cartridge, you can configure Oracle CQL queries that perform the most important geographic domain operations such as storing spatial data, performing proximity and overlap comparisons on spatial data, and integrating spatial data with the Oracle CEP server by providing the ability to index on spatial data.
Sample CQL Queries Simple filtering SELECT * FROM inputChannel [NOW] WHERE eventValue > 10 Continuously calculate the last hour sales by store SELECTSUM(amount) as salesTotal, storeID FROM inputChannel [range 60 minutes] GROUPBY storeID Calculate the average of the last 2 stock ticks by stock symbol SELECTAVG(stockPrice) as avgPrice, stockSymbol FROM inputChannel [PARTITIONBY stockSymbol ROWS 2] GROUPBY stockSymbol Filter for events meeting specific threshold values Running total of up- to-the-moment sales by store Average of the last 2 stock ticks by symbol
Sample CQL Queries Running Total of Credit Card Transactions for Last 24 hours Find the running total of the last 24 hours of credit card transactions by account SELECT accountID, SUM(transactionAmount) as transactionsTotal FROM TransactionChannel [PARTITION BY accountID ROWS 1000 RANGE 24 HOURS] GROUP BY accountID Find smart meters that registered a “last gasp” (i.e. shut down) that were not expected to be serviced. RSTREAM( OffAlarmsView NOT IN KnownServiceView ) Unscheduled Smart Meter Outage
Pattern Matching Powerful concept that allows identification of complex event patterns Defined as regular expressions PATTERN (X+ Y+) 1 or more X events … … followed by 1 or more Y events
Sample CQL Queries Find passengers stuck in security when their flight reaches “final boarding”. SELECT stuck.reservationLocator, 'STUCK' as state FROM PassengerStateEventChannel MATCH_RECOGNIZE ( PARTITION BY reservationLocator MEASURES Entered.reservationLocator AS reservationLocator PATTERN (CheckIn Entered NotExited*? Final) DEFINE CheckIn AS state = 'CHECKIN', Entered AS state = 'ENTERED', NotExited AS state != 'EXITED', Final AS state = 'FINAL' ) AS stuck Find passengers who are stuck in security when their flight is in the “FINAL BOARDING” process.
Pattern Match Query CQL view filters events for use by a query. CQL query that starts when an OFF event occurs and waits 10 seconds to make sure that it is not followed by an ON event before sending the OFF event downstream.
Sample CQL Queries Track a bus on a map Track school buses on a map SELECT bus.busId as busId, bus.seq as seq, com.oracle.cep.cartridge.spatial.Geometry.createPoint(8307, bus.longitude, bus.latitude) as geom FROM BusPosStream as bus Alert when the bus arrives at the bus stop SELECT systimestamp() as incidentTime, bus.busId as busId, busstop.seq as stopSeq FROM BusPosGeomStream[NOW] as bus, BusStopRelation as busstop WHERE CONTAIN@spatial(busstop.geom, bus.geom, 100.0d) = true and bus.busId = busstop.busId Determine when the bus is near the bus stop
Data Explosion Web & social networks experienced it first… Infographic by Go-gulf.com
… but enterprises are also facing it now … but enterprises are now facing it too Utilities deploying smart meters? 200x information flowing to data center! Source: http://www.oracle.com/us/corporate/press/1704764
Storage is the first obvious problem. Analysis is next. “Big Data is not the created content, nor is it even its consumption — it is the analysis of all the data surrounding or swirling around it.” Storage is the first obvious problem. Analysis is next. Source: IDC's Digital Universe Study, sponsored by EMC, June 2011 http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
SOCIAL FAST DATA Event Processing Intelligence 101100101001001001101010101011100101010100100101 BLOG SMARTMETER GREATER Filtering, Real-time Intelligence for Big Data VOLUME VELOCITY VARIETY VALUE 33
minutes Some Challenges Working with Big Data Big data ≠ Infinite storageYes, storage is cheap but it helps to have clean data, with context and less redundancy Hadoop is batch-oriented and there is inherent latency "With the paths that go through Hadoop [at Yahoo!], the latency is about fifteen minutes […] it will never be true real-time. " *RaymieStata, Yahoo! CTO(June 2011) : http://www.theregister.co.uk/2011/06/30/yahoo_hadoop_and_realtime/
Get Ahead of the Curve Filter out, correlate Use Event Processing Techniques • Filter out noise (ex: data ticks with no change), add context (by correlating multiple sources), increase relevance • Identify certain critical conditions as you insert data into the warehouse Move time-critical analysis to front of process
Fast Data Getting Ahead of the Curve Example: analysis of traffic patterns and congestion times for urban planning Fast Data Big Data ms minutes Historical depth: shallow Historical depth: deep Example: monitoring of traffic cameras to ensure given license plates are not in use on multiple vehicles Add “depth” to your fast data by merging output of MapReduce to stream processing
Technical Challenge • 700,000 traffic /sec • In-memorybut zero data loss • Continuous user growth • Expansion of service
Why Oracle Event Processing and Coherence Extremely high throughput with In-memory technology • Optimized for high throughput processing • Tight integration with Oracle Coherence • Simple thread tuning by “Channel” architecture • 700,000 traffic /sec
Why Oracle Event Processing and Coherence High throughput “AND” Strictly High Availability • Continuous processing for stream data • Redundancy of application by OEP • Availability of data access by Coherence • In-memory but zero data loss
Why Oracle Event Processing and Coherence Scalability against Un-expected requirement • Liner scale out, with “0” performance loss • Flexibility of function by using middleware instead of appliance product • Expand to real-time customer services • Continuous user growth • Expansion of service
Oracle Event Processing using Coherence OEP Data Grid Consolidated & in-context Data Challenges Benefits • Handle and correlate events in real-time, including support for multiple patterns: • Pre-processing (buffer OEP) • Within OEP (to cache reference data) • Post OEP (to expose processed events to consuming apps) • High throughput for storing data • Aggregation and event querying • Pattern implementation flexibility combining two complementary technologies
Cache Join Query CQL view that joins a cache containing reference data to an individual streaming event to provide additional context for further processing.
Coherence Integration • Specify the cache configuration in the Event Processing Network (EPN). Use the same Coherence configuration files as any other Coherence JVM. • Insert events into the cache without writing any code. • Easily associate a cache listener with a cache to receive events into the OEP application.
Coherence Integration • Join to caches in CQL • Create an OEP “event-bean” which has a reference to the cache set in the EPN to perform any Java logic using the cache such as invoking a Coherence “entry processor”.
Batching OEP Results to a Cache This technique is useful aggregating results across a number of OEP instances using a cache to hold the results. This can reduce the length of time required for the CQL windows and help scale your application and help implement high availability. Use the SLIDEclause to output the CQL results at regular intervals. Group by the criteria that you need to aggregate on.
Batching OEP Results to a Cache Use an invoke operation against the cache to increase the total or add it for that key, if it doesn’t exist. (The key corresponds to the same attributes in the group by clause).
Oracle Event Processing (OEP) • High-Volume Low-Latency Event Processing Infrastructure • Event Processing Network (EPN) • Light-weight Java Application Server (embeddable) • Easily Customizable • Integrate with existing infrastructure and other Oracle Products (e.g. Coherence, Business Activity Monitoring, Database, Big Data Appliance, Data Mining, Spatial, NoSQL Database etc.) • Time Management & Pattern Matching • Continuously Perform Calculations Over Time Windows • Partition Event Streams By Key Values • Perform Complex Pattern Matching • Adjust Core Business Logic in Real-time without Redeploying