430 likes | 585 Views
Website Survival: Concealing Back-End Outages with Oracle Coherence and HotCache. Jim Xu Senior Technology Architect TELUS Randy Stafford Architect At-Large Oracle Coherence Product Development. Presented with. Session Agenda. 1. TELUS introduction and business challenge
E N D
Website Survival:Concealing Back-End Outages with Oracle Coherence and HotCache Jim Xu Senior Technology Architect TELUS Randy Stafford Architect At-Large Oracle Coherence Product Development Presented with
Session Agenda 1 TELUS introduction and business challenge Oracle technology addressing the challenge Technical highlights of the implementation HotCache whole product: transformation and more Solution validation through business metrics 2 3 4 5
TELUS (TSX: T, NYSE: TU) - Canada’s fastest-growing national telecommunications company HeadquarterBurnaby, British Columbia, Canada Revenue $11.7 billion EBITDA $4.0 billion Customer 13.4 million connections, including 7.9 million wireless subscribers 3.2 million wireline network access lines 1.4 million Internet subscribers 865,000 TV customers Website www.telus.com Introduction to TELUS
Context Digital experience serving several millions of customers Challenges 80% of clients researched online prior to purchase 85% of clients preferred to solve problems online Slow responding web pages and frequent unplanned outages seriously degraded client experience Voice of Client indicated 39% of complaints were related to speed & stability Unreliable self-serve impacted web adoption and drove calls to call centers Subscriber growth increased considerably with traffic and load Goals Under 3 seconds to render customer experience 99.99% uptime Business Challenges
High Availability and Resiliency Program was started in 2011 A number of enhancements reduced response time from 21 sec to 8 sec in 2012, then 6 sec in 2013 Journey on Performance Improvement East West National Q1 Q2 Q3 Q4 (2012)
Impossible to reach 3 sec and 99.99% uptime target without architecture redesign and new technologies Extended outages (10-20 hours) during quarterly releases and maintenance windows Customer data is collected from multiple data sources across multiple data centers Legacy infrastructure requires frequent maintenance Caching data is critical Coherence 3.7 was introduced, but facing challenges in keeping cached data fresh Custom cache updater was considered but later discarded due to complexity Tipping Point
Session Agenda 1 TELUS introduction and business challenge Oracle technology addressing the challenge Technical highlights of the implementation HotCache whole product: transformation and more Solution validation through business metrics 2 3 4 5
Build In-Memory Data Grid with Coherence 12c Resolve cache data update issue with HotCache Conceal back-end outages to provide 7/24 service reliability Improve system performance and maintain consistent client experience Technologies: Exalogic X3-2 and X4-2 Coherence Data Grid edition 12.1.2 Oracle Traffic Director Golden Gate Weblogic12c Stats: Cached raw data – 212 G Number of objects: 821 Million The Solution
Java EE Application Physical Tiering - and Scalability Site 1 WebTier web servers These tiers can scale out… App Tier app servers The grid tier scales out! Grid Tier cache servers The EIS tier is hard to scale! Database EIS Tier Legacy System External Service
Coherence GoldenGateHotCache ExternalApplication Coherence Application Coherence Read / Write Read / Write GoldenGateHotCache GoldenGate Database Push DB changes to Coherence Via GoldenGate and TopLink JPA Tables map to entities, caches Event-driven and efficient Solves stale cache problem when external apps write to shared DB Allows caching to be leveraged in that class of application
Exalogic System Hardware Overview Fast. Easy. Open
Coherence on Exalogic MessageBus: an asynchronous, binary, message-based, event-driven transport layer in Coherence, with pluggable implementations Exabus: a native RDMA implementation of MessageBus, bypassing the OS kernel, avoiding buffer copies Exabus preprocesses messages on I/O threads, avoiding context switches between Coherence threads prior to Exabus Separate MessageBus per Coherence service, instead of all services sharing same transport layer prior to MessageBus, allows utilizing full IB bandwidth MessageBus and Exabus
Data Grid Server - Exalogic vs Commodity Failover Latency Throughput CPU Utilization
Session Agenda 1 TELUS introduction and business challenge Oracle technology addressing the challenge Technical highlights of the implementation HotCache whole product: transformation and more Solution validation through business metrics 2 3 4 5
Data Consolidation Data Grid • Benefits: • Reduce data roundtrips • Improve performance • Less dependency on legacy data centers • Canonical model across multiple source databases Golden Gate Billing Account Customer User Profile
Data Grid Geo-Redundancy • Benefits: • Replicated infrastructure and data • Active-Active to support production • Data and Services closer to consumers Global Traffic Manager Data Services Data Services Data Grid Data Grid East West
Aggressive timeline on launching Data Grid Closely collaborated with Oracle to resolve any technical issues Project Timeline on Data Grid
Manage Object Relationships • Cached Data: • Objects are independent in the grid • But they are logically related • Object traversal through keys
Session Agenda 1 TELUS introduction and business challenge Oracle technology addressing the challenge Technical highlights of the implementation HotCache whole product: transformation and more Solution validation through business metrics 2 3 4 5
Data Transformation with Coherence Live Events V V V V V V V K K K K K K K Canonical Domain Model Coherence Data Grid Legacy Schema Addr AA Object/Cache Mapping BA BC Live Events HotCache AA Cust BC PH PH Svc
Live Events Use Cases in HotCache • Project HotCache model into desired model • Duplicate data for denormalization • Ensure referential integrity in relationship implementations • Merge data from multiple databases • Pending Mutations pattern • Refresh Aggregates when child tables are not replicated
Stage Cache JPA entities Identical data structure as source database to simplify HotCache implementation Initial load is not required, and object can be removed after target object is updated Reduced data grid memory footprint Target Cache Similar to database view, populated with UI optimised domain objects Denormalized/flatten objects to improve performance for data retrieval Process object dependencies through Event Interceptors and Entry Processors Data Aggregation with Layered Caches
Scaling HotCache via Parallel Data Flows Coherence Data Grid V V V K K K • DB schema must be amenable (related tables in same trail) • One HotCache throughput: 700-3000 TPS depending on HW, configuration • This approach has been tested to 18,000 TPS Trail 1 Extract 1 HotCache 1 Trail 2 Extract 2 HotCache 2 Trail N Extract N HotCache N
HotCache High Availability • Coherence is already HA • Oracle Clusterware manages redundant GoldenGateHotCache processes • http://www.oracle.com/technetwork/middleware/goldengate/overview/ha-goldengate-whitepaper-128197.pdf Active Passive GoldenGate Oracle Clusterware GoldenGate check() Manager Manager stop() start() HotCache HotCache
Session Agenda 1 TELUS introduction and business challenge Oracle technology addressing the challenge Technical highlights of the implementation HotCache whole product: transformation and more Solution validation through business metrics 2 3 4 5
No more outages! - supported all major releases and infrastructure maintenance since initial launch in last November Enhanced performance at the service level 2 – 30x faster Reduced dependency on legacy data centers and hardware footprint Offered single view on customer with data from various legacy systems Business Benefits
Data grid enabled Client Account WS response time: 20ms vs 99-10294ms Outage Mode – Portal overview page response time: 3.2s Operational Mode – 48% performance gain from staging performance test Performance Metrics MS HRS