120 likes | 276 Views
Software Stack: Processing, Workflow, Rules . Software Stack: The processing/workflow/rules backbone that supports all other activities. Flexible, rapidly configurable and easily customized Supports all data flow and processing throughout the process
E N D
Software Stack: Processing, Workflow, Rules Software Stack: The processing/workflow/rules backbone that supports all other activities • Flexible, rapidly configurable and easily customized • Supports all data flow and processing throughout the process • Cloud-based platforms allowing parallel processing and rapid analytics deployment
Software & Stack We use a “Federated” model for developing software: co-located teams focus on specific solutions and technologies, and a Core Architecture team ensures best practices and software reuse Too detailed for website, for background info Marketing Services Credit and Risk Global Markets Hadoop Processing Data Visualization Analytic Engines High Speed Transactions UI Design Jersey City San Diego Common Pool Data Services Supply Chain Services CoreArchitecture,PMO (San Diego,Boston) Analytic Engines Analytics Support Team Shanghai Boston Process/Tools “Big Batch” Processing QA Leadership UI Design Ancillary Work Common Pool New Delhi TopCoder Data Entry Forms UI Widgets Design Ideas Documents Utilities UI Configuration/Development ETL Configuration/Development QA Staff Operations
Software & Stack We dramatically reduce the time it takes to translate Signals from development to deployment by using the same infrastructure in both environments Too detailed for website, for background info REUSE: COMMON INFRASTRUCTURE FOR MODEL DEVELOPMENT/DEPLOYMENT Model Development Common Signals Creation Engine (Python, SQL, Java) Review Historical Data Merge, Tag, Sample, Split Data Calculate Variables Execute Segment Logic Determine Best Variables Build Models Package Models to Deploy ETL – Offline Data Structuring (SAS, BIQ, Hadoop, SQL, Python) Model Training Tools (SAS, Matlab, CART, NN, Custom) Model Deployment Receive Client Data Verify/Clean Data Calculate Variables Execute Model (s) Execute Business Logic Respond to Client Report, Act, Analyze Model Execution (Java, Python) ETL – Online Data Structuring (Java, Python, Hadoop, SQL) User Interfaces
Software & Stack We have built the Opera Stack to support a machine learning environment in a Big Data world – while also responding to customers’ needs and requirements Too detailed for website, for background info Flexible output into any insertion point DIRECTED ACTIONS SERVICE ORIENTED, HIGHLY SCALABLE PLATFORM DIRECTED ACTIONS TO HUMANS DIRECTED ACTIONS TO MACHINES OUTPUT SERVICES Visualization & Dashboards BIQ Simulations Scores, Decisions, Curricula, Alerts Mobile Devices MQ, JMS, Sockets ACCESS Secure Browser (HTML and Flex) Optimized approaches for high-volume reads Custom Handheld Web Services FTP and SFTP ANALYTICS ANALYSIS SERVICES OLAP Memcached RDBMS ANALYTICS RESULTS DATABASE Proprietary libraries and capabilities for critical signal and modeling areas MODEL EXECUTION Decision Rules Ensemble • Neural Nets • Restricted Boltzmann • K-Nearest Neighbor • SVD • Linear Regression • Matrix Factorization • Global Effects • Factor Similarity • Temporal Distance • Stochastic Gradient Descent • Kalman Filters • … MODEL DEVELOPMENT/ CONTINUOUS LEARNING SIGNALS SELECTION • Mahalanobis Distance • Stepwise Regression • Classification Trees • Mutual Information • Principal Component Analysis • Sensitivity Analysis • Bivariate Charts • Clustering Covariance Optimized approaches for record read speed SIGNALS DATABASE SIGNALS CREATION NoSQL In-Memory RDBMS NoSQL SIGNALS IDENTIFICATION • Time Series • Events • Sparse Matrix Library of algorithms to create Signals: • Statistics • Geo-Location Flexibility; optimized approaches DATA STORE INPUT SERVICES ETL Hadoop In-Memory RDBMS STRUCTURE SAS TalendHadoop Kettle BIQ Flexible interfaces (client-system agnostic; structured, unstructured) EXTRACT MQ Web Services Sockets SFTP SQL SAP DATA SOURCES 3rd Party (Including Live Feed) Web/ Unstructured Opera Proprietary Databases Enterprise: Oracle, SAP, Customer Systems, etc. In Vivo Feedback Opera Proprietary Best of Breed Commercial and Open Source Accepts unstructured data Includes Opera-proprietary data, e.g., our Zip+4 Geo data Client-system agnostic Accepts real-time feeds
Software & Stack A common reference model can be found consistently across Solutions This show process of data management to final input Build Execute DIRECTED ACTIONS SERVICE ORIENTED, HIGHLY SCALABLE PLATFORM OUTPUT SERVICES TRIAL AND TEST DIRECT ACTION TO PEOPLE OR MACHINES ACCESS ANALYTICS ANALYSIS SERVICES ASSEMBLE EVALUATE SYNTHESIZE PATTERNS CREATE AND SELECT VARIABLES SIGNALS CREATION DETECT SIGNALS ETL INPUT SERVICES CAPTURE, INTEGRATE, TRANSFORM LISTEN DATA SOURCES
Software & Stack Quick integration of new data and modification of existing data is critical. We continue to add to a library of connectors, accumulators and transforms that allow us to do this in batch and real time – and that are adaptable to new data sets and systems INPUT SERVICES • Batch Processing • Automatically detects new files and runs them through a configurableseriesof verification, cleanse, load, and analyze processes • Real-time Processing • Messages are run through a similar verification layer to access the core data and analytics services BATCH CONNECTORS Encrypt/ Compress (optional) Opera Hosted Parallel Data To Process File Monitor Verify, Clean, & Load FTP Extract Data Analyze Client Hosted REAL TIME CONNECTORS Transaction Interface (Web Server, TCP/IP Socket, MQ) Parallel Client Website Data Analysis Request Analyze Call Center App Analysis Response Data Operations
Software & Stack Getting Signals from Big Data is highly dependent on how that Big Data is stored. We have sophisticated, complementary signal extraction tools housed in a flexible architecture that can readily support additional engines; we are “infrastructure agnostic,” not tied to a single technology SIGNAL EXTRACTION Relational Database Network Database Memory Cube Distributed Database Indexed Files Object Transaction Client Data Fast retrieval of transaction history for longitudinal profiling Custom MongoDB General reporting and ad hoc analysis SQL Server Oracle MySQL Netezza Tracking “chains” of relationships, like references Neo Database Custom Develop High speed “slice and dice” analysis of a flat set of data BIQ Mondrian Custom Processing of large data that can be broken down and processed in pieces (Map/Reduce) Hadoop Vertica Accessing data by a single key value, like a customer profile record Berkeley DB CTREE External Signals Algorithms (SQL, MapReduce, MDX, API)
Software & Stack We use a “container approach” that provides access to reusable platform layers while also supporting customizable model and rule execution with dynamic Java and Python plug-ins. Services can support both model development (signal selection, model training) and production (signal creation, model execution) Too detailed for website, for background info ANALYTIC SERVICES ANALYTIC SERVICES Analytic Service Verify request type Data Store Transaction Router Analytic Service Real-Time Transactions • Signal Creation • Retrieve “context” data from Data Store • Calculate variables In-Database or In-Code Signals DB • Production • Combine and Execute Models • Decision Rules • Dev/ Learning • Select Signals • Train Models • Evaluate Performance Batch Processing Loop through records Results DB Save and Deliver
Software & Stack We deliver into any insertion point, batch or real time, through any type of interface, taking in feedback via a closed loop system that allows analytics to learn and adapt without lag time. Advanced visualization tools allow human judgment to provide oversight Too detailed for website, for background info RESPONSE & FEEDBACK Client Systems Analytics Automated Feedback • Websites • Operational Databases • Customer/Sales Touch-Points Extract Analytics Response • Response Connectors • Real-time • Batch Results Database Interactive User Interfaces Reporting and Scenarios • Multiple form factors (desktop, handheld, iPad) • Custom Dashboards • “What-if” Scenarios • Rule / Parameter Changes Manual Adjustments
Software & Stack Our platform allows for rapid implementation with low IT investment. Key to this is the flexibility to address customers’ requirements for both the outbound data streams and the inbound “directed actions” Too detailed for website, for background info Public Cloud Opera Cloud Client Cloud On Premise Opera adds in new data sources with valuable Signals Enterprise Systems xxx Opera performs all data structuring and integration – little new customer IT required Third Party Data Social Media Collector Extract Data Warehouses SFTP Operational Databases SQL Structured Data Store Structure Data Enterprise Operations Web Services Websites MQ Signals Creation POS Sockets Events Signals Database Delivery Directed Actions to Machines SFTP Web Services Model Execution Develop/Train Models Opera creates all interfaces (platform agnostic: desktops, tablet, handhelds, kiosks, more); little new customer IT required Results Database Directed Actions to Humans Signal Select Web Application Server Learning Adaptive Learning Capabilities
Case Study: Building the Platform to manage daily feeds for recommendations – Created for Schwan’s and reused for Nissan SCHWAN’S INTERFACES OPERA SERVER xxx xxx Generic process is configured to continually read and write files from FTP/SFTP servers Customers, Products, Sales, Routes, Inventory Auto FTP Pull SFTP Data Check/Load Generic process checks all files for errors and calculates statistics on all fields prior to loading Web Server (Query tools) MySQL RDBMS Reports Create KNN functions and a Neural Network train/execute function Recommender (KNN, Neural Network) Handhelds SFTP Auto FTP Push Created both Flex and HTML/PDF reports NISSAN INTERFACES OPERA SERVER xxx xxx Inventory, Condition Reports, Sales, Auctions Auto FTP Pull Used the same programs created for Schwan’s to monitor for files, clean them, and load to an RDBMS SFTP Data Check/Load Web Server (Query tools, Real-time Web Service) Reports MySQL RDBMS Used the KNN function from Schwan’s. Create new Kalman train/execute functions Live Dashboard Pricing (KNN, Kalman) Auction Block SFTP Auto FTP Push Used HTML/PDF report tools from Schwan’s. Created new iPad application
Case Study: Building the Platform for large Batch Processing – Created for Mobiuss and reused for FA Performance Aggregator (and Insight Engine) Too detailed for website, for background info, can’t show how we scaled it INTERFACES OPERA SERVER xxx Data Providers Loans, Borrowers, Properties, Payments, Bonds, Deals, Prices Reused the same FTP monitor job from Schwan’s/Nissan Auto FTP Pull SFTP Data Check/Load Same Data Check/Load process, with MSSQL interface Web Server (Query tools, trigger simulators) MSSQL RDBMS Implement WoE algorithms and others to create forecast models that are fed to the pricing engine. First use of Hadoop to run 80,000 Deals and Monte Carlo Reports and Scenarios Forecast Models (Run in Hadoop) Intex Pricing Engine SFTP Auto FTP Push Used same Flex framework to build Mobiuss Reports INTERFACES MSSB OPERA SERVER xxx xxx Used the same programs created for Schwan’s to monitor for files FA, Accounts, Performance, Benchmarks Auto FTP Pull FTP Data Check/Load Create Hadoop (HDFS) load process. Same process used for Insight Engine. Web Server (Query tools) MSSQL HDFS FA Dashboard Created new Aggregation engine to execute complex rollup and benchmarking functions Aggregator Used same Flex framework to create the Performance dashboard. Also, created a BIQ dataset BIQ