1 / 10

Data Quality Models for High Volume Transaction Streams: A Case Study

Data Quality Models for High Volume Transaction Streams: A Case Study. Joseph Bugajski Visa International. Robert Grossman, Chris Curry, David Locke & Steve Vejcik Open Data Group. The Problem: Detect Significant Changes in Visa’s Payments Network. Account. Issuing Bank. Merchant.

haracha
Download Presentation

Data Quality Models for High Volume Transaction Streams: A Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Quality Models for High Volume Transaction Streams: A Case Study Joseph Bugajski Visa International Robert Grossman, Chris Curry, David Locke & Steve VejcikOpen Data Group

  2. The Problem: Detect Significant Changes in Visa’s Payments Network Account Issuing Bank Merchant Acquiring Bank

  3. Visa Payment Network • Over 1.59 billion Visa cards in circulation • 6800 transactions per second (peak) • Over 20,000 member banks • Millions of merchants

  4. The Challenge: Payments Data is Highly Heterogeneous • Variation from cardholder to cardholder • Variation from merchant to merchant • Variation from bank to bank

  5. Observe: If Data Were Homogeneous, Could Use Change Detection Model Baseline Model Observed Model • Sequence of events x[1], x[2], x[3], … • Question: is the observed distribution different than the baseline distribution? • Use simple CUSUM & Generalized Likelihood Ratio (GLR) tests 

  6. Key Idea: Build 104+ Models, One for Each Cell in Data Cube • Build separate model for each bank (1000+) • Build separate model for each geographical region (6 regions) • Build separate model for each different type of merchant (c. 800 types of merchants) • For each distinct cube, establish separate baselines for each metric of interest (declines, etc.) • Detect changes from baselines 20,000+ separate baselines Geospatial region Type of Transaction Bank Modeling using Cubes of Models (MCM)

  7. Greedy Meaningful/Manageable Balancing (GMMB) Algorithm Breakpoint • More alerts • Alerts more meaningful • To increase alerts, add breakpoint to split cubes,order by number of new alerts, & select one or more new breakpoints • Fewer alerts • Alerts more manageable • To decrease alerts, remove breakpoint,order by number of decreased alerts, & select one or more breakpoints to remove One model for each cell in data cube

  8. Augustus • Open source Augustus data mining platform was used to: • Estimate baselines for over 15,000 separate segmented models • Score high volume operational data and issue alerts for follow up investigations • Augustus is PMML compliant • Augustus scales with • Volume of data (Terabytes) • Real time transaction streams (15,000/sec+) • Number of segmented models (10,000+)

  9. Some Results to Date • System has been operational for 2.5 years • ROI • 5.1x Year 1 (over 6 months) • 7.3x Year 2 (12 months) • 10.0x Year 3 (12 months) • Currently estimating over 15,000 individual baseline models • The system has issued alerts for: • Merchants using incorrect Merchant Category Code (MCC) - account testing • Sales channel variations • Incorrect use of merchant city name field • Incorrect coding or recurring payments

  10. Summary • Used new methodology (Modeling using Cubes of Models) for modeling large, highly heterogeneous data sets • This project contributed in part to the development of a Baseline Model in the open source Augustus system • Integrated system to generate alerts using Baseline Models with manual investigation process • Project is generating over 10x ROI • Poster #20 in the Tuesday night Poster Session.

More Related