250 likes | 374 Views
Data Quality: Opportunities, Data, and Examples. Better and More Data. Level of analysis Take a quick look at what/why use data Linking data from disparate and third party sources Explore data types Typical issues & Tricks Cross validation and sourcing Reverse Look-up GIS layering
E N D
Better and More Data • Level of analysis • Take a quick look at what/why use data • Linking data from disparate and third party sources • Explore data types • Typical issues & Tricks • Cross validation and sourcing • Reverse Look-up • GIS layering • Backfill from text correlated to codes • Information from operations • Text analytics
Producer Segmentation Market Planning Revenue Forecasting Cross sell and Up sell Retention and Profitability Sales and Distribution Underwriting Claims Risk Selection and Pricing Portfolio Management Premium Adequacy Billing and Collections Management Payment Accuracy Claim Collaboration > Fraud Detection > Subrogation > Risk Transfer > 3rd Party Deductible > Reinsurance Recoverable General Organizational Overview An information business focused on risk taking. Make. Sell. Serve.
Same Problems – Different Lines of Business • Personal – Auto, HO, Umbrella • Small Commercial – BOP, CPP • Middle Market Commercial – CPP w/GL, CP, Crime, CIM, B&M, WC, Auto • Large Commercial Accounts • Commercial Auto • Workers Comp • Umbrella/Excess • Specialty Lines – D&O, EPL, E&O, Farm, FI
Data Types and Forms Structured data Semi-structured data Unstructured data Text Spatial Pictographic Graphic Voice Video
ACTIONS • Identify Data Systems • Get right data from right systems • Overcome internal Organizational Barriers • Bridge to legacy systems and archived data • Augment to create rich data mining environment • Expect the need to negotiate for resources Multiple Data Systems which must be pulled together for analysis. Great opportunity for cross-validation and sourcing Vendors/Partners Medical Data - Bill Review - PPO - Case Management - Paradigm Archive, Legacy Systems Current SystemClaim Data External Data Policy Multiple Underwriting Systems Multiple States Billing Systems Finance Systems CRM Systems, other data
Some typical external data sources and vendors Dun & BradstreetExperianBureau of Labor and Statistics Market Stance AM Best Equifax US Census Claritas Melissa DataISO GIS vendors U&C Data sets Code Sets for ICD-s and CPT’s …
Data Glitches – historical and on-going Systemic changes to data not process related • Changes in data layout / data types • Changes in scale / format • Temporary reversion to defaults • Missing and default values • Gaps in time series
Defining Issues-sample Source Data 1-Define Issues
Name: Country Identifiers Context: Definition: Unique ID: 5769 Conceptual Domain: Maintenance Org.: Steward: Classification: Registration Authority: Others DataElementConcept Algeria Belgium China Denmark Egypt France . . . Zimbabwe MORE ISSUES…Mapping across sources: Same Fact, Different Terms Data Elements Algeria Belgium China Denmark Egypt France . . . Zimbabwe L`Algérie Belgique Chine Danemark Egypte La France . . . Zimbabwe DZ BE CN DK EG FR . . . ZW DZA BEL CHN DNK EGY FRA . . . ZWE 012 056 156 208 818 250 . . . 716 Name: Context: Definition: Unique ID: 4572 Value Domain: Maintenance Org. Steward: Classification: Registration Authority: Others ISO 3166 3-Alpha Code ISO 3166 English Name ISO 3166 French Name ISO 3166 2-Alpha Code ISO 3166 3-Numeric Code
Data Filling • Manual • Statistical Imputation • Temporal • Spatial • Spatial-temporal
Deriving Data = Power • Totals: Household Income • Trends: Rate of Medical Bill Increases • Ratios: Claims/Premium, Target/Median • Friction: Level of inconvenience, ratio of rental to damage • Sequences: Lawyer-Doctor, Auto-Life Policy • Circumstances: Minimal Impact Severe Trauma • Temporal: Loss shortly after adding collision • Spatial: Distance to Service, proximity of stakeholders • Logged: Progress Notes, Diaries, • Who did it, When, “Why”
Deriving Data = Power (Cont’d) • Behavioral: Deviation from past usage, spike buying • Experience Profiles: Vendor, Doctor, Premium Audit • Channel: How applied, How reported, Service Chain • Legal Jurisdiction: Venue Disposition, Rules • Demographics: Working, Weekly wage, lost income • Firmographics: Industry Class Code Vs Injuries Claimed • Inflation: Wage, Medical, Goods, Auto, COLA • Gov’t Statistics: Crime Rate, Employment, Traffic • Other Stats: Rents, Occupancy, Zoning, Mgd Care
“Search” versus “Discover” Search (goal-oriented) Discover (opportunistic) Structured Data Data Retrieval Data Mining Unstructured Data (Text) Information Retrieval Text Mining
Jimmy Jim James JAMES JAMES JAMES Searching Input Value [Jim] Returns “Similar Matches” All Records Found: Jimmy Jim James Word Replacement Lists TransformedInput Value [JAMES]
Motivation for Text Mining • Approximately 90% of the world’s data is held in unstructured formats (source: Oracle Corporation) • Information intensive business processes demand that we transcend from simple document retrieval to “knowledge” discovery. Structured Numerical or Coded Information 10% Unstructured or Semi-structured Information 90%
Techniques for attacking text data: • Rules-based • Statistical Text Analysis and Clustering • Linguistic and Semantic Clustering • Support Vector Machines • Pattern Matching or other statistical algorithms • Neural Networks • Combination of methods from above Text is like a data iceberg
Home Office Staff • Field Office Claim Staff • Insured Risk Manager • Agent or Broker • Medical Management Staff • Special Investigation Unit • NICB • Vendor Management • Consulting Engineers • Hearing Representative • Structured Settlement Unit • Recovery Staff • Legal Staff • Diary forward – “call Dr Jones next week” • Business Rule – large loss review • System Reminder – update case reserves • Correspondence Tracking – legal letter sent Claims processing – Progress notes and Diaries Service CLAIMS ADJUSTER
Semantic processing: Named Entity Extraction • Identify and type language features • Examples: • People names • Company names • Geographic location names • Dates • Monetary amount • Phone #, zipcodes, SSN, FEIN • Others… (domain specific)