290 likes | 354 Views
Models: Do You Trust Them?. 2003 CAS Annual Meeting Louise Francis, FCAS, MAAA Louise_Francis@msn.com Francis Analytics and Actuarial Data Mining, Inc. Overview. Data Quality Data Cleaning Software Errors Model Assumptions
E N D
Models: Do You Trust Them? 2003 CAS Annual Meeting Louise Francis, FCAS, MAAA Louise_Francis@msn.com Francis Analytics and Actuarial Data Mining, Inc.
Overview • Data Quality • Data Cleaning • Software Errors • Model Assumptions • Questions About Key Assumptions Underlying Popular Models in Finance • Option Pricing Theory • Value at Risk • CAPM
Data Mining Models • Advanced modeling techniques applied to large data bases • Many records • Many variables • Some uses • Credit scoring • Fraud detection • Pricing
Data Issues • “Misplaced faith in black boxes: Data Mining is sometimes perceived as a black box, where you feed the data in and interesting results and patterns emerge. Such an approach is particularly misleading when no prior knowledge or experience is used to validate the results of the mining exercise” • Exploratory Data Mining and Data Cleaning, by Dasu and Johnson
Data Exploration and Cleaning • The overwhelming majority of the effort in data modeling is expended on understanding and cleaning data • Generally 85% or more of the effort is spent on data issues • This gets the modeler to the point of applying a modeling technique
Dirty Data • A fact of life for actuaries • Even more of a problem when working with large complex databases • The information for many variables that are not used to produce key financial numbers are inaccurately or incompletely recorded
Examples of Data Problems • Examples are based on actual problems encountered in Data Mining projects • Examples use simulated data
Detecting Unusual Data: Box and Whisker Plot of Workers’ Compensation Payments
Data Challenges • Heterogeneity and Diversity of Data • Join Keys • Scale • Metadata
The Fraud Study Data • 1993 AIB closed PIP claims • Dependent Variables • Suspicion Score • Expert assessment of liklihood of fraud or abuse • Predictor Variables • Red flag indicators • Claim file variables • Errors were introduced into data for two variables, suspicion score and claimant age
Data Spheres • Applied to numeric data • Can apply to a number of variables simultaneously to detect outliers • Compute standardized value for each variable, yi • Compute Mahalanobis distance:
Data Spheres • More typical values on variables will fall at the center of the data sphere • Less typical values and outliers will be in outer layers • Can look at which variables most influence the Mahalanobis distance
Spreadsheet Errors • A large percentage of spreadsheets contain errors. One study found errors in 86% of spreadsheets • From Raymond Panko “What We know About Spreadsheet Errors” • Methods for finding and correcting errors are fairly well developed for programming in computer languages • Such methods are much less frequently applied when the model is in a spreadsheet
Questioning Model Assumptions • Option Pricing Theory
Option Pricing Theory • Option Pricing Formula widely used in finance in pricing options and other derivatives • The formula assumes asset distributions are normal or lognormal • Evidence that asset return data does not follow the normal distribution is widely available • 1976 Fama paper in Journal of the American Statistical Association
Normal Distribution Assumption • The normality assumption is common in other finance application • Value at risk • CAPM
Consequences of Assuming Normality • The frequency of extreme events is underestimated – often by a lot • Example: Long Term Capital • “Theoretically, the odds against a loss such as August’s had been prohibitive, such a debacle was, according to mathematicians, an event so freakish as to be unlikely to occur even once over the entire life of the universe and even over numerous repetitions of the universe” • When Genius Failed by Roger Lowenstein, p. 159