160 likes | 353 Views
D ATA M INING A N O VERVIEW. BY : J OSEPH C ASABONA. Data Warehouse-->. O VERVIEW. What is Data Mining? Introduction to KDD Type of Data found using Data Mining The 4 Goals of Data Mining Case Study: MetLife. W HAT IS D ATA M INING ?.
E N D
DATA MININGAN OVERVIEW BY: JOSEPH CASABONA Data Warehouse-->
OVERVIEW • What is Data Mining? • Introduction to KDD • Type of Data found using Data Mining • The 4 Goals of Data Mining • Case Study: MetLife
WHATIS DATA MINING? • Definition: The mining or discovery of new information in terms of patterns or rules from vast amounts of data • Adds more functionality than a DBMS • Creates relationships within the data • One step in the KDD Process
KDD • Stands for "Knowledge Discovery in Databases" • Six step process that helps us organize and extract new data from already existing data • The six steps are: data selection, cleansing, enrichment, transformation, mining, and report generation.
KDD CONT. • Selection and cleaning grab and validate the data to make sure it's good, complete, and proper. • Enrichment will add more to the data from other sources. • Transformation then limits the data in some way
DATA MINING • Result is new information the user would not know just by standard querying. • Can be in the form of: • Association Rules • Sequential Patterns • Classification Trees
THE FOUR GOALSOF DATA MINING • Prediction: Using current data to make prediction on future activities • Identification: "Data patterns can be used to identify the existence of an item, an event, or an activity"
THE FOUR GOALSCONT. • Classification: Breaking the data down into categories based on certain attributes. • Optimization: Using the mined data to make optimizations on resources, such as time, money, etc.
DATA MINING EXAMPLES • Most have been consumer bases • Applicable in most industries • Next: Case Study on MetLife
CASE STUDY: METLIFE Company Profile MetLife, Inc. is a leading provider of insurance and other financial services to millions of individual and institutional customers throughout the United States. Established in 1863, Metlife now has offices all over the US and the world, and offers ten different types of insurances and financial services.
CASE STUDY: METLIFE Industry: Insurance and Financial Services How they use Data Mining: Fraud Detection
CASE STUDY: METLIFE • Project first started in 2001 • MetLife set out to build $50 Million relational database • This project would consolidate data from 30 business world wide.
CASE STUDY: METLIFE • Around same time, it was reported that $30 Million of insurance money went to fraudulent claims. • MetLife teamed up with Computer Sciences Corporation (CSC) to • License their data mining tool (called Fraud Investigator), • Develop @First, "an early fraud detection system"
CASE STUDY: METLIFE • By 2003, MetLife's data mining operation was in full swing. • They were able to detect fraud in a fraction of the time it would take in man hours • One example is detecting rate evasion
CASE STUDY: METLIFE • Rate evasion is lying about where you live to pay lower premiums. • Metlife used data mining to detect rate evasion by matching ZIP codes with phone numbers to see if the cities matched. • In 2.5 hours, Metlife found 107 fraudulent claims, all linked to a rate-evasion ring in NY and Massachusetts.