1 / 23

Mining Financial Data Histograms & Contingency Tables

In memory of Dr . Jan Zytkow SEP 09 1944 - JAN 16 2001. ITCS - 8265. Mining Financial Data Histograms & Contingency Tables. Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic. Agenda. Database Task goals Tool & technique used Data preparation and cleaning Attribute selection

obriend
Download Presentation

Mining Financial Data Histograms & Contingency Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In memory of Dr. Jan Zytkow SEP 09 1944 - JAN 16 2001 ITCS - 8265 Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

  2. Agenda • Database • Task goals • Tool & technique used • Data preparation and cleaning • Attribute selection • Data transformation • Data Mining/Pattern Evaluation • Knowledge presentation • Pros/Cons • Questions & Demonstration

  3. Database • Financial Dataset from PKDD 1999 • Financial Dataset from a Czech Bank • Relational Dataset • 8 Relations • ACCOUNT - LOAN • DEMOGRAPH - ORDER • TRANSACTION - CARD • DISPOSITION - CLIENT

  4. Task Goal • Determine Good Client to offer some additional service • Determine Bad Client to watch carefully to minimize bank loss • Offer Services : • Loan • Credit Card

  5. Technique Used - Histogram SQL Statement used SELECT age, COUNT(age) FROM table_x GROUP BY age ORDER BY age

  6. Technique Used – C-Tables SQL Statement used SELECT sex, COUNT(sex), age FROM table_x a, table_y b WHERE a.id = b.fid GROUP BY sex, age ORDER BY sex, age

  7. Technique Used – Correlation SQL Statement used SELECT x, y FROM table_x a, table_y b WHERE a.id = b.fid ORDER BY x, y

  8. Tool - Architecture

  9. Tool - Description

  10. Data Cleaning • Missing Value • Relation DEMOGRAPHIC • Incorrect Values • Relation TRANSACTION (Data reduced by 10% after cleaning)

  11. Data Preparation • Relation CLIENT • Separating SEX & BDATE from BIRTHNUMBER • All Date fields converted to AGE • Ref 199901.

  12. Data Preparation Cont…. • Creating Table definitions • Setting up data in table compatible format • Loading data into Database • Evaluate loading errors and changing attribute definitions accordingly

  13. A4? Y N A1? A6? N Y Y N Class1 Class2 Class1 Class2 Attribute Selection • Decision Relation • LOAN • Decision Attributes • STATUS • Classification Attributes • All other attributes that do not belong to LOAN relation.

  14. Data Transformation • Discretization • Continuous attributes into 4 to 10 buckets • Transactions performed in the year 1997 considered for relation TRANSACTION. • Due to resource limitations • Maximum loans were approved during this period TRANSFORM

  15. Data Mining/Pattern Evaluation • Run Histogram on all non-key attributes to study its distribution. • Discretize continuous attributes. • Run Contingency Table study the reference among two attributes. • Check significance with Correlation function if both attributes are continuous.

  16. Knowledge Presentation - 1 • All loans on accounts where a second person is allowed to dispose are GOOD LOANS (100%)

  17. Knowledge Presentation - 2 • Permanent Orders of type household & leasing indicates financial stability

  18. Knowledge Presentation - 3 • Accounts with Cash withdrawals are more likely to repay their loans

  19. Knowledge Presentation - 4 • Accounts with low transaction amounts indicate good loans

  20. Knowledge Presentation - 5 • Accounts that are in debt indicates BAD LOAN

  21. Pros • Flexibility to alter data presentation to understand the nature of data • Customers with no background with datamining can appreciate the output results because of its simplicity • Since there is a provision to store the results in a file, subsequent analysis on a subset of data becomes very easy

  22. Cons • Needs capability for Multi-Variable analysis. • Some kind of quantification needs to be put in. • Performance issues with using RDBMS.

  23. Questions & Demonstration Thank You

More Related