1 / 6

Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out

Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out Written By: Putten, Kok, Gupta Presented By: Ernesto Ochandio DSCI 5240 November Dec 7, 2005. Problem Definition. Exponential growth in data capture leads to data fragmentation .

Download Presentation

Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out Written By: Putten, Kok, Gupta Presented By: Ernesto Ochandio DSCI 5240 November Dec 7, 2005

  2. Problem Definition • Exponential growth in data capture leads to data fragmentation. • POS customer tracking • Corporate Data Warehouse • Advanced Analytics • Increased popularity of personalized messages. • Prohibitive attitudinal data costs.

  3. Data Fusion Overview • Data Fusionisthe combination of information from different sources. • Also known as: Micro Data Set Merging, Statistical Record Linkage, and Multi-Source Imputation • Example: • Demographic and psychographic data aggregated at geographical level. • Same characteristics for people in the same region. • Motivation: • Algorithms can create generalized fusions providing richer data sets for use in applications or future data mining projects.

  4. Data Fusion Terminology • Recipient, Donor, Fused Variables, Common Variables, Critical Common Variables CommonVariables + = FusedVariables Recipient Donor Fused Dataset

  5. Data Fusion Algorithm • Find best Donor elements that match the Recipient element. • Ensure Critical Variable exact match. • Limit Donor element usage. • Use averages from the Donor set to estimate the Fused variables for the Recipient set. + = Recipient Donor Fused Dataset

  6. Conclusion • Data Fusion increases the value of Data Mining by creating more data to mine while reducing costs and ensuring the best matches possible without over-representing elements in the Donor set.

More Related