1 / 37

The Prospect of the Structure of Data Mining Solution in the Future

The Prospect of the Structure of Data Mining Solution in the Future. Huh, Moon Yul, Song, Kwang Ryeol (SKKU) Kim, Dongwoo (Arsmagna Inc.). 1. Data Mining?.

carrie
Download Presentation

The Prospect of the Structure of Data Mining Solution in the Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Prospect of the Structure of Data Mining Solution in the Future Huh, Moon Yul, Song, Kwang Ryeol (SKKU) Kim, Dongwoo (Arsmagna Inc.)

  2. 1. Data Mining?

  3. "Simply put, data mining is used to discover patterns and relationships in your data in order to help you make better business decisions” - Robert Small, Oracle

  4. Ex1) Mail a promotional offer to all households with two children and a working Mother. - Determine the most profitable groups based on purchase history, and mail the promotion to them. Ex2) Set the life insurance policy rate based on age, sex, and whether the person smokes or not. - Analyze lifetime profitability and risk in the existing book of business, and use this analysis to forecast future risk for the policyholder

  5. What is the source of data mining? Data What is the source of data? Persons

  6. For better business decisions, try to understand the behaviors of the persons concerned

  7. 2. What is the process or the steps of data mining?

  8. Data Preparation Interpretation and Evaluation. Data Filtering Data Mining Take Action determine the business goals or problems. Select the target data and appropriate databases. Cleanse the data. Apply strategies for missing and noisy data. Data transformation and selection of appropriate subsets of data may be necessary to improve the accuracy of the model prediction. Apply the mining tool. Obtain the data structures in the form of a decision tree or a set of rules. Obtain the relationships or trends in the data using sophisticated machine learning or statistical methods. If the discovered knowledge is deemed useful, apply this to the problem. If not, any or all of the previous steps will be repeated. In reality, KDD process can involve significant iterations of the above steps. Use the knowledge acquired from the KDD process to make productive business decisions.

  9. Academics interest (step 3) Machine-Learning techniques Neural nets Decision tree induction memory-based reasoning Clustering Other techniques Statistical Analysis Data Visualization

  10. Example: PAKDD01 at Hongkong, 2001 April. There are 143 submitted: about 1/3 of them are related to classification, 1/3 of them related to decision trees, association rules and neural networks. Most of the time spent for data mining process is with step 1 and 2 - data handling part.

  11. 3. Tools

  12. Over 50 products are listed in KDnuggets. http://www.kdnuggets.com/software/suites.html

  13. Features of these tools

  14. 1. Cockpit style windowMenu-oriented (DeltaMiner)

  15. 2. Drag-and-drop (Clementine)

  16. 3. Job flow diagrams(K-Wiz)

  17. 4. Output designs: Some are using separate windows from each process(Kensington)

  18. One output sheet for a series of mining process / intelligent outputs (Jasp)

  19. 5. Client-Server systems.6. Switching to JAVA technology.7. Emphasize visualization.

  20. Too much emphasis on fancy pictures (MineSet confusion matrix)

  21. 4. Future trends :Visual data mining

  22. - Data mining process will be facilitated by wide variety of data visualization what we call visual data mining. -This will be greatly influenced by the innovation of communication, computing and software development.

  23. 1. Communication High speed of network enables us to easily access vast amount of data distributed over different locations => Server-Client mining solutions.

  24. 2. Computing -Visualization techniques with very large data base - Time consuming computing processes like testing can be achieved with faster computing systems. - This will enable us to handle those complicated problems that have been considered impossible in the past.

  25. 3. Software: JAVA technology -Plug-and-Play -Easy-to-implement graphics programming -Data flow -built and designed for network environment => Server-client visual data mining system

  26. 4. Open question and proposal Can we build a data analysis tool that evolves as it is used?

  27. 1. Classical statistical packages: - The operations of a statistical packages: pre-specified pull-down menus. - Each menu again has its own pre-specified submenus, and so on .… => Conducted under the constraint of the package developer's capability.

  28. 2. Mining solutions - Consists of many components. - The components are very high level, hides technical complexities which the users are not interested in. => Users can integrate these components and construct a new project or streamthat can handle their domain-specific problems.

  29. Is it possible for us to build a new component by integrating some of the existing components?

  30. =>The users can realize their concepts into a new component.=>The building process will be a recursive one, and an evolutionary data mining solution will be possible.-Can we make this possible? How?

  31. We can attack this problem using the JAVA Beans technology.

  32. JavaBean: a software component architecture It is the platform-neutral architecture for the Java application environment. It's the ideal choice for developing or assembling network-aware solutions for heterogeneous hardware and operating system environments--within the enterprise or across the Internet. In fact, it's the only component architecture you should consider if you're developing for the Java platform. http://java.sun.com/products/javabeans/

  33. Elements of a component - properties: aspects of a component’s state - events: a notification generated by a component - methods: invoked to execute codes in a component Des. Stat. data plots

  34. B A C D E A* E D

  35. Data mining system DB DB DB RECEIVER SOURCE RECEIVER DAC COMPONENT 1 COMPONENT i COMPONENT k INFORMATION BUS

  36. Example : A data mining solution for company A ( company’s use of Korea statistics office DB) Statistics Office DB Component A Middle ware Data filtering Clustering Statistical graphics Descriptive statistics Data / Information flow

  37. Future Refinement Statistics Office DB Component A* Middle Ware New Filtering Technique Component A Decision Tree Visualization Technique Data / Information flow

More Related