370 likes | 485 Views
The Prospect of the Structure of Data Mining Solution in the Future. Huh, Moon Yul, Song, Kwang Ryeol (SKKU) Kim, Dongwoo (Arsmagna Inc.). 1. Data Mining?.
E N D
The Prospect of the Structure of Data Mining Solution in the Future Huh, Moon Yul, Song, Kwang Ryeol (SKKU) Kim, Dongwoo (Arsmagna Inc.)
"Simply put, data mining is used to discover patterns and relationships in your data in order to help you make better business decisions” - Robert Small, Oracle
Ex1) Mail a promotional offer to all households with two children and a working Mother. - Determine the most profitable groups based on purchase history, and mail the promotion to them. Ex2) Set the life insurance policy rate based on age, sex, and whether the person smokes or not. - Analyze lifetime profitability and risk in the existing book of business, and use this analysis to forecast future risk for the policyholder
What is the source of data mining? Data What is the source of data? Persons
For better business decisions, try to understand the behaviors of the persons concerned
Data Preparation Interpretation and Evaluation. Data Filtering Data Mining Take Action determine the business goals or problems. Select the target data and appropriate databases. Cleanse the data. Apply strategies for missing and noisy data. Data transformation and selection of appropriate subsets of data may be necessary to improve the accuracy of the model prediction. Apply the mining tool. Obtain the data structures in the form of a decision tree or a set of rules. Obtain the relationships or trends in the data using sophisticated machine learning or statistical methods. If the discovered knowledge is deemed useful, apply this to the problem. If not, any or all of the previous steps will be repeated. In reality, KDD process can involve significant iterations of the above steps. Use the knowledge acquired from the KDD process to make productive business decisions.
Academics interest (step 3) Machine-Learning techniques Neural nets Decision tree induction memory-based reasoning Clustering Other techniques Statistical Analysis Data Visualization
Example: PAKDD01 at Hongkong, 2001 April. There are 143 submitted: about 1/3 of them are related to classification, 1/3 of them related to decision trees, association rules and neural networks. Most of the time spent for data mining process is with step 1 and 2 - data handling part.
Over 50 products are listed in KDnuggets. http://www.kdnuggets.com/software/suites.html
4. Output designs: Some are using separate windows from each process(Kensington)
One output sheet for a series of mining process / intelligent outputs (Jasp)
5. Client-Server systems.6. Switching to JAVA technology.7. Emphasize visualization.
Too much emphasis on fancy pictures (MineSet confusion matrix)
- Data mining process will be facilitated by wide variety of data visualization what we call visual data mining. -This will be greatly influenced by the innovation of communication, computing and software development.
1. Communication High speed of network enables us to easily access vast amount of data distributed over different locations => Server-Client mining solutions.
2. Computing -Visualization techniques with very large data base - Time consuming computing processes like testing can be achieved with faster computing systems. - This will enable us to handle those complicated problems that have been considered impossible in the past.
3. Software: JAVA technology -Plug-and-Play -Easy-to-implement graphics programming -Data flow -built and designed for network environment => Server-client visual data mining system
4. Open question and proposal Can we build a data analysis tool that evolves as it is used?
1. Classical statistical packages: - The operations of a statistical packages: pre-specified pull-down menus. - Each menu again has its own pre-specified submenus, and so on .… => Conducted under the constraint of the package developer's capability.
2. Mining solutions - Consists of many components. - The components are very high level, hides technical complexities which the users are not interested in. => Users can integrate these components and construct a new project or streamthat can handle their domain-specific problems.
Is it possible for us to build a new component by integrating some of the existing components?
=>The users can realize their concepts into a new component.=>The building process will be a recursive one, and an evolutionary data mining solution will be possible.-Can we make this possible? How?
JavaBean: a software component architecture It is the platform-neutral architecture for the Java application environment. It's the ideal choice for developing or assembling network-aware solutions for heterogeneous hardware and operating system environments--within the enterprise or across the Internet. In fact, it's the only component architecture you should consider if you're developing for the Java platform. http://java.sun.com/products/javabeans/
Elements of a component - properties: aspects of a component’s state - events: a notification generated by a component - methods: invoked to execute codes in a component Des. Stat. data plots
B A C D E A* E D
Data mining system DB DB DB RECEIVER SOURCE RECEIVER DAC COMPONENT 1 COMPONENT i COMPONENT k INFORMATION BUS
Example : A data mining solution for company A ( company’s use of Korea statistics office DB) Statistics Office DB Component A Middle ware Data filtering Clustering Statistical graphics Descriptive statistics Data / Information flow
Future Refinement Statistics Office DB Component A* Middle Ware New Filtering Technique Component A Decision Tree Visualization Technique Data / Information flow