230 likes | 439 Views
Intelligent Internet Agents for Distributed Data Mining {yzhang, sowen, sprasad, raj}@cs.gsu.edu gjv@ece.gatech.edu. Yanqing Zhang, Scott Owen, Sushil Prasad and Raj Sunderraman Department of Computer Science Georgia State University George Vachtsevanos
E N D
Intelligent Internet Agents for Distributed Data Mining{yzhang, sowen, sprasad, raj}@cs.gsu.edugjv@ece.gatech.edu Yanqing Zhang, Scott Owen, Sushil Prasad and Raj Sunderraman Department of Computer Science Georgia State University George Vachtsevanos School of Electrical and Computer Engineering Georgia Institute of Technology
Outline • Motivation • Architecture of Intelligent Internet Agents • Program Libraries of Intelligent Middleware • Smart Web Search Agents • Intelligent Soft Computing Agents • Benefits • Deliverables • Conclusion
Motivation • Distributed Web KDD: Useful information and knowledge mined in distributed Web databases • QoS (Efficiency, Web Speed, User Time) : Huge amounts of useless data flow on the Internet • From Data Web to Information Web: Upgrade a current data-flow-oriented Internet to a future information-flow-oriented Internet • Intelligent Web Middleware: with reusable, portable and scalable intelligent functionality • Smart E-Business: Use intelligent Web agents to do better E-Business on the Internet
Architecture of Intelligent Internet Agents Application Layer: E-Commerce, E-Education, other E-B Intelligent Layer: Data Mining, Soft Computing, ES, etc Network Layer: Backbone, gigaPoPs, other hardware
Program Libraries of Intelligent Middleware • Binary Association Rule Generator • 2. Fuzzy Association Rule Generator • Neural-Net-based Data Classifier and Pattern Generator • Fuzzy c-means Program for Data Clustering • Genetic Algorithms for Data Refinement and Optimization • Granular Neural Nets for Linguistic Data Mining • XML-based Smart Web Search Sub-Programs • Connection Programs between Database and Middle Layer • Local Cache Database Manager • Local Cache Informationbase Manager • Basic GUI Programs • Client-Server Creation and Communication Programs • Distributed Operation Manager • Distributed Data Mining Synchronization, • Web Customer Log Miner, .….. , and so on.
Smart Web Search Agents • Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: - Directories (Yahoo, Lycos, etc) - Search Engines (AltaVista, NorthernLight, etc) - Metasearch Engines (MetaCrawler, SavvySearch, AskJeeves, etc) All of these involve keyword searches; Drawback: not easily personalized, too many results (although many give relevancy factors)
- Smart Search Agents will provide - more personalized searches - domain-based search, - more efficient searches
Smart Search Agents will employ - local cache databases (containing frequently asked queries/results; possibly updated periodically - nightly!) - local cache information base (containing mined information and discovered knowledge for efficient personal use) - domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)
Some initial results: • M. Nagarajan, Metagenie - A metasearch engine for multi-databases, M.S. thesis, GSU (July 1999) Domains: Jobs, Books • S. Ahmed, EXACT-FINDER: A cache-based meta-search engine, M.S. thesis, GSU (May 2000) Local cache database storing personalized frequently asked queries and results, updated periodically • R. Sunderraman, ReQueSS: Relational Querying of semi-structured data, ICDE 2000 (demo session),San Diego, CA, March 2000. • X. Li, Querying unified sources of Web data, M.S. thesis, GSU (July 1999) Data wrappers for Web sources (NBA stats/box scores, DBLP Bibliography database)
Intelligent Tools for E-Business • Computational Intelligence, Neural Networks, Fuzzy Logic, Genetic Algorithms, Hybrid Systems • Learning Algorithms, Heuristic Searching • Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery • Prediction & Time Series Analysis • Information Retrieval, Intelligent User Interface • Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems
Enhancing E-Business Process Through Data Mining • Quality of discovered knowledge • Having right data • Having appropriate data mining tools!!! • Traditional Data Mining Tools • Simple query and reporting • Visualization driven data exploration tools, OLAP • Discovery process is user driven
Intelligent Data Mining Tools • Automate the process of discovering patterns/knowledge in data • Require hypothesis, exploration • Derive business knowledge (patterns) from data • Combine business knowledge of users with results of discovery algorithms
Intelligent Information Agents • The Data Mining Problem: • Clustering/ Classification • Association • Sequencing • Viewed as an Optimization Problem • Tools: Genetic Algorithms
Fuzzy Rules Discovering • Rules discovering : The discovery of associations between business events, i.e. which items are purchased together • In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge • Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query • Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data
Example of 3 Service Provider’s Features 3 R R R R ( isk- esponse- etention ( ) Model) Fuzzy Decision Making:Match Users with Dynamic Products, Services, and Pricing Low Risk High Response High Retention -> Customer: Preferred Pricing: according to Life-time Value Cross-Selling: Bundle Extra Liability Insurance Loss Ratio R Low Medium High ( isk) Persistency Low Medium High R ( etention) Low MediumHigh R esponse
Measuring Performance of Intelligent Agents • Accuracy : distance or variance measure of IAs’ performance from their goal, i.e. Fuzzy Entropy • Speed : latency of response • Cost : resources consumed, consequences of failures • Benefit : payoff for goals achieved
Performance Assessment, Learning and Optimization Learning/ Adaptation Performance Evaluation Module Goals/ Objectives
Examples • Product Information Clustering • Use a GA as the Heuristic Search Engine • Apply the GA selection and inversion operators • Evaluate information content • Estimate system entropy • Apply reinforcement learning strategy • Dynamic Pricing • In addition to above steps, explore association and sequencing relations
The “New Technology” Paradigm Internet Related Technologies Euphoria/ Optimism Reality Back to Basics Time
INFORMATION IS SELLING NOW! Intelligent Agents will give your information product bargaining power
Benefits • Better QoS: - Web users get information (not raw data) - Smart agents can make decisions for users - Smart agents can save users’ surfing time • Faster Internet: - Information flows on the Internet quickly (e.g., 1k information << 100 k raw data) - Reduce data redundancy on the Internet - Reduce Web communication congestion
Deliverables • Intelligent Middle Layer - Data Mining Program Libraries - Soft Computing Program Libraries (e.g., Neural Networks, Fuzzy Logic, Genetic Algorithms, Neuro-fuzzy Systems) • Application Layer - Smart Web Search Agents - Intelligent Soft Computing Agents
Conclusion • To make the future Internet more intelligent and more efficient, it is necessary to design relevant "Intelligent Middleware" between network hardware and high-level Web application systems. • We will first design basic intelligent middle layer with basic intelligent functionality, and then implement two Web application systems for distributed data mining and E-Business.