190 likes | 360 Views
~Khaled Shaban PhD. Candidate Supervisors: Dr. Otman Basir Dr. Mohammad Kamel. Previous work. MSc. Thesis, 2002, “Information Fusion in a Cooperative Multiagent System for Web Information Retrieval”
E N D
~Khaled Shaban PhD. Candidate Supervisors: Dr. Otman Basir Dr. Mohammad Kamel
Previous work • MSc. Thesis, 2002, “Information Fusion in a Cooperative Multiagent System for Web Information Retrieval” • K. B. Shaban, O. A. Basir, K. Hassanein, and M. Kamel, "Intelligent Information Fusion Approach in Cooperative Multiagent Systems", World Automation Congress. June 2002. • K. B. Shaban, O. A. Basir, K. Hassanein, and M. Kamel, "Information Fusion in a Cooperative Multi-agent System for Web Information Retrieval", The Fifth International Conference On Information Fusion, July 2002.
Envisioned View of the System User User Personal Agent Personal Agent Intermediate “Fusion” Agent Resource “Information Retrieval” Agent Environment “The Web” System vision
A1 Z1 R1 Z1 A2 Z2 Z1 R2 Z2 Zn Environment A1 Z3 A3 An A1 A2 An R1 Zn RG Rn R1 DECISION MAKER (a) Markovian team. (b) Centralized team. R2 R2 Rn DECISION MAKER A2 An RG RG Zn Z2 Consensus team. (c) Decision Fusion
Retrieval Agent AltaVista Retrieval Agent Personal Agent Fusion Agent Excite Retrieval Agent AltaVista Implementation
Current Project “Semantic-based Document Clustering”
Project Goals • Clustering documents based on semantic similarities of their contents • Lend ideas to other mining projects • PhD. thesis by 2005/2006!
Document Cluster Low Inter-cluster similarity Document Cluster Clustering Documents High Intra-cluster similarity Document Cluster Document Clustering
Applications • Improve information retrieval systems performance • Improve the organization and viewing of documents • Accelerate nearest-neighbour search • Generate directories of hierarchy clusters • Improve automatic speech recognition systems
Existing Schemes • Data representation models • Documents as bags-of-words (Vector Space Model (VSM)) • N-grams • Latent Semantic Indexing (LSI) • Phrase-based • Similarity measures • Euclidean distances • Minkowski distances
Existing Schemes, Cont. • Clustering algorithms • Partitioning (k-means & Fuzzy C-means) • Geometric (Self-Organized Maps (SOM), LSI) • Probabilistic (Maximization Expectation (ME), Probabilistic LSI) • Evaluation methods • Entropy • F-measure • Overall Similarity
Shortcomings • Abandoning meanings produce wrong results! • Ex. • ”John eats the apple standing beside the tree“ vs. ”The apple tree stands beside John’s house” • ”John is an intelligent boy“ vs. “John is a brilliant son”
Knowledge Representation scheme Parse Tree Documents Document Cluster Syntactic analysis Semantic analysis Semantic- based document clustering Document Cluster Document Cluster Proposed Approach
Proposed Approach - Steps • Preprocess text • Remove tags, hyperlinks, etc. • Morphological analysis • Identifying words, punctuations, etc. • Syntactic analysis • Building sentences grammatical structures (Parse Tree) • Semantic analysis • Assigning meaning to words • Discourse integration • Pragmatic analysis • Knowledge representation structure • Clustering using the produced representations • New similarity measures • New clustering algorithm • Better document clustering results (hopefully!)
Parse Trees sent 2 sent 1 clause 1 clause 1 clause 2 adv np vg adv np np vg np prep prep n v apos det v n n n det n det standing the apple house the John eats tree the tree stands beside John beside apple ‘s Illustration • “John eats the apple standing beside the tree.” vs. “The apple tree stands beside John’s house.”
eats the apple John Act 1 Obj 2 Obj 1 The apple tree Stands beside John’s house standing beside the tree Obj 1 St 1 Act 2 Obj 3 Illustration, Cont. Knowledge Representations
Relation to LORNET? • Findings can be applied to Learning Objects (LO) mining • Knowledge Representations • Clustering • Classification • Retrieval • Knowledge Sharing
Phase 1 Grad. courses Lit. review Proposal Comp. Exam Phase 2 Development Experimentations Evaluations Phase 3 Reporting Thesis writing Defence Jan 03 Jan 04 Jan 05 Milestones