270 likes | 483 Views
Personalizing Information Retrieval in CRISs with Fuzzy Sets and Rough Sets. Germán Hurtado Martín 1,2 Chris Cornelis 2 Helga Naessens 1 1. University College Ghent, 2. Ghent University (Belgium). Overview. Problems in CRISs Fuzzy sets and Rough sets PAS project. Overview.
E N D
Personalizing Information Retrieval in CRISs with Fuzzy Sets and Rough Sets Germán Hurtado Martín1,2 Chris Cornelis2 Helga Naessens1 1. University College Ghent, 2. Ghent University (Belgium) CRIS 2008
Overview • Problems in CRISs • Fuzzy sets and Rough sets • PAS project CRIS 2008
Overview • Problems in CRISs • Fuzzy sets and Rough sets • PAS project CRIS 2008
Problems in CRISs Fuzzy Term = Term Rough CRIS 2008
Overview • Problems in CRISs • Fuzzy sets and Rough sets • PAS project CRIS 2008
Fuzzy sets and rough sets • Traditional approach: crisp sets Young people = {x People | 0<age(x)<27} CRIS 2008
Fuzzy sets and rough sets • Fuzzy approach: fuzzy sets 0 if age(x) ≥ 30 1 if age(x) ≤ 20 (30 – age(x)) / 10 otherwise Young(x) = CRIS 2008
Fuzzy sets and rough sets • Rough approach: rough sets • Upper approximation (R↑A) R↑A = {Num. Analysis, Ex. Sciences, Statistics, ... , Coding Theory} A = {Numerical Analysis} B = {Compilers} R↑B = {Compilers, Programming, GCC, YACC} CRIS 2008
Fuzzy rough sets • Fuzzy approach on rough sets • Fuzzy set A • Fuzzy relation R • R (x,y) • Upper approximation • (R↑A)(y) = min(R(x,y),A(y)) CRIS 2008
Fuzzy rough sets: application • Query expansion • Allows more results by using R↑A • - Query: “Programming” • - Expanded query: {(“Programming”,1.0), (“C++”,0.8), (“Java”,0.8), (“Algorithm”,0.6)} CRIS 2008
Overview • Problems in CRISs • Fuzzy sets and Rough sets • PAS project CRIS 2008
PAS-project • What is the PAS-project? • Personal Alert System (HoGent) • Goal: to get the researcher’s attention on funding possibilities that match his/her profile • Information: about researchers, projects, funding possibilities (grants etc.) → matching/collaboration • Automation and intelligence CRIS 2008
IWETO Thesaurus HoGent Thesaurus PAS – How does it work? • Name • Staff number • Department(s) • Group • Date of creation of the profile • Last update of the profile • Percentage research time • Skills description • Diplomas • Publications • IWETO-keywords • Free keywords Fill in User CRIS 2008
IWETO Thesaurus HoGent Thesaurus PAS – How does it work? • Reference • Title • Content • Attachment(s) • Level • Duration • Institution • Deadline • Address • Contact person • IWETO-keywords • Free keywords Messages CRIS 2008
1 2 3 PAS – How does it work? • The IWETO-classification has 641 research fields: 5 at the 1st level, 31 at the 2nd level, 605 at the 3rd level CRIS 2008
1 2 3 PAS – How does it work? • By adding “free keywords” we can refine the classification 0.6 0.7 0.8 CRIS 2008
PAS – How does it work? Query: A = {k3} Expanded query: R↑A = {(k1,0.8), (k3,1.0), …} M1 → R2 CRIS 2008
PAS – How does it work? 0.6 0.7 0.8 0.7 CRIS 2008
PAS – Current implementation • Prototype that will be used as skeleton for the final system • Basic algorithm using weights and their products and basic fuzzy rough query expansion1 • Basic profiles and messages • Manual processing of feedback and manual data extraction from text files. 1 P. Srinivasan, M. E. Ruiz, D. H. Kraft, J. Chen: Vocabulary mining for information retrieval: rough sets and fuzzy sets, Information Processing and Management, 37(1) (2001) 15-38 CRIS 2008
PAS – Future work • Richer representation of profiles and messages • Automation of the feedback mechanism • Dealing with imprecision and words from different thesauri • Dealing with ambiguity and incomplete profiles • Tracking research activities for collaboration • Automatic extraction of information from text files • Search engine CRIS 2008
Thank you CRIS 2008