220 likes | 302 Views
Mining for Interactive Identification of Users’ Information Needs. Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University. Outline. Introduction Information Need Identification (INI): What & Why Interactive INI
E N D
Mining for Interactive Identification of Users’ Information Needs Rey-Long Liu and Wan-Jung Lin 劉瑞瓏ヽ林宛蓉 Dept. of Information Management Chung Hua University
Outline • Introduction • Information Need Identification (INI): What & Why • Interactive INI • INEED: Incremental Mining for Interactive INI • The profile miner • The information need identifier • Experiment • Conclusion
C12 CR ‧‧‧ C1 C1 Cn C11 C12 C11 C12 Cn1 Cn2 C111 C112 C121 C122 C121 C122 Cn11 Cn12 Cn21 Cn22 C1212 C1211 C1212 C1211 C1212 Cn22 C1211 C1212 C1211 C1212 Introduction • Information Need Identification (INI) for • Information portals • Online service guidance • Internet search engines • People finding • Interactive INI, which needs to consider • Precision (P) • Precision Effectiveness (PE) • Recall (R) • Recall Effectiveness (RE)
Introduction (Cont.) • Main Challenges • Each information space has its own content and structure. • Each information space is intrinsically dynamic. • Users are often unable (or unwilling) to precisely express their information needs (INs). Their queries are often quite short. • Users prefer simpler and fewer interactions.
(4) Information Required Information Provider INEED IN Identifier (1)Interaction (3) Information Category Profile (2)Request Information Storage Profile Miner Interface (0)Content & Taxonomy INEED
The Profile Miner Incremental profile mining • Given: The document d to be added to category c. • Effect: Updating the profiles of c and related categories. • Procedure: (1) While c is not the root of the text hierarchy, do (1.1) For each distinct word w in d, do (1.1.1) If w is not a profile term for c, add <w, sw,c> to the profile of c (strength sw,cis unknown); • (1.2) For each pair <w, sw,c> in the profile of c, do • (1.2.1)sw,c = P(w|c) (Bc / iP(w|ci)); • (1.2.2) For each sibling b of c, update sw,b in the profile of b; • (1.3)c father of c.
New document added to f The Profile Miner (Cont.) The s-values of the profile terms are updated ‧‧‧ f ‧‧‧ ‧‧‧ ‧‧‧ The s-values of the profile terms are updated Updating the profiles of related categories once a document is added
製造部 產品生產、設計製造 電腦整合課 生產資訊、資訊運用 出納課 款項收付 會計課 帳目管理、預算編排 人事課 員工聘用、人才培育 資訊部 系統規劃、研發維護 行政部 營運管理 資訊管理課 系統管理、辦公室自動化 客戶部 訂單管理、銷售分析 行銷部 行銷文宣、廣告宣傳 研發處 整合評估、流程制定 管理處 內務行政、績效管理 業務處 市場規劃、商品推展 經理人員 決策制定、協調整合 品保部 品質維護、產品測試 The Profile Miner (Cont.) An example:
管理處 內務、行政、管理 研發處 研發、生產、流程 資訊部 資訊、系統、建置 電腦整合課 生產、整合、運用 行銷部 行銷、廣告、宣傳 經理人員 業務處 市場、規劃、銷售 品保部 品質、管理、測試 管理處 內務、行政、管理 研發處 研發、生產、流程 品保部 品質、管理、測試 資訊部 資訊、系統、建置 電腦整合課 生產、整合、運用 客戶部 訂單、管理、分析 生產管理之相關資訊? 生產品質維護 …… …… …… …… 具有代表性 P(w|c)高 區別能力 P(w|c) * Bc/ iP(w|ci)強 S=P(w|c) * (Bc / iP(w|ci) 生產管理系統建置與維護 …… …… The Profile Miner (Cont.) context
The IN Identifier (Cont.) • (1) For each category c, HitScorec 0; • (2) For each pair (w, c), where w is a word in the query Q and c is a category, • (2.1) If sw,c > 1 and Support(w, c) minSupport, • (2.1.1) ns (sw,c – 1) / (number of siblings of c); • (2.1.2) HitScorecHitScorec+ ns TF(w, Q); • (3) S The set of all categories; • (4) While the target category has not been identified and interaction is still allowed, do • (4.1) Let p1 and p2 be two pedigrees (in S) with the highest average HitScore; • (4.2) Let t1 and t2 be the categories with the highest HitScore in p1 and p2; • (4.3) Display t1 and t2 (and their basic information) for the user to select; • (4.4) If either t1 or t2 is exactly the target, return the space under the target; • (4.5) Else if neither t1 nor t2 is of interest, S S – {the categories under t1 and t2}; • (4.6) Else if both t1 and t2 are of interest, gClimbUp(common ancestor of t1 and t2), and return the space under g; • (4.7) Else • (4.7.1) Let t be the category that is of interest; • (4.7.2) If t is a leaf, gClimbUp(father of t), and return the space under g; • (4.7.3) Else S {the categories under t}; • (5) Return S;
The IN Identifier (Cont.) • Finding two candidate categories for interaction (1) (2) (3) (4) (5) p1 p2 t2 t1
The IN Identifier (Cont.) • Function ClimbUp(f), where f is a category to start climbing • (1) If f is the root, return f; • (2) While the target category has not been identified and interaction is still allowed, • (2.1) fsibling A sibling of f; • (2.2) funcle A sibling of the father of f; • (2.3) Display fsibling and funcle (and their basic information) for the user to select; • (2.4) If either fsibling or funcle is exactly the target, return the target; • (2.5) Else if neither fsibling nor funcle is of interest, return f; • (2.6) Else if both fsibling and funcle are of interest, • (2.6.1)f grandfather of f; • (2.6.2) If f is the root, return f; • (2.7) Else if fsibling is of interest, return father of f; • (2.8) Else return {f, funcle}; • (3) Return f;
2.6 2.6 2.4 2.7 2.5 2.4 The IN Identifier (Cont.) • Generalization by climbing the hierarchy f funcle fsibling Finding two categories for generalization Possible results of generalization
Experiment • Experimental Data • Source: Yahoo! (http://www.yahoo.com) • Coverage: Computers & Internet, Society and Culture, and Science • Size: 214 categories; depth: 8 • Training data: 2216 documents • Test data: 168 queries extracted from another set of site summaries
Experiment (Cont.) • Each system could conduct at most 5 interactions for each query
Experiment (Cont.) • Precision • BruteForce was poor • Interaction is good for precision • INEED improved 14%~20% w.r.t NB • Recall • INEED was good in both precision and recall • BruteForce and CN achieved 100% recall • INEED achieved 100% recall using only 2 interactions
Experiment (Cont.) • Precision-effectiveness • BruteForce was excluded • INEED improved more (19%~32%) w.r.t. NB interactions by INEED were more effective • Recall-effectiveness • INEED performed best • INEED improved 2%~20% w.r.t. NB
Experiment (Cont.) • Precision vs.Recall • BruteForec and CN always achieved 100% recall • INEED performed best (its curve lied on the upper right corner) • When no interaction is allowed • INEED improved 38% recall w.r.t. NB • Precision of INEED improved 62% in the first interaction (NB only improved 29%)
Experiment (Cont.) An example:
Conclusion • Interactive Information Need Identification (interactive INI) as an essential component for • Information portals • Online service guidance • Information retrieval • People finding • Requirements of interactive INI, fulfilled by INEED • Exactly identify the information space that may satisfy the user’s information needs • Effectively interact with the user • Intelligently reduce the user’s load in query formation and result cognition