220 likes | 316 Views
Evaluating Hiera r chical Clustering of Search Results. SPIRE 2005, Buenos Aires. Departamento de Lenguajes y Sistemas Informáticos UNED, Spain Juan Cigarrán Anselmo Peñas Julio Gonzalo Felisa Verdejo nlp.uned.es. Overview. Scenario Assumptions
E N D
Evaluating Hierarchical Clustering of Search Results SPIRE 2005, Buenos Aires Departamento de Lenguajes y Sistemas Informáticos UNED, Spain Juan Cigarrán Anselmo Peñas Julio Gonzalo Felisa Verdejo nlp.uned.es
Overview • Scenario • Assumptions • Features of a Good Hierarchical Clustering • Evaluation Measures • Minimal Browsing Area (MBA) • Distillation Factor (DF) • Hierarchy Quality (HQ) • Conclusion
Scenario • Complex information needs • Compile information from different sources • Inspect the whole list of documents • More than 100 documents • Help to • Find the relevant topics • Discriminate from unrrelevant documents • Approach • Hierarchical Clustering – Formal Concept Analysis
Problem • How to define and measure the quality of a hierarchical clustering? • How to compare different clustering approaches?
Physics Physics d1 d1, d2, d3, d4 Astrophysics d4 Astrophysics d4 Nuclear physics Nuclear physics d2, d3 d2, d3 Previous assumptions • Each cluster contains only those documents fully described by its descriptors
Physics Jokes d1 d2 Jokes about physics d3 Jokes about physics d3 Physics Jokes d1 d2 Jokes about physics d3 Previous assumptions • ‘Open world’ perspective
+ + + + + - + + - - - + - - + + + + - - Good Hierarchical Clustering • The content of the clusters. • Clusters should not mix relevant with non relevant information
+ + + + - - - + + + + - - - - - - + + + + + + - - - Good Hierarchical Clustering • The hierarchical arrangement of the clusters • Relevant information should be in the same path
Good Hierarchical Clustering • The number of clusters • Number of clusters substantially lower than the number of documents • How clusters are described • Cognitive load of reading a cluster description • Ability to predict the relevance of the information that it contains (not addressed here)
Evaluation Measures • Criterion • Minimize the browsing effort for finding ALL relevant information • Baseline • The original document list returned by a search engine
Evaluation Measures • Consider • Content of clusters • Hierarchical arrangement of clusters • Size of the hierarchy • Cognitive load of reading a document (in the baseline): Kd • Cognitive load of reading a node descriptor (in the hierarchy): Kn • Requirement • Relevance assessments are available
+ - + - + - + Minimal Browsing Area (MBA) • The minimal set of nodes the user has to traverse to find ALL the relevant documents minimising the number of irrelevant ones
Distillation Factor (DF) • Ability to isolate relevant information compared with the original document list (Gain Factor, DF>1) • Equivalent to: • Considers only the cognitive load of reading documents
+ Document List Doc 1 + Doc 2 - Doc 3 + Doc 4 + Doc 5 - Doc 6 - Doc 7 + - + - + - + Distillation Factor (DF) • Example Precision = 4/7 Precision MBA = 4/5 DF(L) = 7/5 = 1.4
Precision = 4/8 Precision MBA = 4/4 DF = 8/4 = 2 - + + + + - - - Distillation Factor (DF) • Counterexample: • Bad clustering with good DF • Extend the DF measure considering the cognitive cost of taking browsing decisions HQ
- + - - - + +- - + MBA |Nview|=8 Hierarchy Quality (HQ) • Assumption: • When a node (in the MBA) is explored, all its lower neighbours have to be considered: some will be in turn explored, some will be discarded • Nview : subset of lower neighbours of each node belonging to the MBA
Hierarchy Quality (HQ) • Kn and Kdare directly related with the retrieval scenario in which the experiments take place • The researcher must tune K=Kn/Kd before conducting the experiment • HQ > 1 indicates an improvement of the clustering versus the original list
- + - - - + +- - + - + + + + - - - Hierarchy Quality (HQ) • Example
Conclusions and Future Work • Framework for comparing different clustering approaches taking into account: • Content of clusters • Hierarchical arrangement of clusters • Cognitive load to read document and node descriptions • Adaptable to the retrieval scenario in which experiments take place • Future work • Conduct user studies to compare their results with the automatic evaluation • Results will reflect the quality of the descriptors • Will be used to fine-tune the kd and kn parameters