1 / 13

Information Retrieval

Information Retrieval. For the MSc Computer Science Programme. Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17. Dell Zhang Birkbeck , University of London. … (30). agriculture. biology. physics. CS. space. dairy. botany. cell. AI. courses. crops.

wilbur
Download Presentation

Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval For the MSc Computer Science Programme Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17 Dell Zhang Birkbeck, University of London

  2. … (30) agriculture biology physics CS space ... ... ... ... ... dairy botany cell AI courses crops craft magnetism HCI missions agronomy evolution forestry relativity Yahoo! Hierarchy http://dir.yahoo.com/science

  3. Hierarchical Clustering • Builda tree-like hierarchical taxonomy (dendrogram) from a set of unlabeled documents. • Divisive (top-down) • Start with all documents belong to the same cluster. Eventually each node forms a cluster on its own. • Recursive application of a (flat) partitional clustering algorithm, e.g., kMeans (k=2)  Bi-secting kMeans. • Agglomerative (bottom-up) • Start with each document being a single cluster. Eventually all documents belong to the same cluster.

  4. Dendrogram Clustering is obtained by cutting the dendrogram at a desired level: each connected component forms a cluster. The number of clusters k is not required in advance.

  5. Dendrogram – Example Clusters of News Stories: Reuters RCV1

  6. Dendrogram – Example Clusters of Things that People Want: ZEBO

  7. HAC • Hierarchical Agglomerative Clustering • Starts with each doc in a separate cluster. • Repeat until there is only one cluster: • Among the current clusters, determine the pair of clusters, ci and cj, that are most similar. • (Single-Link, Complete-Link, etc.) • Then mergesci and cj to a single cluster. • The history of merging forms a binary tree or hierarchy.

  8. Single-Link • The similarity between a pair of clusters is defined by the single strongest link (i.e., maximum cosine-similarity) between their members: • After merging ci and cj, the similarity of the resulting cluster to another cluster, ck, is:

  9. HAC – Example

  10. HAC – Example

  11. d3,d4,d5 d4,d5 d3 HAC – Example • As clusters agglomerate, docs are likely to fall into a dendrogram. d3 d5 d1 d4 d2 d1,d2

  12. HAC – Example Single-Link

  13. Take Home Message • Single-Link HAC • Dendrogram

More Related