1 / 22

Efficient C oncept Clustering for Ontology L earning using an Event Life Cycle on the Web

Efficient C oncept Clustering for Ontology L earning using an Event Life Cycle on the Web. By Sangsoo Sung, Seokkyung Chung, Dennis McLeod. Presented by Amir Tahmasebi. Overview. Motivation Concept clustering Creating Rough Clusters Algorithm for creating rough Clusters

bebe
Download Presentation

Efficient C oncept Clustering for Ontology L earning using an Event Life Cycle on the Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Concept Clustering for Ontology Learning using an Event Life Cycle on the Web By Sangsoo Sung, Seokkyung Chung, Dennis McLeod Presented by Amir Tahmasebi

  2. Overview • Motivation • Concept clustering • Creating Rough Clusters • Algorithm for creating rough Clusters • Similarity computation with rough clusters • Complexity Analysis • Experiment • Conclusion

  3. Motivation • Why do we need ontology learning? • Handcrafted Ontologies VS. Automatically/Semi-Automatically created • Pros and Cons • The manual approach of extracting semantic meanings cannot scale with the growth of the Web.

  4. Clustering • Definition? • Given a set of terms, need to distribute the terms into clusters. After clustering, semantic relations of the terms within each cluster can be determined. • Problems? • Computationally very expensive • Complexity? • Due to pair-wise similarity computations

  5. Clustering • Solution? • Break up term space into multiple subsets using a cheap division algorithm. • Create Rough Clusters • This significantly reduces number of pair-wise comparisons.

  6. Creating Rough Clusters • How can we break up the term space? • Using Event Life Cycle Phenomena • Certain events generate posting to the web. The volume of these posting starts small, grows and and gradually diminishes. • But how can we use this phenomena for clustering purposes? • Terms that have the same posting peaks are more likely to be related.

  7. Gallistel change point finding algo.

  8. Gallistel change point finding algo.

  9. Gallistel change point finding algo. • How could pcp be verfied? • By performing an unequal variance t-test: • Where: • A change point is identified when t is significantly far from zero, rejecting the null hypothesis.

  10. Gallistel change point finding algo. • A set of terms (ωt) whose elements have the same change point is defined as follows: • A set (ωt) is also defined as follows: • Where Ω covers the entire time span.

  11. Gallistel change point finding algo. • Then Ω is clustered into overlapping sub-sets with respect to α and β which are distance thresholds. β α t0 β α t0

  12. Cluster refinement using expensive similarity metric • String similarity VS context-based similarity (Pros & Cons) • This research focuses on Context-based similarity

  13. Cluster refinement using expensive similarity metric • Using tf-idf vector wrighting scheme • Λp : Set of all documents within cluster Ωp • Λpis incorporated to generate a tf-idf for each candidate term x in Ωp. • The vector include term that co-occurred with term x, and the weight of the terms is defined as:

  14. Cluster refinement using expensive similarity metric • All elements of vi are eliminated except m terms with the highest wx,di. • Let ϒ(x) be centroid vector of all vi’s • Cosine metric is used to determine similarity of terms: • Where

  15. Complexity Anlysis • Comparing method with rough clustering (Oa) VS method without rough clustering (Ob):

  16. Equations: O(N) O(N2)

  17. Complexity Anlysis • Comparing method with rough clustering (Oa) VS method without rough clustering (Ob): • As a Result: • C(Oa) = O(N+L2) where L is number of terms in rough clusters • C(Ob) = O(N2)

  18. Experiment

  19. Experiment

  20. Experiment

  21. Conclusion • Given large number of quantities with many billions terms, quantifying all pairwise similarities of terms is very expensive. This paper presents a new method based on Event Life Cycle phenomena to divide the terms space into rough cluster before pair wise similarity computations.

  22. Questions?

More Related