1 / 14

Mining Generalized Associations of Semantic Relations from Textual Web Content

This paper presents a mining approach to extract and mine generalized associations of semantic relations from textual web content. The proposed method uses a Resource Description Framework (RDF) schema and employs semantic relation extraction techniques. The experiments demonstrate the effectiveness of the approach in discovering patterns and associations in text.

Download Presentation

Mining Generalized Associations of Semantic Relations from Textual Web Content

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Generalized Associations of Semantic Relations from Textual Web Content Tao Jiang, Ah-Hwee Tan, Senior Member, IEEE, and Ke Wang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 2, 2007. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 2007/1/10

  2. Outline • Introduction • Resource Description Framework and RDF Schema • Semantic relation extraction • Mining generalized association form RDF metadata • Experiments • Conclusion • Comments

  3. Motivation • Text mining problem • As terms are treated as individual items in such simplistic representations, terms lose their semantic relations and texts lose their original meanings. • Two short text documents with different meanings can be represented in a similar bag of keywords.

  4. Objective • Semantic relation associations • An intermediate representation that expresses the semantic relations between the concepts in texts.

  5. Major processes • Semantic relation extraction • The extracted relations are encoded in RDF statements. • Semantic relation associations • Meaningful and detailed patterns can be discovered from text using the conceptual graph representation.

  6. Resource Description Framework andRDF Schema • Resource Description Framework (RDF) • For describing and interchanging semantic metadata. • RDF statements • <subject, predicate, object> • {France, Defeat, Italy, World Cup, Quarter Final} • RDF Schema • Defines RDF vocabularies for constructing RDF statements.

  7. Term Taxonomy Construction • Term similarity measure • Incremental term taxonomy construction

  8. RDF model • RDF vocabulary •  ={,P,H, domain, range}, where  ={ a, b, c, d, e, f, ab, cd, ef, cdef}, P= {p}, domain = { a, b, ab}, and range= {c, d, e, f, cd, ef, cdef} • Generalized relation hierarchy • e.g. {< a, p, ef >,< b, p, c >}is a relationset and it is also a generalized relationset of {< a, p, e >,< b, p, c >}.

  9. Overgeneralization • Example • {< a, p, e>,< b, p, c >}, • {< Score, agent, F:Inzaghi >,< Assist, agent, RuiCosta >} • {< a, p, ef >,< b, p, c >}, • {< Score, agent, AttackPlayer >, < Assist, agent, RuiCosta >} • Definition • A frequent relationset X is overgeneralized if there exists a specialized relationset Y of X with supp(X) = supp(Y).

  10. Overgeneralization Reduction • Node is a unique generalization closure • If a closure and its children have the same support, this closure is not closed and can be pruned. • Such a nonclosed closure is prune by replacing it with the union of its equal-support children.

  11. GP (Generalized Pattern)-Close Algorithm • GP-Close • Initializes the enumeration tree to contain only the root closure. • Closure-Enumeration • Starting from the root closure of the empty set, the closure enumeration process recursively traverses the closure enumeration tree to discover closed generalization closures.

  12. Experiments • Data sets • The online database of the International Policy Institute for Counter-Terrorism (ICT) including suicide bombing (ICT-SB) and car bombing (ICT-CB) documents. • Analysis of Patterns • 71.8 percent (56 out of 78) of the patterns are commonsense patterns already known by people. • 12.8 percent (Ten out of 78 ) of the patterns are identified as previously unknown and not useful. • 15.4 percent (12 out of 78) of the patterns are previously unknown and potentially useful.

  13. Conclusions • Semantic relation extraction • Discovering knowledge from free-form textual Web content. • GP-Close algorithm • Based on mining closed generalization closures. • Substantially reduce the pattern redundancy and perform.

  14. Comments • Advantage • A novel idea for semantic relation association extraction. • GP-Close is applicable for reducing pattern search space. • Drawback • Example depiction cannot keep consistent in data. • Diagrams of child-closure pruning and sub-tree pruning make reader confuse. • Application • Data mining applications in semantic relation association.

More Related