140 likes | 155 Views
This paper presents a mining approach to extract and mine generalized associations of semantic relations from textual web content. The proposed method uses a Resource Description Framework (RDF) schema and employs semantic relation extraction techniques. The experiments demonstrate the effectiveness of the approach in discovering patterns and associations in text.
E N D
Mining Generalized Associations of Semantic Relations from Textual Web Content Tao Jiang, Ah-Hwee Tan, Senior Member, IEEE, and Ke Wang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 19, NO. 2, 2007. Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 2007/1/10
Outline • Introduction • Resource Description Framework and RDF Schema • Semantic relation extraction • Mining generalized association form RDF metadata • Experiments • Conclusion • Comments
Motivation • Text mining problem • As terms are treated as individual items in such simplistic representations, terms lose their semantic relations and texts lose their original meanings. • Two short text documents with different meanings can be represented in a similar bag of keywords.
Objective • Semantic relation associations • An intermediate representation that expresses the semantic relations between the concepts in texts.
Major processes • Semantic relation extraction • The extracted relations are encoded in RDF statements. • Semantic relation associations • Meaningful and detailed patterns can be discovered from text using the conceptual graph representation.
Resource Description Framework andRDF Schema • Resource Description Framework (RDF) • For describing and interchanging semantic metadata. • RDF statements • <subject, predicate, object> • {France, Defeat, Italy, World Cup, Quarter Final} • RDF Schema • Defines RDF vocabularies for constructing RDF statements.
Term Taxonomy Construction • Term similarity measure • Incremental term taxonomy construction
RDF model • RDF vocabulary • ={,P,H, domain, range}, where ={ a, b, c, d, e, f, ab, cd, ef, cdef}, P= {p}, domain = { a, b, ab}, and range= {c, d, e, f, cd, ef, cdef} • Generalized relation hierarchy • e.g. {< a, p, ef >,< b, p, c >}is a relationset and it is also a generalized relationset of {< a, p, e >,< b, p, c >}.
Overgeneralization • Example • {< a, p, e>,< b, p, c >}, • {< Score, agent, F:Inzaghi >,< Assist, agent, RuiCosta >} • {< a, p, ef >,< b, p, c >}, • {< Score, agent, AttackPlayer >, < Assist, agent, RuiCosta >} • Definition • A frequent relationset X is overgeneralized if there exists a specialized relationset Y of X with supp(X) = supp(Y).
Overgeneralization Reduction • Node is a unique generalization closure • If a closure and its children have the same support, this closure is not closed and can be pruned. • Such a nonclosed closure is prune by replacing it with the union of its equal-support children.
GP (Generalized Pattern)-Close Algorithm • GP-Close • Initializes the enumeration tree to contain only the root closure. • Closure-Enumeration • Starting from the root closure of the empty set, the closure enumeration process recursively traverses the closure enumeration tree to discover closed generalization closures.
Experiments • Data sets • The online database of the International Policy Institute for Counter-Terrorism (ICT) including suicide bombing (ICT-SB) and car bombing (ICT-CB) documents. • Analysis of Patterns • 71.8 percent (56 out of 78) of the patterns are commonsense patterns already known by people. • 12.8 percent (Ten out of 78 ) of the patterns are identified as previously unknown and not useful. • 15.4 percent (12 out of 78) of the patterns are previously unknown and potentially useful.
Conclusions • Semantic relation extraction • Discovering knowledge from free-form textual Web content. • GP-Close algorithm • Based on mining closed generalization closures. • Substantially reduce the pattern redundancy and perform.
Comments • Advantage • A novel idea for semantic relation association extraction. • GP-Close is applicable for reducing pattern search space. • Drawback • Example depiction cannot keep consistent in data. • Diagrams of child-closure pruning and sub-tree pruning make reader confuse. • Application • Data mining applications in semantic relation association.