180 likes | 357 Views
Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base. Yi Z eng , Dongsheng Wang, etc. 2013.11.18. Content. Introduction Chinese Semantic Knowledge Base Entity Linking Semantic Knowledge Base Construction Linking Unambiguous Entities
E N D
Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base Yi Zeng, Dongsheng Wang, etc. 2013.11.18
Content • Introduction • Chinese Semantic Knowledge Base Entity Linking • Semantic Knowledge Base Construction • Linking Unambiguous Entities • Stepwise Entity Disambiguation Based on a Chinese Knowledge Base • Experimental Result and Analysis • Conclusion
Introduction • Motivation and requirement • Real-time update knowledge base • Extract facts automatically from massive raw texts • -> Entity identification in advance is essential
Introduction • CASIA_EL (the proposed system) • Knowledge base construction • Entity linking by retrieving “relations” • Ambiguous • Similarity measurement • Stepwise ambiguity • Result • Linked 1232 entities from Sina to a Chinese Knowledge Base • An accuracy of 88.5% overall
Chinese Semantic Knowledge Base Entity Linking • Semantic Knowledge Base Construction • Linking Unambiguous Entities • Stepwise Entity Disambiguation Based on a Chinese Knowledge base
Semantic Knowledge Base Construction • Format • XML -> N3 • TDB triple store
Synset construction from multiple sources (1/2) • Provided • Part of the BaiduEncyclopedia knowledge base • name, English name, Chinese name from the infoboxknowledge • Not provided synset • 1) nick names and redirect titles from: • Baidu Encyclopedia • Hudong Encyclopedia • Wikipedia Chinese pages • -> lead to 476,086 pair of synonyms are added
Synset construction from multiple sources (2/2) • Split western people’s name by “.” into smaller keywords • Etc., “Michael·Jordan” is split into “Michael” and “Jordan” • added as possible labels for “Michael Jordan” • These synset are represented through “rdf:label” • Enable the search of entity through keyword
Linking Unambiguous Entities • Preprocessing • <,>,《,》,”,”, etc. • For example, “《霸王别姬》 • Process as it is (syntax) • Remove and process again • Retrieving “rdfs:label” can result in • 1) one candidate • Link directly and out put KB_ID • 2) no candidate • Google’s “did you mean?” function • Start again or output null • 3) several candidate • We will discuss in details in the following
Stepwise Entity Disambiguation Based on a Chinese Knowledge base
Stepwise Entity Disambiguation Based on a Chinese Knowledge base • Stepwise Bag-of-Words(S-BOW) • Add other entities' document • bag[dst] • Bag of [Document of short text] • bag[dkb] • Bag of [document of knowledgebase] Algorithms • 1) sim(dst, dkb)=|bag[dst] ∩bag[dkb]|. • 2) bag’[dst]= bag[dst]bag[t1]bag[t2]…bag[tn]. • 3) sim'(dst, dkb)=|bag'[dst] ∩bag[dkb]| • = | (bag[dst]bag[t1]bag[t2]…bag[tn]) ∩bag[dkb] |.
Experimental Result and Analysis • Reasons why we select nouns and literal string • Which words should be considered and the documents should be added?
Experimental Result and Analysis • The number of candidate entities • Goes larger, more correct it is
Experimental Result and Analysis • Due to the adding of many synonyms • Many of then more than 10 (56 entities) • Performs very well • Example • Note: when the number is greater than 9, there are no incorrect disambiguations.
Experimental Result and Analysis • Disambiguation • 161 disambuguation entites are detected (161/1232) • 123 were disambiguated -> correctness is 82.1% (101/123 entities were correctely disambiguated) • Extend the original microblog posts (Stepwise) • Another 38 entities were disambiguated • correctness is 63.2% • the overall entity disambiguation correctness based on the proposed algorithm is 77.6% (125/161 entities are correctly disambiguated).
Conclusion • As for short texts (Based on Chinese Knowledge base) • Stepwise method • Adding other documents of entities those that are within the same context • Compared to many algorithms • Solve the insufficient of information • Retrieving of the relations is various • Enriching of the synset from multiple sources
References • [1] F. M. Suchanek, G. Kasneci, and G. Weikum, "YAGO: A Large Ontology from Wikipedia and WordNet". Journal of Web Semantics, 6(3), 203-217, Elsevier, 2008. • [2] A. Bagga and B. Baldwin, "Entity-based Cross-document Coreferencing Using the Vector Space Model". Proceedings of the 17th International Conference on Computational linguistics (COLING '98), 79-85, ACL, Montreal, Quebec, Canada, 1998. • [3] G. S. Mann and D. Yarowsky, "Unsupervised personal name disambiguation". Proceedings of the 7th Conference on Natural Language Learning (CONLL '03), 33-40, ACL, Edmonton, Canada, 2003. • [4] R. Bekkerman and A. McCallum, "Disambiguating Web appearances of people in a social network". Proceedings of the 14th International Conference on the World Wide Web (WWW '05), 463-470, ACM Press, Chiba, Japan, 2005. • [5] L. Jiang, J. Wang, N. An, S. Wang, J. Zhan, and L. Li, "GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search". Proceedings of the 9th IEEE International Conference on Data Mining (ICDM '09), 199-208, IEEE Press, 2009. • [6] X. Han and J. Zhao, "Named entity disambiguation by leveraging Wikipedia semantic knowledge". Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09), 215-224, ACM Press, Hong Kong, China, 2009. • [7] W. Shen, J. Wang, P. Luo, and M. Wang, "LINDEN: linking named entities with knowledge base via semantic knowledge". Proceedings of the 21st international conference on the World Wide Web (WWW '12), 449-458, ACM Press, Lyon, France, 2012. • [8] X. Niu, X. Sun, H. Wang, S. Rong, G. Qi, and Y. Yu, "Zhishi.me - Weaving Chinese Linking Open Data". Proceedings of the International Semantic Web Conference (ISWC '11), Lecture Notes in Computer Science 7032, 205-220, Springer, 2011. • [9] Z. Wang, J. Li, Z. Wang, and J. Tang, "Cross-lingual Knowledge Linking across Wiki Knowledge Bases". Proceedings of the 21st World Wide Web Conference (WWW '12), 459-468, ACM Press, Lyon, France, 2012.