180 likes | 307 Views
EDIUM: I mproving E ntity D isambiguation via U ser modelling. Romil Bansal, Sandeep Panem , manish gupta , vasudeva Varma International Institute of information technology, hyderabad. 14 th April 2014. Introduction (Entity Disambiguation).
E N D
EDIUM: Improving Entity Disambiguation via User modelling Romil Bansal, Sandeep Panem, manishgupta, vasudevaVarma International Institute of information technology, hyderabad 14th April 2014
Introduction (Entity Disambiguation) Entity Disambiguation is the task of finding the correct entity referent in the knowledge base for the given mention.
Introduction (User modelling) • User modelling is the task of categorizing users’ activities, so as to customize and adapt the system based on user’s needs. Tweets by the User @GameOfThrones (Official HBO Game of Thrones TV Series Handle)
Motivation • Short text from social media (e.g. Twitter, Facebook etc.) is an important source of information. • Entities are important for detecting and tracking information shared about various products. • Events and locations. • Reputations about companies and people. • Movies, Sports etc. • Named Entity Detection (NED) is difficult in micro-posts as they lack sufficient context. • Entities from user’s previous tweets could help in creating interest models that could further help in disambiguating new entity mentions.
Related Work • Many models have been proposed to disambiguate entities in the text. Many models [ASMP12, NERT11, EDTL13] tried to disambiguate entities based on the following parameters. • Context Aware Entity Disambiguation • Use text around the entity for disambiguation • Popularity based Entity Disambiguation • Likelihood of candidate entity being the target for the given mention • We try to disambiguate the entities by combining contextual models and user models by analyzingthe user’s tweeting behavior.
Problem Entity Disambiguation User modelling
The EDIUM System Self-learn the user’s interests. Use existing context-based method for disambiguation. Add highly confident (ratio test, confidence > 90% ) disambiguations from the user’s tweet to create user model. Cluster the interests based on semantic similarity between different entities.
The EDIUM System Compute the user based disambiguation score []of candidate entity () based on the semantic similarity with the entity and interest topics (). Compute the context based disambiguation score [] of the candidate entity from the context-based systems. Rank the results on the context as well as user model scores. Select the candidate entity with the maximum score as the final disambiguated entity for the given mention.
The EDIUM System • Re-calculate the score α based on the similarity of the user’s new tweet’s topics with the previous m tweet topics. This is done to reduce the dependency of user model for entity disambiguation in case the user model is incomplete or user tweets are too general. • Where , is the cosine similarity between tweet categories vector obtained by the system and tweet categories vector by the user model; • and is the cosine similarity between tweet categories vector obtained by the system and tweet categories vector by the contextual model.
Dataset • We evaluated the performance of EDIUM on a dataset annotated manually by three individuals. • The dataset consists of 200 tweets each from randomly selected 20 different Twitter users.
Results • Entity Disambiguation Fig. 2: Performance with DBpedia Spotlight Fig. 1: Performance with Wikipedia Miner
Observations • System works better with Wikipedia Miner [WIKIM13] than with DBpedia Spotlight [DSSL11]. • System depends on the underlying Contextual modelling system to learn the user’s interests initially. • More precise text contextual systems leads to greater improvement in the desired results.
Conclusion • In this paper, we have modeled entity disambiguation based on the user’s past interest information. • We proposed a way to model the user’s interests using the entity linking techniques and then using it later to improve the disambiguation in entity linking systems. The gain in precision is proportional to the accuracy of the underlying entity linking system.
Future Work • Future work requires more analysis on the user modelling aspect of the system. • Along with user’s previous tweets, user’s network and demographics information could also be considered for further improve the entity disambiguation.
Thank you! • Questions?
References • [RESE13] Murnane, E. L., Haslhofer, B., Lagoze, C.: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text. In: Proc. of the 22nd Intl. Conf. on World Wide Web (WWW), Republic and Canton of Geneva, Switzerland (2013) • [ASMP12] E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM 2012. ACM, 2012 • [ELFT13] X. Liu, Y. Li, H. Wu, M. Zhou, F. Wei, and Y. Lu. 2013. Entity linking for tweets. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics • [NERT11] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental study. In Proc. Of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011 • [DUTI10] Michelson, M., Macskassy, S. A.: Discovering Users’ Topics of Interest on Twitter: A First Look. In: Proc. of the 4th Workshop on Analytics for Noisy Unstructured Text Data, ACM (2010) 73–80 • [ABIR10] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, 13(3):254–270, Jun 2010
References • [DSSL11] Mendes, P. N., Jakob, M., Garc´ıa-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding Light on the Web of Documents. In: Proc. of the 7th Intl. Conf. on Semantic Systems, New York, NY, USA, ACM (2011) • [WIKM13] Milne, D.,Witten, I. H.: An Open-source Toolkit for Mining Wikipedia. Artificial Intelligence 194(2013) 222–239 • [EDTL13] Yerva, S. R., Catasta, M., Demartini, G., Aberer, K.: Entity Disambiguation in Tweets Leveraging User Social Profiles. In: Proc. of the 2013 Intl. Conf. on Information Reuse and Integration (IRI), 2013, IEEE (2013) 120–128