460 likes | 652 Views
Discovering Geographical Topics In The Twitter Stream. PRESENTED BY TEAM-9. Liu,Zhi Karthik Kumar Rangineni. Discovering Geographical Topics In The Twitter Stream. Content. Introduction Related Work Model Experiment Conclusion.
E N D
Discovering Geographical Topics In The Twitter Stream PRESENTED BY TEAM-9 Liu,Zhi Karthik Kumar Rangineni
Discovering Geographical Topics In The Twitter Stream Content • Introduction • Related Work • Model • Experiment • Conclusion University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Introduction Widely spread usage of Micro-blogging services • Twitter was used extensively in a number of events and emergencies, • ranging from elections, earthquakes and tsunamis to Events specification. • Recently, Twitter, and other online social networking services such as • Foursquare, Facebook and Yelp, have started supporting location • services in their messages. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Introduction It is done in two ways : 1. Explicit Method 2. Implicit Method • The above functionalities allows researchers to address an exciting • set of questions: • 1) How is information created and shared across geographical locations, • 2) How do spatial and linguistic characteristics of people vary across regions, • 3) How to model human mobility. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Introduction • Drawbacks of previous methods: • Complicated • Over simplified • This is a challenge task to discover topics and identify user’s interests from • these geo-tagged messages due to the sheer amount of data and diversity • of language variations used on these location sharing services. • Usage of Twitter in paper • Presenting an Algorithm by modeling diversity in tweets considering • Topical diversity • Geographical diversity, and • Interest distribution of the user. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Introduction • The Author customized the model to be sufficiently sparse to allow for • a large scale in terms of users and locations. • The analysis of data still poses a considerable challenge due to its size and • due to the integration of a range of different attributes. To my knowledge • this is the first paper to address both scale, location and language modeling • in an integrated fashion. • Furthermore, they designed an accurate and scalable inference algorithm. • The algorithm allows them to discover language patterns and to extract • user’s interests from geo-tagged messages. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Introduction • The Algorithm allows us to discover language patterns and to extract • user’s interests from geo-tagged messages. • Discovery of language patterns and user’s interests. • In addition, factors that influence the language used in a tweet with a • particular location. Example : Tweet in a particular region University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Introduction • The choice of words is clearly influenced by the topic of the tweet. • Location specific language will cause the same event to be reported quite • differently in different locations • Different geographical regions have different language variations • and topics have different chances of being discussed in these regions. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Prior Work • Prior work falls into two groups: • Some work only on the models of certain aspects of the problem described and ignoring the remainder. • However, no regional language models are learned and user preferences are also not taken into account. • Thus, models developed for such data are usually limited and cannot easily be applied to content rich social media. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream • At the other end they found rather complex • models, however, without the ability to scale to industrial size. • For instance proposed a model to predict locations of users in Twitter. • Then this model has a global topic matrix and each region has different • variation of this matrix. However, the inference algorithm is complex. • Furthermore, the problem of over-parametrization makes it nontrivial • to perform inference accurately. • Furthermore, previous models ignore user preferences. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Author Contribution • Author proposed a model that is : • User preference Model • Flexible enough to embed all reasonable components of content and • geographical locations, • Handling real-world datasets consists millions of documents and users. • Usage of • statistical topic models and • sparse coding techniques • Used for uncovering different language patterns and common interests shared • across the world. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Related work • There are two ways of related research work The first is a range of papers which use geographical language modeling in general. 2. The second is a set of works which are specifically tuned for Twitter data. • Interest of author in choosing model and interest that combine • Geographical modeling and • Language modeling • to discover topics from geographical regions. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Some of related works by other representatives are : • Mei proposed a model based on Probabilistic Latent Semantic Indexing (PLSA). • It assumes that each word is either drawn from a universal background topic or • from a location and time dependent language model. • Later, Wang introduce a fully Bayesian generative model to incorporate locations. • No usage of real latitudes and longitudes, • Having Fixed number of regional Labels • Assumption of each term is associated with a location label. • Sizov proposed a similar model to Wang . Rather than using a multinomial • Distribution to generate locations they replace it with two Gaussian • distributions for generating latitude and longitude respectively. • Drawback: Usage of Flickr restricted to the greater London area. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream • Hao proposed a model built upon Wang. However, they introduce the notion • of global topics and local topics where more general terms are • grouped into global topics and terms related to local events going to local • topics. The inference is performed by Gibbs Sampling. Hao evaluated their • model based on anecdotal results and some heuristic measurements. Although there exists such attempts of modeling language patterns and geographical locations, most prior work does not consider users at all. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Notation University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Mixture of conponents University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Three parts of each tweet University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Intuitions: • Words used in a tweet depend on both the location and topic of the tweet. • Different geographical regions have different language variations. Topics have different chances to be discussed in different regions (e.g. bullfights in India are unlikely to occur; likewise Spaniards are unlikely to discuss Divali). • Users tend to appear in a handful geographical locations. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Draw a latent region index University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Draw a topic index University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Draw a location University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • For each token w in wd draw University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • A graphical representation of the model University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Sparse Modeling • Location independent distribution + Prevalent in a given location University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm • EM step: • E: Iteratively draw latent region assignments and topic assignments for all tweets • M: Maximize the log likelihood of the model with respect to model parameters by fixing all region and topic assignments obtained in the E-step University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm • For each tweet, a latent region r is firstly drawn from the following distribution, conditioned on the old topic assignments: University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm • Sampling the topic assignment z for the same tweet, conditioned on the newly sampled r: University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm • How to get the value of parameters University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Inference Algorithm University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Geographical Location Modeling • One region • Multiple region University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Geographical Location Modeling • Bayesian treatment University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Implementation Notes University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Model • Implementation Notes University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction • Baseline • Topics • Topics + Region • Full Model University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Experiments • Evaluation Metric: University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Experiments • Location Prediction University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Conclusions • In this paper the problem of modeling geographical topical patterns on Twitter by introducing a novel sparse generative model, which utilizes both statistical topic models and sparse coding techniques to provide a principled method for uncovering different language patterns and common interests shared across the world. • This approach is vital for applications such as behavior targeting, user profiling, content recommendation and topic tracking and the method can be easily extended in a number of ways University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Contributes • An additive generative model of content and locations that incorporates multiple facets of micro-blogging environments in an integral fashion. • Sparse coding techniques and Bayesian treatments are smoothly embedded in this modeling, resulting in an efficient and effective implementation. • This model outperforms several state-of-the-art algorithms in the task of location predictions and it demonstrates interesting patterns in real-world datasets. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Reference: • A. Ahmed, E. P. Xing, W. W. Cohen, and R. F. Murphy. Structured correspondence topic models for mining captioned figures in biological literature. In Proceedings of KDD 2009, pages 39–48, New York, NY, USA. ACM. • A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2:183–202, March 2009. • C. Chemudugunta, P. Smyth, and M. Steyvers. Modeling general and specific aspects of documents with a probabilistic topic model. In NIPS 2006, pages 241–248. • Z. Cheng, J. Caverlee, K. Lee, and D. Sui. Exploring millions of footprints in location sharing services. In ICWSM 2011. • E. Cho, S. A. Myers, and J. Leskovec. Friendship and mobility: user movement in location-based social networks. In Proceedings of KDD 2011, pages 1082–1090, New York, NY, USA. ACM. • J. Eisenstein, A. Ahmed, and E. Xing. Sparse additive generative models of text. In Proceedings of ICML 2011, pages 1041–1048, New York, NY, USA, June. ACM. • J. Eisenstein, B. O’Connor, N. A. Smith, and E. P. Xing. A latent variable model for geographic lexical variation. In Proceedings of EMNLP 2010, pages 1277–1287, Stroudsburg, PA, USA. Association for Computational Linguistics. University of North Texas, Computer Science & Engineering
Discovering Geographical Topics In The Twitter Stream Thanks! University of North Texas, Computer Science & Engineering