10 likes | 81 Views
7. Experimental Results. 8. Conclusion. 5. Tag Detection using Support Vector Machines. 9. Future Work. 6. Tag Detection using Grammatical Techniques. 4. Why are Tags Needed?. 1. Summary. 2. Political Blogs. 3. The Larger Problem. Input. Input. Output. Output. Fetch data from blog.
E N D
7. Experimental Results 8. Conclusion 5. Tag Detection using Support Vector Machines 9. Future Work 6. Tag Detection using Grammatical Techniques 4. Why are Tags Needed? 1. Summary 2. Political Blogs 3. The Larger Problem Input Input Output Output Fetch data from blog Fetch data from blog Preprocess data and segment into posts Preprocess data and segment into posts Perform shallow parsing Perform co-reference resolution Extract Noun Phrases Extract entities Top scoring entities Blog URL Blog URL Top scoring nouns Collect data from several blogs that tag data Collect data from the blog Preprocess data – Parse HTML and rectify errors Preprocess data – Parse HTML and rectify errors Divide data into posts and index them by their tags Divide data into posts Train the SVMs on the training data Run all the classifiers on each post Output Output Input Input Top five tags associated with each post One classifier for each tag Blog URLs Blog URL Automatic Detection of Tags for Political Blogs • 2681 posts from Daily Kos and 571 posts from Red State • Compared tags to original tags of blog post • Manually evaluated relevance of tags on a small portion of test set • Tags for Political blogs are automatically detected • Tags are representative of topics • Significant topics are automatically identified using SVM and other NLP techniques • Many blogs tag their posts • Tags are representative of the topics discussed • Training data was collected from “Daily Kos” and “Red State” • 100,000 posts from Daily Kos (2003-2010) • 70,000 posts from Red State (2007-2010) • A total of 787,780 tags • Used Joachim’s SVM Light Khairun-nisa Hassanali Vasileios Hatzivassiloglou nisa@hlt.utdallas.eduvh@hlt.utdallas.edu The University of Texas at Dallas • More than 22 .6 million Americans maintain web sites with regularly updated commentary (blogs), of which at least 38,500 are specifically dedicated to politics Fig 3: Results on Daily Kos Training of SVM classifiers • Given multiple texts from two or more blogs/political sources, answer the following questions: • On which subjects the texts, as a whole across each source, agree/disagree? • How similar are the sources’ positions? • What makes them agree/disagree? Fig 4: Results on Red State Detection of Tags Fig. 1: Tag Detection using Support Vector Machines • A tool for automatically tagging of political blog posts was introduced. • Political blogs differ from other blogs as they often revolve around named entities (politicians, organizations and places). Therefore, tagging of political blog posts benefits from using basic named entity recognition to improve tagging. • Tag identification using a hybrid approach (statistical and grammatical) yield better results • Sood et. al report a precision/recall of 13.11%/22.83% whereas Wang and Davidson report a precision/recall of 45.25%/23.24%. Our recall is higher perhaps because of the domain. • Use the same SVM based approach with new features based on grammatical knowledge • Proper Nouns are frequently topics • Place a higher weight on proper and common nouns • Identifying entities referred by different names • Barack Obama, Obama and Barack Hussein Obama refer to the same person • Difficult to associate an attitude with a specific topic/subject • Many clues are implicit and appear to require deep semantic analysis • Tags can serve as a basis for bringing together posts about the same topic • Compiling a profile for each political entity: What it talks about and what its position is • Organizing groups of sources according to perspective • Political Profile is a summary of a political entity’s (politician, political group) stance on different issues • Extract the top scoring topics along with the “entities’ sentiments” (attitudes towards topic) and select representative sentences that voice sentiments towards these topics • Aggregate information across texts according to specific criteria (poster, source, time) and quantitatively compare signatures and identify which topics are responsible for the differences Extraction of Tag Nouns Extraction of Tag Entities using Named Entity Recognition and Co-reference Resolution Fig. 2: Tag Detection using Grammatical Techniques