1 / 57

Grammatical structures for word-level sentiment detection

Grammatical structures for word-level sentiment detection. Asad B. Sayeed MMCI Cluster of Excellence Saarland University Jordan Boyd-Graber, Bryan Rusk, Amy Weinberg { iSchool / UMIACS, Dept. of CS, CASL} University of Maryland College Park NAACL2011. OUTLINE. Abstract Introduction

harvey
Download Presentation

Grammatical structures for word-level sentiment detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammatical structures for word-level sentiment detection Asad B. SayeedMMCI Cluster of ExcellenceSaarland UniversityJordan Boyd-Graber, Bryan Rusk, Amy Weinberg{iSchool / UMIACS, Dept. of CS, CASL}University of Maryland College ParkNAACL2011

  2. OUTLINE • Abstract • Introduction • Syntactic relatedness tries • Encoding SRTs as a factor graph • Data source • Experiments and discussion • Conclusions and future work

  3. Abstract • Existing work in fine-grained sentiment analysisfocuses on sentences and phrases but ignoresthe contribution of individual words andtheir grammatical connections.

  4. Abstract • This is becauseof a lack of both • (1) annotated data at the wordlevel • (2) algorithms that can leverage syntacticinformation in a principled way

  5. Introduction • Effective sentiment analysis for textsdepends on the ability to extract who(source) is saying what (target).

  6. Introduction • Lloyd Hession, chief security officer at BTRadianz in New York, said that virtualizationalso opens up a slew of potential networkaccess control issues.

  7. Introduction • Source: • Lloyd Hession, BTRadianz, and New York. • Traget: • include all the sources • but also “virtualization”, “network access control”,“network”, and so on • Opinion

  8. Introduction • Searching for all triples{source, target, opinion} in this sentence • We call opinion mining “fine-grained” when it retrieves many different {source, target, opinion} triples per document.

  9. Introduction • Lloyd Hession, chief security officer at BTRadianz in New York, said that virtualizationalso opens up a slew of potential networkaccess control issues. • “LloydHession” is the source of an opinion, “slew of networkissues,” about a target, “virtualization”.

  10. Introduction • We use a sentence’s syntactic structure to build a probabilistic model that encodes whether a word is opinion bearing as a latent variable.

  11. Syntactic relatedness tries • Use the Stanford Parser to produce a dependency graph and considerthe resulting undirected graph structure overwords, and then construct a trie for each possible word in sentence.

  12. Encoding Dependencies in an SRT • SRTs enable us to encode the connections between a possible target word(a object of interest) and a set of related objects. • SRTs are data structures consisting of nodes and edges.

  13. Encoding Dependencies in an SRT • The object of interest is the opinion target, defined as the SRT root node. • Each SRT edge corresponds to a grammatical relationship between words and is labeled with that relationship.

  14. Encoding Dependencies in an SRT • We use the notation to signify that node a has the relationship (“role”) R with b.

  15. Using sentiment flow to label an SRT • goal: • to discriminate between parts of the structure that are relevant to target-opinion word relations and those that are not

  16. Using sentiment flow to label an SRT • We use the term sentiment flow for relevant sentiment-bearing words in the SRT and inert for the remainder of the sentence.

  17. Using sentiment flow to label an SRT • MPQA sentence: • The dominant role of the European climate protectionpolicy has benefits for our economy.

  18. Using sentiment flow to label an SRT • Suppose that an annotator decides that “protection” and “benefits” are directly expressing an opinion about the policy, but “dominant” is ambiguous (it has some negative connotations) • protection, benefits:flow • dominant: inert

  19. Invariant • Invariant: • no node descending from anode labeled inert can be labeled as a partof a sentiment flow.

  20. Invariant • flow label switched to inert will requireall the descendents of that particular node to switchto inert.

  21. Invariant • Inertlabel switched to flow will require all of the ancestorsof that node to switch to flow .

  22. Encoding SRTs as a factor graph • In this section, we develop supervised machine learning tools to produce a labeled SRT from unlabeled, held-out data in a single, unified model.

  23. Sampling labels • A factor graph is a representation of a joint probability distribution in the form of a graph with two types of vertices • variable vertices: Z = {z1 . . . zn } • factor vertices: F = {f1 . . . fm} • Y = {Y1,Y2…Ym} -> subset of Z

  24. Sampling labels • We can then write the relationship as follows:

  25. Sampling labels • Our goal is to discover the values for the variablesthat best explain a dataset. • More specifically,we seek a posterior distribution over latent variablesthat partition words in a sentence into flow and inertgroups.

  26. Sampling labels • g represents a function over features of the given node itself, or “node features.” • f represents a function over a bigram of features taken from the parent node and the given node, or “parent-node” features. • h represents a function over a combination features on the node and features of all its children, or “node-child” features.

  27. Sampling labels • In addition to the latent value associated with each word, we associate each node with features derived from the dependency parse • the word from the sentence itself • the part-of-speech (POS) tag assigned by the Stanford parser • the label of the incoming dependency edge

  28. Sampling labels • scoring function with contributions of each node to the score(label | node):

  29. Data source • Sentiment corpora with sub-sentential annotations such as • Multi-Perspective Question-Answering (MPQA) corpus • J. D. Power and Associates (JDPA) blog post corpus • But most of these annotations are phrase level

  30. Data source • We developed our own annotations to discover such distinctions.

  31. Information technology business press • Focus on a collection of articles from the IT professional magazine, Information Week, from the years 1991 to 2008.

  32. Information technology business press • This consists of 33K articles including news bulletins and opinion columns.

  33. Information technology business press • IT concept target list (59 terms) comes from our application.

  34. Crowdsourced annotation process • There are 75K sentences with IT concept mentions, only a minority of which express relevant opinions(about 219).

  35. Crowdsourced annotation process • We engineered tasks so that only a randomly-selected five or six words(excluded all function words) appear highlighted for classification(called “highlight group”) in order to limit annotator boredom • Three or more users annotated each highlight group

  36. Crowdsourced annotation process • Lloyd Hession, chief security officer at BT Radianz in New York, said that virtualization also opens up a slew of potential networkaccess control issues.

  37. Crowdsourced annotation process • highlighted word class: • “positive”, “negative” • -> “opinion-relevant” • “not opinion-relevant”, “ambiguous” • -> “not opinion-relevant”

  38. Crowdsourced annotation process • Annotators labeled 700 highlight groups • Training groups: 465 SRTs • Testing groups: 196 SRTs

  39. Experiments and discussion • We run every experiment (training a model and testing on held-out data) 10 times and take the mean average and range of all measures. • F-measure is calculated for each run and averaged post hoc.

  40. Experiments • baseline system is the initial setting of the labels for the sampler: • uniform random assignment of flow labels, respecting the invariant.

  41. Experiments • This leads to a large class imbalance in favor of inert. • switch to inert converts all nodes downstream from the root to convert to inert • switch to inert converts all nodes downstream from the root to convert to inert

  42. Discussion • We present a sampling of possible feature-factor combinations in table 1 in order to show trends in the performance of the system.

  43. Discussion • Though these invariant-violating models are unconstrained in the way they label the graph, our invariant-respecting models still outperform them and our SRT invariant allows us to achieve better performance and will be more useful to downstream tasks.

More Related