200 likes | 379 Views
Predicting the Semantic Orientation of Adjective. Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi. Aim . To validate that conjunction put constraints on conjoined adjectives and this information can be used to detect their semantic orientation
E N D
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi
Aim • To validate that conjunction put constraints on conjoined adjectives and this information can be used to detect their semantic orientation • Based on above information cluster adjectives into two groups representing adjectives with positive and negative orientation.
Constraint On Conjoined Adjectives • Validate constraints from conjunction on positive/negative semantic orientation of adjectives • Honest ‘and’ peaceful – same orientation • Talented ‘but’ Irresponsible – opposite orientation • Thus conjunction affect semantic orientation • Synonyms may have same semantic orientation • Antonyms may have opposite semantic orientation ( hot and cold).
Approach • Extract conjunction from corpus with their morphological relation • A log-linear regression model to predict orientation of two different adjectives • A clustering algorithm separates the adjectives into two subset of same or opposite orientation.
Data • 21 million word 1987 Wall Street Journal Corpus annotated with part-of-speech tags • Remove adjectives occurring less than 20 times and those which had no orientation. • Manually assign orientation to each adjective based on use of adjective • Multiple validation of labeled adjectives was done. • Final Set – 1336 adjective – 657 positive and 679 negative – with 96.97% inter-reviewer agreement.
Validating the Hypothesis • Run parser on 21 million words dataset to get 15,048 conjunction tokens involving 9,296 pairs of distinct adjective pairs. • Each conjunction was classified into : 1.)conjunction used ; 2.)type of modification ; 3.)modified noun • Count percentage of conjunction in each category with adjectives of same or different orientation
Validating Hypothesis • For almost all the cases p-values are low. Hence the statistics are significant. • There are very small differences in behavior of conjunctions • ‘and’ usually joins adjectives of same orientation • ‘but’ is opposite and joins adjectives of different orientation
Baseline Method to Predict Link • Simple baseline method – to call each link as same orientation will give 77.84% accuracy • Adjective con-joined by ‘but’ are mostly of opposite orientation • Morphological relationship (e.g. : adequate-inadequate) contains information as well
Better Idea – Use regression model • Train a log Linear Regression Model • x is the observed count of adjective pair in various conjunction category. • To avoid over fitting they used subsets of data. • Process of iterative stepwise refinement leads to building up of final model
Result of Prediction • Log Linear Regression models performs slightly better than baseline • Mainly used to group adjectives into same group
Grouping Adjectives into same pack • Log Linear model generates a dissimilarity score between two adjective between 0 and 1 • Same and different adjectives thus form a graph • Iterative Optimization procedure is used to partition graph into clusters. • Minimize : • Hierarchical Clustering
Labeling Clusters • Same authors in ‘95 showed that a semantically unmarked member of gradable adjectives is the most frequent. • Now semantic markedness exhibit a strong correlation with orientation • Unmarked member always have positive orientation • So group with higher average frequency contains positive terms.
Evaluating Clustering of Adjectives • Separate the Adjective set A into training and testing groups by selecting a parameter named α. • α is the parameter which decides the number of link of each adjective in the selected training and test set. • Higher α creates subset of A such that more adjectives are connected to each other.
Clustering Results • Highest accuracy obtained when highest number of links were present. • Every time - ratio of group frequency correctly identified the positive subgroup
Performance • To measure performance of algorithm a series of simulation experiments were run. • Parameter P measures how well each link is predicted independently – Precision • Parameter k – number of distinct adjective each adjectives appears in conjunction with. • Generate Random Graph between nodes such that each node participated in k links and P% of all nodes connected same orientation and classify them
Conclusion • A good ‘and’ comprehensive method for classification of semantic orientation of adjectives. • Can be used to find antonyms without accessing any semantic information • Can be extended to nouns and verbs.