140 likes | 279 Views
REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon , PT and Austin , TX. Mário J. Silva University of Lisbon , Portugal. Grants (paid by Reaction). Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 ) João Ramalho (BIC: Jan 1, 2011 – April 31, 2011). Mining resources.
E N D
REACTION Workshop 2011.01.06Task 1 – Progress Report & PlansLisbon, PT andAustin, TX Mário J. Silva UniversityofLisbon, Portugal
Grants (paid by Reaction) • Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 ) • João Ramalho (BIC: Jan 1, 2011 – April 31, 2011)
Mining resources • Development of robust linguistic resources to process different types and genres of texts • knowledge resources about media personalities: recognizing and resolving references to named-entities; • sentiment lexicons and grammars: detecting the polarity of opinions about media personalities • annotated corpora: training different text classifiers and evaluating classification procedures
Mining resources • POWER - Political Ontology for Web Entity Retrieval • SentiLex-PT01 – Sentiment Lexicon for Portuguese • SentiCorpus-PT09 – Sentiment annotated corpus of user comments to political debates
POWER POWER is an ontology that formalizes the domain knowledge defining a political landscape, i.e., the political actors and their roles in the political scene, their relationships and interactions. The ontology is foccused in describing: Politicians Political Institutions with different levels of authority (International, National, Regional,...) Political Associations Political Affiliations and Endorsements Elections Mandates
POWER Currently, the ontology describes: 587 Political actors 17 (editions) of Political Institutions 16 Political Associations 900 Mandates 1 Election 6 Candidate Lists from the Portuguese political scene
SentiLex-PT01 SentiLex-PT01 is a sentiment lexicon for Portuguese made up of 6,321 adjective lemmas, and 25,406 inflected forms. • The sentiment entries correspond to human predicate adjectives • The sentiment attributes described in SentiLex-PT01 concern: • the predicate polarity, • the target of sentiment, and • the polarity assignment (which was performed manually or automatically, by JALC)
SentiLex-lem-PT01 6,321 lemmas abatido.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN abelhudo.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN abençoado. PoS=Adj;TG=HUM;POL=1;ANOT=JALC atrevido, PoS=Adj;TG=HUM;POL=0;ANOT=MAN bem-educado.PoS=Adj;TG=HUM;POL=1;ANOT=MAN brega.PoS=Adj;TG=HUM;POL=-1;ANOT=JALC violento, PoS=Adj;TG=HUM;POL=-1;ANOT=JALC Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01
SentiLex-flex-PT01 25,406 inflected forms abatida,abatido.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=MAN abatidas,abatido.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=MAN abatido,abatido.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=MAN abatidos,abatido.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=MAN bem-educada,bem-educado.PoS=Adj;GN=fs;TG=HUM;POL=1;ANOT=MAN bem-educadas,bem-educado.PoS=Adj;GN=fp;TG=HUM;POL=1;ANOT=MAN bem-educado,bem-educado.PoS=Adj;GN=ms;TG=HUM;POL=1;ANOT=MAN bem-educados,bem-educado.PoS=Adj;GN=mp;TG=HUM;POL=1;ANOT=MAN brega,brega.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=JALC brega,brega.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=JALC bregas,brega.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=JALC bregas,brega.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=JALC Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01
SentiCorpus-PT09 SentiCorpus-PT09 is a collection of comments posted by the readers of the Público newspaper to a series of 10 news articles, each covering a televised face-to-face debate between the main candidates to the 2009 parliamentary elections. • The collection is composed by 2,795 comments (~8,000 sentences). • 3,537 sentences, from 736 comments (27% of the corpus), were manually labeled with sentiment information. • Sentiment annotation involves different relevant dimensions, such as polarity, opinion target, target mention and verbal irony.
Main findings • Real challenge in performing opinion mining in user-generated content is correctly identifying the positive opinions • Positive opinions are less frequent than negative opinions (20%) • Positive opinions particularly exposed to verbal irony (11%) • Other opinion mining challenges are related to the entity recognition and co-reference resolution sub-tasks • mentions to human targets are frequently made through pronouns, definite descriptions and nicknames. • The most frequent type of mention is the person name, but it only covers 36% of the analyzed cases.
Next steps April 2011: • POWER • Populating the ontology, using text-mining approaches • Internal release • SentiLex-PT01 • Exploring other methods and algoritms (SVM, Active Learning) for automatic polarity classification • Enlarging the sentiment lexicon (verbs, predicate nouns, idiomatic expressions)
Next steps August 2011: • POWER • First release to the general public via SPARQL endpoint and web user interface • SentiCorpus-PT09 • Publically available • Analysis and (semi-automated) annotation of a collection of documents from industrial and social media, over a period of 6 months