1 / 14

REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon , PT and Austin , TX

REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon , PT and Austin , TX. Mário J. Silva University of Lisbon , Portugal. Grants (paid by Reaction). Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 ) João Ramalho (BIC: Jan 1, 2011 – April 31, 2011). Mining resources.

conan
Download Presentation

REACTION Workshop 2011.01.06 Task 1 – Progress Report & Plans Lisbon , PT and Austin , TX

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REACTION Workshop 2011.01.06Task 1 – Progress Report & PlansLisbon, PT andAustin, TX Mário J. Silva UniversityofLisbon, Portugal

  2. Grants (paid by Reaction) • Sílvio Moreira (BI: Oct 1, 2010 – March 31, 2011 ) • João Ramalho (BIC: Jan 1, 2011 – April 31, 2011)

  3. Mining resources • Development of robust linguistic resources to process different types and genres of texts • knowledge resources about media personalities: recognizing and resolving references to named-entities; • sentiment lexicons and grammars: detecting the polarity of opinions about media personalities • annotated corpora: training different text classifiers and evaluating classification procedures

  4. Mining resources • POWER - Political Ontology for Web Entity Retrieval • SentiLex-PT01 – Sentiment Lexicon for Portuguese • SentiCorpus-PT09 – Sentiment annotated corpus of user comments to political debates

  5. POWER POWER is an ontology that formalizes the domain knowledge defining a political landscape, i.e., the political actors and their roles in the political scene, their relationships and interactions. The ontology is foccused in describing: Politicians Political Institutions with different levels of authority (International, National, Regional,...) Political Associations Political Affiliations and Endorsements Elections Mandates

  6. POWER Currently, the ontology describes: 587 Political actors 17 (editions) of Political Institutions 16 Political Associations 900 Mandates 1 Election 6 Candidate Lists from the Portuguese political scene

  7. SentiLex-PT01 SentiLex-PT01 is a sentiment lexicon for Portuguese made up of 6,321 adjective lemmas, and 25,406 inflected forms. • The sentiment entries correspond to human predicate adjectives • The sentiment attributes described in SentiLex-PT01 concern: • the predicate polarity, • the target of sentiment, and • the polarity assignment (which was performed manually or automatically, by JALC)

  8. SentiLex-lem-PT01 6,321 lemmas abatido.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN abelhudo.PoS=Adj;TG=HUM;POL=-1;ANOT=MAN abençoado. PoS=Adj;TG=HUM;POL=1;ANOT=JALC atrevido, PoS=Adj;TG=HUM;POL=0;ANOT=MAN bem-educado.PoS=Adj;TG=HUM;POL=1;ANOT=MAN brega.PoS=Adj;TG=HUM;POL=-1;ANOT=JALC violento, PoS=Adj;TG=HUM;POL=-1;ANOT=JALC Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01

  9. SentiLex-flex-PT01 25,406 inflected forms abatida,abatido.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=MAN abatidas,abatido.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=MAN abatido,abatido.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=MAN abatidos,abatido.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=MAN bem-educada,bem-educado.PoS=Adj;GN=fs;TG=HUM;POL=1;ANOT=MAN bem-educadas,bem-educado.PoS=Adj;GN=fp;TG=HUM;POL=1;ANOT=MAN bem-educado,bem-educado.PoS=Adj;GN=ms;TG=HUM;POL=1;ANOT=MAN bem-educados,bem-educado.PoS=Adj;GN=mp;TG=HUM;POL=1;ANOT=MAN brega,brega.PoS=Adj;GN=fs;TG=HUM;POL=-1;ANOT=JALC brega,brega.PoS=Adj;GN=ms;TG=HUM;POL=-1;ANOT=JALC bregas,brega.PoS=Adj;GN=mp;TG=HUM;POL=-1;ANOT=JALC bregas,brega.PoS=Adj;GN=fp;TG=HUM;POL=-1;ANOT=JALC Recently made publicly available on: http://xldb.fc.ul.pt/wiki/SentiLex-PT01

  10. SentiCorpus-PT09 SentiCorpus-PT09 is a collection of comments posted by the readers of the Público newspaper to a series of 10 news articles, each covering a televised face-to-face debate between the main candidates to the 2009 parliamentary elections. • The collection is composed by 2,795 comments (~8,000 sentences). • 3,537 sentences, from 736 comments (27% of the corpus), were manually labeled with sentiment information. • Sentiment annotation involves different relevant dimensions, such as polarity, opinion target, target mention and verbal irony.

  11. Main findings • Real challenge in performing opinion mining in user-generated content is correctly identifying the positive opinions • Positive opinions are less frequent than negative opinions (20%) • Positive opinions particularly exposed to verbal irony (11%) • Other opinion mining challenges are related to the entity recognition and co-reference resolution sub-tasks • mentions to human targets are frequently made through pronouns, definite descriptions and nicknames. • The most frequent type of mention is the person name, but it only covers 36% of the analyzed cases.

  12. Next steps April 2011: • POWER • Populating the ontology, using text-mining approaches • Internal release • SentiLex-PT01 • Exploring other methods and algoritms (SVM, Active Learning) for automatic polarity classification • Enlarging the sentiment lexicon (verbs, predicate nouns, idiomatic expressions)

  13. Next steps August 2011: • POWER • First release to the general public via SPARQL endpoint and web user interface • SentiCorpus-PT09 • Publically available • Analysis and (semi-automated) annotation of a collection of documents from industrial and social media, over a period of 6 months

More Related