120 likes | 243 Views
REACTION Workshop 2011.01.06 Task 2 – Progress Report & Plans Lisbon , PT and Austin , TX. Mário J. Silva University of Lisbon , Portugal. Information Discovery. Relationship extraction techniques to support information discovery in journalists’ activities
E N D
REACTION Workshop 2011.01.06Task 2 – Progress Report & PlansLisbon, PT andAustin, TX Mário J. Silva UniversityofLisbon, Portugal
Information Discovery Relationship extraction techniques to support information discovery in journalists’ activities • Entity Ranking: finding the relevant entities for a given topic • Entity Distillation: finding relevant resources for a given entity • Attribute Selection: finding a list of key aspects to compare and differentiate a given set of entities
Annotation NER Socrates reuniuhojeem Braga com Mesquita Machado e Firmino Marques Mapping <PERSON>Socrates</PERSON>reuniuhojeem<LOCAL>Braga</LOCAL> com <PERSON>Mesquita Machado</PERSON> e <PERSON>Firmino Marques</PERSON> Annotated Corpus <POWER id=1>Socrates</POWER> reuniuhojeem<GeoNetPT id=10>Braga</GeoNetPT> com <POWER id=10>Mesquita Machado</POWER> e <PERSON>Firmino Marques</PERSON>
Analysis Entity Ranking Annotated Corpus Voos da CIA em Portugal Entity Distillation Luís Amado José Socrates (Power:1) Attribute Selection • XVII GovernoConstitucional (Power:20) • WikiLeaks Ontology Extension http://pt.wikipedia.org/wiki/Luís_Amado
First Approach • NER • REMBRANDT (Reconhecimento de Entidades Mencionadas Baseado em Relações e ANálise Detalhada do Texto) • Mapping (Classification or Grounding) • String Matching Methods • Ontologies: POWER (task 1); GeoNetPT; Yahoo! GeoPlanet
SocratesreuniuhojeemBraga com Mesquita Machado e Firmino Marques.
Prototype: First Release • April 2011 • To be used in the Web Applications course unit project • What’s missing? • Mapping • Interface • Evaluation • Precision and recall • Gold standard (Task 1)
Prototype: Second Release • August 2011 • Evaluate and Analyze First Prototype Results • Improved NER and Mapping • Using machine learning • Conditional Random Fields • Information Content • FiGO
Prototype: Third Release • December 2011 • Containing Modules for: • Entity Ranking • Entity Distillation • Attribute Selection • Ontology Extension • Participate in TREC (Entity Track) • http://ilps.science.uva.nl/trec-entity/
Prototype: Fourth Release • August 2012 • Containing Modules for: • Opinion mining • Using machine learning to • Detect and classify opinionated text • Targeting the identified entities and topics