100 likes | 177 Views
Improving ACE Performance. Edward Loper Seth Kulick. {. {. Person Person Location. (NE) (Nom) (NE). Soc. At. The ACE Task. John met his son at the beach. Detect and classify entities Person, Geo-Political Entity, Organization, Facility, Location Entity Types:
E N D
Improving ACE Performance Edward Loper Seth Kulick
{ { PersonPersonLocation (NE) (Nom)(NE) Soc At The ACE Task John met his son at the beach. • Detect and classify entities • Person, Geo-Political Entity, Organization, Facility, Location • Entity Types: • Named Entities: Cisco, George Washington • Nominals: a large crowd, a quaint library • Pronoun: he, it • Detect relations between entities • At, Role, Near, Social, Part
The U. Penn ACE System • A rapidly developed IE system • Built using TIDES-PennTools • Pipelined Architecture • Easy to construct from existing components • Easy to plug in new components • Statistical Components • Require less hand-tuning • Easy to improve with new training data
Tokenizing/Preprocessing Input File NE Tagging Parsing Nominal Tagging Relation Extraction Coreference Output File
Improving the ACE System • Improve Pipeline Components • Add new features to existing models • Replace Pipeline Components • New machine learning techniques • Generate New Training Data • Active learning (WordFreak) • Improve the Architecture • Wide Pipeline architecture
Improving Components • Use more informative features • Use features based on richer annotation • PropBank roles • Use PropBank roles as features to improve relation detection. • SuperTAGs • Use supertags instead of part of speech tags, to improve the detection and classification of named entities and nominals.
Improving the Architecture • Disadvantages of a simple pipelined architecture: • Interaction between stages is limited • If one stage produces incorrect output, later stagescan’t recover. • Wide Pipeline architecture - Each component generates multiple weighted outputs. • Increased interaction between stages • Later stages can re-rank the earlier outputs. • We have built a prototype wide pipeline system • NE Classification only
Replacing Components • Using improved ML algorithms, can we get better results with less training data? • Ryan McDonald implemented a NE tagger using Conditional Random Fields (CRF). • Outperforms our system’s Maxent NE tagger. • Experiment: Integrating the CRF tagger • Replace the Maxent NE tagger with a CRF tagger. • Exclude BBN training data (about 1/3 of the data) • Evaluate the changes in overall system performance
Integrating CRF: Results Entity Scores Relation Scores • The CRF tagger significantly improves NE detection, giving a higher entity score. • Better NE detection allows the system to find more relations, giving a higher relation score. Maxent Maxent +BBN CRF Maxent Maxent +BBN CRF
Conclusions • The architecture of the ACE System allows for: • Rapid improvement • Concurrent development • We are working to improve the system… • By improving the existing components. • By adding more sophisticated components. • By improving our training data with active learning. • By improving the basic system architecture.