80 likes | 240 Views
LaSIE: The Large Scale Information Extraction System. Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield. Outline. History: The Evolution of an IE System Applications: Projects using LaSIE Demo. History: The Evolution of an IE System.
E N D
LaSIE: The Large Scale Information Extraction System Robert Gaizauskas Natural Language Processing Group Department of Computer Science University of Sheffield
Outline • History: The Evolution of an IE System • Applications: Projects using LaSIE • Demo AKT Workshop
History: The Evolution of an IE System • January 1, 1995: LaSIE official “birth date” • EPSRC 3 year/3 person grant called Large Scale Information Extraction “to develop GATE, an architecture for combining modules to produce a system to extract information from large-scale running text to data templates in a specified domain to build and evaluate a particular set of modules and submit it to standard evaluation” GATE LaSIE • Prehistory: • TIC/POETIC systems at Sussex 1987-1993 • CRL MUC-5 System at NMSU 1993(?) AKT Workshop
History: Chronology of LaSIE • September 1995: LaSIE v1.0 participates in MUC-6 • Four tasks: Named Entity, Coreference, Template Element, Scenario Template (management succession) • Scores - P & R : NE: 89 (96) CO: 71/51 (72/59) TE: 70 (80) ST: 49 (56) • November 1996: GATE v1.0 • Contains VIE (Vanilla Information Extraction) system • VIE = LaSIE v1.5 • LaSIE v1.5 has essentially same functionality at LaSIE v1.0, but is embedded in GATE AKT Workshop
History: Chronology of LaSIE (cont) • April 1997: LaSIE v2.0 participates in MUC-7 • Five tasks: NE, CO, TE, ST and Template Relations (TR) new for MUC-7 • Scores – P & R: NE: 86 (93) CO: 62 (62) TE: 77 (87) TR: 55 (76) ST: 51 (44) • Only site to participate in all 5 tasks (“inside GATE”) • 1997-2000 LaSIE serves as basis for a number of IE applications (below) • October 2000: LaSIE v2.1 (rolled up changes, since v2.0) AKT Workshop
History: LaSIE People • Initial RAs: Hamish Cunningham, Kevin Humphreys, Takahiro Wakao • Others: • Saliha Azzam • Mark Hepple • Chris Huyck • Brian Mitchell • Sandy Robertson • Pete Rodgers • Yorick Wilks AKT Workshop
LaSIE: IE Applications • PASTA • Protein Active Site Template Acquisition • BBSRC • EMPathIE • Enzyme and Metabolic Pathways IE • GlaxoWellcome/Elsevier • Competitor/Market Intelligence • New project launches/person tracking • British Gas; Mars Foods • STOBS • Structured Transcription of Broadcast Speech • EPSRC • EXALT • Extracting Amendments from Legal Text • Venns • Extracting info from Biographical Dictionaries AKT Workshop
LaSIE: Applications (continued) • TRESTLE • Text Retrieval Extraction and Summarisation Technologies for Large Enterprises • GlaxoWellcome • Question Answering • TREC-8 and TREC-9 QA Track • CLARITY • Cross-language information retrieval • EC FW-5 AKT Workshop