TIDES IFE-Bio KickOff Meeting

TIDES IFE-BioKickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth, Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever October 17, 2001

Agenda • Current Status and Experiments (Laurie) • User Feedback on MiTAP and Exercise (Eric) • Lessons Learned (Laurie) • Architecture Briefing (Jay & Scott) • Geospatial Processing (George) • Schedule (Jay) • Issues and Discussion (All)

Status of MiTAP • Availability: excellent • Available ~100% to users inside, outside firewall • 12 individual user accounts, 6 group accounts • 8 daily users on average, mostly repeat users • Data capture: rich & dynamic • ~70 working sources, new source added in 30 min • Average 5.8K msgs/day, 1 min latency • 250K msgs total in system • Analysis tools: improving • Messages in 6 languages (with COTS translation) • Sorted into 173 newsgroups • Color coded tagging (pers/org/loc/disease) • Popup summarization • Product: need to understand how system is being used

MiTAP Activity:Messages and Users Over Time Attack on America Aug Experiment

Performance Summary: Sudan 1999 vs Attack on America 2001

Disease of the Month Experiments

Feedback from Eric • Report on Bio-Threats • Deployment for N2 • MiTAP Status • Utility • Usability • Accessibility

Lessons Learned Availability • User accounts for production system • No training needed (instructions available on website) • Stronger security (e.g., intrusion detection) • Better back-up, monitoring of throughput • More processing power Capture • Reduced latency on scheduled downloads and spidering, hourly capture of headlines • Distributed capture processing • Better capture of formatted sources • Some badly filtered, excess volume causes backlog • Poor zoning/formatting/decoding of some sources

Lessons Learned (2) Analysis • Improved search (e.g., by date/relevance, popups, integrated with news server) • Improved “normalization” of names, regions • Too much data! - need better filtering, topic detection & clustering, summarization • Better MT, support for Arabic • Q&A • Geospatial & temporal visualization • Advanced search • Better information extraction

Lessons Learned (3) Product • No environment for preparing reports • Workspace • Drag&drop repository • Editing capabilities • Multidoc summarization • Collaboration feature (chat & shared workspace)

Catalyst Update: Recent work • Usability for developers • Logger • Configuration file refinements • Improvements for distributed systems • Redesign of I/O polling procedures • Explicit synchronization feature for Language Processor developers

catlogger Logger Tokenize MetaData Documents Entity Extraction Entities Word.Text Sentence Sentence Tagger Word.POS catlogger

In progress • Usability for developers • Monitor (system status capability) • Native XML I/O! (for ease of debugging & for lightweight Catalyst ) • Information retrieval • Integration between Catalyst and new IR engine • Pushing stream filters toward archived streams • Documentation

Monitor Tokenize MetaData Documents Entity Extraction Entities Word.Text Sentence Sentence Tagger Word.POS Monitor Monitor

XML I/O Present XML to Catalyst Event Extraction Catalyst to XML XML doc XML doc With XML I/O feature Easier to debug! Event Extraction XML doc XML doc

XML I/O With XML I/O feature Catalyst Processes Catalyst Processes Wrapper Process XML Easier path to integrate existing language processing systems! Non-Catalyst Process

Archived streams Filter criteria must be pushed upstream from its origination point toward the indices so that process may be reduced to little more than is absolutely necessary. Question Answering Application Coreference Indices XML doc Index Refinement Candidate Selection Answer Extraction filter criteria Origination point

For the Midterm - 12/12/2001 • Monitor • XML I/O support in the Catalyst library • Lightweight Catalyst design • Documentation

Catalyst collaborations • Qanda • Catalyst-based Qanda used for TREC • Catalyst-based Qanda deployed at AFIWC • Information retrieval • Archived annotation streams (for creating IR indexes) • Seekable streams (for processing IR queries) • Other projects • ACE/Alembic (Information Extraction) • Audio hot-spotting (Speech Retrieval) • Reading-comp (Question Answering)

Document Management • Process scheduling • System linkage • Inter-site cooperation support • User features

Process Scheduling • Problem: MiTAP needs the ability to prioritize sources • ‘Catching up’ on a new source shouldn’t prevent timely processing of an important existing source • Solution: • Preprocessing daemon will notify scheduler of incoming content • Scheduler assigns jobs to available resources based on priority • Status: • Prototype scheduler delivered (Ponte) • Preprocessing daemon rewrite in mid-November (Wohlever)

System Linkage • Problem: Ever notice how new features tend to only apply to new content? • MiTAP is not flexible - difficult to: • Reprocess and repost a message that has errors • Find the original source document • Etc. • Currently, retroactive changes require 11th hour hacking (or sometimes 12th hour hacking) • Solution: Keep database of linkage information to make the system more flexible • Status: • Additional information currently being logged • Linkage database - March

Inter-site Cooperation Support • Problem: Collaboration with other TIDES contractors who have large legacy systems • Issue of communication more than scalability • Solution: • Linkage database for annotations, similar to the one used for system maintenance • Web client server communication • Path to scalable solution w/richer interactions • Status: • Data management - January • Communications: investigation of relevant protocols and preliminary design - completed • Native XML support for Catalyst - December

User features • Problem: MiTAP helps you find good information, then what? • Solution: • Web accessible support for user views and data organization to assist in reporting and analysis • Automated view construction/feedback incorporating additional TIDES technologies • Status: • Schema for v.1 of workspace developed (Ponte, Anderson) • Supporting code in progress (Ponte) • Prototype - December

Geo-Spatial Normalization - Goal Goal: We have: Text containing place names We want: Points on maps Process: Extract place names Look up places on a list Determine Lat-Long Display Seattle 47.6 N 122.317 W • Problems: • Place name not on list • More than one place with same name

Geo-Spatial Normalization - Solution Solution: Part 1: A significant portion of the references can be resolved using easy methods. Unambiguous: Seattle Toulouse Ambiguous:Paris Washington Disambiguated:Paris, TexasThe State of Washington Solution: Part 2: Use the “easily resolved” references as training data for a machine learning classifier which will distinguish the rest.

Geo-Spatial Normalization - Plans • For MidTerm (Dec. 12, 2001) • Detect a significant portion of the “easily resolvable” references • Display with some map tool • - Web delivery desirable • After MidTerm (May, 2002) • Try to find more “easily resolvable” references • Do the machine learning part • Integrate with other mapping tools

IFE-Bio Schedule

Architecture Schedule

Issues and Discussion • How is MiTAP currently being used? • Who are the users? • What are the users doing? • What do users want? • Prioritization of issues • Integrated feasibility experiment versus operational prototype: • Possible deployment vs integration of other TIDES technologies • (Do we need to adjust our priorities?) • Along what dimensions should we optimize? • Availability, capture, analysis, presentation

TIDES IFE-Bio KickOff Meeting

TIDES IFE-Bio KickOff Meeting

Presentation Transcript

HEAL 5 Kickoff Meeting

Implementation Kickoff Meeting

Project Kickoff Meeting

SPIRiT Kickoff Meeting

Engagement kickoff meeting

RESINT – Kickoff meeting

IPRE Kickoff Meeting

IPRE Kickoff Meeting

Kickoff Meeting

HEAL 5 Kickoff Meeting

GISMO Kickoff Meeting

GRADE Kickoff Meeting

Kickoff meeting

Kickoff Meeting Template

HEAL 5 Kickoff Meeting

KickOff Meeting

Project Kickoff Presentation | Kickoff Meeting | SlideUpLift