1 / 30

TIDES IFE-Bio KickOff Meeting

TIDES IFE-Bio KickOff Meeting. David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth, Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever October 17, 2001. Agenda.

carla-wynn
Download Presentation

TIDES IFE-Bio KickOff Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TIDES IFE-BioKickOff Meeting David Anderson, Laurie Damianos, David Day, Lynette Hirschman, Robyn Kozierok, Scott Mardis, Tom McEntee, Chad McHenry, Michael Merideth, Keith Miller, Bev Nunan, Jay Ponte, George Wilson, Flo Reeder, Steve Wohlever October 17, 2001

  2. Agenda • Current Status and Experiments (Laurie) • User Feedback on MiTAP and Exercise (Eric) • Lessons Learned (Laurie) • Architecture Briefing (Jay & Scott) • Geospatial Processing (George) • Schedule (Jay) • Issues and Discussion (All)

  3. Status of MiTAP • Availability: excellent • Available ~100% to users inside, outside firewall • 12 individual user accounts, 6 group accounts • 8 daily users on average, mostly repeat users • Data capture: rich & dynamic • ~70 working sources, new source added in 30 min • Average 5.8K msgs/day, 1 min latency • 250K msgs total in system • Analysis tools: improving • Messages in 6 languages (with COTS translation) • Sorted into 173 newsgroups • Color coded tagging (pers/org/loc/disease) • Popup summarization • Product: need to understand how system is being used

  4. MiTAP Activity:Messages and Users Over Time Attack on America Aug Experiment

  5. Performance Summary: Sudan 1999 vs Attack on America 2001

  6. Disease of the Month Experiments

  7. Feedback from Eric • Report on Bio-Threats • Deployment for N2 • MiTAP Status • Utility • Usability • Accessibility

  8. Lessons Learned Availability • User accounts for production system • No training needed (instructions available on website) • Stronger security (e.g., intrusion detection) • Better back-up, monitoring of throughput • More processing power Capture • Reduced latency on scheduled downloads and spidering, hourly capture of headlines • Distributed capture processing • Better capture of formatted sources • Some badly filtered, excess volume causes backlog • Poor zoning/formatting/decoding of some sources

  9. Lessons Learned (2) Analysis • Improved search (e.g., by date/relevance, popups, integrated with news server) • Improved “normalization” of names, regions • Too much data! - need better filtering, topic detection & clustering, summarization • Better MT, support for Arabic • Q&A • Geospatial & temporal visualization • Advanced search • Better information extraction

  10. Lessons Learned (3) Product • No environment for preparing reports • Workspace • Drag&drop repository • Editing capabilities • Multidoc summarization • Collaboration feature (chat & shared workspace)

  11. Catalyst Update: Recent work • Usability for developers • Logger • Configuration file refinements • Improvements for distributed systems • Redesign of I/O polling procedures • Explicit synchronization feature for Language Processor developers

  12. catlogger Logger Tokenize MetaData Documents Entity Extraction Entities Word.Text Sentence Sentence Tagger Word.POS catlogger

  13. In progress • Usability for developers • Monitor (system status capability) • Native XML I/O! (for ease of debugging & for lightweight Catalyst ) • Information retrieval • Integration between Catalyst and new IR engine • Pushing stream filters toward archived streams • Documentation

  14. Monitor Tokenize MetaData Documents Entity Extraction Entities Word.Text Sentence Sentence Tagger Word.POS Monitor Monitor

  15. XML I/O Present XML to Catalyst Event Extraction Catalyst to XML XML doc XML doc With XML I/O feature Easier to debug! Event Extraction XML doc XML doc

  16. XML I/O With XML I/O feature Catalyst Processes Catalyst Processes Wrapper Process XML Easier path to integrate existing language processing systems! Non-Catalyst Process

  17. Archived streams Filter criteria must be pushed upstream from its origination point toward the indices so that process may be reduced to little more than is absolutely necessary. Question Answering Application Coreference Indices XML doc Index Refinement Candidate Selection Answer Extraction filter criteria Origination point

  18. For the Midterm - 12/12/2001 • Monitor • XML I/O support in the Catalyst library • Lightweight Catalyst design • Documentation

  19. Catalyst collaborations • Qanda • Catalyst-based Qanda used for TREC • Catalyst-based Qanda deployed at AFIWC • Information retrieval • Archived annotation streams (for creating IR indexes) • Seekable streams (for processing IR queries) • Other projects • ACE/Alembic (Information Extraction) • Audio hot-spotting (Speech Retrieval) • Reading-comp (Question Answering)

  20. Document Management • Process scheduling • System linkage • Inter-site cooperation support • User features

  21. Process Scheduling • Problem: MiTAP needs the ability to prioritize sources • ‘Catching up’ on a new source shouldn’t prevent timely processing of an important existing source • Solution: • Preprocessing daemon will notify scheduler of incoming content • Scheduler assigns jobs to available resources based on priority • Status: • Prototype scheduler delivered (Ponte) • Preprocessing daemon rewrite in mid-November (Wohlever)

  22. System Linkage • Problem: Ever notice how new features tend to only apply to new content? • MiTAP is not flexible - difficult to: • Reprocess and repost a message that has errors • Find the original source document • Etc. • Currently, retroactive changes require 11th hour hacking (or sometimes 12th hour hacking) • Solution: Keep database of linkage information to make the system more flexible • Status: • Additional information currently being logged • Linkage database - March

  23. Inter-site Cooperation Support • Problem: Collaboration with other TIDES contractors who have large legacy systems • Issue of communication more than scalability • Solution: • Linkage database for annotations, similar to the one used for system maintenance • Web client server communication • Path to scalable solution w/richer interactions • Status: • Data management - January • Communications: investigation of relevant protocols and preliminary design - completed • Native XML support for Catalyst - December

  24. User features • Problem: MiTAP helps you find good information, then what? • Solution: • Web accessible support for user views and data organization to assist in reporting and analysis • Automated view construction/feedback incorporating additional TIDES technologies • Status: • Schema for v.1 of workspace developed (Ponte, Anderson) • Supporting code in progress (Ponte) • Prototype - December

  25. Geo-Spatial Normalization - Goal Goal: We have: Text containing place names We want: Points on maps Process: Extract place names Look up places on a list Determine Lat-Long Display Seattle 47.6 N 122.317 W • Problems: • Place name not on list • More than one place with same name

  26. Geo-Spatial Normalization - Solution Solution: Part 1: A significant portion of the references can be resolved using easy methods. Unambiguous: Seattle Toulouse Ambiguous:Paris Washington Disambiguated:Paris, TexasThe State of Washington Solution: Part 2: Use the “easily resolved” references as training data for a machine learning classifier which will distinguish the rest.

  27. Geo-Spatial Normalization - Plans • For MidTerm (Dec. 12, 2001) • Detect a significant portion of the “easily resolvable” references • Display with some map tool • - Web delivery desirable • After MidTerm (May, 2002) • Try to find more “easily resolvable” references • Do the machine learning part • Integrate with other mapping tools

  28. IFE-Bio Schedule

  29. Architecture Schedule

  30. Issues and Discussion • How is MiTAP currently being used? • Who are the users? • What are the users doing? • What do users want? • Prioritization of issues • Integrated feasibility experiment versus operational prototype: • Possible deployment vs integration of other TIDES technologies • (Do we need to adjust our priorities?) • Along what dimensions should we optimize? • Availability, capture, analysis, presentation

More Related