160 likes | 282 Views
State Department Cables Information Retrieval System. Fall 2007 LBSC 796 Erica Cooper, Linda Melchor Chris Reed, Jo-Han Rong Dave Rouff, Jess Snyder. Overview. About the collection Nature of the expected users About the search tool Batch evaluation Results User Study Results
E N D
State Department Cables Information Retrieval System Fall 2007 LBSC 796 Erica Cooper, Linda Melchor Chris Reed, Jo-Han Rong Dave Rouff, Jess Snyder
Overview • About the collection • Nature of the expected users • About the search tool • Batch evaluation Results • User Study Results • Next Steps (Features to add, given time to do it)
About the Collection • The U.S. State Department • Branch of the Federal Government responsible for U.S. foreign relations, diplomatic policy, and protecting U.S. citizens abroad. • 1973 to 1975 diplomatic communications • Behind the scenes look at international relations. • World events of the time: End of Vietnam war, Watergate, Bush (senior) becomes ambassador to China then DCI
Anticipated User Base • For the nature of the collection, the IR system will be used by researchers who want to see US opinion change as events unfold over time. • Users will not be looking for only one message on a topic, but all messages on a topic • Users may not know telegram format for addresses and TAGS
About the Search Tool • State Cables IR system developed in Java using the following resources • NetBeans IDE • Apache • Lucene toolkit - Information Retrieval tools • Digester - import XML • Two major components • Importing XML formatted messages and building index • User GUI and Index Searcher
Japan Tokyo 4793 3173 9326 Africa XZ 10946 Iraq IZ 2534 258 960 4172 1515 2783 3236 5440 Benefits: Geographic and TAGS abstraction • In the early 1970’s, telegram authors were encouraged to be brief, so left out key terms assumed to be known to the recipient.
Batch Evaluation Results • Inherent OR for search terms and abstraction terms causes increased recall, but lower precision.
User Study Results • Do novice users find the system easy to learn? • All the volunteers considered the system easy enough for a novice to use. However, one volunteer stated, “A person used to Google would expect more. ” • Can users easily learn to formulate effective queries using our system? • The responses were yes. • However, observations showed that we should emphasize that Boolean queries can be used. Initial search results were very large due to vague queries.
User Study Results • Are there common mistakes or misunderstandings that can be addressed for a better design? • “AND” should be automatically capitalized so it is understood as a Boolean term not a query keyword. • ”It would be nice to be able to see the whole Subject/Title" so that it is easier to select which ones she wants to read. • What are their expectations for the system? • Users found that the system met their expectations. • One volunteer stated that that system was more "user friendly compared to NARA's current system.”
User Study Results • What would they like to see in the design of the system? • Being able to hit the “Enter” key instead of having to click on the “Perform Search” button • Limit the number of hits per search • Make the interface wider • Provide summaries of articles from the result • Provide feedback that tells user that their request is being processed • A layout of that would separate the results to make it easier to read
User Study Results • Were the added features, geographic abstraction” and TAGS abstraction used? If so, were they useful? • One volunteer used the added features, but could not tell if they worked. • Any suggestions or comments? • Highlight search terms • Jazz it up more visually
Next Steps • Bugs to fix and features to add given additional time • option to sort results by date rather than score • Telegram DTG format interpreted as a string, resulting in string based sorting. • Warning for large result set and option to cancel search before committing to wait • pull search function out of button click/hit list click so it is persistent past the click event • currently the system requires when a message is selected from the hit list • option to export results to a file • web GUI • option to accept or reject proposed abstraction tools • ability to recognize multi-word search terms
Backup GUI Screenshot • …TBD