110 likes | 246 Views
Text Analysis Using Automated Language Translators. CDT John Stanford MAJ Ian McCulloh. Agenda. Overview and Hypothesis Literature Review Motivation (Radio Address Case Study) Arabic Translation Data Conclusions and Recommendations. Overview and Hypothesis.
E N D
Text Analysis Using Automated Language Translators CDT John Stanford MAJ Ian McCulloh
Agenda • Overview and Hypothesis • Literature Review • Motivation (Radio Address Case Study) • Arabic Translation Data • Conclusions and Recommendations
Overview and Hypothesis • Text analysis is a useful tool for gathering intelligence. • A language barrier exists that makes text analysis harder in non-English-speaking regions. • Hiring human translators to translate texts into English is slow, expensive, and possibly a security issue. • Hypothesis: Output from automated machine translators such as the Forward Area Lanuage Converter (FALCon) is difficult for the average person to understand, but is just as useful for text analysis as human-translated text.
Literature Review • This project relates to two ARL projects: FALCon and the ARL Dynamic Network Analysis Lab. • Language can be modeled mathematically as a network of concepts using an adjacency matrix (Sowa, 1984). • Preprocessing steps such as stemming, deletion, and thesaurus application prepare a text for analysis (Carley and Diesner, 2004). • AutoMap, being developed by Carnegie Mellon University, inputs texts and outputs adjacency matrices. • ORA, also being developed by CMU, inputs the adjacency matrices and outputs the mental models (Carley and Reminga, 2004).
Radio Address Study • 94 of the President’s weekly radio addresses analyzed • From after Sep 11th to after the beginning of OIF (15 Sep 2001 to 21 June 2003) • Concept of ‘violence’ plotted on timeline; high occurrence after Sep 11th and leading up to OIF 20 MAR 2003- United States invades Iraq 12 SEP 2002- George Bush speaks to UN General Assembly 15 SEP 2001 27 JUL 2002 21 JUN 2003
Arabic Text Analysis • Arabic translated using CyberTrans, part of the FALCon package. • 22 Arabic articles from the Department of State’s news site analyzed (US Dept of State, 2006).
Analysis Results • Top concepts for the two methods of translation are the same in 16 of the 22 articles. • Top concept in the human-translated text is in the top three machine-translated concepts for all articles • When the methods differ, the human translation isn’t necessarily better. Human Machine
Conclusions and Recommendations • Automated text analysis makes it fast and economical to look at trends in local publications of strategically significant regions over either time or space. • Detailed statistical analysis must be done on this data. • Intelligence agencies who have access to large volumes of REDFOR data should run this kind of text analysis to verify that it works as well on REDFOR data as BLUFOR data. • FALCon development should continue and possibly be expanded to other languages such as Farsi.
Works Cited Bush, George. (2001-03). “President Bush’s Radio Addresses by date and topic.” Washington, DC: Office of the Press Secretary. Available from < http://www.whitehouse.gov/news/radio/index.html>. Carley, Kathleen and Diesner, Jana. (2004). Revealing Social Structure from Texts: Meta-Matrix Text Analysis as a novel method for Network Text Analysis. Causal Mapping for Information Systems and Technology Research: Approaches, Advances, and Illustrations., Harrisburg, PA: Idea Group Publishing. Sowa, J.F. (1984). Conceptual Structures: Information Processing in Mind and Machine. Reading, MA: Addison-Wesley. US Dept of State. (2006). “News from Washington.” Washington, DC: Office of the Press Secretary. Available from < http://usinfo.state.gov/usinfo/products/washfile.html>.
Dept of Mathematical Sciences Unites States Military Academy Dynamic Network Analysis Lab Army Research Lab Questions?