250 likes | 409 Views
Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben Vargas Advisor: David Beskow. Data Fusion for Regionally Aligned Units. Agenda for Project Proposal. Introduce Regionally Aligned Units Problem Statement Framing the Problem Data Model Results. Recent Ebola Virus Outbreak.
E N D
Caitlin Rowe, Jed Lee, Max Nugmanov & Ruben VargasAdvisor: David Beskow Data Fusion for Regionally Aligned Units
Agenda for Project Proposal Introduce Regionally Aligned Units Problem Statement Framing the Problem Data Model Results
Recent Ebola Virus Outbreak • October 2014—U.S. Army Africa deployed 3,200 soldiers to Monrovia, Liberia as Joint Force Command for Operation United Assistance • support interagency humanitarian efforts • supervise construction of Ebola treatment units • These Soldiers/Leaders deployed into an austere and unfamiliar environment and problem set • Important that soldiers understand operating environment • Influential actors • Key infrastructure
Difficult Environments Hybrid Threats Humanitarian Assistance Natural Disaster Regional Conflict
Problem Statement Influential People? Important Places? We will fuse open-source data to identify influential state and non-state actors and infrastructure in order to help tactical leaders of regionally aligned units understand and engage their areas of interest before and during temporary deployments to austere environments.
Framing the Problem:The Operations Process Visualize Take from ADRP 5.0 Build and Maintain Situational Understanding Take from ADRP 5.0
Framing the Problem:OSINT – Open Source Intelligence OSINT – intelligence that is produced from publicly available information and is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement – Congress findings Open source data contains a wealth of information that can help military leaders understand these complex environments. Creation of OSC under CIA Working to get OSINT on equal ground with all other intelligence sources
Data:Global Knowledge Graph • Global Knowledge Graph is part of GDELT** Project • GDELT is built using an enhanced TABARI algorithm • GKG is an open source database that “connects the world's people, organizations, locations, themes, counts, and emotions into a single holistic network over the entire planet” **GDELT is the Global Database of Events, Language, and Tone
Data:TABARI Algorithm • Text Analysis By Augmented Replacement Instructions • Machine coding of international event data using pattern recognition and simple grammatical parsing • Designed to extract information from short news articles • Applied to ~80k English articles daily
Persons found in the same article are listed in the Persons field, separated by a semicolon karnatakajanatapaksha;kabhayachandrajain;janatadal We assumed persons found in the same article are “connected” The Global Network karnatakajanatapaksha k abhayachandrajain
GDELT from by KalevLeetaru in “Foreign Policy”
Identifying Influential People Step 1: Unix Code queries the GKG data set in order to find a geographic or topical subset (i.e. Bangladesh subset). Step 2: Use Network Centrality Models in R to identify influential individual names Step 3: Use Python to programmatically search Wikipedia for 2 sentence descriptions Linux Query Network Centrality Wiki Merge
Measuring Network Centrality Betweenness? Degree? Closeness? Eigenvector?
Degree Centrality • Number of neighbors, or edges, for a node • Counts connections with other nodes • Ignores any directions on the edges • Measures local centrality • Computationally fast
Closeness Centrality • Closeness is a measure of a nodes distance to all other nodes. • Closeness measures how easy it is for information to spread from a node to all other nodes sequentially
Betweenness Centrality • Measures the number of shortest paths that cross a node • Shows when one particular person is located in a strategically central position of the group gjk= the number of geodesics connecting jk; gjk(ni) = the number that actor i is on.
Eigenvector Centrality • Measures the importance of a node in a network, using adjacency matrix • Assigns relative scores to nodes by giving “points” to nodes that are more influential than others • Nodes depend on number of connections and relative scores of nodes • Connections to more influential people will contribute to their own influence A = adjacency matrix of graph λ = constant, eigenvalue ν = eigenvector
Our Primary Algorithm UNIX grep command for query Bootstrap for Large Datasets Combination of 2
Nigeria Case Study (1 of 3) • BokoHaram – Extremist group formed in 2009 that seeks establishment of an Islamic state in Nigeria. • In addition to Nigeria, they also operate in Chad, Cameroon, and Niger • Opposes Westernization of Nigerian society • Pledged allegiance to Islamic State of Iraq and Syria (ISIS) on March 7, 2015. • They have destabilized Nigeria, killing over 13,000 civilians and causing 1.5 million people to flee What if US Regionally Aligned Forces deployed to Nigeria to advise and assist in the fight against BokoHaram?
Nigeria Case Study (2 of 3) • Given data for entire world for the last 30 days, we selected only the data related to Abuja, Nigeria • This gives us a network with: • 13,527 People • 321,206 Connections • Density of 0.004 • This is a disconnected network Abuja, Nigeria 13,527 People in Network 321,206 Connections
Web App http://data-analytics.net/Apps/fusionNet/
Current Status and Way Forward • Tactical units and leaders are testing the model and web-app • 95th Civil Affairs Brigade and the Communications-Electronics Research, Development and Engineering Center will continue to develop the tool next year • The model has sparked interest in the OSINT Community at the Intelligence and Security Command at Fort Belvoir
Questions Questions? The authors would like to thank the Data Tactics Corporation for their support and collaboration throughout this project