380 likes | 492 Views
Mining for Social Processes in Intelligence Data Streams. Robert Savell, Ph.D. SBP ‘08 April 1,2008. 04/01/08. Overview:. Introduction: Process Based SNA. Process Detection and the Process Query System (PQS). Experiment: The Alibaba Data Set. Results. Conclusion.
E N D
Mining for Social Processes in Intelligence Data Streams Robert Savell, Ph.D. SBP ‘08 April 1,2008 04/01/08
Overview: • Introduction: Process Based SNA. • Process Detection and the Process Query System (PQS). • Experiment: The Alibaba Data Set. • Results. • Conclusion.
Traditional SNA and DSNA are reductionist: Project Rich Data Sets onto: • Graph representations • Directed and undirected • Reachability and connectedness • Define Structure and Properties • Centrality and prestige • Sub groups - clustering • Analyze Role and Position • Structural Equivalence • Block Models • Network Level Equivalence
Real World DSNA: Complex Systems on Networks. Real World social networks are composed of dynamic multimodal systems whose attendant processes and interactions both determine and are determined by the network topology
Process Based Social Network Analysis: Problem: 1. Identify and track active process threads in transactional datasets. 2. Identify supporting control and communication processes in the social network. 3. Establish structural roles of agents. 4. Define active individual or group processes and track the state of these processes. Note: Paradoxically added complexity can sometimes simplify the analysis.
Observations missed, noise added, unlabelled (This is what we see) a b a c f k h d c b g d b k h a g d a Observations are interleaved a b c c fh d cc a b gd b agd a Observations related to state sequences a b c d a b b a d a c f h c c g d g Methodology:Process Detection and Tracking f, g a, c a, b f, c c, d c, d Underlying (hidden) state spaces e h Process 1 Process n Note: Complexity from Entanglement of Distributed Simple Processes
The Process Query System (PQS): Sensor: Upon query produces a constrained set of recent email events from stream. Subscriber: Queries Sensors. Preprocesses streams. Produces attribute rich encapsulated observations. Trafen Engine: Partitions observation set into tracks (evidence of underlying social processes). Produces maximum likelihood hypothesis (collections of tracks and inferred process descriptions). [current implementation is based on the MHT algorithm]. Publisher: Formats Output of Trafen Engine. Please refer to www.pqsnet.net
Task: The Alibaba Dataset (Scenario 1) • A Simulated SigInt and HumInt collection. • Approximately 800 reports. • 8 month plot window. • 409 named entities. • 98 locations. Ground Truth: A 12 Member Terrorist Cell --- connected with the Ali Baba Network plans to “bake a cake” (build a bomb) which will be targeted to blow up a water treatment facilitynear London. The plot takes place from April to September of 2003. A close knit association of terrorists and sympathizers from other organizations will fill the air w/ fake chatter and decoy plots.
Alibaba Scn 1: discover the plot. Scenario 1: 820 reports. 409 named entities. 98 locations. Approx. 8 months. (Lethal characters in green w/ their connected component in cyan). Alibaba Scenario 1 Ground Truth
Alibaba Scenario 1 Ground Truth Ground Truth: Leader: Imad Abdul. Planner: Tarik Mashal. Hacker: Ali Hakem. Financier: Salam Seeweed. Recruiter: Yakib Abbaz. Security: Ramad Raed. Demolitions: Quazi Aziz. Demolitions: Hosni Abdel. Associate: Phil Salwah. Associate: Lu’ay. Alibaba terrorist Network in green. Background connected component in cyan.
Alibaba Scenario 1: SNA (cluster analysis) Stationary clustering finds some key suspects: 1-Phil Salwah 2-Abdul 3-Yakib Abbaz 4-Tarik Mashal 5-Qazi 6-Fawzan 7-Alvaka 8-Afia 9-Mazhar 10-Salam 11-Ahlima Amit 12-Wazir Bengazi 13-Raed 14-Saud Uvmyuzik 15-Mahira Algorithm:Extract triads. Collect common neighbors. Threat score for node is proportional to number of triads containing node. Top 15 suspects shown at right.
Alibaba S1: SNA Results Results vs. Ground Truth Stationary clustering: Ground Truth: 1-Phil Salwah 2-Abdul 3-Yakib Abbaz 4-Tarik Mashal 5-Qazi 6-Fawzan 7-Alvaka 8-Afia 9-Mazhar 10-Salam 11-Ahlima Amit 12-Wazir Bengazi 13-Raed 14-Saud Uvmyuzik 15-Mahira Leader: Imad Abdul. Planner: Tarik Mashal. Hacker: Ali Hakem. Financier: Salam Seeweed. Recruiter: Yakib Abbaz. Security: Ramad Raed. Demolitions: Quazi Aziz. Demolitions: Hosni Abdel. Associate: Phil Salwah. Associate: Lu’ay. ----> Significant Deviations from Ground Truth
DSNA: A Process View --- What we’d like: Full Transactional Data Ex. A Complete Meeting FSM
A Process View I--- What we have: Colocation Information: Event(d) = {date, location, named entities x 3}.
A Process View III---and singleton evidence of local state: Some Example Target/Event Strings from Alibaba Scenario 1: • 'Abdul tasked Yakib to recruit’. • 'Declining invitation to meet Phil Salwah’. • 'Discussed planning schedule’. • 'Arranged for meeting next week’. • 'Charity fundraiser’. • 'Discussed payment for assisting in baking of cakes’. • 'Informed that deception is in effect’. • 'Discussed training arrangements for baking cake’. • 'Attempted theft of chemicals’. • 'Casing Portsmouth Facility’.
Stages of Process Detection (1):Track Individual Entities. A. Remove Broadcasts. (Minimal information content). 1. Infer a home location for entities, and track individual trajectories.
Stages of Process Detection (2):Track Group Coordination Processes. 2. Aggregate l trajectories according to group synchronization FSM.
Weak Process Methodology (Stage 1): Given: A Constrained Alibaba corpus--- colocation event tuples: The Problem: Make Group and Subgroup coordination and broadcast process assignments (partition the event space): Define a quality measure for the partition:
Results: Alibaba Network Discovery. Ground Truth: 1 Leader: Imad Abdul. 2 Planner: Tarik Mashal. 3 Hacker: Ali Hakem. 4 Financier: Salam Seeweed. 5 Recruiter: Yakib Abbaz. 6 Security: Ramad Raed. 7 Associate: Phil Salwah. 8 Demolitions: Quazi Aziz. 9 Demolitions: Hosni Abdel. 9 ******: Omar. 10 Recruitee: Fawzan. 11: Decoy: Ahmet, Ali,… 12 Associate: Lu’ay. 12. ******: Sinan.
Ali Baba Network Discovery 2: Result: The technique successfully assigns significant hierarchical relationships across the net.
Ali Baba Cell Process Signature: Downstream Control Coordination w/ Top 4 Suspects Peak Logistical Preparation Initiation of plot Peak planning period
Ali Baba Subgroup Signatures: Downstream Control Ali Baba Top 4 Suspects: High Level Coordination Early and persistent meeting events. Decoy Plot (Ship or Port): Also lacks upstream coordination. Sparse early structure. No meetings.
Ali Baba Role Differentiation I: Downstream Control Predominance of Home Events Leader: Imad Abdul
Ali Baba Role Differentiation II: Downstream Control Operational Independence (from Abdul) Balanced Travel and Home Events Planner: Tarik Mashall
Ali Baba Role Differentiation III: Few Downstream Events (not a subgroup leader) Close Interaction w/ Imad Abdul Predominantly Home Events w/ Abdul Financier: Salaam Seeweed
Stages of Process Detection (4):Track an Evolving Threat. Note: too little training examples to do this systematically. But... Some potential Keyword Mappings: Requisite processes: 1-personnel 2-skill 3-leadership 4-train 5-finance 6-material 7-transport 8-house 9-stealth 10-recon 11-action Recommend Assign Recruit Task Terminate Assassinate Skill Invitation Arrange Assign Reprimand Disengage Propose Meeting Report Plan Train Price Finance Payment Request Material Break and Enter Smuggle Exit Return Trip Housing Deception Sleep Target Case Case Case Target Activate Attack
Detecting an aliased plot:Cake Plot vs. WaterPlot Sync Events Cake Plot Activity profiles, of suspects Associated with each plot. Similar but not obviously related. Water Plot
Aliased plot detection: Cake vs Water Threat Spectra Legend: 1-personnel 2-skill 3-leadership 4-train 5-finance 6-material 7-transport 8-house 9-stealth 10-recon 11-action Cake: blue Water : red AND THE WINNER IS…
Conclusion: • Process analysis provides a generic framework for identification tracking and categorization of social organisms. • Excellent results so far from process based techniques--- even on restricted attribute sets. • Just the beginning of exploration of this methodology within the complex systems framework.
Questions? Thanks To: George Cybenko: Postdoctoral advisor. Gary Kuhn: IC advisor. and the Process Query Systems Group. For further examples of PQS applications please visit: www.pqsnet.net.
A hostile network as an autopoietic system (collection of processes): Sustaining Processes (a partial list): • Structural coherence: • planning. leadership. synchronization. • Differentiation (from environment) --- • deception, active defense. • obsolescence and termination. • Metabolism: • financial and material support. • transportation and housing. • Sustainability/Reproduction: • recruitment. reward. indoctrination. • fission. merger. • Responsiveness (environmental interaction): • plot generation. planning. execution. • adaptive strategies. replanning.
DSSP Step I--- Partition the event space: • Entity Tracking and Stream Aggregation: • Isolate low entropy events such as broadcasts via process signature (broadcast FSM). • Track spatio and socio-temporal trajectories of individuals. • Identify trajectory collisions (co-occurrences).
DSSP Step Ia--- Derive network hierarchy from partition structure: • Identify coordinated entities (subgroups): • Aggregate entities into structurally coherent units. • Establish hierarchical relationship of units. • Identify primary communication channels between units.
DSSP Step 2--- State Assignments: • Identify Process Signatures: • Assign event states in Coordination FSM using hierarchical context defined in Step I. • Distribution of event states defines the weak process signature for the individual and subgroup. • Qualitatively (for now) assess individual roles via analysis of synch process event distributions.