Mass Declassification What If?

Mass DeclassificationWhat If? Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics JeffJonas@us.ibm.com September 23, 2010

The Ask • What emerging technology or innovative approaches come to mind … which may have applicability to this task? • Use your imagination. What if? • Not talking about any specific products • Not focusing on the widely available COTS/GOTS technologies (OCR, document management, case management, workflow, etc.)

The Problem at Hand • Volumes may be beyond human, brute force review (@5min/ea = 18,382 FTEs) • Necessitates some form of machine triage • Red: A disclosure risk • Yellow: A possible disclosure risk • Green: No disclosure risk • Reliable machine triage requires substantially better prediction systems • Even then, advanced means for humans to deal with the remaining large volumes of “possibles” is still required

Background • Early 80’s: Founded Systems Research & Development (SRD), a custom software consultancy • 1989 – 2003: Built numerous systems for Las Vegas casinos including a technology known as Non-Obvious Relationship Awareness (NORA) • 2001/2003: Funded by In-Q-Tel • 2005: IBM acquires SRD • Cumulatively: I have had a hand in a number of systems with multi-billions of rows describing 100’s of millions of entities • Affiliations: • Member, Markle Foundation Task Force on National Security in the Information Age • Senior Associate, Center for Strategic and International Studies (CSIS) • Distinguished Research Faculty (adjunct), Singapore Management University, School of Information Systems • Member, EPIC advisory board • Board Member, US Geospatial Intelligence Foundation (USGIF), the GEOINT organizing body

In Today’s Session • Intro to context accumulating systems • Predictions and data points needed for mass declassification • Strawman architecture • Challenges • Q&A

Context Accumulating Systems

Contextualization From Pixels to Pictures to Insight Relevance Observations Consumer (An analyst, a system, the sensor itself, etc.) Context

Context, definition of: Better understanding something by taking into account the things around it.

scrila34@msn.com Without Context

Consequences • Algorithms flat-lining (e.g., alert queues) • Enterprise amnesia on the rise • Overwhelmed by false positives and false negatives? You have seen nothing yet • Not enough humans to fix this with brute force • Risk assessment becomes the risk

scrila34@msn.com Job Applicant Trusted Supplier Known Terrorist Stolen Identity Context Accumulation

Puzzle Metaphor Primer • Imagine an ever-growing pile of puzzle pieces of varying sizes, shapes and colors • What it represents is unknown – there is no picture on hand • Is it one puzzle, 15 puzzles, or 1,500 puzzles? • Some pieces are duplicates and some are missing • Some are pieces are incomplete, low quality, or have been misinterpreted • Some pieces may even be professionally fabricated lies • Until you take the pieces to the table, you don’t know what you are dealing with

How Context Accumulates • With each new observation … one of three assertions are made: 1) Un-associated; 2) near like neighbors; or 3) connections • Asserted connections must favor the false negative • New observations sometimes reverse earlier assertions • Some observations produce novel discovery • As the working space expands, computational effort increases • The emerging picture helps focus collection interests • Given sufficient observations, there can come a tipping point • Thereafter, confidence improves while computational effort decreases!!!!

False Negatives Overstate The Universe Unique Identities True Population Observations

Counting Is Difficult Mark R Smith (707) 433-0000 DL: 00001234 Mark Smith 6/12/1978 443-43-0000 File 2 File 1

The Rise and Fall of a Population Unique Identities True Population Observations

New Record Mark Randy Smith 443-43-0000 DL: 00001234 Data Triangulation Mark R Smith (707) 433-0000 DL: 00001234 Mark Smith 6/12/1978 443-43-0000 File 2 File 1

Increasing Accuracy and Performance Unique Identities True Population Observations

“Expert Counting” is Fundamental to Prediction • Is it 5 people each with 1 account … or is it 1 person with 5 accounts? • If one cannot count … one cannot estimate vector or velocity (direction and speed). • Without vector and velocity … prediction is nearly impossible. • Therefore, if you can’t count, you can’t predict.

Mass DeclassificationPredictions

Mass Declassification Predictions • Whose equity is it? • Machine triage – disposition • Queue prioritization

Using What Data Points? FOR EXAMPLE: • 450M target documents • Dirty words • Previous declassifications • Previous declassification denials • FOIA’s • Intellipedia • Wikipedia • WikiLeaks • Deceased persons • Publically available accounts/facts

Open Source Discovery/Scoring • “Height of Pakistan’s Mufasa missile.” • What is 15.5 meters? • New York Times, Sept 21, 2010, C3 “Pakistan unveils Mufasa 7 Warhead” • Wikipedia: Mufasa_7_Warhead

Mufasa 7 Warhead Open Source Reference FOIA March 2010 Classified – Asserted Dirty Word Context Accumulation

Context Accumulation + Statistics Document Element Total | Declass | Class-Default | Class-Asserted Author: “Billy K” 4503 1600 403 0 Codeword: “Tomatoe” 4818 4600 218 0 Classification: “SI/TK/001” 23 22 1 0 Actors: “Salam Ahmed” 782 700 82 0 Declassification dispositions … becoming a force multiplier. The more human dispositions, the more automated dispositions. Humans Auto Triage 5,000 20 10,000 4,000 100,000 65,000 1,000,000 17,000,000

Policy Questions • What related information is already available in the public domain? • Evidence: Exists in open source • What damage might conceivably result from disclosure and what benefits might ensue • Evidence: Same text already released (by same equity holder)

Strawman Architecture

Strawman Architecture 450M Docs Predictions(*) Feature Extraction & Classification Historical Dispositions Context Accumulation DirtyWords Workflow System Dispositions Etc. (*) Recommendations: Equity of, Disposition, Priority

Another Idea: Crowd Sourcing • Can you predict specific people with privileges and knowledge … to whom can be routed selected documents for evaluation? • Can you publish machine-triage recommendations to a wiki or other form of internal broadcast for community crowd sourcing?

Another Idea: Better Classification • Using the overall declassification platform to assist in proper classification (real-time) • And, better pre-tagging to assist in future auto-declassification

Challenges

Challenges • Entity extraction is imperfect • Predictions may still not good enough, often enough • Not in English • The user work surface and its distribution • Consequences of an inappropriate release • With super access and super tools, this may call for stronger audit and insider-threat protections • Your contracting cycle and the creation of the system might take until mid-2011 or 2012 or 2013

Closing Thoughts

Closing Thoughts • Contextualization is essential to better prediction • There are not enough humans to ask every question every day • “Human attention directing” systems are critical to the mission • The data must find the data, the relevance must find the user

Worst Case Scenario • Rich context enables better hints for users, results in faster dispositions • Rich context enables improved sequencing of the work

Related Blog Posts Smart Sensemaking Systems, First and Foremost, Must be Expert Counting Systems Data Finds Data Puzzling: How Observations Are Accumulated Into Context The Fast Last Puzzle Piece Algorithms At Dead-End: Cannot Squeeze Knowledge Out Of A Pixel How to Use a Glue Gun to Catch a Liar It Turns Out Both Bad Data and a Teaspoon of Dirt May Be Good For You Smart Systems Flip-Flop

Questions? Blogging At: www.JeffJonas.TypePad.com Information Management Privacy National Security and Triathlons

Mass DeclassificationWhat If? Jeff Jonas, IBM Distinguished Engineer Chief Scientist, IBM Entity Analytics JeffJonas@us.ibm.com September 23, 2010

The Problem at Hand • 450M documents • x5min/document • =2.25B minutes • /60 = 37.5M hours • /2040 = 18,382 FTE’s

Mass Declassification What If?

Mass Declassification What If?

Presentation Transcript

powerpoint presentation

Powerpoint presentation

PPT Presentation

PowerPoint presentation

PowerPoint Presentation.

talk-ppt - PowerPoint Presentation

What Makes an Effective PowerPoint Presentation?

Presentation to the Public Interest Declassification Board

What is Mandatory Declassification Review (MDR)?

PowerPoint Presentation

PowerPoint Presentation

PowerPoint Presentation

Mass Declassification What If?

Full Service Moving Plano TX - PowerPoint PPT Presentation

What if the Assessment is not Fair (PowerPoint Presentation)

IEinfosoft.Pvt.Ltd Powerpoint PPT Presentation.

1800 Drivers PPT - PowerPoint PPT Presentation

PPT (PowerPoint Presentation) Combat Pest Control

PPT PRESENTATION

Hybrid MLM Software - PowerPoint PPT Presentation

Best MLM Software - PowerPoint PPT Presentation

Affiliate Marketing Software - PowerPoint PPT Presentation