1 / 61

Visual Link Analysis

Visual Link Analysis. Christopher R. Westphal Visual Analytics Inc www.visualanalytics.com. Christopher R. Westphal. CEO of Visual Analytics, Inc (VAI) Over 22 years of experience Experienced with a wide number of domains: Financial Crimes Money Laundering

naif
Download Presentation

Visual Link Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Link Analysis Christopher R. Westphal Visual Analytics Inc www.visualanalytics.com

  2. Christopher R. Westphal CEO of Visual Analytics, Inc (VAI) Over 22 years of experience Experienced with a wide number of domains: Financial Crimes Money Laundering Frauds (corporate/insurance) Law Enforcement (RMS) Intelligence

  3. It’s not rocket science… It’s not nuclear science… …but it does require some intelligence Prerequisites for Analysis Need to use your noodle Need to think “outside the box” 1 + 1 = a Need to know your data ! Need to learn new techniques

  4. There are no roadmaps to follow... …or existing references to use... …you have to “make-it-up” as you go along.

  5. What does… …. a money launderer look like? …. a criminal look like? …. an insider trader look like? …. a generic disorder in DNA look like? …. a manufacturing defect look like? …. a terrorist look like?

  6. Typing: 1 Reading: 40 Hearing: 60 Visual: 12,500 Interpretation Methods • Leverages human facility to process visual information • 312 times more efficient than reading text

  7. Need to use right visual presentation for exposing patterns. Many times the pattern is not obvious – and using alternative presentations can help expose the anomaly. How will business processes change once the patterns are found? Want to Expose Patterns & Trends New It’s like trying to find a needle in a haystack

  8. Pattern #1 Pattern #3 Pattern #2 Finding Patterns in the Data The method of data presentation is key to exposing hidden patterns

  9. X1 Y1 Z1 X2 Y2 Z2 X3 Y3 Z3 X4 Z4 X5 Y1 Y2 X1 Y3 X2 Z1 X3 Z2 X4 Z3 X5 Z4 Z1 Z3 X5 X1 Y1 Z2 Y2 X3 Y3 X2 X4 Z4 Placement Influences Interpretation Detecting Patterns Field-X Field-Y Field-Z X1 Y1 Z1 X2 Y1 Z2 X3 Y2 Z2 X4 Y2 Z3 X5 Y3 Z3 X5 Y3 Z4 Databases Table

  10. Example: Unexpected Commonality Unexpected Commonality • Certain “entities” should never be shared (e.g., SSNs) • Data prone to typos and misspellings • Possible misrepresentation and/or falsifying data on forms • Appearance of avoidance by varying information

  11. Example: Too Much Commonality • Many patterns are exposed due to repeating behaviors • Too many commonalities may indicate organized behaviors • Subjects perpetrate the same crime at different financial institutions • Only minor changes in their underlying Modus Operandi (MO) Too Much Commonality

  12. Example: Accumulated Behaviors Accumulated Behaviors • Each unique filing looks valid – need to see it collectively (all at once) • Large numbers of discrete actions forms the bigger pattern • Easy to avoid detection if each transaction appears legitimate • Individual may be using mules to move money in/out of the accounts

  13. What are Data? Organizations People Accounts Weapons Vehicles Claims Addresses Comms Passports ID Numbs Vessels Aircrafts Money Phones Meetings Facilities Transfers Events Drugs Narcotics Email Equipment Cases Travel Which of the following are Real-World Objects and which are Conceptual Objects?

  14. Be Consistent with Defining Types Define thetypefor the entity – not therole • Would not want to create: • Caller/Callee • Deposit/Withdrawal • From/To • Arrival/Destination • Shipper/Consignee • Seller/Buyer • Prime/Sub • Payor/Payee • Sender/Receiver • Owner/Renter Phone People Vehicle Address

  15. Technologies vs. Methodologies • Link Visualization is a tool just as Microsoft Word is a tool • Link Analysis does not replace the knowledge of the user • Improves efficiency • Produces better and higher-quality results • Link Analysis does exactly what it is told to do • Link Analysis makes data explicit • Methodology drives the technology • Need to fully understand your data • Need to have an expectation of what you want to see

  16. Connect The Dots…. • What happens if you… • Don’t know the dots? • Have missing/extra dots? • Mess-up the sequence? • Don’t recognize the threat? Simple…Easy…Straightforward…???? q p o r l m s k Pattern #1 Enters country on student visa Attends flight-training school Indirect connections to known terrorist u n t v j z w g 6 a Pattern #2 Commercial driver’s license Apply for chemical-hauling permits Purchase storage containers Rent transport trucks i h x y f 1 5 b e c 2 4 d 3 …Pattern #X…

  17. Is this an Important Pattern? UNKNOWN 999-99-9999 111-11-1111 000-00-0000 123-45-6789 NOT DEFINED Depends on Context Betty Ronald John Ronny Depends on Content Depends on Data Quality Roger Ronnie Pam Ron 480-07-7456 Depends on Interpretation 480-07-7456 Dutch-Boy George Depends on Sources The Gipper Mary Depends on Importance

  18. Is this an Important Pattern? What about this pattern? JOHN SMITH Common values can short circuit a network and potentially lead to inconsistencies. Therefore, it is important to decide how to represent your entities and try to manage the “lowest common denominator” Also need to factor the degree of transpositions to determine if it reflects an intentional misrepresentation of the facts.

  19. EVENT 07/01/05 EVENT 01/01/02 EVENT 06/27/05 EVENT 05/16/05 EVENT 06/03/05 PHONE PHONE SUBJECT SUBJECT EVENT 01/05/02 ADDRESS EVENT 01/11/02 Is this a Reliable Pattern? Pattern #1 Pattern #1 Pattern #3 Pattern #2 Pattern #3

  20. SSN SSN ID NUMBER ORG PHONE ADDRESS SSN Do these Patterns Make Sense? Pattern #1 Pattern #2 REPORT SUBJECT = SS Death Master Hit

  21. Is this an Important Pattern? 222334444 INVALID

  22. ID NUMBER ID NUMBER PHONE PHONE PHONE PHONE SUBJECT SUBJECT SUBJECT SUBJECT ADDRESS ADDRESS ADDRESS ADDRESS ADDRESS REPORT REPORT REPORT REPORT REPORT REPORT REPORT REPORT REPORT REPORT REPORT REPORT REPORT SSN SSN SSN SSN Which Pattern Is More Valuable? Pattern #2 Pattern #1

  23. What Does this Pattern Tell Us?

  24. Who is the Most Important Person?

  25. Methodologies – What’s Important?!?! A single SAR with a large number of SUBJECTS typically indicates some type of fraud-scheme. Data Result Sets SUBJECT has numerous SAR filings utilizing the same ACCOUNT number

  26. What Are We Looking For? Single Source Multiple Sources TELEPHONE TERRORISM CRIMINAL

  27. Income < $10k Property > $500k (non-compliant) No Income Any Property (non-filers) Source Integration Patterns Property Income Overlap

  28. Entity Resolution Are they the same entity? Jonathan Q. Adams Boston, MA 05/29/1968 DL:54321-123 Tokenizing Source A Source B Source C Standardization Normalizing Aliasing Value-add Permutation John Quincy Adams 123 Main Street Boston, MA 774-207-0000 Quincy Adams Bedford, MA 774-207-0000 12/05/1965 Anonymous Resolution md5_128bit = d35ecc61e4cc6810913e5de7fcb5931c

  29. Sample Network - Unchanged Multiple references to the same people based on spelling variations in their name. Different color-boxes show the like/similar entities MARIA DAVID EDISON

  30. Same Network - Consolidated Same exact data and information being displayed from previous diagram Reduced network from 14 entities to 3 entities. Much more readable and comprehendible . Proper data-cleaning is important for highly-variable data entry processes. Larger frequency between Edison and David Bi-directional flow between Edison and David Transfers only flows from Edison to Maria No transfer between David and Maria

  31. Network Structures – Interpretation Highly centric network – shows either source/sink behavior and strong influence/control over the network. Vulnerable and easy to monitor and seize assets. Network may be alien smuggling or various fraudulent activities. More interconnected nodes provides less overall control over the network. Multiple players act in a distributed fashion to add complexity to monitor or disrupt due to multiple targets of interest. Network may be narcotics trafficking or gambling operations. Highly distributed structure shows limited control or oversight across the network. No single control point and network can easily reconstitute using alternative entities. Hard to track and trace. Network may be terrorist financing.

  32. MANHA HAN MANHANHATTAN MANHANTAN MANHANTHAN MANHANTTAN MANHATAN MANHATTAN MANHATTAN K MANHATTAN N Y MANHATTAN NY MANHATTEN MANHATTON MEW YORK N Y NEW Y ORK NEW Y ORK NEW YO RK NEW YOEK NEW YOIK NEW YOK NEW YOKR NEW YOORK NEW YOR NEW YOR K NEW YOR, NEW YORJ NEW YORK NEW YORK 10017-1011 NEW YORK 10031 NEW YORK 725 NEW YORK 806 NEW YORK 806 NEW YORK 987 NEW YORK BK NEW YORK CITY NEW YORK N NEW YORK NEW YORK NEW YORK NY NEW YORK NY NEW YORK NY 10001 NEW YORK NY 10002 NEW YORK NY 10009 NEW YORK NY 10016 NEW YORK NY 10018 NEW YORK NY 10019 NEW YORK NY 10022 NEW YORK NY 10023 NEW YORK NY 10028 NEW YORK NY 10029 NEW YORK NY 10036 NEW YORK NY 10036-3619 NEW YORK QUEENS NEW YORK ROOSEVELT ISLAND NEW YORK STATE NEW YORK Y NEW YORK, NEW YORK, NEW YORK, NEW YORK NEW YORK, NY NEW YORK, NY 10017 NEW YORKCITY NEW YORKD NEW YORKE NEW YORKJ NEW YORKK NEW YORKQ NEW YORKS NEW YORKY NEW YORK| NEW YORL NEW YORY NEW YOTK NEW YOUR NEW YOURK NEW YOYK NEW YRK NEW YROK NEWYORK NY NY NY NY PLAZA NYC Y Data Quality Impacts Analyses

  33. Question? What country is represented by the code SA ? What country is represented by the code ZA ?

  34. Example – Structuring Dentist

  35. Example – Structuring Same Address

  36. Example – Structuring Dental Practice

  37. Example – Structuring More Structuring Date (2004)

  38. Example – Structuring Original Filing (over $10k)

  39. Elderly Abuse Pattern… SAR-MSB where SUBJECT DOB < 1930 Notice anything in common among these SAR-MSBs?

  40. Analysis of the Warranty Data Show all vehicles with fewer than 100 miles brand new cars many will still be on the dealership lot not realistic mileage for general service repairs Show labor only entire cost is based on mechanic time no parts replaced (not traceable) no work was outsourced (not external) Extracted Set Repair Type Review Details 1 hour - $45 1 hour - $60 1 hour - $50 1 hour - $65 Cigarette Lighters 1 hour - $45

  41. All SAR forms filed by banks in Howard County, Maryland Filing Years 2004 – 2005 Approximately 300 filings Group by CITY/STATE of the subject Example – Geospatial Filings

  42. Geo-encoded the Centroid of the Zipcode Centroid = approx middle of region Populated GIS viewer with results of encoded addresses Example – Geospatial Filings

  43. Example – Geospatial Filings • Filtered out any addresses associated with SAR transactions below $100k • Heavy concentration along I-95 corridor

  44. Example – Geospatial Filings • Zoom-in to the map • Highlight the boundaries for Howard County • Notice: all but a few of the addresses fall outside the county

  45. I-94 - Arrival-Departure Record 1. Family Name2. First (Given) Name3. Birth Date (Day/Month/Year)4. Country of Citizenship5. Sex (Male or Female)6. Passport Number7. Airline and Flight Number8. Country Where You Live9. City Where You Boarded10. City Where Visa Was Issued11. Date Visa Issued (Day/Month/Year)12. Address While in the United States13. City and State14. Family Name15. First (Given) Name16. Birth Date (Day/Month/Year)17. Country of Citizenship FLIGHT PASSPORT EVENT SUBJECT ADDRESS

  46. I-94 – Multiple Passport Numbers Identified a courier with over 50 different passport numbers for over 200 travel events Generated a timeline to show the number was changed in July (Mexican Passport)

  47. I-94 – Multiple Passport Numbers 1) Reverse look-up on address 2) Identified a courier business 3) Expanded to show other I-94 targets

  48. What Type of Data are in a Date? 4th DoM DOW - Day of Week DOM - Day of Month DOY - Day of Year DOQ - Day of Quarter WOQ - Week of Quarter WOY - Week of Year WOM - Week of Month MOY - Month of Year QTR-f - Quarter Fiscal QTR-c - Quarter Calendar Season/Holiday/Leap Year 1976 7th MoY Sunday 2nd DoW July JULY 4, 1976 2nd WoM Summer Leap Year 186 DoY 3rd QTR-c Banks Closed 4th QTR-f Holiday-US 28 WoY 2000 March 7 2001 February 27 2002 February 12 2003 March 4 2004 February 24 2005 February 8 2006 February 28 2007 February 20 2008 February 5 2009 February 24 2010 February 16 What do these dates have in common? ? ?

  49. Temporal – SAR Filings Very regular filing (1 per month – except Dec) Represents a very consistent filing behavior Dates for first ½ year occurred more often on Mondays Unusual change for remainder of year – jumps around a bit Reflect filing behavior of financial institution Week of Year Day of Week

  50. Temporal – SARC Filings SAR-CASINO filing that clearly show the individual tends to prefer weekend and holiday gambling time frames. Tells us he is “employed” since he is working during the week. Inactive times correspond to known work periods Long weekend Period of inactivity Holiday – 4th July Long weekend (Labor Day) Period of inactivity Holiday break timeframe is quite active and includes 12/25

More Related