370 likes | 383 Views
Sensitive Information Sweep. Using Cornell’s Spider Wyman Miles , Cornell University Kerry Havens , University of Colorado at Boulder Steve Lovaas , Colorado State University. Overview. Quick Background The Technical Problem (Kerry) The Organizational Problem (Steve) Spider (Wyman)
E N D
Sensitive Information Sweep Using Cornell’s Spider Wyman Miles, Cornell University Kerry Havens, University of Colorado at Boulder Steve Lovaas, Colorado State University
Overview • Quick Background • The Technical Problem (Kerry) • The Organizational Problem (Steve) • Spider (Wyman) • Summary & Questions
What is “Sensitive Information”? • A Growing Concern • A Moving Target • SSN, Credit Card, Driver’s License, Medical Records, Student Information, Proprietary Research,… • Data in Context – Aggregation
Why Are We All Here? • The Front Page! • CDW-G 2006 Survey – more than 3 million college students may have lost personal information in the last year. • Identity theft is the fastest growing crime in the U.S. • By far the biggest culprit? Lost or stolen computers.
Regulations, Standards, & Laws • Federal – HIPAA, FERPA, SarbOx, GLB,… Identity Theft Protection Act? • State – Many states passing identity theft protection laws; New York & Colorado have state CISO • Industry – PCIDSS
The Technical Problem:Finding sensitive information in a haystack Kerry Havens University of Colorado at Boulder
SSN Remediation • At CU-Boulder, SSNs were used as a student identifier before 2004 • House Bill 03-1175 was approved in 2003 requiring institutions to change this method to ensure the privacy of a student’s social security number • CU-Boulder started issuing student IDs to new students in July 2004 and converting SSNs to SIDs in 2005
Where the data is not stored • File type exclusions – fine tuning • Binary files where the data cannot be read • Received input from community for fine tuning • False positives • International telephone numbers • Examples for web form validation • Why is the department webpage asking for SSNs?
OS and File Encoding Problems • HTML encoding problems • Representations (pictures) of sensitive data are not found • Examples include PDF • Searching a UNIX filesystem • Preparing the file before searching for private data • For example, using strings to extract text from text/binary hybrids like .doc or .xls
Where the data is stored • Typical file types of discovered data • Gradebooks • Course web pages • Homework assignments • Travel authorization forms • Personal financial documents • Email
Regular Expressions • Returns too much data: /\d{3}-\d{2}-\d{4}/ • Searching for environment specific data in the hope that common data will lead us to more data: /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | (52[1-4]|65[0-3])\d{6})\b/ • State specific information can be found at http://www.ssa.gov/employer/stateweb.htm
Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/
Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ Boundary
Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ First acceptable digit
Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ 2, 4, or 6 digits in a row
Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ Delimited by dash or space
Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ Colorado specific prefix, not delimited
CU Experiences • Pitfalls • Users’ interpretations of the log file • Fine tuning file extension exceptions and regular expressions • Recommendations • Keep current environment in mind
The Organizational Problem:a really big haystack Steve Lovaas Network Security Manager Colorado State University
Organizational Vision • Support from the top • Cabinet-level committee driving the project • Spurred by headlines and state mandates • VP for IT who really gets security • Campus PR campaign • Web site • Public meetings • Tied SSN purge to the rollout of a new CSUID in Fall 2006
Using Resources • Project Constraints • Tight timeline • No budget • Not a trivial programming project • Buy / Build / Leverage tools? • Goal: 100% coverage vs. Best Effort • Spider chosen for Windows, Linux, Mac • Manual searching on AIX, mainframe
Ultimate Responsibility • Original thought: deans / dept. heads • Revised edition: individual employees • Developed a personal attestation for for every employee to sign, submitted in bulk by colleges • More work for central IT • Senior VP: Doing the scan and signing the form is a CONDITION OF EMPLOYMENT
Individual Attestation Form • Every employee • 2 choices: • I don’t interact with SSNs in the course of my job • SSNs in all electronic files under my control have been removed or encrypted • VP for IT must approve exceptions
CSU Experiences • Pitfalls • Beta tool for a live project requires quick response and careful management of user expectations & acceptance • Careful of deadlines, it’s a lot of work! • Recommendations • Don’t do this kind of project without active support from the very top • Anticipate the need for analysis/parsing tools • Have a supported encryption solution for exceptions
Cornell Spider Wyman Miles Sr. Security Engineer Cornell University
A Brief History of Spider • Early 2005, scan Web for SSNs • Later, scan disk images for SSNs/CCNs • March 2006, debut at BU Security Camp • April 2006, Educause, demand for a Windows version • Version 1.0 in May, 2.0 in June
A Brief History, II • June 2006, major feedback from Steve: bug reports, tests, feature requests • Engine developed that same month: internal incident response • OSX Spider Sept 2006 • Windows Spider rewrite • April 2007, GPL release of all Spiders
Current Spider • SSN, SIN, CCN, NINO discovery in many file types • Various data type validators • Web scanning, back to its roots • Scan for data in unallocated space • Faster. More readable source
Various Spiders • Windows Spider, aka Spider3 • OSX Spider • Engine, general UNIX spider • LinSpider, our oldest version • Spider Simple: Windows Spider preconfigured to skip noisy files
Future Spider • Feature set convergence between Engine, OSX, Windows • Community Development • Possible I2 hosting of distribution and documentation • More documentation! • Client-Server model revisited
Spider at Cornell • Incident response: a compromise has happened, what was at risk? • Pre-emptive • Dan Elswit, CALS Security Officer
Spider in CIT • CIT abandoned SSNs a few years ago, but they remain • Tech support uses Spider Simple to discover lurking SSNs • Manual process
Athletics • Spider Simple • Unique log names to network share • Centralized analysis
Spider Downloads • http://www.cit.cornell.edu/security/tools
Summary • Purging sensitive information is something we’re going to have to get good at • Get support from the highest levels • Tune regular expressions and file/ext skip lists for your environment • Anticipate parsing needs, exceptions • New Spider features, more users, broader OS support • Spider also for ongoing support, forensics
Questions? • Wyman Miles: • wm63@cornell.edu • Kerry Havens: • Kerry.Havens@Colorado.EDU • Steve Lovaas: • Steven.Lovaas@ColoState.EDU • The Spider users’ list: • cuspider-L@cornell.edu