1 / 37

Sensitive Information Sweep

Sensitive Information Sweep. Using Cornell’s Spider Wyman Miles , Cornell University Kerry Havens , University of Colorado at Boulder Steve Lovaas , Colorado State University. Overview. Quick Background The Technical Problem (Kerry) The Organizational Problem (Steve) Spider (Wyman)

debralynch
Download Presentation

Sensitive Information Sweep

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sensitive Information Sweep Using Cornell’s Spider Wyman Miles, Cornell University Kerry Havens, University of Colorado at Boulder Steve Lovaas, Colorado State University

  2. Overview • Quick Background • The Technical Problem (Kerry) • The Organizational Problem (Steve) • Spider (Wyman) • Summary & Questions

  3. What is “Sensitive Information”? • A Growing Concern • A Moving Target • SSN, Credit Card, Driver’s License, Medical Records, Student Information, Proprietary Research,… • Data in Context – Aggregation

  4. Why Are We All Here? • The Front Page! • CDW-G 2006 Survey – more than 3 million college students may have lost personal information in the last year. • Identity theft is the fastest growing crime in the U.S. • By far the biggest culprit? Lost or stolen computers.

  5. Regulations, Standards, & Laws • Federal – HIPAA, FERPA, SarbOx, GLB,… Identity Theft Protection Act? • State – Many states passing identity theft protection laws; New York & Colorado have state CISO • Industry – PCIDSS

  6. The Technical Problem:Finding sensitive information in a haystack Kerry Havens University of Colorado at Boulder

  7. SSN Remediation • At CU-Boulder, SSNs were used as a student identifier before 2004 • House Bill 03-1175 was approved in 2003 requiring institutions to change this method to ensure the privacy of a student’s social security number • CU-Boulder started issuing student IDs to new students in July 2004 and converting SSNs to SIDs in 2005

  8. Where the data is not stored • File type exclusions – fine tuning • Binary files where the data cannot be read • Received input from community for fine tuning • False positives • International telephone numbers • Examples for web form validation • Why is the department webpage asking for SSNs?

  9. OS and File Encoding Problems • HTML encoding problems • Representations (pictures) of sensitive data are not found • Examples include PDF • Searching a UNIX filesystem • Preparing the file before searching for private data • For example, using strings to extract text from text/binary hybrids like .doc or .xls

  10. Where the data is stored • Typical file types of discovered data • Gradebooks • Course web pages • Homework assignments • Travel authorization forms • Personal financial documents • Email

  11. Regular Expressions • Returns too much data: /\d{3}-\d{2}-\d{4}/ • Searching for environment specific data in the hope that common data will lead us to more data: /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | (52[1-4]|65[0-3])\d{6})\b/ • State specific information can be found at http://www.ssa.gov/employer/stateweb.htm

  12. Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/

  13. Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ Boundary

  14. Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ First acceptable digit

  15. Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ 2, 4, or 6 digits in a row

  16. Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ Delimited by dash or space

  17. Regular Expressions • Let’s dissect this… • /\b([0-7]\d{2}[-|\s]\d{2}[-|\s]\d{4} | • (52[1-4]|65[0-3])\d{6})\b/ Colorado specific prefix, not delimited

  18. CU Experiences • Pitfalls • Users’ interpretations of the log file • Fine tuning file extension exceptions and regular expressions • Recommendations • Keep current environment in mind

  19. The Organizational Problem:a really big haystack Steve Lovaas Network Security Manager Colorado State University

  20. Organizational Vision • Support from the top • Cabinet-level committee driving the project • Spurred by headlines and state mandates • VP for IT who really gets security • Campus PR campaign • Web site • Public meetings • Tied SSN purge to the rollout of a new CSUID in Fall 2006

  21. Using Resources • Project Constraints • Tight timeline • No budget  • Not a trivial programming project • Buy / Build / Leverage tools? • Goal: 100% coverage vs. Best Effort • Spider chosen for Windows, Linux, Mac • Manual searching on AIX, mainframe

  22. Ultimate Responsibility • Original thought: deans / dept. heads • Revised edition: individual employees • Developed a personal attestation for for every employee to sign, submitted in bulk by colleges • More work for central IT • Senior VP: Doing the scan and signing the form is a CONDITION OF EMPLOYMENT

  23. Individual Attestation Form • Every employee • 2 choices: • I don’t interact with SSNs in the course of my job • SSNs in all electronic files under my control have been removed or encrypted • VP for IT must approve exceptions

  24. CSU Experiences • Pitfalls • Beta tool for a live project requires quick response and careful management of user expectations & acceptance • Careful of deadlines, it’s a lot of work! • Recommendations • Don’t do this kind of project without active support from the very top • Anticipate the need for analysis/parsing tools • Have a supported encryption solution for exceptions

  25. Cornell Spider Wyman Miles Sr. Security Engineer Cornell University

  26. A Brief History of Spider • Early 2005, scan Web for SSNs • Later, scan disk images for SSNs/CCNs • March 2006, debut at BU Security Camp • April 2006, Educause, demand for a Windows version • Version 1.0 in May, 2.0 in June

  27. A Brief History, II • June 2006, major feedback from Steve: bug reports, tests, feature requests • Engine developed that same month: internal incident response • OSX Spider Sept 2006 • Windows Spider rewrite • April 2007, GPL release of all Spiders

  28. Current Spider • SSN, SIN, CCN, NINO discovery in many file types • Various data type validators • Web scanning, back to its roots • Scan for data in unallocated space • Faster. More readable source

  29. Various Spiders • Windows Spider, aka Spider3 • OSX Spider • Engine, general UNIX spider • LinSpider, our oldest version • Spider Simple: Windows Spider preconfigured to skip noisy files

  30. Future Spider • Feature set convergence between Engine, OSX, Windows • Community Development • Possible I2 hosting of distribution and documentation • More documentation! • Client-Server model revisited

  31. Spider Log

  32. Spider at Cornell • Incident response: a compromise has happened, what was at risk? • Pre-emptive • Dan Elswit, CALS Security Officer

  33. Spider in CIT • CIT abandoned SSNs a few years ago, but they remain • Tech support uses Spider Simple to discover lurking SSNs • Manual process

  34. Athletics • Spider Simple • Unique log names to network share • Centralized analysis

  35. Spider Downloads • http://www.cit.cornell.edu/security/tools

  36. Summary • Purging sensitive information is something we’re going to have to get good at • Get support from the highest levels • Tune regular expressions and file/ext skip lists for your environment • Anticipate parsing needs, exceptions • New Spider features, more users, broader OS support • Spider also for ongoing support, forensics

  37. Questions? • Wyman Miles: • wm63@cornell.edu • Kerry Havens: • Kerry.Havens@Colorado.EDU • Steve Lovaas: • Steven.Lovaas@ColoState.EDU • The Spider users’ list: • cuspider-L@cornell.edu

More Related