210 likes | 526 Views
Email Archiving. Arvind Srinivasan Gaurav Baone. Imagine this is what happens to your business records at the end of every month …. SEC 17a-4. FDA 21 CFR 11. NASD 3010, 3110. DoD 5015.2. HIPAA. Sarbanes-Oxley. If this looks absurd …. That’s exactly what we do to email!.
E N D
Email Archiving Arvind Srinivasan Gaurav Baone
Imagine this is what happens to your business records at the end of every month ….
SEC 17a-4 FDA 21 CFR 11 NASD 3010, 3110 DoD 5015.2 HIPAA Sarbanes-Oxley If this looks absurd … That’s exactly what we do to email! Practically every major transaction, project, and contract, is recorded in email Regulators now treat email like hard copy records And the courts agree (FRCP, Dec 2006) Non-compliance fines and legal liabilities are rising . . . ZipLip, Inc.
Just How Much Scalability Does Archiving Require? 25,000 Employees averaging 70 mails/day 7 Years Retention 4.47 Billion Emails For Archive System To Index & Search 4.28 Billion Web-Pages Indexed by source: Google Press Release, Feb 17, 2004 Assume: versus Functionality needs to scale to these volumes
Outline • Email Capture Methods • Business Drivers • Archive Functionality • Retention & Deletion • Surveillance & Compliance • E Discovery • Conclusion
Email Capture Methods • Active Capture Methods – PRO-ACTIVE Archiving • Journaling • Mailbox crawling • SMTP Gateway Capture • Historical Capture Methods – REACTIVE Archiving • Restore from backup tapes • Crawl for PST / NSF files from desktops • Forensic captures
Primary Business Drivers - Regulations and Laws SEC 17a-4 NASD 3010 Gramm-Leach-Bliley Act HIPAA Hedge Funds Rule 203(b) Basel II CA SB1386 Sarbanes-Oxley Act Mutual Funds Rule 38a-1 NASD 3011 Investment Advisors Act UK Freedom of Information Act US Freedom of Information Act Canada PIPEDA Florida Sunshine Law FRCP Japan Personal Information Protection Act DoD5015.2
Functional Requirements • Retention • Surveillance and Compliance • e Discovery • Common Theme - Classification
Retention & Deletion Conflicting Requirements: • Laws & Regulation => Retain for “x” years. • Vs • Company Liability/Risk and Cost • Real-time Categorization of Mail • Sender/Recipients • Content (Subject, body, attachment) • User Input (Which folder it was found, Manual Tagging)
Retention & Deletion (cont’d) • "a priori" and "a posteriori“ based Retention. • Event Driven – Deletion of mail from user folder, Reclassification of mail by end user • Legal Hold – Court Orders to retain evidence relating to certain subject matters. • Single Instance Storage • Same Email in Multiple Mailboxes • Same Attachment in Multiple Emails • Significant storage savings.
Surveillance Conflicting Requirements: • Regulation require review of documents • Vs • Effort spent into reviewing the documents. • Real-time Flagging of Mail • Lexical Based – Key words, word associations, wild-cards • Policy Based – Eg. Mail from WallStreetJournal.com is newsletter. • Custom Code – Detect Vacation Response, Read Receipts, DSN’s
Surveillance(cont’d) • Real-time Flagging is a categorization problem • Current Systems suffer from lot of false positive. • Transparent and Deterministic rules preferred over Blackboxes. • Disclaimers (Internal and External) tend to get flagged as it contains the very terms that we try to flag. • Use Reviewer feedback to adapt the rules.
E-Discovery Conflicting Requirements: • Produce electronic docs. to satisfy court-orders • Vs. • Providing insufficient, not relevant, privileged Information • Discovery Request • Certain number of custodians • Date Range • Pertaining to certain subject matter; usually described by a set of Search terms. ┼ Source: Williams v. Taser Int’l, Inc., 2007 WL 1630875 (N.D. Ga. June 4, 2007)
E-Discovery(cont’d) • Landmark case Zubulake vs. UBS Warburg (2003) • Primarily driven by Federal Rules of Civil Procedure (FRCP) established in 2006. • Litigants are entitled to obtain electronic information from the adverse party. • Voluntary Initial Disclosures need to be made pertaining to each litigant • Today, almost all cases have some sort of electronic documents as evidence.
E-Discovery(cont’d) • Parties face Sanctions if they do not provide all the relevant documents.(Numerous precedence, eg. Metrokane vs Built NY 2008). Validation occurs when receiving party can prove existence of other document through hard-copy printout or other means. • Lawyers from both parties routinely negotiate keywords to define Search Concepts • Manual Review of Documents for Relevance and Privilege. Numerous product cluster similar documents (near deduplication) to present similar documents to reviewers to improve efficiency. • Chain of Custody – To prove that the document has not be tampered or altered.
Palin’s e-mail at $15m per request • NBC's price quote for e-mails sent to Todd Palin: $15 million. • AP's price quote for e-mails between state employees and the campaign headquarters of Sen. John McCain: $15 million. • AP's price quote for e-mails between state employees and the National Park Service: $15 million.
Conclusion • Most challenges in archiving can be reduced to Classification problem. • Segmentation Problems: Detect internal and external disclaimers • Detect change in Email behavior through email profile analysis • Understanding mails: Need to develop Analysis techniques to understand the contents • Visualization and Grouping Similar mails – Control the order in which mails and documents are viewed. • Consistent way of defining Subject Matters – Beyond just a set of keywords. • Extract more meta data about attachments such as images, audio and video files. • And all the above are required in muliple languages – English, Japanese, Spanish, Chinese, and others.