160 likes | 356 Views
The Lifecycle of Enterprise Information. Pankaj Mehra HP Distinguished Technologist Chief Scientist, HP Labs Russia. Business Records. Peak of product success. deemed evidence. Product design. Identified in a legal dispute.
E N D
The Lifecycle ofEnterprise Information Pankaj MehraHP Distinguished TechnologistChief Scientist, HP Labs Russia
Business Records Peak ofproduct success deemedevidence Productdesign Identifiedin a legaldispute Source for business value indicator: Geoff Moore, General Manager of Tower Software • Businesses record and retain a lot of information • Old information is generally less valuable • Information must be retained for a long time Syrcodis 2008 – Pankaj Mehra, HP Labs
The discipline ofInformation Lifecycle Management • Stored Data • Paper • Business Records BusinessProcesses( People(PortalsSearch) Applications(Query) ) • Discovery • Classification • Policy engine • Migration • Retention • Deletion • Key differentiators • Metadata indexing • Content indexing • Compression • Tamperproofing • Timerange search DatabasesFilesystemsContent RepositoriesScalable storage Syrcodis 2008 – Pankaj Mehra, HP Labs
To appreciate the full generality of Information Lifecycles, consider: • A store coupon (price, expiry date) • A gift card (value, identity) • Office space lease contract • An airline ticket • A child’s X-ray
To appreciate the full generality of Information Lifecycles, consider: • A store coupon (price, expiry date) • A gift card (value, identity) • Office space lease contract • An airline ticket • A child’s X-ray • Retention / deletion policy • 90 years! • Legal requirements • HIPAA • Privacy and access control • Healthcare power-of-attorney • Parental rights • Life and death issues • Cost and risk of doing business • Future laws! ×
Can you figure out the systems implications of all this? • A store coupon (price, expiry date) • A gift card (value, identity) • Office space lease contract • An airline ticket • A child’s X-ray • Retention / deletion policy • 90 years! • Legal requirements • HIPAA • Privacy and access control • Healthcare power-of-attorney • Parental rights • Life and death issues • Cost and risk of doing business • Future laws! × • Comprehensive data capture • Technology neutral • De-normalized • Sufficient indexing • Disaster tolerance • Volume of accumulated information (PBs)? • At what cost and what quality of service? • In what format and on what media? • Legal and loyalty cost of data outages and data loss? • What to keep and for how long? • What to delete and when? ×
ILM Platforms • Robust and auditable architecture • To allow compliance with prevailing laws • Connectors and data discovery • to capture information anywhere (image in medical equipment, invoice from SOAP message, project data fragmented across documents, or sales order normalized across many tables) • Scalable, tiered storage with migration • Data de-duplication and compression • Tamper-proofing • Robust classification and analysis algorithms • Compact and uniformly applied policies • Access methods resilient against forgotten names and locations
To comply with laws Syrcodis 2008 – Pankaj Mehra, HP Labs
To differentially manage by business value 50 000 000 data outage penalty-rate($/hr) 5 000 000 asynchronous, batched mirroring synchronous mirroring 500 000 50 000 fail-over to secondary site reconstruct primary site 5 000 tape backup 500 async mirror data loss penalty-rate ($/hr) 500 5 000 50 000 500 000 5 000 000 50 000 000 Keeton, et al. (HP labs), Designing for disasters, FAST’04 conference Syrcodis 2008 – Pankaj Mehra, HP Labs
Typical Lifecycle of Structured Data Test and development Subset the data Datawarehouse Extract, transform, load Productiondatabases Archive (or delete) Historical archive (long-term retention) Individual data marts (decision support) Syrcodis 2008 – Pankaj Mehra, HP Labs
… not forgetting the other 85%(in files and folders on fileservers) These graphs are typical of “reports” produced by discovery tools, such as Scentric Tao Destiny, Kazeon Discovery Engine, and IntermineFileCensus. Syrcodis 2008 – Pankaj Mehra, HP Labs
Continuously Protect Optimize Archive 0-72 hrs 72 hrs – 2 wks Months Years Decades The Lifecycle of Files • Information in documents typically passes through 3 phases during its life • Operational • frequently updated during 72 hours after creation • Transitional • infrequently updated • converted to business record format • Archival • static(rarely accessed) • subject to long-term retention management Syrcodis 2008 – Pankaj Mehra, HP Labs
Information Lifecycle Management Process Storage class 1 Discover find info sources Storage class 2 Classify determine categories Analyze Storage class 3[normalize, compress, encrypt] extract metadata disposition? at end of retention period discard ILMpolicies Syrcodis 2008 – Pankaj Mehra, HP Labs
ILM’s 3 Pillars:Business value, Laws, Costs • in: info sources, apps • out: business records Storage class 1 Discover • classification rules Storage class 2 Classify Analyze Storage class 3[normalize, compress, encrypt] disposition? • in: examples (e.g., of insider trading) • out: notifications (e.g., of possible insider-trading instances) • infrastructure costs, efficiency discard ILMpolicies • lifecycle action rules/goals • business value of information Syrcodis 2008 – Pankaj Mehra, HP Labs