310 likes | 742 Views
SVR02. Using Classification for Data Security and Data Management. Clyde Law Software Design Engineer Microsoft Corporation. Agenda. Motivation File Classification Infrastructure (FCI) Overview and Demo FCI Architecture Retrieving Properties from Files Custom File Management Tasks
E N D
SVR02 Using Classification for Data Security and Data Management Clyde Law Software Design Engineer Microsoft Corporation
Agenda • Motivation • File Classification Infrastructure (FCI) Overview and Demo • FCI Architecture • Retrieving Properties from Files • Custom File Management Tasks • FCI Extensions • Extensibility Demo
Data Management Challenges Storage Growth Storage Costs Compliance Security and Information Leakage Increasing data management needs with disparate data management products
Managing Data by Location Business IT Need per-project file share Ensure business secret files do not leak out Back up files with personal information to encrypted store Expire low business impact files created over three years ago and not touched in the past year
Managing Data using Classification Mitigate costs and risks Manage data based on business value Classify data Apply policy File Classification Infrastructure Classify Manage Report Extend
Introducing the File Classification Infrastructure demo Clyde Law Software Design Engineer File Server Management Team
Benefits of Classification Reduce Cost Manage Risk Find sensitive files on public servers Expire files to reduce storage purchasing needs Extend through IT or ISV solutions Watermark documents with confidential data Encrypt backups of files with personal information Apply rights management to high-secrecy files Comply with retention policies Optimize backup SLAs Replicate only business-related documents Move files to less expensive storage Available in Windows
FCI ArchitectureClassification Pipeline • Designed to enable an ecosystem around classification • Comprehensive API for solutions • Extensible classification infrastructure Get/Set Property API for external applications File Classification Extensibility Points
Get/Set Property API • Consume properties by specifying files • Automation-compatible COM API • Works with native code, managed code, or scripts • Available through classification manager object • Set is meant for manual classification • Use extensibility modules instead to extend rule-based automatic classification
Get/Set Property APIUsing PowerShell • # Get an instance of the Classification Manager • $cm = New-Object –ComObjectFsrm.FsrmClassificationManager • # Enumerate and display all properties associated with a file • $props = $cm.EnumFileProperties("P:\foo\bar.txt", 0) • foreach ($prop in $props) { • Write-Host $prop.Name = $prop.Value • } • # Get and display the value of the "Secrecy" property • $secrecyProp = $cm.GetFileProperty("P:\foo\bar.txt", "Secrecy", 0) • Write-Host $secrecyProp.Value • # Set the value of the "Secrecy" property to "High" • $cm.SetFileProperty("P:\foo\bar.txt", "Secrecy", "High")
Get/Set Property APIUsing native C++ • // Get an instance of the Classification Manager • CComPtr<IFsrmClassificationManager> spClassMgr; • HRESULT hr = CoCreateInstance(CLSID_FsrmClassificationManager, • NULL, • CLSCTX_LOCAL_SERVER, • __uuidof(IFsrmClassificationManager), • &spClassMgr); • // Get the "PII" property • CComBSTRbstrFilename(L"P:\\foo\\bar.txt"); • CComBSTRbstrPropName(L"PII"); • CComPtr<IFsrmProperty> spPIIProp; • hr = spClassMgr->GetFileProperty(bstrFilename, • bstrPropName, • 0, • &spPIIProp);
Custom File Management Tasks • Apply policies by running custom commands on files that match specified criteria • Faster than scanning and retrieving properties yourself • No control on file order • Task runs command in new process per file
FCI Extensions • Classification modules • Determine values of properties to apply to files • Available in Windows: • Folder classifier – assigns properties based on file location • Content classifier – assigns properties based on string and regular expression matches in file content • Storage modules • Supply and persist properties associated with files • Available in Windows: • System storage module for all file types • Uses NTFS named stream to store properties • Functions as a cache for fast retrieval • Office 97-2003 and Office 2007 in-file storage
Pipeline Anatomy Each module passes streams of property bags to the next one • Streams can cross processes • Security checks are performed on cross-process data transfers Classification Runtime Process Hosting Process Hosting Process Hosting Process Scanner Gets basic file properties Office Storage [Load] Loads embedded properties Folder Classifier Classifies based on location Content Classifier Classifies based on content Office Storage [Save] Saves embedded properties Reporting Engine Adds files to report Discover Data Extract Properties Classify Data Store Properties Apply Policies Most modules are hosted within a separate process
Custom Pipeline Modules • Register module by creating a module definition through the Classification Manager • Typically once during installation • Module is a COM server that implements IFsrmClassifierModuleImplementation or IFsrmStorageModuleImplementation • Both native and managed are supported • Pipeline calls OnLoad to initialize module • Module needs to return connector object to connect hosting process • Instructions in MSDN documentation
Classifier ModulesModels for classification • Yes/no • Pipeline asks module whether or not a property value applies to the file • Explicit value • Pipeline asks module what value to assign to a specified property • Controlled by NeedsExplicitValue flag in module definition
Classifier ModulesClassification session call sequence • UseRulesAndDefinitions called at start of session • Module can choose to cache these rules • For each file: • OnBeginFile – specifies the property bag of the file to classify and the rules to classify it with • Module can choose to process file right away • For each rule: • Yes/no – DoesPropertyValueApply • Return TRUE or FALSE • Explicit value – GetPropertyValueToApply • Return value to apply, or return error code FSRM_E_NO_PROPERTY_VALUE if no value should be applied • OnEndFile – indicates end of file processing
Storage Modules • Supply or persist properties associated with file • Two types supported: InFile and Database • Cache is reserved for the built-in System Cache Module • Capabilities field in module definition determines whether module is instantiated for loading and/or saving properties • Separate instances created for load and save • LoadProperties – provide property values by calling SetFileProperty in the property bag • SaveProperties – retrieve properties in the property bag and persist them
Accessing File Contents • Modules should never open files directly • May not have proper permissions • Stream state may not be consistent with metadata • Use GetFileStreamInterface in the property bag • Supports ILockBytes and IStream interfaces • Takes care of getting the right permissions • Ensures last access and last modified times are unchanged • Ensures changes are properly committed (for storage modules)
PowerShell Host Classifier • Included in Windows SDK • Presents itself as a classifier to FCI that hosts PowerShell scripts to do the actual classification • Create custom classifiers without compiling and registering your own modules • Simpler to build, but has slower performance • Intended for in-house IT solutions and prototyping • More information at http://blogs.technet.com/filecab/archive/2009/08/14/using-windows-powershell-scripts-for-file-classification.aspx
Putting it all together demo Clyde Law Software Design Engineer File Server Management Team
Developer OpportunitiesCall to action • FCI provides many avenues to be part of end-to-end data lifecycle management solutions • Classifiers – provide classification based on content, identity, regulations, etc. • Data management products – leverage classification in solutions to backup, archival, leakage-prevention, etc. • Storage modules – provide property storage for new file formats • Flexible COM API • Native code, managed code, or scripting • PowerShell support enables fast deployment of solutions
Additional Resources • FCI Overview • http://microsoft.com/fci/ • Microsoft TechNet • http://technet.microsoft.com/en-us/library/dd758765%28WS.10%29.aspx • http://technet.microsoft.com/en-us/library/dd758756%28WS.10%29.aspx • Developing for FCI • Windows SDK • http://msdn.microsoft.com/en-us/windows/bb980924.aspx • FSRM API Documentation on MSDN • http://msdn.microsoft.com/en-us/library/bb972746%28VS.85%29.aspx • FCI Code Gallery • http://code.msdn.microsoft.com/fci/
Contact Us • Storage Team Blog • http://blogs.technet.com/filecab/default.aspx • E-mail • FCI Team • fciext@microsoft.com • Clyde Law, Developer • claw@microsoft.com • Matthias Wollnik, Program Manager • mwollnik@microsoft.com
YOUR FEEDBACK IS IMPORTANT TO US! Please fill out session evaluation forms online at MicrosoftPDC.com
Learn More On Channel 9 • Expand your PDC experience through Channel 9 • Explore videos, hands-on labs, sample code and demos through the new Channel 9 training courses channel9.msdn.com/learn Built by Developers for Developers….
AppendixProperty aggregation and conflict resolution [Default] Apply only if there is no value stored in the file • Values from Storage: • In-file > Database > Cache • Values from Classification Rules: • Default values applied once if not already present • Can also choose to explicitly aggregate or overwrite existing values • Ordered lists, Booleans, Multi-choice lists, and Multi-strings can be aggregated [Ignore Existing] Apply and ignore (replace) values from Storage and Default rules [Consider Existing] Apply but aggregate with values from Storage and Default rules
AppendixProperty bags • Property bag object holds the metadata of a file being classified • The object flows through the classification pipeline • Each pipeline module can assign property values Property Bag Property • File System Info • Relative Path, Creation Time, etc. Name Type Properties Assigned Values and Sources From Storage Modules From Default and CE Rules From IE Rules Messages Read Stream Write Stream Aggregated Value Aggregated Sources • Current Context • Module Type, Rule, etc.
AppendixConnecting a module to the pipeline • STDMETHODIMP CCustomModule::OnLoad( • __in IFsrmPipelineModuleDefinition *pDefinition, • __deref_outIFsrmPipelineModuleConnector **ppModuleConnector • ) • { • ...perform module initialization... • // Create the connector • CComPtr<IFsrmPipelineModuleConnector> spConnector; • hr = CoCreateInstance(CLSID_FsrmPipelineModuleConnector, • NULL, • CLSCTX_LOCAL_SERVER, • __uuidof(IFsrmPipelineModuleConnector), • &spConnector); • ...handle any errors... • CComQIPtr<IFsrmPipelineModuleImplementation> spModuleImpl = GetControllingUnknown(); • if (spModuleImpl == NULL) • ...handle error... • // Bind the connector to the module • hr = spConnector->Bind(pDefinition, spModuleImpl); • ...handle any errors... • // Return the connector • *ppModuleConnector = spConnector.Detach(); • return hr; • }