380 likes | 450 Views
Integrating SQL Server FileTables , Property Search, and FTS/Semantic Search . Bob Beauchemin Developer Skills Partner SQLskills. About Me. www.sqlskills.com /blogs/ bobb. bobb@sqlskills.com. @ bobbeauch. Bob Beauchemin. Outline. FileTable The substrate – Filestream Storage
E N D
Integrating SQL Server FileTables, Property Search, and FTS/Semantic Search Bob Beauchemin Developer Skills Partner SQLskills
About Me... www.sqlskills.com /blogs/bobb bobb@sqlskills.com @bobbeauch Bob Beauchemin
Outline • FileTable • The substrate – Filestream Storage • FileTable Implementation • FileTable and Full-text Search • Full-text Search Enhancements • Property-Based Search • Semantic Search
LOBs - File System or Database? • Storing large binary objects in databases is suboptimal • Large objects take buffers in database memory • Updating large objects cause database fragmentation • In file system however, "update" is delete and insert • "Before image" in an update is not deleted immediately • Storing all related data in a database adds • Transactional consistency • Integrated, point-in-time backup and restore • Single storage and query vehicle • Break-even point is approximately 256kb
Filestream Implementation • Filestream is a database/file system hybrid • Enable filestream at an OS level (per-instance) • Filestreamfilegroups declared using DDL • Filestream storage is tied to a database • The filegroup is mapped to a directory • Sparse tx log, metadata, data live here • Must be NTFS file system • Caution: Files deletable from file system with appropriate permissions • VARBINARY(MAX) columns defined with FILESTREAM attribute • Filestream storage not available for other large types • Column's data is stored in the file system
Programming with Filestreams • Filestream columns available through T-SQL • If SQL is used, access as varbinary(max) • A few limitations – WRITE method not available • Streaming I/O • ODBC and OLE DB • Use OpenSQLFilestream • ADO.NET • System.Data.SqlTypes.SqlFileStream • No direct EF and LINQ integration • Uses T-SQL internally to access
Filestream Enhancements • SQL Server 2008 R2 • Snapshot Isolation Level Support • SQL Server 2012 • Multiple Containers per Filegroup • Performance Improvement for T-SQL and File I/O • Up to 5x read improvement • Linear scaling with multiple threads • Full Always-On Support • Access through virtual network name • Transparent reconnect • No support for Filestream and Database Mirroring • MAXSIZE can be specified for Filestream containers
Filestream Wrap-up • The good… • Data on file system, also just a column to SQL • Integration with existing tables • Integrated with backup, restore, SQL security • Managed almost completely by SQL DBA • Can use transactional streaming IO for performance • Limitations • Specific APIs for file I/O (Not Win32 API) • Always require SQL to get file path, transaction • Not useable for existing programs • File names are statically synthesized • Flat namespace • No notion of file extension (suffix) • No renames
FileTable • FileTable • Built over the Filestream feature • Allows access by existing applications • Win32 API access • Provides data read and write capabilities • Remote access using SMB protocol • Metadata manipulation handled by FileTable • Other capabilities handled by the file system • Hierarchical namespace
FileTable • FileTable is a fixed format table (SQL 2012) • Uses filestream storage and container • Includes file system properties as columns • HierarchyID columns presents data as "synthesized" hierarchical file system share • Available as file system share or T-SQL table • Can be maintained through the share directly • Some columns maintainable through T-SQL • Preserves filename and suffix • Allows non-transactional access through share
FileTable Setup • Enable Filestream feature - prerequisite • ALTER database to • Allow non-transactional access • Specify directory name for share • CREATE tablename AS FILETABLE • Specify filetable_directory (share subdirectory) • Specify collation for Name column • No partitioned FileTables • File access at: • \\machine\share\dbdirectory\ftdirectory • Share is instance share name • Dbdirectory is name specified on ALTER DATABASE • Ftdirectory is name specified on CREATE [tablename] AS FILETABLE
T-SQL Access • Insert • Many columns use defaults • Update • Can update stream without changing datetimes • No WRITE method • Delete • Fails on directories containing directories or files • BCP • Can disable system constraints and use BCP without constraints • SELECT INTO does not create a FileTable
Programming Functions • T-SQL built-ins allow portable code • GetFileNamespacePath() • Accessor on each row • GetFileNamespacePath(1) gets full UNC pathname • FileTableRootPath() • Gets root path of the table • FileTableRootPath('tablename') gets root path with table name included • GetPathLocator() • Retrieves hierarchyid value for UNC name
Triggers • Triggers fire with SQL or File I/O • Some trigger limitations • No INSTEAD OF triggers • Can't update any FileTables including itself • For in-place update of stream, before image not available • Rollback does not roll back stream updates • Possible to get sharing violations on the stream • After image must be aware of sharing modes
FileTable Share Behaviors • Share is real Windows Share • Even antivirus programs think it's real • Not available if SQL Server not running • Uses SMB protocol – open TCP 139, 445 for remote file access • Cannot layer FileTable on existing share • Some limitations • Memory mapped files not supported • Can't open with Notepad, Paint files locally • 15 level subdirectory limit • Windows Explorer limit 260 byte names • NTFS (and FileTable) allow more
Concurrency • READ COMMITTED semantics • For non-transactional file I/O • During RCSI or SI transactions • TSQL or transacted Win32 reads on stream fail unless they use READCOMMITTEDLOCK • FTS indexing always works • Snapshots and Always-On read-only secondaries • Same behavior as RCSI/SI
FTS Overview • Provides ability to search against character-based data • Character-based columns • Varbinary • For binary document formats • Includes Filestream storage data and FileTables • XML • Search functionality includes • Simple terms - with Boolean operators and fuzzy matching • Prefix searches • Inflectional forms • Words near other words or phrases • Ranking documents by relevance • Weighted values
FTS SQL Server 2012 Components • Word breakers updated • English, Danish, Polish, Turkish • Third party replaced with MS components • You can revert to previous versions • New word breakers • Czech, Greek • Stoplists, Thesaurus, Filters same as SQL Server 2008 • Office 2010 filters must be installed manually • Property Lists added
FTS 2012 Performance • Scales up to 350 million documents • Index-building speed • Master merge parallelization • More granular build • More granular locks during build
FTS 2012 Query Performance • Query orders of magnitude faster than 2008 • Worst case query response time < 3 sec • Predicate folding • e.g. CT(1) AND CT(2) -> CT(1 AND 2) • Faster search time using FTS STVF • Query parallelization • Optimized for SELECT TOP N
SQL Server 2012 Property Lists • Property-based search • Define property lists (DB object) • Property lists are per-index • Must update index content if added to existing index • Used with form of CONTAINS, CONTAINSTABLE • IFilter must support extracting properties of interest • Troubleshoot with TF 7603
Using Search Property Lists • CREATE SEARCH PROPERTY LIST • Empty list or from existing • ALTER SEARCH PROPERTY LIST • Add or remove search properties • Windows Property Sets/Properties have well-known IDs • http://msdn.microsoft.com/en-us/library/dd561977(v=VS.85).aspx • Associate with FULLTEXT INDEX • In CREATE or ALTER FULLTEXT INDEX
SQL Server 2012 NEAR Predicate • Custom proximity term specifies "how close" • Specify how many search terms away • Specify if order of search terms matters • Original NEAR referred to as "generic proximity"
Semantic Search – SQL Server 2012 • Search the meaning of the document • Not just keywords • Implements common search pattern • Search by keyword (full-text) • Most relevant keyphrases (document summary) • Finds closest documents by relevance • Reason (keyphrases) for document similarity
Semantic Indexes • Uses two internal indexes • Tag Index • Uses variation of TF-IDX algorithm • Term Frequency / Inverse Document Frequency • Document Similarity Index • Uses variation of cosine similarity algorithm • Only supports unigrams in SQL Server 2012
Implementing Semantic Search • SemanticsDB must be installed • Language Model – subset of FTS languages • Specify in full-text index specification • SEMANTIC_STATISTICS, per-column • Built as part of full-text index build
Programming Semantic Search • Three table-valued functions • SEMANTICKEYPHRASETABLE • Keyphrases by column & document (scored) • SEMANTICSIMILARITYDETAILSTABLE • Find most similar documents to exemplar (scored) • SEMANTICSIMILARITYTABLE • Why two documents are similar (scored)
SQL Server 2012 Diagnostics • Additional XEvents • FTS and Semantic Index building events • Includes FTS internal execution tree/recompiles • FileTableXEvents • Additions To DMVs • Includes Semantic Search Information • FileTable/Filestream • IO Handles • IO Requests • Non-transacted handles
Summary • Filestream Expanded in SQL Server 2012 • Supported by Always On • FileTable added • Based on Filestream • Adds Win32 and non-transaction access • Allows hybrid file-database applications • In the database • Support Full-Text and Semantic Search • Full-text enhancements in SQL Server 2012 • Performance • Semantic Search • Property Search • Customized NEAR
References • My blog entries on FileTable • http://www.sqlskills.com/BLOGS/BOBB/post/SQL-Server-2012-FileTables-in-T-SQL-part-1-functions-and-methods.aspx and others • Pro Full-Text Search in SQL Server 2008 • Michael Coles and Hilary Cotter • Blog: SQL Server FTS Team Blog • http://blogs.msdn.com/b/sqlfts • SQL Server 2012 Books Online • http://msdn.microsoft.com/en-us/library/cc645577(SQL.110).aspx
Track Resources Hands-On Labs @sqlserver @ms_teched SQL Server 2012 Eval Copy Get Certified! mva • Microsoft Virtual Academy
Resources Learning TechNet • Connect. Share. Discuss. • Microsoft Certification & Training Resources http://northamerica.msteched.com www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet http://microsoft.com/msdn
Required Slide Complete an evaluation on CommNet and enter to win!
Required Slide • *delete this box when your slide is finalized • Your MS Tag will be inserted here during the final scrub. MS Tag Scan the Tag to evaluate this session now on myTechEd Mobile
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.