1 / 37

Integrating SQL Server FileTables , Property Search, and FTS/Semantic Search

Integrating SQL Server FileTables , Property Search, and FTS/Semantic Search . Bob Beauchemin Developer Skills Partner SQLskills. About Me. www.sqlskills.com /blogs/ bobb. bobb@sqlskills.com. @ bobbeauch. Bob Beauchemin. Outline. FileTable The substrate – Filestream Storage

cachez
Download Presentation

Integrating SQL Server FileTables , Property Search, and FTS/Semantic Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating SQL Server FileTables, Property Search, and FTS/Semantic Search Bob Beauchemin Developer Skills Partner SQLskills

  2. About Me... www.sqlskills.com /blogs/bobb bobb@sqlskills.com @bobbeauch Bob Beauchemin

  3. Outline • FileTable • The substrate – Filestream Storage • FileTable Implementation • FileTable and Full-text Search • Full-text Search Enhancements • Property-Based Search • Semantic Search

  4. LOBs - File System or Database? • Storing large binary objects in databases is suboptimal • Large objects take buffers in database memory • Updating large objects cause database fragmentation • In file system however, "update" is delete and insert • "Before image" in an update is not deleted immediately • Storing all related data in a database adds • Transactional consistency • Integrated, point-in-time backup and restore • Single storage and query vehicle • Break-even point is approximately 256kb

  5. Filestream Implementation • Filestream is a database/file system hybrid • Enable filestream at an OS level (per-instance) • Filestreamfilegroups declared using DDL • Filestream storage is tied to a database • The filegroup is mapped to a directory • Sparse tx log, metadata, data live here • Must be NTFS file system • Caution: Files deletable from file system with appropriate permissions • VARBINARY(MAX) columns defined with FILESTREAM attribute • Filestream storage not available for other large types • Column's data is stored in the file system

  6. Programming with Filestreams • Filestream columns available through T-SQL • If SQL is used, access as varbinary(max) • A few limitations – WRITE method not available • Streaming I/O • ODBC and OLE DB • Use OpenSQLFilestream • ADO.NET • System.Data.SqlTypes.SqlFileStream • No direct EF and LINQ integration • Uses T-SQL internally to access

  7. Filestream Enhancements • SQL Server 2008 R2 • Snapshot Isolation Level Support • SQL Server 2012 • Multiple Containers per Filegroup • Performance Improvement for T-SQL and File I/O • Up to 5x read improvement • Linear scaling with multiple threads • Full Always-On Support • Access through virtual network name • Transparent reconnect • No support for Filestream and Database Mirroring • MAXSIZE can be specified for Filestream containers

  8. Filestream Wrap-up • The good… • Data on file system, also just a column to SQL • Integration with existing tables • Integrated with backup, restore, SQL security • Managed almost completely by SQL DBA • Can use transactional streaming IO for performance • Limitations • Specific APIs for file I/O (Not Win32 API) • Always require SQL to get file path, transaction • Not useable for existing programs • File names are statically synthesized • Flat namespace • No notion of file extension (suffix) • No renames

  9. FileTable • FileTable • Built over the Filestream feature • Allows access by existing applications • Win32 API access • Provides data read and write capabilities • Remote access using SMB protocol • Metadata manipulation handled by FileTable • Other capabilities handled by the file system • Hierarchical namespace

  10. FileTable • FileTable is a fixed format table (SQL 2012) • Uses filestream storage and container • Includes file system properties as columns • HierarchyID columns presents data as "synthesized" hierarchical file system share • Available as file system share or T-SQL table • Can be maintained through the share directly • Some columns maintainable through T-SQL • Preserves filename and suffix • Allows non-transactional access through share

  11. FileTable Setup • Enable Filestream feature - prerequisite • ALTER database to • Allow non-transactional access • Specify directory name for share • CREATE tablename AS FILETABLE • Specify filetable_directory (share subdirectory) • Specify collation for Name column • No partitioned FileTables • File access at: • \\machine\share\dbdirectory\ftdirectory • Share is instance share name • Dbdirectory is name specified on ALTER DATABASE • Ftdirectory is name specified on CREATE [tablename] AS FILETABLE

  12. FileTable Columns

  13. T-SQL Access • Insert • Many columns use defaults • Update • Can update stream without changing datetimes • No WRITE method • Delete • Fails on directories containing directories or files • BCP • Can disable system constraints and use BCP without constraints • SELECT INTO does not create a FileTable

  14. Programming Functions • T-SQL built-ins allow portable code • GetFileNamespacePath() • Accessor on each row • GetFileNamespacePath(1) gets full UNC pathname • FileTableRootPath() • Gets root path of the table • FileTableRootPath('tablename') gets root path with table name included • GetPathLocator() • Retrieves hierarchyid value for UNC name

  15. Triggers • Triggers fire with SQL or File I/O • Some trigger limitations • No INSTEAD OF triggers • Can't update any FileTables including itself • For in-place update of stream, before image not available • Rollback does not roll back stream updates • Possible to get sharing violations on the stream • After image must be aware of sharing modes

  16. FileTable Share Behaviors • Share is real Windows Share • Even antivirus programs think it's real • Not available if SQL Server not running • Uses SMB protocol – open TCP 139, 445 for remote file access • Cannot layer FileTable on existing share • Some limitations • Memory mapped files not supported • Can't open with Notepad, Paint files locally • 15 level subdirectory limit • Windows Explorer limit 260 byte names • NTFS (and FileTable) allow more

  17. Concurrency • READ COMMITTED semantics • For non-transactional file I/O • During RCSI or SI transactions • TSQL or transacted Win32 reads on stream fail unless they use READCOMMITTEDLOCK • FTS indexing always works • Snapshots and Always-On read-only secondaries • Same behavior as RCSI/SI

  18. FTS Overview • Provides ability to search against character-based data • Character-based columns • Varbinary • For binary document formats • Includes Filestream storage data and FileTables • XML • Search functionality includes • Simple terms - with Boolean operators and fuzzy matching • Prefix searches • Inflectional forms • Words near other words or phrases • Ranking documents by relevance • Weighted values

  19. FTS SQL Server 2012 Components • Word breakers updated • English, Danish, Polish, Turkish • Third party replaced with MS components • You can revert to previous versions • New word breakers • Czech, Greek • Stoplists, Thesaurus, Filters same as SQL Server 2008 • Office 2010 filters must be installed manually • Property Lists added

  20. FTS 2012 Performance • Scales up to 350 million documents • Index-building speed • Master merge parallelization • More granular build • More granular locks during build

  21. FTS 2012 Query Performance • Query orders of magnitude faster than 2008 • Worst case query response time < 3 sec • Predicate folding • e.g. CT(1) AND CT(2) -> CT(1 AND 2) • Faster search time using FTS STVF • Query parallelization • Optimized for SELECT TOP N

  22. SQL Server 2012 Property Lists • Property-based search • Define property lists (DB object) • Property lists are per-index • Must update index content if added to existing index • Used with form of CONTAINS, CONTAINSTABLE • IFilter must support extracting properties of interest • Troubleshoot with TF 7603

  23. Using Search Property Lists • CREATE SEARCH PROPERTY LIST • Empty list or from existing • ALTER SEARCH PROPERTY LIST • Add or remove search properties • Windows Property Sets/Properties have well-known IDs • http://msdn.microsoft.com/en-us/library/dd561977(v=VS.85).aspx • Associate with FULLTEXT INDEX • In CREATE or ALTER FULLTEXT INDEX

  24. SQL Server 2012 NEAR Predicate • Custom proximity term specifies "how close" • Specify how many search terms away • Specify if order of search terms matters • Original NEAR referred to as "generic proximity"

  25. Semantic Search – SQL Server 2012 • Search the meaning of the document • Not just keywords • Implements common search pattern • Search by keyword (full-text) • Most relevant keyphrases (document summary) • Finds closest documents by relevance • Reason (keyphrases) for document similarity

  26. Semantic Indexes • Uses two internal indexes • Tag Index • Uses variation of TF-IDX algorithm • Term Frequency / Inverse Document Frequency • Document Similarity Index • Uses variation of cosine similarity algorithm • Only supports unigrams in SQL Server 2012

  27. Implementing Semantic Search • SemanticsDB must be installed • Language Model – subset of FTS languages • Specify in full-text index specification • SEMANTIC_STATISTICS, per-column • Built as part of full-text index build

  28. Programming Semantic Search • Three table-valued functions • SEMANTICKEYPHRASETABLE • Keyphrases by column & document (scored) • SEMANTICSIMILARITYDETAILSTABLE • Find most similar documents to exemplar (scored) • SEMANTICSIMILARITYTABLE • Why two documents are similar (scored)

  29. SQL Server 2012 Diagnostics • Additional XEvents • FTS and Semantic Index building events • Includes FTS internal execution tree/recompiles • FileTableXEvents • Additions To DMVs • Includes Semantic Search Information • FileTable/Filestream • IO Handles • IO Requests • Non-transacted handles

  30. Summary • Filestream Expanded in SQL Server 2012 • Supported by Always On • FileTable added • Based on Filestream • Adds Win32 and non-transaction access • Allows hybrid file-database applications • In the database • Support Full-Text and Semantic Search • Full-text enhancements in SQL Server 2012 • Performance • Semantic Search • Property Search • Customized NEAR

  31. References • My blog entries on FileTable • http://www.sqlskills.com/BLOGS/BOBB/post/SQL-Server-2012-FileTables-in-T-SQL-part-1-functions-and-methods.aspx and others • Pro Full-Text Search in SQL Server 2008 • Michael Coles and Hilary Cotter • Blog: SQL Server FTS Team Blog • http://blogs.msdn.com/b/sqlfts • SQL Server 2012 Books Online • http://msdn.microsoft.com/en-us/library/cc645577(SQL.110).aspx

  32. Track Resources Hands-On Labs @sqlserver @ms_teched SQL Server 2012 Eval Copy Get Certified! mva • Microsoft Virtual Academy

  33. Resources Learning TechNet • Connect. Share. Discuss. • Microsoft Certification & Training Resources http://northamerica.msteched.com www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet http://microsoft.com/msdn

  34. Required Slide Complete an evaluation on CommNet and enter to win!

  35. Required Slide • *delete this box when your slide is finalized • Your MS Tag will be inserted here during the final scrub. MS Tag Scan the Tag to evaluate this session now on myTechEd Mobile

  36. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related