470 likes | 604 Views
Introduction to Full-Text Searching in SQL Server 2012. Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com. Outline. What can we do with FTS? How to install FTS FTS components Creating FTS indexes How to query with FTS FILESTREAM and FileTable. FTS Basics.
E N D
Introductionto Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D.IT Impact, Inc. asocorro@itimpact.com
Outline • What can we do with FTS? • How to install FTS • FTS components • Creating FTS indexes • How to query with FTS • FILESTREAM and FileTable
FTS Basics FTS allows searching against character-based data • char • varchar • nchar • nvarchar • text • ntext • image • xml • varbinary • varbinary(max)
A First Look • Let’s run some simple examples to get a feel for FTS!
LIKE vs FTS • LIKE works on character patterns only • Cannot use the LIKE predicate to query formatted binary data • FTS is much faster against large amounts of unstructured text data
Supported SQL Server Editions • Enterprise • Business Intelligence • Standard • Web • Express with Advanced Services Available since at least SQL Server 2000
Language Support • 50+ languages • Language-specific components • Word breakers and stemmers • Stoplists • Thesaurus files
FTS Indexes • One index per table or indexed view • Must have a unique, single-column, non-nullable index on the table • Grouped within the same database into one or more full-text catalogs (“containers”)
Full-Text Catalogs • A logical construct • A way to manage FT indexes together
Index Population • Population: the addition of data to full-text indexes
CONTAINS • Precise or prefix matches to single words and phrases • Proximity matches • Logical operations between conditions: AND, OR, AND NOT • Optional use of inflectional formsand thesaurus
FREETEXT • Matching the meaning, but not the exact wording, of specified words or phrases • Always uses inflectional forms and thesaurus
CONTAINSTABLE AND FREETEXTTABLE • Return a relevance ranking value (RANK) and full-text key (KEY) for each row • The actual RANK values are unimportant and typically differ each time the query is run • ISABOUT/WEIGHT influence the rankingin CONTAINSTABLE
Stoplists • A mechanism to discard commonly occurring strings that do not help the search
Thesaurus • Nicknames: Robert/Bob • Common misspellings: calendar/calender • Homophones: Geoff/Jeff • Technical terms: proc/procedure Very powerful if you log searches and learn what users are commonly searching for
Thesaurus • One file per language
Filters • Extract textual information from the document (removing the formatting) • Send the text to the word-breaker component for the language associated with the column • Need to manually install Office 2010 and PDF filters
Why Store in the Database? • Integrating unstructured data into the relational database provides significant benefits: • Integrated storage and data management capabilities (e.g., backup) • Ease of administration and policy management • Full-text search
FILESTREAM • A database/file system hybrid • FILESTREAM is an attribute that can be assigned to a varbinary(max) column • Allows storing BLOB data in the file system • Not restricted to the 2 GB limitSQL Server imposes on BLOBs
FILESTREAM • SQL Server buffer pool is not used • Isolation semantics are governed byDatabase Engine transaction isolation levels
FILESTREAM • All data access must be transactional • Must use specific APIs for file I/O • Do not edit the files directly!
When to Use FILESTREAM • Objects that are being stored are, on average, larger than 1 MB • Store smaller objects in the database • Fast read access is important • You are using a middle tier for application logic
FileTables • A special, fixed-schema kind of table • Builds on top of existing FILESTREAM capabilities • Store files and documents in in the database, but access them from Windows applications as if they were stored in the file system (WIN32 API)
FileTables • Hierarchical namespace • Includes file system properties as columns • Preserves full file names • Non-transactional access through the FS
FileTables • Calls to create or change a file or directory through the Windows share are intercepted by a SQL Server component and reflected in the corresponding relational data in the FileTable
FileTables vs FILESTREAM • File and directory hierarchy maintained in the database • Windows application compatibility • Relational access to file attributes • Both are available in all editions
Wrap Up • Advanced searching on character-based data, including documents • FTS setup, components, and queries • FILESTREAM • FileTables
Other Topics • Document-property search • Semantic search • Optimizations • Query plans and execution traces
References • Posts and presentations by Bob Beauchemin • http://www.sqlskills.com/blogs/bobb/ • Blog: SQL Server FTS Team Blog • http://blogs.msdn.com/b/sqlfts • SQL Server 2012 Books Online • http://msdn.microsoft.com/en-us/library/cc645577(SQL.110).aspx
Filter Packs • Adobe PDF Filter • http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=4025&fileID=3941 • Office 2010 Filters • http://www.microsoft.com/en-us/download/details.aspx?id=17062