1 / 6

Full-Text Search: Efficient and Powerful Solution for Querying Substrings

Learn about the limitations of using the "LIKE" operator for substring searches in databases and the advantages of implementing full-text search. Discover how inverted indexing and native support improve efficiency and explore alternative middleware solutions like Lucene, Solr, and Elastic.

jparent
Download Presentation

Full-Text Search: Efficient and Powerful Solution for Querying Substrings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS122B: Projects in Databases and Web Applications Winter 2019 Notes 11: Full-Text Search Professor Chen Li Department of Computer Science UC Irvine

  2. Search using “LIKE” • SELECT * FROM movies where title LIKE “%rain%” • Limitations: • “rain” will match “how to train your dragon” • “Chance Meatballs” cannot find “Cloudy with a Chance of Meatballs 2” • Reasons? String is treated a substring not a “word” or “a bag of words”

  3. Solution: full-text search • Most DBs support native full-text search • Example: CREATE TABLE ft ( entryID INT AUTO_INCREMENT, entry text, PRIMARY KEY (entryID), FULLTEXT (entry)); SELECT entryID FROM ft WHERE MATCH (entry) AGAINST ('golf');

  4. Under the hood: Inverted index 2 4 8 16 32 64 128 1 2 3 5 8 13 21 34 Cat Dog Bird 13 16

  5. DB full-text search: good and bad • Good: • “Native” – search happens at the place of the data, no separate copy, and no effort to do sync • Bad: • very limited support (e.g., ranking, stemming, tokenization, stop words, multi-language support) and efficiency

  6. Alternative solutions • Use a separate middleware solution just for search • Lucene, Solr, and Elastic • Bad: • Second data copy; needs to sync with DB data • Good: • Very powerful

More Related