60 likes | 68 Views
Learn about the limitations of using the "LIKE" operator for substring searches in databases and the advantages of implementing full-text search. Discover how inverted indexing and native support improve efficiency and explore alternative middleware solutions like Lucene, Solr, and Elastic.
E N D
CS122B: Projects in Databases and Web Applications Winter 2019 Notes 11: Full-Text Search Professor Chen Li Department of Computer Science UC Irvine
Search using “LIKE” • SELECT * FROM movies where title LIKE “%rain%” • Limitations: • “rain” will match “how to train your dragon” • “Chance Meatballs” cannot find “Cloudy with a Chance of Meatballs 2” • Reasons? String is treated a substring not a “word” or “a bag of words”
Solution: full-text search • Most DBs support native full-text search • Example: CREATE TABLE ft ( entryID INT AUTO_INCREMENT, entry text, PRIMARY KEY (entryID), FULLTEXT (entry)); SELECT entryID FROM ft WHERE MATCH (entry) AGAINST ('golf');
Under the hood: Inverted index 2 4 8 16 32 64 128 1 2 3 5 8 13 21 34 Cat Dog Bird 13 16
DB full-text search: good and bad • Good: • “Native” – search happens at the place of the data, no separate copy, and no effort to do sync • Bad: • very limited support (e.g., ranking, stemming, tokenization, stop words, multi-language support) and efficiency
Alternative solutions • Use a separate middleware solution just for search • Lucene, Solr, and Elastic • Bad: • Second data copy; needs to sync with DB data • Good: • Very powerful