Rails App 运用 Redis 构建高性能的实时搜索

Rails App 运用 Redis 构建高性能的实时搜索 • 李华顺

Name: 李华顺 (Jason Lee) • Twitter:@huacnlee • Github:http://github.com/huacnlee • 者也淘宝 MED

目前市面上的搜索引擎项目

但我不讲它们!

Background • 做了者也(zheye.org)这个网站; • 需要实现类似 Quora 那样高效的搜索功能； • 采用 Ruby on Rails 开发，MongoDB 数据库； • 中文的搜索，需要分词； • 需要逐字匹配搜索；

此搜索功能的需求 • 能够在键盘输入的瞬间响应搜索结果； • MongoDB 支持； • 不需要太复杂的查询，单个字段作为搜索条件； • 逐字匹配功能； • 分词、模糊匹配； • 实时更新； • 排序；

为什么不用 Sphinx 或其他的开源项目 • 查询速度无法满足按键瞬间需要响应的需求 • 对于 MongoDB 的，暂无现成的组件可用 • 需要逐字匹配搜索 • 实时更新索引

起初的实现机制 set keys *关键词* mget classAskafter_createdo key ="quora:#{self.title.downcase}"$redis.set(key,{:id=>self.id,:title=>self.title,:type=>self.type})endbefore_destroydo$redis.del("quora:#{self.title_was.downcase}")enddefsearch(text,limit =10) words =RMMSeg.split(text) keys =$redis.keys("*#{words.collect(&:downcase).join("*")}*")[0,limit] result =$redis.mget(*keys) items =[] result.each do |r| items <<JSON.parse(r)end items.sort {|b,a| a['type']<=> b['type']}return itemsendend

问题 • 数据上了10万+会越来越慢 • 分词搜索只能按顺序输入的查询 • 无法排序

改如何改进？

运用 Redis 的特性 关键词索引前缀匹配索引 Sets SADD SREM Sorted Sets ZADD SINTER SUNION ZRANK ZRANGE 实体数据 Hashes HSET HDEL HMGET

Redis-Search 的索引结构

演示数据: Ask { 'id' : 1, 'title' : 'Ruby on Rails 为什么室如此高效？' , 'score' : 4} { 'id' : 2, 'title' : 'Ruby 编程入门应该看什么书籍？', 'score' : 20 } { 'id' : 3, 'title' : 'Ruby 和 Python 那个更好?' , 'score' : 13 } { 'id' : 4, 'title' : '做 Python 开发应该用什么开发工具比较好？', 'score' : 5 } Topic { 'id' : 1, 'name' : 'Ruby' , 'score' : 5}{ 'id' : 2, 'name' : 'Rails', 'score' : 18} { 'id' : 3, 'name' : 'Rubies', 'score' : 10 }{ 'id' : 4, 'name' : 'Rake', 'score' : 4 }{ 'id' : 5, 'name' : 'Python' , 'score' : 2} prefix_index_enable = true

索引关键词索引 Score排序索引前缀匹配索引 Sets Sets Sorted Sets topic:rails [2] ask:rails [1] topic:ruby [1] ask:ruby [1,2,3] topic:rails [4] topic:rubies [5] ask:python [3,4] ask:什么 [1,2,4] ...... • r • ra • rai • rail • rails* • rak • rake* • ru • rub • rubi • rubie • rubies* • ruby* ask:_score_:1 4 ask:_score_:2 20 ask:_score_:3 13 ask:_score_:4 5 topic:_score_:1 18 topic:_score_:2 10 topic:_score_:3 4 topic:_score_:4 2 ...... • * 号项表示实际词 • 自动排序存放

索引实际数据 Hashes Topic topic:1 { 'id' : 1, 'name' : 'Ruby' } topic:2 { 'id' : 2, 'name' : 'Rails' } topic:3 { 'id' : 3, 'name' : 'Rubies' } topic:4 { 'id' : 4, 'name' : 'Rake' } topic:5 { 'id' : 5, 'name' : 'Python' } Ask ask:1 { 'id' : 1, 'title' : 'Ruby on Rails 为什么如此高效？' } ask:2 { 'id' : 2, 'title' : 'Ruby 编程入门应该看什么书籍？' } ask:3 { 'id' : 3, 'title' : 'Ruby 和 Python 那个更好?' } ask:4 { 'id' : 4, 'title' : '做 Python 开发应该用什么开发工具比较好？' }

前缀匹配搜索过程 r ru rub ruby 输入 redis> ZRANKr • r • ra • rai • rail • rails* • rak • rake* • ru • rub • rubi • rubie • rubies* • ruby* 1 8 9 13 坐标 redis> ZRANGE1 100+1 得到从坐标 1 到 101 之间的前缀，并取出带 * 号的项 [rails,rake,rubies,ruby] [rubies,ruby] [ruby] redis> SUNIONSTOREtopic:rubies+ruby topic:rubies topic:ruby 取关键词的并集 redis> SORTtopic:rubies+rubyBYtopic:_score_:*DESC LIMIT 0 10 排序 [2,3,1,4] [2,1] [1] 返回到 redis-search redis> HMGETask 2,3,1,4 { 'id' : 2, 'name' : 'Rails', 'score' : 18} { 'id' : 3, 'name' : 'Rubies', 'score' : 10 } { 'id' : 1, 'name' : 'Ruby' , 'score' : 5}{ 'id' : 4, 'name' : 'Rake', 'score' : 4 } 结果 http://antirez.com/post/autocomplete-with-redis.html 前缀算法索引来源:

分词搜索过程 Ruby Ruby 什么 Ruby 什么书籍输入 [ruby] [ruby,什么] [ruby,什么,书籍] 分词得到 redis> SINTERSTOREask:ruby+什么+书籍ask:ruby ask:什么 ask:书籍 [1,2,3] [1,2] [2] 交集 (in Redis) redis> SORTask:ruby+什么+书籍BYask:_score_:*DESC LIMIT 0 10 [2,1] [2,3,1] [2] 返回编号到 redis-search redis> HMGETask 2,3,1 { 'id' : 2, 'title' : 'Ruby 编程入门应该看什么书籍？', 'score' : 20 } { 'id' : 3, 'title' : 'Ruby 和 Python 那个更好?' , 'score' : 13 } { 'id' : 1, 'title' : 'Ruby on Rails 为什么室如此高效？' , 'score' : 4} 结果

so...

Redis-Search ActiveRecord

Redis-Search 特性 • iMac 上面能够 100万+ 数据的搜索能够达到10ms/次以内响应速度; • 实时更新搜索索引； • 中文分词搜索 (rmmseg-cpp) • 前缀匹配搜索； • No-SQL - 无需查询原始数据库； • 根据汉语拼音搜索(chinese_pinyin)； • ActiveRecord 和 Mongoid 支持；

Redis-Search 的局限性 • 只能针对一个字段搜索（后面会加入别名搜索功能)； • 排序选项有限（目前只有一个）； • 附加条件只能是 =，不能 > 或 < ...； • 拼音搜索在某些同音字场景下面会有小出入；

应用场景 • 文章搜索； • 搜索用户； • 国家，城市匹配； • 好友匹配； • 分类，Tag 匹配； • 其他名称匹配（如：店名，地址，品牌，书籍，电影，音乐...) • 相关内容匹配；

How to use it?

安装 Gemfile gem'redis','>= 2.1.1'gem'chinese_pinyin','0.4.1'gem'rmmseg-cpp-huacnlee','0.2.9'gem'redis-namespace','~> 1.1.0'gem'redis-search','0.7.0' shell> bundle install

配置 config/initializers/redis_search.rb require"redis"require"redis-namespace"require"redis-search"redis =Redis.new(:host=>"127.0.0.1",:port=>"6379")redis.select(3) # 设置命名空间，防止和其他项目发生冲突redis =Redis::Namespace.new("your_app_name:search",:redis=> redis)Redis::Search.configure do |config| config.redis = redis # 前缀匹配搜索阀值，设置多少要看你需要前缀匹配的内容，最长的字数有多少，越短越好 config.complete_max_length =100 # 是否开启拼音搜索 config.pinyin_match =trueend

Model 配置 classUserincludeMongoid::DocumentincludeRedis::Search field :name field :tagline field :email field :followers_count,:type=>Integer,:default=>0 field :sex,:type=>Integer,:default=>0 # 开启次 Model 的搜索索引 # title_field 用于搜索的字段 # prefix_index_enable 是否使用逐字匹配 # score_field 排序字段 # condition_fields 附加条件 # ext_fields 存入 Hash 的字段,因为 redis-search 不再查询原始数据库，所以如果显示需要某些字段，请把它定义到这里 redis_search_index(:title_field=>:name,:prefix_index_enable=>true,:score_field=>:followers_count,:condition_fields=>[:sex]:ext_fields=>[:email,:tagline])end

配置好以后，Redis-Search 将会在数据 Create, Update, Destroy 的时候自动更新 Redis 里面的索引，以及 Hash 数据，无需理会更新的问题。

查询前缀匹配搜索: rails c>Redis::Search.complete('User','hua',:conditions=>{:sex=>1},:limit=>20) 普通分词搜索: rails c>Redis::Search.query('Ask','Ruby敏捷开发',:conditions=>{:state=>1},:limit=>20)

项目地址 http://github.com/huacnlee/redis-search

Thanks

Rails App 运用 Redis 构建高性能的实时搜索