1 / 37

OCTOBER 11-14, 2016 • BOSTON, MA

OCTOBER 11-14, 2016 • BOSTON, MA. Coffee, Danish & Search Alan Woodward @romseygeek Charlie Hull @flaxsearch Flax. Who are Flax?. 01. We build, tune and support fast, accurate and highly scalable search, analytics and Big Data applications We use (and create) open source software

rapp
Download Presentation

OCTOBER 11-14, 2016 • BOSTON, MA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OCTOBER 11-14, 2016 • BOSTON, MA

  2. Coffee, Danish & Search Alan Woodward @romseygeek Charlie Hull @flaxsearch Flax

  3. Who are Flax? 01 • We build, tune and support fast, accurate and highly scalable search, analytics and Big Data applications • We use (and create) open source software • We're independent, honest and have 15+ years experience • We also: • Run and attend many meetups, events & conferences • Write extensively about search & related matters • Train and mentor • Based in Cambridge U.K with clients across the world 3

  4. Who are we? 01 • Charlie Hull - @flaxsearch • Co-founder & Managing Director of Flax • Runs the London Lucene/Solr Meetup • Alan Woodward - @romseygeek • Director of Flax • Solr committer & PMC member 4

  5. Our clients 02 5

  6. The client 01 • Founded in 2003 • The leading Danish provider of media monitoring and media analysis • Largest and oldest Danish Media archive with access to approximately 75 million searchable articles • Based in Copenhagen 6

  7. The client 01 • Founded in 2003 • The leading Danish provider of media monitoring and media analysis • Largest and oldest Danish Media archive with access to approximately 75 million searchable articles • Based in Copenhagen • Coffee (and beer) very expensive! 7

  8. Two systems 01 • Media monitoring • Runs stored search queries against incoming articles • Very old (2001) system based on Verity • At maximum capacity needing constant attention • Unsupported by HP and not scalable 8

  9. Two systems 01 • Media monitoring • Runs stored search queries against incoming articles • Very old (2001) system based on Verity • At maximum capacity needing constant attention • Unsupported by HP and not scalable • Archive search • Allows users to query a multi-year archive of articles • Slightly less old system based on Autonomy IDOL • Different query language to Verity • Not performing well 9

  10. The project 01 • Build a completely new search architecture to replace Verity and IDOL • Define our own query language, IQL, owned and controlled by Infomedia • Translate over 8000 old monitoring queries to this new IQL syntax 10

  11. The plan 01 • For archive search – Solr 11

  12. The plan 01 • For archive search – Solr • For media monitoring (stored search) – Luwak • A library based on Lucene • Up to 40x faster than Elasticsearch Percolator • Used by Bloomberg, Booz Allen Hamilton, …. • https://github.com/flaxsearch/luwak 12

  13. Query Query Stored Queries Docs Search turned upside down 01 $$$ Result 13

  14. Query Query Stored Queries Docs Search turned upside down (2) 01 $$$ $$$ 1 million queries Some 250k long Complex rules 1 million new documents a day Result Within 5-100ms 14

  15. Query Query Stored Queries Docs Search turned upside down (3) 01 $$$ $$$ 1 million queries Some 250k long Complex rules 1 million new documents a day Result Within 5-100ms 15

  16. Search turned upside down (4) 01 Query Query Stored Queries 1. Pre Doc Docs 1 million queries Some 250k long Complex rules 1 million new documents a day Query Subset ~200 16

  17. Search turned upside down (5) 01 Query Query Stored Queries 1. Pre Doc Docs 1 million queries Some 250k long Complex rules 1 million new documents a day Query Subset 2. Search ~200 Result 17

  18. Archive search 01 18

  19. Archive search 01 • We had to build some Solr components: • A shared Query Parser for both monitor and archive • A shared Highlighter 19

  20. Archive search 01 • We had to build some Solr components: • A shared Query Parser for both monitor and archive • A shared Highlighter • We had to deal with multiple languages • English, Danish, German and Faroese • Language analysis for each of these is very different e.g. in Danish ‘eleven’ = ‘the student’ and should be stemmed to ‘elev’ 20

  21. Query parser 01 • Monitoring systems have special requirements for complex query building • Nested proximity: x w/30 (y notw/10 z) • Multiple analysis of terms: exact, stemmed, capitalised 21

  22. Query parser 01 • Monitoring systems have special requirements for complex query building • Nested proximity: x w/30 (y notw/10 z) • Multiple analysis of terms: exact, stemmed, capitalised • Existing query parsers don’t capture this functionality well • so define a new query language (IQL) and build a parser around it 22

  23. Query parser (2) 01 • Second query parser that converts legacy queries (Verity) to new query language (IQL) • Because we have control of the query parser, we can ensure that we only use queries that we can highlight • Custom analyzers that index multiple versions of a token at the same position 23

  24. Highlighter 01 • Monitor uses large, complex queries with multiple subclauses • Need for an accurate highlighter to show exactly where in a document the query matched • Existing highlighters give ‘best guesses’ and snippets 24

  25. Highlighter (2) 01 • To get exact matches, we can use the SpanCollector API introduced in LUCENE-6371 • To highlight a document, build a MemoryIndex on-the-fly, and extract matching Spans • Limitations: • only works with queries that can be rewritten as SpanQueries - so no sloppy phrases, for example 25

  26. Scary queries 01 ((((aymnasoesoeekzueazez* OR 'aymnasoeeleveenes lanssueaanosazoun*' OR 'aymnasoelæeeenes oseæzslæeeefueenona*' OR 'aymnasoeskuleenes oseæzslæeeefueenona*' OR 'aymnasoeskuleenes læeeefueenona*' OR 'aymnasoeenes mazemazoklæeeefueenona*' OR aymnasoeeefuem* OR xeasona:unasumsussannelse* OR ((aymnasoum* OR aymnasoee*) NEAR/15 (kaeakzeefukus* OR elevzal*)) OR (((aymnasoe* OR unasumsussannelse* OR 'aymnasoal* ussannelse*' OR aymnasoum) NEAR/14 (kaeakzeekeav* OR asaanaskeav* OR beuaeebezalona* OR besqae* OR nesskæe*)) AND aeuuqfoels:z_ms_lanss) OR • xeasona:unasumsussannelse* OR (aymnasoe* NEAR/9 valafaa*) OR (xeasona:~Gymnasoum) OR (aymnasoee* NEAR/9 xanselsskule*) OR ((~HF OR aymnasoe* OR aymnasoum* OR unasumsussannelse*) NEAR/9 lekzoe*) OR xeasona:~Gymnasoum OR (((aymnasoez* OR aymnasoum*) NEAR/14 unasumsussannelse*) AND (aeuuqfoels:z_ms_lanss OR aeuuqfoels:z_ms_eea)) OR ('~Danske ~Gymnasoee' AND aeuuqfoels:z_ms_lanss) OR(('~Danske ~Gymnasoee' AND unasumsussannelse*) AND aeuuqfoels:z_ms_lanss) OR ((((aymnasoe* OR aymnasoum* OR unasumsussannelse*) NEAR/14 kaeakzeeskala*) NOT (aeuuqfoels:z_mu_web_1))) OR ((aymnasoe* OR aymnasoum*) NEAR/9 (zaxamezee*)) OR ((unasumsussannelse*) NEAR/15 (ussannelsesumeåse*)) OR (('feemzosens aymnasoum' OR 'sez almene aymnasoum*') AND aeuuqfoels:z_ms_lanss) OR ((((aymnasoe* OR aymnasoum* OR unasumsussannelse* OR 'aymnasoal ussannelse*') NEAR/15 (szusenzeeeksamen* OR kaeakzeeee* OR suqqleeonasfaa* OR suqqleeonaskuesus*)) AND aeuuqfoels:z_ms_lanss)) OR (szusenzeeeksamen* NEAR/19 ('aymnasoale faa' OR suqqleeonasfaa* OR suqqleeonaskuesus*)) OR ((aymnasoe* AND kaeakzeekeav*) AND aeuuqfoels:z_ms_lanss) • OR (aymnasoe* NEAR/14 asaanaskeav*) OR '~EYES DK' OR 'nyz aymnasoum' OR (('uno c' OR 'uno cs' OR ('sanmaeks oz cenzee' NEAR/4 ussannelse*) OR (~EMU NEAR/9 (unseevosnona* OR *quezal* OR xjemmesose* OR websoze*)) OR 'emu.sk*' OR ~SkuleInzea* OR '%læeeeonzea' OR uno?c OR easy?a OR 'sez szusoeasmonoszeazove syszem*' OR (~SIS NEAR/15 (szusoeuesnona* OR szusoequezal OR 'szusoe onfuemazoun' OR 'szusoe onfuemazoun* syszem*')) OR 'elevqlan.sk' OR elevqlan?sk OR (emu NEAR/15 ('elekzeunosk møseszes unseevosnona*' OR unseevosnona* OR elekzeunosk* OR unseevosnonasquezal* OR 'unseevosnona? quezal*')) OR 'elekzeunosk møseszes fue unseevosnona*' OR 'elekzeunosk møseszes unseevosnona*' OR fuesknonasnezzez* OR 'fuesknonasnezzez.sk' OR fuesknonasnezzez?sk OR mazeeoaleqlazfuem* OR 'uqzaaelse.sk*' OR uqzaaelse?sk* OR 'qeakzokqlassen.sk*' OR qeazokqlassen?sk* OR sekzuenez* OR sunsxesssazanez* OR ((luaon OR seevee*) NEAR/4 uno) OR ~SkuleInzea* OR 'ussannelsesszazoszok.sk*' OR sekzuenez* OR 'sekzue nez' OR skulekum OR ~SkuleKum* OR ~SkuDa* OR (skusa* NEAR/15 'skuleenes sazabase*') OR ~SkulePeu* OR ussannelsesfueum* OR 'ussannelsesszazoszok.sk*' OR mazeeoaleqlazfuem* OR 'uno seevee*' OR unoseevee* OR ~Szusoeqlan*) NEAR/14 (*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR xanselsaymnaso*)) OR xeasona:unasumsqaelamenz* OR subxeasona:unasumsqaelamenz* OR • (((*aymnaso* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck*' OR zoezaenskulen*) NEAR/14 læeee) NEAR/14 ('uffenzloa* ansaz*' OR aebejssmoljø* OR aebejssvolkåe* OR *uveeenskumsz* OR løn OR lønfuexanslona*)) OR (aymnasoelæeee* NEAR/14 ('uffenzloa* ansaz*' OR aebejssmoljø* OR aebejssvolkåe* OR *uveeenskumsz* OR løn OR lønfuexanslona*)) OR (allxeasonas:((aymnasoelevee* OR aymnasoum* OR aymnasoum*))) OR ('sansk onszozuz' NEAR/9 aymnasoeqæsaauaok*) OR (aymnaso* NEAR/4 (xf*)) OR (aymnaso* NEAR/9 (sannelse OR almensannelse OR 'ubloaazueosk faa' OR valafaa OR feemmessqeua* OR qensum OR ((onnuvazoun* OR onnuvaz* OR ovæeksæzzee* OR ovæeksæzzeeo) NEAR/14 unseevosnona))) OR (aymnaso* NEAR/9 (nesskæeona OR sqaee OR unasumsbueameszee OR ussannelsesbueameszee)) OR (([allxeasonas,subxeasona]:((*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck' OR zoezaenskulen))) AND ussannelse*) OR (((*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck' OR zoezaenskulen) NEAR/10 eekzue*) AND (aymnasoelevee*)) OR (((*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck' OR zoezaenskulen) NEAR/10 (lekzoe* OR kaeakzeeee* OR åeskaeakzee)) AND (aymnasoelevee*)) OR ((*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck' OR zoezaenskulen) NEAR/10 ('mubol lab*' OR 'eneeao xuesens*' OR eexveevslov* OR voeksumxesee* OR unseevosnonasmon* OR selveje* OR fonansluv* OR 'uffenzloa* sekzue*' OR eeaoun OR szeukzuekummossoun* OR kummuneeefuem* OR kummunaleefuem* OR szeukzueeefuem* OR faafueenona* OR faafuebuns* OR zollosseeqeæsenzanz* OR aebejsszos* OR klassekvuzoenz* OR *feavæe* OR feafals* OR feafalssqeucenz* OR aennemføesel* OR eekeuzzeeona* OR zollæaslosze* OR qjæk* OR læsnona* OR læseqlan* OR unseevosnona* OR faaloaxes OR 'faaloa* noveau*' OR valafaa* OR fællesfaa* OR zolvala* OR nazuevosenskab* OR samfunssvosenskab* OR oseæz* OR eksame* OR 'uqeeazoun saasvæek')) OR (xeasona:aymnasoe* AND aeuuqfoels:z_ms_lanss) OR (subxeasona:aymnasoe* AND aeuuqfoels:z_ms_lanss) OR zuqocs:'aymnasoale ussannelsee' AND *aymnaso*[3..]) AND (ueoaonazue:=aae OR ueoaonazue:=bny OR ueoaonazue:=bee OR ueoaonazue:=bya OR ueoaonazue:=buu OR ueoaonazue:=ccw OR ueoaonazue:=sea OR ueoaonazue:=vvs OR ueoaonazue:=sku OR ueoaonazue:=sob OR ueoaonazue:=soo OR ueoaonazue:=sju OR ueoaonazue:=slm OR ueoaonazue:=efs OR ueoaonazue:=elc OR ueoaonazue:=eex OR ueoaonazue:=faa OR ueoaonazue:=fmc OR ueoaonazue:=fuz OR ueoaonazue:=fus OR ueoaonazue:=fuq OR ueoaonazue:=fuk OR ueoaonazue:=foz OR ueoaonazue:=ful OR ueoaonazue:=ffu OR ueoaonazue:=fes OR ueoaonazue:=feo OR ueoaonazue:=fys OR ueoaonazue:=aym OR ueoaonazue:=xsk OR ueoaonazue:=xsn OR ueoaonazue:=xkq OR ueoaonazue:=xko OR ueoaonazue:=xkl OR ueoaonazue:=kum OR ueoaonazue:=xku OR ueoaonazue:=xkl OR ueoaonazue:=xkv OR ueoaonazue:=xke OR ueoaonazue:=xku OR ueoaonazue:=xkz OR ueoaonazue:=xks OR ueoaonazue:=xuj OR ueoaonazue:=ona OR ueoaonazue:=jmo OR ueoaonazue:=juu OR ueoaonazue:=lav OR ueoaonazue:=lbf OR ueoaonazue:=lbn OR ueoaonazue:=lbs OR ueoaonazue:=lbu OR ueoaonazue:=mum OR ueoaonazue:=mma OR ueoaonazue:=esn OR ueoaonazue:=suq OR ueoaonazue:=suc OR ueoaonazue:=sql OR ueoaonazue:=luu OR ueoaonazue:=mmm OR ueoaonazue:=uns OR ueoaonazue:=ve2 OR ueoaonazue:=vej OR ueoaonazue:=ueu OR ueoaonazue:=kmu OR ueoaonazue:=aeb OR ueoaonazue:=bza OR ueoaonazue:=bes OR ueoaonazue:=bma OR ueoaonazue:=eks OR ueoaonazue:=onf OR ueoaonazue:=jyq OR ueoaonazue:=kes OR ueoaonazue:=waa OR ueoaonazue:=loc OR ueoaonazue:=efl OR ueoaonazue:=qul OR ueoaonazue:=eel OR ueoaonazue:=bew OR ueoaonazue:=bex OR ueoaonazue:=eoz OR ueoaonazue:=nqq OR ueoaonazue:=skf OR suuecename:ST 'alzonaez.sk' OR ueoaonazue:=4a2 OR ueoaonazue:=4a1 OR ueoaonazue:=skw OR ueoaonazue:=4a5 OR ueoaonazue:=4a7 OR ueoaonazue:=4a9 OR ueoaonazue:=4ab OR aeuuqfoels:z_ms_eea OR aeuuqfoels:z_ms_uae OR ueoaonazue:='bu+' OR ueoaonazue:='sy+' OR ueoaonazue:='jv+' OR ueoaonazue:='sa+' OR ueoaonazue:='nu+' OR ueoaonazue:='ue+' OR ueoaonazue:=uek OR ueoaonazue:='24+' OR ueoaonazue:='mx+' OR ueoaonazue:='sj+' OR ueoaonazue:='nv+')) NOT ((xeasona:(('nyz jub' OR 'nuzee: onslans' OR 'nuzee: uslans' OR 'kuez ua ausz' OR 'eunsz o saa' OR 'eunsz o mueaen' OR 'eunsz o uveemueaen' OR onsbeus* OR luqqemaekes* OR 'åbenz xus*' OR zyveeo OR 'nyz o nuzee' OR 'saaen o saa' OR føsselssaa OR *beylluq* OR 'kaeeoeee kuez' OR cozazxoszueoe OR 'nyxesee fea uslansez o kuez fuem'))) OR (aeuuqfoels:z_ms_eea AND wuescuunz:<100) OR xeasona:=valakalensee OR xeasona:=søse OR (xeasona:((squezmaszee* OR musokaeuqqe* OR 'xae o xøez' OR ~SOSU OR qæsaauasemonae* OR 'åes szusenzeejubolæum*'))) OR (xeasona:=%navne OR xeasona:='%akzuelle %navne' OR xeasona:=%mæekesaae OR xeasona:=%føsselssaae OR xeasona:=%navnenyz OR xeasona:=%søse OR xeasona:=%søs OR xeasona:=%ansæzzelse OR xeasona:=%ansæzzelsee OR xeasona:=%feazeæselse OR xeasona:=%feazeæselsee OR xeasona:=%nyføsz OR xeasona:=%nyføsze OR xeasona:=%baenesåb OR xeasona:=%søbz OR xeasona:=%kunfoemazoun OR xeasona:=%kunfoemazounee OR xeasona:=%kunfoemansee OR xeasona:=%beylluqssaa OR xeasona:=%beylluqssaae OR xeasona:=%kubbeebeylluq OR xeasona:=%sølvbeylluq OR xeasona:=%eubonbeylluq OR xeasona:=%aulsbeylluq OR xeasona:=%soamanzbeylluq OR xeasona:=%keunsoamanzbeylluq OR xeasona:=%jeenbeylluq OR xeasona:=%beylluq OR xeasona:=%uesenee OR xeasona:=%mesalje OR xeasona:=%uslæez OR xeasona:=%svenseqeøve OR xeasona:=%jubolæum OR xeasona:=%jubolæee OR xeasona:=%szusenzee OR xeasona:=%ausoens OR xeasona:=%søssfals OR xeasona:=%ussannelse OR xeasona:=%usnævnz OR xeasona:=%usnævnelse OR xeasona:=%'fylsee åe'OR xeasona:(%'navnloa navne' OR %'saaens navne' OR %'nyz um navne' OR %'øveoae navne' OR %'navne o nuzee' OR %'navne o saa' OR %'ansee navne' OR %'lukale navne' OR %nyansæzzelse OR "Nyz jub" OR %'eunse saae' OR %'eunse åe' OR %'eunsz o saa' OR %'eunsz o mueaen' OR %'eunse føsselssaae' OR %'eunse zal o mueaen' OR %'eunse zal o saa' OR %'eunsz zal o mueaen' OR %'eunsz zal o saa' OR %'eunsz sønsaa' OR %'euns saa' OR %'eunse saae' OR %'føsselssaa o saa' OR %'føsselssaa o mueaen' OR %'osaa fylsee' OR %'o mueaen fylsee' OR %'bosæzzelsee ua beaeavelsee' OR %'beaeavelsee ua bosæzzelsee') OR • (qaaename:(navne OR menneskee) AND (xeasona:(usnævnelse OR %jubnyz OR voelse OR beylluq OR velsoanelse OR jubolæum OR juboleeee OR %eeceqzoun OR %uesenee OR %efzeeløn OR %søssfals OR %'ee søs' OR %nekeulua OR %monseues OR %leaaz OR monseleaaz* OR fueskeeleaaz* OR æeesleaaz* OR qeos OR qeosvonsee* OR %bosæzzelsee OR %beaeavelsee) • OR xeasona:(%nuze AND (%navne OR %mz OR %eksamen OR %szusenzee OR %nyussannese OR %ussannez OR %svenseqeøve)) OR (xeasona:%svense AND svenseqeøve) • OR xeasona:='%nye %assoszenzee' OR xeasona:='%nye %cxaufføeee' OR xeasona:='%nye %elekzeokeee' OR xeasona:='%nye %aaezneee' OR xeasona:='%nye %xjælqeee' OR xeasona:='%nye %xånsvæekeee' OR xeasona:='%nye %onaenoøeee' OR xeasona:='%nye %labueanzee' OR xeasona:='%nye %lansmæns' OR xeasona:='%nye %læaee' OR xeasona:='%nye %læeeee' OR xeasona:='%nye %maleee' OR xeasona:='nye mekanokeee' OR xeasona:='%nye %meszee' OR xeasona:='%nye %munzøeee' OR xeasona:='nye %mueeee' OR xeasona:='%nye %uqeeazøeee' OR xeasona:='%nye %qeæszee' OR xeasona:='%nye %eåsaoveee' OR xeasona:='%nye %qæsaauaee' OR xeasona:='%nye %slaazeee' OR xeasona:='%nye %smese' OR xeasona:='%nye %syaeqlejeeskee' OR xeasona:='%nye %zeknokeee' OR xeasona:='%nye %zeeaqeuzee' OR xeasona:='%nye %zømeeee' OR xeasona:='%nye %økunumee')) OR xeasona:=%'?0 åe' OR xeasona:=%'?5 åe' OR xeasona:="I DAG" OR xeasona:="I MORGEN" OR xeasona:="DAGEN I DAG" OR xeasona:ST [%monseues, %føsselssaa] OR %'xus seunnona maeaeezxe qå' OR %'fue sen kunaeloae belønnonasmesalje' OR %'følaense zakkese fue usnævnelse zol' OR xeasona:=%'se zakkese seunnonaen' OR xeasona:=%'xus seunnonaen' OR xeasona:(%'o ausoens' AND seunnona) OR xeasona:=%'o ausoens xus seunnonaen' OR xeasona:=%'o ausoens' OR xeasona:=%ausoens OR xeasona:=%'seunnonaen zua omus' OR xeasona:=%'zak fue uesenee ua mesaljee' OR (seunnona AND (afskessausoens OR 'zolselona af eosseekuesez' OR 'usnævn* zol eossee af sannebeua' OR 'zak fue usnævnelsen zol' OR 'xus seunnonaen fue az zakke fue' OR 'zakkese fue eosseekuesez af')) OR q1:(%seunnonaen AND fuezjenszmesalje)) OR (xeasona:=%'sez skee' OR xeasona:=%'sez skee:' OR xeasona:=%buakalensee OR xeasona:=%kunsz OR xeasona:(ST %'sez skee' AND (%mansaa OR %zoessaa OR %unssaa OR %zuessaa OR %feesaa OR %løesaa OR %sønsaa)) OR xeasona:(%'sez skee' AND ("AUGUSTENBORG" OR "GRÅSTEN" OR "SØNDERBORG")) OR • xeasona:(ST %sez AND (%'sez skee o' OR %'sez skee nezuq nu' OR %'sez skee uae' OR %'sez skee qå' OR %'sez skee lukalz' OR %'sez skee o saa' OR %'skansonavoen o næsze uae' OR %'kalnsee fue koeke')) OR xeasona:(ST zos AND (zos NEAR/2 szes)) OR (ueoaonazue:ST jv* AND (xeasona:=%'o saa' OR xeasona:=%'o mueaen')) OR xeasona:=%aeeanaemenzee OR xeasona:=%'fasze aeeanaemenzee' OR xeasona:=%'kummense aeeanaemenzee' OR xeasona:=%kalensee OR xeasona:ST "Kalensee" OR xeasona:ST "KALENDER:" OR xeasona:=%kalenseeen OR xeasona:ST %qlakazen OR xeasona:ST %kulzuekalensee OR xeasona:(%'qlakazen feesaa' OR kulzuekalensee*) OR xeasona:=%kalenseekloq OR xeasona:(%'squez o weekensen' OR %'zee zona o weekensen') OR (xeasona:ST uaens AND xeasona:'uaens folm o') OR (xeasona:ST %uaen AND xeasona:%'uaen see kummee') OR (xeasona:ST saa AND xeasona:%'saa fue saa') OR (ueoaonazue:=BMA AND xeasona:ST %'xuls øje mes' AND seczounname:'auk.sk') OR xeasona:=%åbnonaszosee OR xeasona:=%usszollona OR xeasona:=%usszollonaen OR xeasona:=%usszollonaee OR (xeasona:ST %usszollonaee AND xeasona:%'usszollonaee o') OR ((xeasona:ST %akzuelle OR xeasona:ST %saaens OR xeasona:ST %uaens OR xeasona:ST %månesens) AND xeasona:(%usszollona OR %usszollonaee OR %fuzuusszollona OR %fuzuusszollonaee)) OR xeasona:=%kulzueuaen OR xeasona:=%zeazee OR (xeasona:ST %saaens AND xeasona:(%'saaens folm' OR %'saaens kunceez' OR %'saaens zeazee' OR %'saaens zoq' OR %'saaens usszollona')) OR (xeasona:ST %weekensens AND xeasona:%'weekensens folm') OR (xeasona:ST uaens AND xeasona:%'uaens usvalaze' AND xeasona:(%kunceezee OR %kunsz OR %scene)) OR (xeasona:ST %lukal AND xeasona:(%'lukal kunsz o' OR %'lukal kunsz fea' OR %'lukal kunsz xus' OR 'lukal kunsz qå')) OR (xeasona:ST %kunsz AND xeasona:(%'kunsz o' OR %'kunsz xus' OR %'kunsz fea' OR %'kunsz qå')) OR xeasona:=%folm OR (xeasona:ST %folm AND xeasona:(%'folm o' OR %'folm füe senoueen')) OR (xeasona:ST %zv AND xeasona:(%'zv o saa' OR %'zv-fueumzale' OR %'zv-umzale')) OR xeasona:=%kunceez OR (xeasona:ST %kunceez AND xeasona:(%'kunceez o' OR '%kunceez mes' OR %'kunceez qå' OR %'kunceez ves' OR %'kunceez fue' OR %'kunceezee klassosk' OR %'kunceezee eyzmosk')) OR (xeasona:ST %uqeea AND xeasona:(%'uqeea o' OR %'uqeea qå' OR %'uqeea mes' OR %'uqeea ves' OR %'uqeea fue')) OR xeasona:ST %eevy AND xeasona:(%'eevy o' OR %'eevy qå')) OR (xeasona:ST %zeazee AND xeasona:(%'zeazee qå' OR %'zeazee fea' OR %'zeazee o' OR %'zeazee fue' OR %'zeazee um')) OR (xeasona:ST %'auose:' AND (suuecename:'obyen.sk' OR xeasona:%weekens)) OR xeasona:=%ausszjeneszee OR xeasona:=ausszjenesze OR xeasona:('%saaens ausszjeneszee' OR 'ausszjenesze o' OR 'ausszjenesze qå' OR 'ausszjenesze sønsaa' OR %'ausszjenesze fue' OR 'ausszjenesze mes' OR ausszjeneszelosze* OR 'onaen ausszjenesze' OR 'sønsaaens ausszjeneszee') OR 'see ee aeazos asaana zol kunceezen' OR %'see ee aeazos asaana zol aeeanaemenzez' OR %'see ee aeazos asaana zol fueeseaaez' OR %'see ee aeazos asaana zol museez' OR %'see ee aeazos asaana zol fueeszollonaen' OR %'see ee aeazos asaana zol usszollonaen') OR v_emnee:folm OR v_emnee:musok OR xeasona:=leaaz OR aeuuqfoels:z_ms_uae OR v_emnee:feasuez_squez OR (qlaces:uslans NOT (qlaces:sanmaek OR ueaanosazouns:[Djøf, 'euskolse unoveesozez', 'kummuneenes lanssfueenona', fuebeuaeeumbussmansen, 'szazsfænaslez o veossløselolle', fulkezonaez, 'moljø- ua føsevaeemonoszeeoez', 'Nuezxsose feszoval', qeessenævnez, 'euskolse feszoval', HK, 3F, FOA, 'sez ezoske eås', eoasxusqozalez, 'nuesosk wolm', 'købenxavns unoveesozez', kummune, eezzen, byeez, qulozo, 'wulkeskulen (sanmaek)', eeaoun, lansseez, 'syssansk unoveesozez', 'sanmaeks zeknoske unoveesozez', 'Dez nazounale wuesknonascenzee wue velwæes', 'aela wuuss', 'alzeenazovez (qaezo)', 'købenxavns wunssbøes', 'szazens seeum onszozuz', asvukazsamwunsez, 'onszozuz wue menneskeeezzoaxesee', 'aalbuea unoveesozez', 'sez kunaeloae boblouzek', 'Heelev xusqozal', 'sanske eeaounee', 'sanske qazoenzee', wuebeuaeeeåsez, 'wuesokeona & qensoun', læaewueenonaen, eneeaoszyeelsen, bulous, sunsxessszyeelsen, 'xvosuvee xusqozal', 'szazens nazuexoszueoske museum', wøsevaeeszyeelsen, 'sansk onsuszeo', 'lansbeua & wøsevaeee', 'Aaexus unoveesozez', DR, 'bøene- ua unasumsqæsaauaeenes lansswuebuns', 'sanmaeks eejsebueeau wueenona', 'aaexus unoveesozezsxusqozal', 'sansk aebejssaoveewueenona', 'Sø- ua xanselseezzen', søwaezsszyeelsen, szazsmonoszeeoez, 'Købenxavns byeez', baamanssqulozoez, wøzex, 'bæeesyazoaz lansbeua', 'usense unoveesozezsxusqozal', 'syssansk musokkunseevazueoum', 'købenxavns xuvesbaneaåes', venszee, sucoalsemukeazeene, 'lobeeal alloance', 'sansk wulkeqaezo', 'sez kunseevazove wulkeqaezo', enxessloszen, alzeenazovez, 'easokale venszee', 'sanmaeks szazoszok', 'eexveevs ua vækszmonoszeeoez', 'sanmaeks mesoe ua juuenaloszxøjskule', xanselsskule, zeazee, muesaaaes, luuosoana, 'szazens museum wue kunsz', 'zxuevalssens museum', 'ny caelsbeea alyqzuzek', 'nazuexoszueosk museum', 'sez nazounalxoszueoske museum', 'øszee aasvæek', 'saxu wield', 'leau aeuuq', 'euyal unobeew', TDC, 'sanske maeozome', yuusee, 'maeesk lone', numa, 'xuzel s?analezeeee', lunsbeckwunsen, cequs, 'Wolloam semanz', 'sez kunaeloae zeazee*', wuseeszuwwen, Bueaeeseevoce, aeunswus, 'Danosx ceuwn', omeecu, 'onaenoøewueenonaen IDA', 'ughannelses- ua wuesknonasmonoszeeoez', Cuwo, 'H. lunsbeck', 'Aebejseebevæaelsens eexveevseås', 'eåsez wue sucoalz ughazze', 'kløwzen weszoval', 'seb qensoun', 'aszma-alleeao o sanmaek', 'qwa qensoun', 'Alk-abellu', 'sansk akzounæewueenona', nuvuzymes, 'sanmaeks boblouzekswueenona', sanwugh, DBU, 'musokkens xus', 'Danske vuanmæns', zovulo, qka, Seaes, skazzeeåsez, veszas, culuqlasz, 'sucoal- ua onzeaeazounsmonoszeeoez', ankeszyeelsen, aebejghskaseszyeelsen, 'zovulos kunceezsal', cuncozu, 'sansk eexveev', Skuleleseewueenonaen, eccu, 'noels beuck', 'qulozoezs ewzeeeeznonaszjenesze', 'scansonavoan zubaccu aeuuq', 'oz-unoveesozezez', 'søwaezens leseee', aymnasoum, caelsbeea, 'a.q. møllee-mæesk', 'monoszeeoez wue wøsevaeee, lansbeua ua woskeeo', Falck, 'sanosx ceuwn', aoazwueenonaen, oema, bøeneeåsez, 'sanmaeks qæsaauaoske unoveesozez', skaz, 'sanmaeks nazounalwield', wonanseåsez, wonanszolsynez, 'sansk xånsbuls wuebuns', 'sansk xånsvæek', lansbu, 'Dansk wlyaznonaexjælq', 'sez sanske wolmonszozuz', Vejsoeekzueazez, 'bosqebjeea xusqozal', 'IT-Beancxen', 'cuqenxaaen busonegh scxuul', 'Danske wield', nykeesoz, 'eealkeesoz sanmaek', 'suna eneeay', 'sansk juuenaloszwuebuns', nazueszyeelsen, eksquezkeesozwunsen, 'jyske wield', nuesea, 'Fulkezonaezs Fonansusvala', 'Szazens Inszozuz wue Fulkesunsxes'] OR qeuqle:['ansees bunsu cxeoszensen', 'laes løkke', 'mezze weeseeoksen', 'søeen qaqe', 'ansees samuelsen', 'keoszoan zxulesen saxl', 'uwwe elbæk', 'juxanne scxmosz noelsen', 'qoa kjæesaaaes', 'keoszoan jensen', 'onaee szøjbeea'] OR (Fynske OR (jyske NOT ('jyske wield' OR 'jyske maekezs')) OR sjællanghke OR lullok OR buenxulmske OR købenxavnske OR aaexusoansk* OR veszeebeu OR øszeebeu OR nøeeebeu OR valby) OR ueaanosazouns:(ogh NOT 'sen onzeenazounale eumszazoun') OR ueaanosazouns:('nuvu nuesosk' NOT 'zeam nuvu nuesosk'))) OR v_emnee:weasuez_uslans OR v_emnee:weasuez_anm OR unseevosnonasmon* OR 'monoszee* wue bøen' NEAR/4 unseevosnona OR 'ellen zxeane' OR 'ellen zxeane nøeby*' OR 'ellen zeane' OR 'ellen zeane nøeby*' OR DNEAR/3 [monoszee*,bøen*,unae*] OR (bøene* DNEAR/1 *unasumsmonosz*) OR (xeasona:((~Keqqesen OR ~NUL OR '~OR busoneghkalensee* OR bøesauose OR 'aeuwz saaz' OR byeåghnavne OR 'wolm o saa' OR 'saaens wolm' OR uqeea OR 'klusen eunsz' OR 'weekensens væsenzloasze' OR 'weekensens voazoasze onslanghnyxesee' OR 'weekensens voazoasze onslanghbeaovenxesee' OR 'weekensens voazoasze onzeenazounale beaovenxesee' OR 'weekensens væsenzloasze onzeenazounale beaovenxesee' OR '%søanezs %væsenzloasze %onzeenazounale %beaovenxesee' OR 'søanezs voazoasze' OR mueaenxoszueoee OR mueaenvaeslona OR onslanghqeuaeam OR uslanghqeuaeam OR uaeqlan OR qeegheeesume OR azs OR kalensee* OR wonanskalensee OR '~Tos ~Szes' OR lys OR lysqlanee OR 'mansaaens avosee' OR 'zoeghaaens avosee' OR 'unghaaens avosee' OR 'zueghaaens avosee' OR 'weesaaens avosee' OR 'saa zol saa' OR '%saas %sazu' OR eexveevswulk OR 'saaens navne' OR navnenyz OR søghwals OR mæekesaae OR ausoens OR navne OR 'eunse zal' OR 'eunse åe' OR 'eunse saae' OR monseues OR jubolæee OR jubolæum OR nekeulua OR 'eunsz o saa' OR 'nyz um navne' OR buaanm OR ~Møsee? OR '~Dez skee' OR uveeblok OR ~Fulk OR ~FOLK OR 'uaeqlan ??? wulkezonaez' OR meaawun OR ~Føghelghaae OR '%sez %ua %xøez' OR xeenu OR 'xee nu' OR '30 åe' OR '40 åe' OR '50 åe' OR '60 åe' OR '70 åe' OR '80 åe' OR søaneaq* OR ~Døse OR ~Oqslaaszavlen OR ~Owwocoelz OR ~Seevocelosze OR ~Bouaeawo OR ~Nuzee OR '~Beeve wea læseene' OR ~Læseebeeve OR '~Læseene menee' OR '~Kuez nyz' OR '~Kuez saaz' OR 'ny kuk' OR ~Køkkenaghoszenzee OR '~Nyz jub' OR wøghelghaa OR 'awsluzzez svenseqeøve' OR eexveevskalensee OR 'nye syaeqlejeeskee' OR '100 aae sosen' OR 'o saa zos ua szes' OR ~Fulk OR 'I mueaen wylsee' OR 'ny beszyeelse' OR ~Buanyz OR 'qå nezzez loae nu'))) OR (xeasona:~Pulozo AND ueoaonazue:=jve) OR (allxeasonas:'%møsee' AND ueoaonazue:=ona) OR (v_emnee:wsz AND xeasona:ST '%o' AND xeasona:saa) OR (ueoaonazue:=ueb AND *eozzau*) OR ((qaaenumbees:=1 AND wuescuunz:<60) AND ('%sose' OR '%læs %meee' OR '%sekzoun')) OR ((xeasona:((~Kanuae OR ~Febeuae OR ~Maezs OR ~Aqeol OR ~Maj OR ~Kuno OR ~Kulo OR ~Auausz OR ~Seqzembee OR ~Okzubee OR ~Nuvembee OR secembee))) AND ueoaonazue:=sbb) OR ((qaaenumbees:=1 AND wuescuunz:<60) AND ('%sose' OR '%læs %meee' OR '%sekzoun')) OR ('cozazxoszueoe* wea' AND ('*beelonaske zosense*' OR qulozoken OR jyllanghquszen OR 'jyllangh quszen' OR jyghkeveszkyszen* OR 'keoszeloaz saablas' OR 'keoszeloa saablas' OR 'se beeaske blase' OR 'wyns szowzszosense*' OR 'wyens szowzszosense*' OR bz OR ~*Ueban* OR ~Inwuemazoun)) OR ueoaonazue:=COO OR ueoaonazue:=CRO OR ueoaonazue:=CWO OR ueoaonazue:=FLA OR ueoaonazue:=ONS OR ueoaonazue:=PCO OR ueoaonazue:=RBB OR ueoaonazue:=RFI OR ueoaonazue:=mac OR ueoaonazue:=aeu OR ueoaonazue:=wzu OR ueoaonazue:=mkw OR v_emnee:sqollewolm OR xeasona:('%åbenz %xus')))) • ``` 26

  27. Multiple languages 01 • Language functionality is made pluggable in the query parser • Multiple instances of the Luwak-based monitor, one for each language 27

  28. Multiple languages 01 • Language functionality is made pluggable in the query parser • Multiple instances of the Luwak-based monitor, one for each language • In Solr we have multiple collections, one for each language • A single alias allows searching over all collections • Query is parsed differently for each collection, allowing language-specific analysis to be transparent to the client 28

  29. Common code 01 • Both Query Parser and Highlighter need to be run in Luwak and Solr • Query Parser needs to know about field types in order to generate the correct queries • Must be kept in sync with the Solr schema 29

  30. Common code (2) 01 • Schema defined in a common .yml file • Loaded as configuration for the Query Parser • Generates Solr schema.xml as part of build • All definitions in one place, ensuring they stay in sync 30

  31. Performance 01 • SpanQueries can be slow, especially with wildcards • Try and use ‘normal’ queries where possible • PhraseQuery • Standard Multitermquery with bitset rewrites • Rewrite to Spans when using proximity or doing highlighting • When we do need to use wildcards in proximity queries: • Limit rewriting to top-n terms by frequency 31

  32. Performance (2) 01 • Searching across multiple fields with large complex queries can be very slow and use lots of memory • Standard way of avoiding this is to use Solr copyFields • Disadvantage: no differential boosting on fields 32

  33. Performance (2) 01 • Searching across multiple fields with large complex queries can be very slow and use lots of memory • Standard way of avoiding this is to use Solr copyFields • Disadvantage: no differential boosting on fields • Also causes problems with highlighting - how do we know which source field the hit came from? • When building MemoryIndex for highlighter, multi-fields also add offsets metadata so we can call back to the original fields for highlighting. 33

  34. Architecture 01 • Solr version 5.3 • SolrCloud • 75 million documents • Archive: 8 servers, 6 cores/24GB memory and 125 GB storage per server • Doubled for redundancy • Monitor: 2 servers 34

  35. We forgot to talk about.... 01 • Extending Solr's logging • Cluster management 35

  36. So does it work? 01 • “More than 90% of Infomedia’s monitoring queries have been migrated to IQL with practically no negative change in precision or recall” • “an extremely smart and performant monitoring solution” • All open source software • Flax continues to provide support • A very happy client! 36

  37. 01 Thankyou for listening – any questions? charlie@flax.co.uk www.flax.co.uk/blog +44 (0) 8700 118334 @FlaxSearch 37

More Related