1 / 11

Seznam.cz

Seznam.cz. The Czech number one Internet company. Ji ří Materna , Head of Research. What is Seznam.cz. Internet portal with tens of high-quality services: Web search (web search engine successfully competing with Google) Specialized search (Czech companies, e-shops)

armen
Download Presentation

Seznam.cz

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seznam.cz The Czech number one Internet company Jiří Materna, Head of Research

  2. What is Seznam.cz Internet portal with tens of high-quality services: Web search (web search engine successfully competing with Google) Specialized search (Czech companies, e-shops) E-mail (the most popular free e-mail in the Czech market) News (covering business, politics, lifestyle, sport, whether, TV schedules, etc.) Entertainment (video and music online streaming) On-line maps (more detailed than Google maps) Sklik.cz (advertising system) And others… @JiriMaterna

  3. Seznam.cz in numbers More than 1000 employees Revenue 2.8 billion CZK (108 mil. EUR) 2.4 million people visit Seznam.cz every day 1.5 billion crawled web pages -- 45 % English -- 37 % Czech -- 7.7 % Slovak -- 2.3 % German -- 8 % Others 500 queries per second in peak hours @JiriMaterna

  4. Search engine architecture @JiriMaterna

  5. Query Expander Query understanding Graph representation: - AND - OR - optional - other relations @JiriMaterna

  6. Search aggregators Deduplication Document sub-results SERP restrictions Caching @JiriMaterna

  7. Ranking RC-Rank – Boosted regression oblivious trees Hundreds of features Our own quality measure @JiriMaterna

  8. Index & Indexer Indexing: complete, daily, fresh Data structures: word barrel – stores the inverted index document barrel – stores document features title barrel – stores processed web pages content and metadata others – query site barrel, site barrel, link barrel, qds barrel, query url barrel, … @JiriMaterna

  9. Downloader & document database Hadoop, Giraffe, Yarn 50 mil. documents every day 1.5 bil. documents out of 50 bil. known documents stored duplicity detection @JiriMaterna

  10. Possible models of cooperation Joint projects Providing our technology Sharing data (MetaCentrum) … @JiriMaterna

  11. Jiří Materna, HeadofResearch, jiri.materna@firma.seznam.cz Thank you for your attention. @JiriMaterna

More Related