250 likes | 351 Views
Modern search box with suggestions for most popular CMSs and all other (non-CMS-based) sites. SaaS . Modern web technologies become more complex day-by-day. Site owners outsource more and more features to web SaaS providers. Typical site. http://getsatisfaction.com Forum. www.disqus.com
E N D
Modern search box with suggestions for most popular CMSs and all other (non-CMS-based) sites. SaaS.
Modern web technologies become more complex day-by-day. Site owners outsource more and more features to web SaaS providers Typical site http://getsatisfaction.com Forum www.disqus.com Communities http://www.janrain.com Social login http://www.liveperson.com Chat http://mailchimp.com/ Email newsletters
What’s next to move SaaS? What are other common site features causing difficulties for site-owners? What of them can be outsourced and used as SaaS? We interviewed a lot of friends, met people running online communities, ecommerce, blogs Their answers vary significantly, from recommendation system to spam control, but the most common answer is – site search Site-owners want to improve site search experience!
What needs to be improved in the search text input box? Compare Facebook and any OS Commerce site Facebook: Fast and relevant suggestions while typing -apps, pages, people with photos OSCommerce based site: No suggestions, poor results…You probably don’t use this kind of old search boxes anymore
The main idea Now imagine: you can add modern search box with suggestions to you site simply with a line of Java Script
The main idea: details • Modern search box performs searching under different fields while user is typing.Available fields: categories, product titles, articles etc. The results are automatically split • User can: • get more results from any filed, • or click on the found products to go directly to the according site page • The goal is to avoid leading the user to the standard CMS search results page and to provide highly relevant results while typing • This will dramatically improve search box performance,user experience and finally sales and other online business metrics
A lot of sites need this • Let’s take a look at the forum engines: difficult to find relevant results.Imagine you can implement modern full text search box with suggestions that searches by • Theme title • Theme text • Comments • and provide results sorted by comments count. No any additional software needs to be installed.
A lot of sites need this Even simple full text blog search can be done better with the modern search box
From the site owner’s point of view: Site-owner enters site URL into the index using administrator interface of the SaaSsearchbox : www.my-planes-shop.com| Site-owner gets a line of JS code that needs to be included into the myshop.com page with the search box: <script>varsearchElementID =“srch_box”….</script>
Behind the scene: 1 2 3 Detect fields (title, comments, text, image..) Detect CMS Get The Main Page www.my-planes-shop.com This is OpenCart Fields detected: Product name Product description Product image After we get site URL, we download and parse its main page Almost all CMSs have fingerprints like class names, IDs, script names that identify them definitely. (OpenCart, Joomla, Wordpress etc) After we detected the CMS, we detect fields available on the pages. For example we know that in OpenCart product title is marked with first H1 tag, and the product image is marked with <img id=“product_img”> Even if we can’t do it automatically, site-owner can “teach” our system how to parse fields, or add special marks to html code like HTML5 Microdata, Semantic markup etc.
Behind the scene: 4 5 6 Crawl the site and extract the fields Ready to serve requests Make index Found “plane” in products Word “plane” found 913 times in Product name 2301 times in Product desc. 10963 pages contain: Product name Product description Product image The customer’s search box is ready and provides results We produce full text index and rank the results. We crawl all the pages of the site, extract the fields and save them.
Main trick The main technical trick is to extract fields from CMS’s (and other) web pages. It can be done by parsing html for marks like DOM elements classes and IDs. <div class=“article”>In mathematics, a plane is a flat…</div> If you can’t find any marks, you parse by HTML element number in DOM: “to get the article text find <div> №5 in <body> section.” This works nice, but in case CMS changes HTML markup everything blows and need to be fixed. But HTML5 brings us Semantic markup and Microdata and Google claims they will use it in their search algorithms, So in future the situation for extracting structured data from web pages will be better and better.
More features of the SaaS search box Indicate site URL pattern to index For example you want to search only the pages under www.mysite.com/products. Indicate fields that will be presented in search results Search only in product title and description. Results in titles will be presented first. Rank by parameters Forum topics with the higher comments count ranked higher Manually customized ranking Under search term “Plane” provide “F16 RC Plane sale 50% off” page.
More features of the SaaS search box User behavior rank system Products that attract more clicks while searching are shown higher in search results Provide synonyms Plane = airplane = jet. Search results will be the same. Construct search box suggestion view Search results will be presented by product image, product rating and title. Default search results Most popular products are shown right after user clicks in search box (without any letter entered) Reports and statistics The most searched word is “Plane”, top clicked product is “F16 RC plane”
Site search competitors http://www.google.com/cse/ http://www.google.com/commercesearch/ • Google Custom Search • Now: Enterprise Google Site Search • Pluses • + It’s Google • + Nice spell-check, synonyms • + Nice search landing page customization • + Search and rank by fields (attributes) • Minuses • Very simple suggestions • $5 per 1000 queries • If you want to customize, it becomes much more complex and programmer-oriented • No CMS support by default • Google Commerce Search • Pluses • + Has most of the functionality described in this presentation • Minuses • Only for Ecommerce • Complex integration • Data provided manually or via API • Large enterprise-oriented (can be purchased via Google sales force) • Celebros • http://www.celebros.com/ • +Suggestions • + Complete search solution (not only search box, but also search landing page) • Only Ecommerce • Only enterprise • Hosted SaaS is just an option. Complete software integration is preferable. • A lot of products for e-Commerce (no focus on search) • Looks like manual data upload or DB API integration is required
Site search competitors Some more • https://www.indexdepot.com/en/ • Looks nice, but has too simple suggestions • http://www.websolr.com/ • http://www.solrhq.com/ • - both focus on hosted Solr • http://www.isys-search.com – technology • http://www.searchfit.com – for eCommerce • http://www.cxense.com – technology • GetWebSearch • http://getwebsitesearch.com • Pluses • +Nice looking site • +Easy integration • +Easy demo on your site • + Clear pricing model ($59 monthly basic, $499 enterprise) • Minuses • No search box suggestions at all • Very poor search options • Sli-systems • http://www.sli-systems.com • Pluses • + Nice suggestion box • Minuses • No focus on search (a lot of SEO and other products) • Integration via salesman • Enterprise-oriented • Integration process not described (looks like manual data upload, or DB, API integration is required) And some more
Why we are better Easy integration No APIs or DB connectors, cron jobs to update index, maintaining Lucene etc. Your site is just enough. CMS support We know how to parse you CMS, so there is next to nothing to tune. Mobile ready Search box for mobile versions of the site is ready out of the box. Complete focus on search box suggestions for web Staying focused on our niche we can implement useful features faster and better than competitors with wide product lines. Not only e-Commerce Competitors focus on e-Commerce now. But there is much potential in Forums, Helpdesks, CMSs, Knowledgebases, etc. Clear SMB-oriented pricing model Sales by site with clear and understandable prices. Avoiding enterprise-style sales.
Is it extremely complex in development? The answer is NO In scale of SMB business (tens of thousands pages) searching is relatively simple and Google can’t use all the power of it’s ranking algorithms and scalability. We use Apache’s stack of search technologies: Apache Lucene For full text search, ranking, snippets, highlighting. Apache Nutch Fast crawler Apache Solr Used for suggestions, cache, replication, sharding Apache Tika or jSoup To parse HTML As you see this is a highly technological project that uses a lot of open source, but in fact there are no extremely difficult algorithms or low level system coding.
Does anybody need “SaaS Search Box” if Solr can be used directly? Integration Usually it takes about 2-3 weeks to research, implement, test and deploy Solr technology Maintain You need to think about constant index updates, incremental updates from DB and other. Customize for you business Any customization, like adding new synonyms brings you to command line and .conf files. Business features Solr doesn’t provide you with reports and tools your business needs, like popular searches and promoting products for this searches, targeting. Shared web hosting A lot of SMBs still on shared web hosting plans, they simply don’t have root access to the server to install Solr CMSs A lot of SMBs and enterprise use CMS and do not develop any custom features. Simple Solr integration can cost you about $3.000 (programmer’s time) and maintaining can cost about $300 monthly (server maintaining + index updates management)
Business • Search box is a paid solution • But definitely, we can do it under $5 for 1000 queries (Google’s price) • The pricing is completely clear • Pay as user clicks on the search results • 100 clicks per day - FREE • (for sites under 1000 pages. Crawled once in two days) • 100-1000 clicks per day - $9.95 • 1000-5000 clicks per day - $19.95 • Alternative business model: simple search box is for free, and advanced features are paid. Paid features can be: • Customize search fields • Customize results view • Customize URL patterns to search • Alternative business model 2: Search box is a free solution, advertisement block is shown in suggestion area
Marketing • Contact CMS powered site owners • Via CMS communities • Directly via email • Via official CMS site advertisement • Via official CMS plug-ins and extensions • Provide Search Powered By … link in the search box • Publications and advertisement on webmasters portals • - communities, blogs etc. • Standard internet marketing • PPC advertisement etc.
Team AndreyUglev I’m project manager with strong technical background (programmer). 31 years old. Been involved in Internet business for 7 years I'm always trying to be on the cutting edge of technology, platforms, UI and business trends. My interest now is in open source big data solutions (mostly Hadoop stack) and Apache Lucene stack. 2 more engineers are highly interested to join in case project gets investment NOW: Mail.ru Group MAIL.RUGRP LSE Project manager at Social Networks BU (42.000.000 daily unique users total) http://mail.ru/ IN THE PAST: Octoline Startup. SaaS IP PBX http://www.octoline.ru/ Drivers Ed Own startup. Online drivers ed courses (USA). Sold out to: http://www.driverseddirect.com/ VeniVidi Own startup, Travelers SN. Sold out in 2011 http://venividi.ru/
Prototype • To try the prototype please visit: • http://188.40.134.20:9999/war/ • What’s realized: • - Crawling products pages from Open Cart • Extracting Product Title, Description and Image • Saving results to Lucene • Search Box with suggestions by Title and Description. Image is provided by search box too.
Contacts Thanks for you interest! AndreyUglev Email: andrew.uglev@gmail.com Skype: cher8080 Tel. +7 903 9765110