1 / 7

Elasticsearch Query Optimization - A Step-by-Step Guide

Master the art of Elasticsearch Query Optimization with this comprehensive step-by-step guide. Transform your search experience and achieve unparalleled results.

inextures
Download Presentation

Elasticsearch Query Optimization - A Step-by-Step Guide

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Elasticsearch Query Optimization - A Step-by- Step Guide Elasticsearch is a popular open-source search and analytics engine that is built on Apache Lucene. Elasticsearch is a highly scalable, distributed RESTful search and analytics engine designed for horizontal scalability, reliability, and easy management. It allows storing, searching, and analyzing big volumes of data quickly and in near real-time. How to Optimize Elasticsearch Query for Optimize Query Performance

  2. Approach 1: Scaling Shards for Enhanced Performance Elasticsearch is a powerful and versatile search and analytics engine that excels in handling large volumes of data. One of the key aspects of optimizing Elasticsearch performance is the thoughtful configuration of shards, which are the fundamental units for data distribution and parallel processing. In this document, we explore the benefits and steps for increasing the number of shards in Elasticsearch to achieve better performance, particularly when dealing with substantial datasets. The Importance of Shards: Shards are the building blocks of Elasticsearch indices, and they play a pivotal role in achieving efficient data distribution and parallelism across a cluster. By increasing the number of shards in your Elasticsearch setup, you can harness several benefits: •Parallel Processing: More shards allow for parallel processing of data and queries, significantly improving search and indexing performance. •Optimal Resource Utilization: Distributed data across multiple shards ensures that system resources, such as CPU and memory, are efficiently utilized. •Scalability: As your data grows, adding more shards can provide scalability without overburdening existing shards. •High Availability: With multiple shards, you can distribute data redundantly across nodes, enhancing data availability and resilience. Also Read: Explore Elasticsearch and Why It’s Worth Using? Real-World Impact: For testing this we hit the same Elastic Search Query in 2 indexes first with one shard and another with 3 shards. By this, we noticed that the response time is boosted to a great extent. 1. Before Increasing Shards Settings of index:

  3. Search output: Response Time: 1m 28s 2. After increasing the number of shards Settings of index:

  4. Search output: Response Time: 4s

  5. Approach 2: Updating max_open_scroll_context of Elastic Search Cluster The recommendation is to adjust the max_open_scroll_context setting of the Elasticsearch cluster. This setting controls the maximum number of open scroll contexts across the cluster. Scroll contexts are crucial for scrolling through a large number of search results while consuming system resources, particularly memory. The search.max_open_scroll_context setting in Elasticsearch controls the maximum number of scroll contexts that can be opened at the same time per node. Also read: Google Cloud & Elasticsearch: Interactive Search Intro Some key things to know about this setting: •A scroll context is opened when a search query uses the scroll API to retrieve results in batches. •Each scroll context uses resources on the node holding it open – memory for the result set, threads for rebuilding iterates, etc. •A value between 512 – 1024 is reasonable for most clusters. By setting this limit, you can control resource usage and prevent the system from being overwhelmed with too many open scroll contexts. This is particularly important in scenarios where you have a large number of concurrent scrolling searches. For example, if you set max_open_scroll_context to 500, Elasticsearch will allow up to 500 open scroll contexts for that index. Once the limit is reached, you won’t be able to open additional scroll contexts until some of the existing ones are closed. To configure max_open_scroll_context: curl -x “” -X PUT localhost:9200/_cluster/settings -H ‘Content-Type: application/json’ -d'{ “persistent” : { “search.max_open_scroll_context”: 1024 }, “transient”: { “search.max_open_scroll_context”: 1024 } }’

  6. Approach 3: Async Search – A Cautionary Tale Asynchronous search lets you search requests that run in the background. You can monitor the progress of these searches and get back partial results as they become available. After the search finishes, you can save the results to examine at a later time. In this, we can use asyn_search API instead of search API. This optimizes the search result. But the Asyn search doesn’t guarantee results every time sometimes it results in no response. So this isn’t an optimal approach. These are the following disadvantages of Asyn Search: •Queue overflow – Async search queues up the search requests and runs them concurrently. If the queue fills up from too many requests, new searches may get rejected. •Timeout – There is a timeout for an async search after which partial/no results are returned. With a large result set, the query may not finish in time. •Memory constraints – Async search loads results into memory before returning them. With too many hits, it may exceed the available memory and fail. •Thread pool saturation – Async search uses concurrent threads to run searches. A high number of large searches can saturate the thread pool and limit capacity. Suggestions: Here are some suggestions to optimize the performance of the Elasticsearch query •Should not use Highlight in the query as it takes a long time for large documents.

  7. oThe use of highlighting in Elasticsearch queries, especially with large documents or extensive result sets, can indeed impact performance negatively. Highlighting involves additional processing to identify and mark up the parts of the document that match the query, and when dealing with substantial data, this process can be resource-intensive and time-consuming. •Should not use wildcard query in search oWildcard queries, especially when used without proper consideration, can lead to inefficient searches and increased resource consumption Originally published by: Elasticsearch Query Optimization - A Step-by-Step Guide

More Related