1 / 9

Accelerating search engine query processing using the GPU

Accelerating search engine query processing using the GPU. Sudhanshu Khemka. Prominent Document Scoring M odels. The Vector Space Model. Treats each document as a vector with one component corresponding to each term in the dictionary

lucio
Download Presentation

Accelerating search engine query processing using the GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerating search engine query processing using the GPU SudhanshuKhemka

  2. Prominent Document Scoring Models

  3. The Vector Space Model • Treats each document as a vector with one component corresponding to each term in the dictionary • Weight of a component is calculated using the tf-idf weighing scheme where tf is the total number of occurrences of the term in the document, while idf is the inverse document frequency of the term. • As the query is also a mini document, the model represents the query as a vector. • Similarity between two vectors can be found as follows:

  4. The Language model based approach to IR • Builds a probabilistic language model for each document d and ranks documents based on P(d|q) • Formula is simplified using Bayes rule: • P(d|q) = • P(q) is same for all documents and P(d) is treated as uniform across all documents. Thus, P(d|q) = P(q|d) • P(q|d) can be found using number of different methods. For example, using the Maximum likelihood estimate and the unigram assumption:

  5. My research

  6. Lot of research has been done to develop efficient algorithms for the CPU that improve query response time • We look at the task of improving the query response time from a different perspective • Instead of just focusing on writing efficient algorithms for the CPU, we shift our focus to the processor and formulate the following question: • “Can we accelerate search engine query processing • using the GPU?”

  7. Why the GPU? • GPU’s programming model highly suitable for processing data in parallel • Allows programmers to define a grid of thread blocks. Each thread in a thread block can execute a subset of the operations in parallel: • Useful for information retrieval as the score of each document can be computed in parallel.

  8. Past work done • Ding et.al. in their paper, “Using Graphics Processors for High Performance IR Query Processing,” implement a variant of the vector space model , the Okapi BM25, on the GPU and demonstrate promising results. • Okapi BM25: • In particular, they provide data parallel algorithms for inverted list intersection, list compression, and top k scoring.

  9. My contribution • Propose an efficient implementation of the second ranking model, the LM based approach to document scoring, on the GPU • Method: • Apply a divide and conquer approach as need to compute P(q|d) for each document in the collection • Each block in the GPU would calculate the score of a subset of the total documents, sort the scores, and transfer the results to an array in the global memory of the GPU • After all the blocks have written the sorted scores to the array in the global memory, we would use a Parallel merge algorithm to merge the results and return the top k results. • Satish et. al., in their paper “Designing Efficient Sorting Algorithms for ManycoreGPUs,” provide an efficient implementation of merge sort that is the fastest among all other implementations in the literature.

More Related