90 likes | 225 Views
String Searching In Parallel By Sowmya Padmanabhan. Final Term Project Presentation for Parallel Processing Dr. Charles Fulton. One way to parallelize is:.
E N D
String Searching In ParallelBy Sowmya Padmanabhan Final Term Project Presentation for Parallel Processing Dr. Charles Fulton
One way to parallelize is: • Consider a huge text document ( something like an encyclopedia available electronically ) and you want to search through it for several words or phrases or sentences at the same time. • We call what we are searching as “search_string”. • Rather than having one processor look for all the search_strings in the given huge document, we could take advantage of parallel processing and have 10 different processors look for 10 different search_strings simultaneously thereby doing the searching really quickly and efficiently.
One way to parallelize is: • My first program basically accomplishes this objective. • The document in which I am searching for search_strings is an actual document, collection of William Shakespeare’s works, downloaded from an online resource and consists of approximately 400 Million characters. • My program is capable of handling up to 450 Million characters.
Second Way to Parallelize • Think of this scenario: I have to look up the available huge electronic document (again imagine an encyclopedia ) for just one word or phrase or sentence at a time. • How do I take advantage of parallel processing? Simple! Divide the whole document into as many equal parts as there are processors. Let’s call these “sub-documents” and allot each sub-document to one processor. Now, what do we do with these sub-documents?
Second Way to Parallelize • Yes, you are right! • Have each of the processors search for the search_string in only the sub-document that it has been allotted. • Sounds great! So, how do I code it? • Using MPI_Scatter Of Course! • Note: This program works when no. of processors are 10 and above, for less no. of processors, the buffer gets exceeded for MPI_Scatter command.
Comparison of Times • See Table of Comparisons.
Algorithm for String Searching • int string_searching_algo (char *string, char *search_string) { • int i, j, k; • int count = 0, occurences = 0; • const int len_search_string = strlen ( search_string ); • const int len_given_string = strlen ( string ); • for (i = 0; i <= (len_given_string - len_search_string); i++ ) { count = 0; • for(j = i,k = 0; k < (len_search_string) ; j++, k++) { • if ( *(string + j) != *(search_string + k) ) { • break; • } else { • count++; • } • if ( count == len_search_string ) { • occurences++; • } • } • } • return occurences; • }
Conclusion • String searching done in parallel saves a lot of time especially when string searching needs to be done in an extremely huge document and is more efficient than single-processor searching. • One way to parallelize is to have several processors search different strings in one document in parallel and second way is to have several processors search for the same string in different portions(sub-documents) of the same document in parallel.
One Problem however… • The second program that uses MPI_Scatter has one drawback that is, when a search_string overlaps in two sub-documents (one portion of it exists at the end of one sub-document and the other portion of the search-string exists at the beginning of next sub-document, available with some other processor), then the program will not give proper results.