280 likes | 347 Views
Chapter 12: Searching in Web applications The first examples use a search form embedded in a Web page to query the deptstore database, which contains the following table. The actual table contains many more records.
E N D
Chapter 12: Searching in Web applications • The first examples use a search form embedded in a Web page to query the deptstore database, which contains the following table. • The actual table contains many more records.
Run the products1.cgi program on the Web site. Search for the word rack. Original query hard-coded into returned search form.
Things to note: • This search form returns every product containing rack as a substring. • The search form was returned with the results so the user can do another search without hitting the back button. • The original query is "hardcoded" into the returned search form as an added convenience. • By "hardcoded," we mean that the user's original data was coded into the HTML in the value attribute, which gives the text area its initial value when the page loads.
The program is very straight forward. It either prints the initial &search_page or performs a search. • ### app logic ################################### • if($formHash{"request"} eq "search") { • &search; • } • else { • &search_page; • }
The initial page contains only the search form, which we print with a helper function. • sub search_page { • print<<TOP; • <html><head><title>Search our sports store</title></head><body> • TOP • &searchForm;# no argument --> no hardcoded # value for search field • print<<BOTTOM; • </body></html> • BOTTOM • }
When the search form is submitted, the meat of the &search function is handled by the DBI module and MySQL. • sub search { • $searchstring = $formHash{"searchstring"}; • my $sql = "SELECT item FROM products WHERE item LIKE '%$searchstring%'"; • my $qObj = $dbhandle -> prepare($sql) or ... • $qObj-> execute() or ... • my @row; my @matches = (); • while(@row = $qObj->fetchrow_array()){ • push @matches, $row[0]; • } • # print the matches in the page • # hardcode search string into returned form • &searchForm($formHash{"searchstring"});
It is usually desirable to provide refinement options in a search form. Execute products2.cgi on the Web site. • SQL's LIKE command does allow minimal pattern matching. • LIKE'%z%'# matches contain a z • LIKE'%ing'# matches end with ing • LIKE'___'# matches any three-character word • LIKE'_a_'# matches any three-character word # with an a in the middle
Many database systems also offer the RLIKE (Regular expression LIKE) extension to SQL. • SELECT item FROM products WHERE item RLIKE 'regular_expression' • Most metacharacters work in RLIKE patterns, but the special location markers and characters classes currently don't: • (\A, \Z, \b, \d, \w, etc.) • Instead of the beginning and end of string markers,\Aand \Z, we have to use^ and $, respectively. (These also work in Perl patterns.) • Further, you can usually manually construct a character class [a-zA-Z0-9_]to use in place of a built-in one. The word class \w, in this case.
So we can implement the following using SQL. • SELECT item FROM products WHERE item RLIKE '^rack$' • This pattern is equivalent to /\Arack\Z/. • The problem is this only matches rack, but not part of a bigger string. • For whole word searches, we do want to match something like bicycle rack. • We just want to rule out stuff like track shoes.
In Perl, a whole-word search is trivial using word boundary markers: • /\brack\b/ • Using RLIKE in SQL, we will have to construct our own pattern to replace the unavailable \b. • While we're at it, we might as well also match something like rack-o-lamb, which is still in the spirit of whole word matches. • That is, a whole-word search will mean that the word stands apart from other words by some non-word character. • (^|[^a-zA-Z0-9]) # beginning of string or • # not a word character • ($|[^a-zA-Z0-9]) # beginning of string or # not a word character
Below is how we implement the whole word search in products2.cgi. • $searchstring = $formHash{"searchstring"}; • if($formHash{"wholeword"}) { • $searchstring = '(^|[^a-zA-Z0-9])' • .$searchstring. • '($|[^a-zA-Z0-9])'; • } • If the whole-word search option was chosen (checkbox), we simply concatenate our contrived word boundary pattern on each side of the user's query. • my $sql = "SELECT item FROM products WHERE item RLIKE '$searchstring'";
Really, the only other difference between products2.cgi and the first one is that the function which prints the search form needs to know more information to be able to return the search form hardcoded with the original search criteria. • That is, when more and more refinement options are added to a search form, the user should get the same form back as a matter of convenience for them.
How can we provide more elaborate search refinements? • One option is to allow users to enter regular expressions into search fields. Below, we search for rack or sack. • Patterns actually work in our search forms since we pass the user's data straight to the SQL statement. Try some! • Most search engines don't feature that capability since regular expressions elude most of the population. One easy way to disable that is simply to pre-process user data, escaping all metacharacters so that they are taken literally. • Many do feature search refinements using special words and characters like (and, or, +, -).
The search applications with elaborate search refinements such as seen on the previous slide allow the user simply to manipulate a variety of HTML form elements. • Of course, the refinement options must be transformed from submitted form data to database queries. • There are several options, depending upon the complexity needed: • Construct advanced SQL queries. (This book only features a small subset of SQL). • Use the SQL RLIKE command and transform the submitted options into regular expressions usable in SQL. • Do relatively simple SQL queries on the database and then further filter the returned records using regular expressions within the Perl program (or whatever language you are using).
There are a couple of common practices regarding Web searches which are fairly easy to implement. • The first is secondary processing of the results returned from the database. • The second is to limit the number of returned matches to a fixed number per page, and to supply links to deliver the next 10 matches, for example.
The secondary processing involves: • Testing for the the search string in each returned search match. • Using the substitution operator to replace the search string with the search string together with some extra HTML formatting applied. • $match =~ s/($searchstring)/<span>$1<\/span>/ig; • In this case we simply put the search string in a span container, whose style rule specifies red text. • Note that the grouping parentheses capture the search string into the special $1 variable.
Limiting the number of matched results returned per page is significantly more complicated. • See products4.cgi • The details are somewhat complicated, so you will have to carefully examination the code. • The search results are stored in an array, so it boils down to which "chunk" of the array to return. A submitted name=value pair of the form start=11 tells the program where to begin the returned chunk. • Then the link for the next ten matches would take the form • <a href="products4.cgi? . . . &start=21"> • Next 10 matches</a> • A sample run of this program is shown on the next slide.
Sometimes searching in a Web application only entails searching through a site comprised of a bunch of static HTML files. • To that end, one first needs to be able to "scan" a directory to obtain the list of files and other directories it contains. • Fortunately, that is extremely easy in Perl. • opendir(DIRECTORYHANDLE, path_to_directory); • @array = readdir(DIRECTORYHANDLE); • closedir(DIRECTORYHANDLE); • The readdir function returns an array of strings consisting of all the names of the files and directories it finds.
Goal: Produce a list containing only chapters and examples. Skip any other files or folders. See examples.cgi
When we scan the main directory to make the outer list, we "grep out" only those names containing "chapter". • opendir(MAINDIR, $mainDir) or &errorPage; • @maincontents = readdir(MAINDIR); • closedir(MAINDIR); • @chapterdirs = grep(/chapter/, @maincontents); • We then loop over the chapter directories, reading in the files from each one. When we scan a given chapter directory, • we "grep out" only those names containing "example". • foreach $chapdir (sort @chapterdirs){ • opendir(DIR, "$mainDir$chapdir"); • @files = readdir(DIR); • closedir(DIR); • @examples = grep(/example/, @files);
When scanning directories, it is often desirable to determine beyond just its name, the nature of each item therein. • For example, is each item a file or directory? • Below is a sampling of file test operators with a sample usage. • $directoryItem; • if (-d $directoryItem) { • scan the directory to find more files }
Common tasks enabled by the file test operators are constructing site maps or conducting searches through a whole site of static Web pages. • The most versatile way to do this is recursively. Below is pseudocode for a recursive search of a Web site comprised of static Web pages. • --sub recursiveSearch • --Open a directory. • --Scan its contents into a @contents array. • --For each $item in @contents • --if ($item is a directory) • --call recursiveSearch on $item. • (This is the recursive step.) • --else • $item is a file that we open and search • for the user's query
It is instructive to trace how the recursion unfolds given a fairly large directory structure. Go to sitesearch.cgi (fig. 12.12) on the Web site and search for "ozzy". You will see which files in this site contain that word.
Putting a whole Web search in a page is very easy. • Do a search on the search engine you want to embed in your page. • Observe the query string. • http://www.google.com/search?q=wildebeest&btnG=Google+Search • Construct a form in your page which, when submitted, produces an identical query string. • It is customary to give the search engine advance credit by putting their icon next to your search utility.