40 likes | 62 Views
Learn to index, query, and analyze data using Lucene. Create your own StopWord file, run various analyzers, and review search results.
E N D
Lucene Lab 2 030209
General IR Process Start Indexing (start stepping though all files) Tokenize & stem each file Index 1st, Index Run query against index User enters (roughly) natural language query Tokenize & stem the query Results 2nd, Query/ Search
Lucene Process IndexWriter.java StandardAnalyzer.java or Other analyzer Index 1st, Index Run query against index User enters (roughly) natural language query Tokenize & stem the query Results 2nd, Query/ Search
Lucene Lab All below will be run against the policies directory. 1) Create your own StopWord file & run it with the StopAnalyzer. Export the results to an XML file. • Send the • source file • XML file, • your StopWord file to Jeff by beginning of class Wed. 2) Compile the SearchFiles.java program & run it against your indices. Do this for: -- indexing with the StandardAnalyzer -- indexing with the SimpleAnalyzer -- indexing with the StopAnalyzer -- indexing with the StopAnalyzer with your stop words For each of the above, do one run with ‘Streaming’ option & one with the ‘Paging’ option. The \docs\demo2.html file briefly discusses the difference. Review the usage statement in the source code to see how to select between the two. Take a screen shot of the results. So this portion of the Lab/Homework will a total of 8 screen shots – a screen shot of the Streaming option & a screen shot of the Paging option for each of the index files above. **REMEMBER – The SearchFiles program must use THE SAME ANALYZER as the one that created the index being searched.** For example, when you search the index created with the StopAnalyzer, then your SearchFiles program must invoke the same analyzer, StopAnalyzer in this case in order to get appropriate results.