80 likes | 284 Views
Information Retrieval Homework #1. Members: Wesley, Lbr, Shuang CSIE, NCU. Outline. Introduction Stemming Algorithm Suffix Tree Performance Conclusion. Stemming Algorithm (optional). Goal of stemming improve performance and require less resources by reducing the number of unique words
E N D
Information RetrievalHomework #1 Members: Wesley, Lbr, Shuang CSIE, NCU
Outline • Introduction • Stemming Algorithm • Suffix Tree • Performance • Conclusion
Stemming Algorithm(optional) • Goal of stemming • improve performance and require less resources by reducing the number of unique words • Ex. “computable”, “computation”, “computability” • Porter Algorithm (most commonly accepted)
Suffix Tree Library • libsfxdisk-1.2 is a Fast indexing library based on suffix tree • Storing, retrieving, deleting and dumping/loading the database
Indexing (Optional) Dir Name DirReader StopWords Stem File Name FileReader SuffixTree Filter Delete Index File
Searching Key Word SearchEngine Index Print OutResults
Performance • Total Indexing Time • Spend more time • One file take about one minute • Average searching time • very quick • http://140.115.156.49/~wesley/IR.html
Future • To add stemming scheme • To limit indexing time • Additional searching • AND, OR