50 likes | 69 Views
This project focuses on modern information retrieval techniques, including indexing, stop word removal, Porter's stemming algorithm, word counting, and hit rate calculation. The system inserts data into three database tables: Doc, Word, and Vector, with indexing times and file sizes tested on different hardware configurations. Searching is optimized with MySQL server running on specific hardware settings for faster search performance.
E N D
Modern Information Retrieval 組員:張中原 楊惟仁 謝坤龍
Indexing • Stop Word • Porter’s Stemming • Count each word • Calculate hitrate(word_count/num_of_words)
Insert into Database • 3 Tables : Doc,Word,Vector • Doc Table • Word Table
Insert into Database • Vector Table • Indexing time AMD 1G + Windows XP + PHP -> 3~4分鐘 PII 300 + FreeBSD + PHP -> 6~7分鐘 • Index File Size 665084504 Oct 31 00:56 vectorlist.ISD 814268416 Oct 31 00:56 vectorlist.ISM
Searching • MySQL Server running on PII300 + FreeBSD + 2G HD(4xxx rpm)