50 likes | 69 Views
Modern Information Retrieval. 組員 : 張中原 楊惟仁 謝坤龍. Indexing. Stop Word Porter ’ s Stemming Count each word Calculate hitrate(word_count/num_of_words). Insert into Database. 3 Tables : Doc,Word,Vector Doc Table Word Table. Insert into Database. Vector Table Indexing time
E N D
Modern Information Retrieval 組員:張中原 楊惟仁 謝坤龍
Indexing • Stop Word • Porter’s Stemming • Count each word • Calculate hitrate(word_count/num_of_words)
Insert into Database • 3 Tables : Doc,Word,Vector • Doc Table • Word Table
Insert into Database • Vector Table • Indexing time AMD 1G + Windows XP + PHP -> 3~4分鐘 PII 300 + FreeBSD + PHP -> 6~7分鐘 • Index File Size 665084504 Oct 31 00:56 vectorlist.ISD 814268416 Oct 31 00:56 vectorlist.ISM
Searching • MySQL Server running on PII300 + FreeBSD + 2G HD(4xxx rpm)