120 likes | 278 Views
Galago for Information Retrieval. Java based information retrieval toolkit. Galago. http://www.galagosearch.org/index.html. Download both Binary and Source at: . http://code.google.com/p/galagosearch/downloads/list. Document. Query. Tasks . Index. Parse. Parse. Evaluation. <Html>
E N D
Java based information retrieval toolkit Galago http://www.galagosearch.org/index.html Download both Binary and Source at: http://code.google.com/p/galagosearch/downloads/list
Document Query Tasks Index Parse Parse Evaluation
<Html> <Head> Hello </Head> <Body> I love information retrieval so much! </Body> </Html> Tasks - Index Doc-001: hello; i; love; information…… Index with words Doc-001: hello; i; love; informate…… Index with stemmed words Doc-001: html; head; body…… Index with extents galago.batbuild D:\test_collection\Wiki\index D:\test_collection\Wiki\wiki-small.corpus
Index Folder How galago index work documentLengths documentNames postings Parts (folder) stemmedPostings extents
postings How galago index work galago.batdump-keys galago.batdump-index stemmedPostings extents
Text Folder: D:\test_collection\Wiki\wiki-small.corpus Index Folder: D:\test_collection\Wiki\index Tasks - Search galago.batsearch D:\test_collection\Wiki\index D:\test_collection\Wiki\wiki-small.corpus http://localhost:XXXX
Information retrieval tasks + Evaluation (TREC style) Query: <parameters> <query> <number>1</number> <text> test query </text> </query> </parameters> Retrieved and ranking result: 1 Q0 Zend_Framework_a5ef 1 -14.29629326 galago 1 Q0 KULA-LP_fb81 2 -15.90760040 galago 1 Q0 KKJK_7f20 3 -15.92886543 galago 1 Q0 WNUZ_d533 4 -15.93414688 galago 1 Q0 KZEN_8c0c 5 -15.94256783 galago IR experiment Eval Result: num_ret 1 100 num_rel 1 5 num_rel_ret 1 3 map 1 0.2667 ndcg 1 0.4622 ndcg15 1 0.4622 R-prec 1 0.4000 Judgements: 1 Q0 KULA-LP_fb81 1 1 Q0 WNUZ_d533 1 1 Q0 KBON_0027 1 1 Q0 Nicky_Wroe_5d39 1 1 Q0 Chemult_(Amtrak_station)_ac76 1
Batch search task, send multiple queries to Galago: IR experiment galago.batbatch-search --index=D:\test_collection\Wiki\index --count=100 D:\test_collection\Wiki\test.query Query: <parameters> <query> <number>1</number> <text> test query </text> </query> </parameters>
Save your result in a file, e.g. wiki.query.eval Retrieved and ranking result: 1 Q0 Zend_Framework_a5ef 1 -14.29629326 galago 1 Q0 KULA-LP_fb81 2 -15.90760040 galago 1 Q0 KKJK_7f20 3 -15.92886543 galago 1 Q0 WNUZ_d533 4 -15.93414688 galago 1 Q0 KZEN_8c0c 5 -15.94256783 galago IR experiment
You need a judgment file e.g. wiki.query.judgments Judgements: 1 Q0 KULA-LP_fb81 1 1 Q0 WNUZ_d533 1 1 Q0 KBON_0027 1 1 Q0 Nicky_Wroe_5d39 1 1 Q0 Chemult_(Amtrak_station)_ac76 1 IR experiment You can evaluate your retrieve and ranking performance: galago.batevalD:\test_collection\Wiki\wiki.query.eval D:\test_collection\Wiki\wiki.query.judgments
You can make it better by updating org.galagosearch.core.tools.App.java And …… Advanced, read and update the source code!