400 likes | 504 Views
Information Retrieval (3). Prof. Dragomir R. Radev radev@umich.edu. SI650 Winter 2010. … 5. Evaluation of IR systems Reference collections TREC …. Relevance. Difficult to change: fuzzy, inconsistent Methods: exhaustive, sampling, pooling, search-based. Contingency table.
Information Retrieval(3) Prof. Dragomir R. Radev radev@umich.edu
SI650 Winter 2010 … 5. Evaluation of IR systems Reference collections TREC …
Relevance • Difficult to change: fuzzy, inconsistent • Methods: exhaustive, sampling, pooling, search-based
Contingency table retrieved not retrieved relevant w=tp x=fn n1 = w + x not relevant y=fp z=tn N n2 = w + y
Precision and Recall w Recall: w+x w Precision: w+y
Exercise Go to Google (www.google.com) and search for documents on Tolkien’s “Lord of the Rings”. Try different ways of phrasing the query: e.g., Tolkien, “JRR Tolkien”, +”JRR Tolkien” +Lord of the Rings”, etc. For each query, compute the precision (P) based on the first 10 documents returned by AltaVista. Note! Before starting the exercise, have a clear idea of what a relevant document for your query should look like. Try different information needs. Later, try different queries.
Interpolated average precision (e.g., 11pt) Interpolation – what is precision at recall=0.5?
Issues • Why not use accuracy A=(w+z)/N? • Average precision • Average P at given “document cutoff values” • Report when P=R • F measure: F=(b2+1)PR/(b2P+R) • F1 measure: F1 = 2/(1/R+1/P) : harmonic mean of P and R
Kappa • N: number of items (index i) • n: number of categories (index j) • k: number of annotators
Kappa (cont’d) • P(A) = 370/400 = 0.925 • P (-) = (10+20+70+70)/800 = 0.2125 • P (+) = (10+20+300+300)/800 = 0.7875 • P (E) = 0.2125 * 0.2125 + 0.7875 * 0.7875 = 0.665 • K = (0.925-0.665)/(1-0.665) = 0.776 • Kappa higher than 0.67 is tentatively acceptable; higher than 0.8 is good
Sample TREC query <top> <num> Number: 305 <title> Most Dangerous Vehicles <desc> Description: Which are the most crashworthy, and least crashworthy, passenger vehicles? <narr> Narrative: A relevant document will contain information on the crashworthiness of a given vehicle or vehicles that can be used to draw a comparison with other vehicles. The document will have to describe/compare vehicles, not drivers. For instance, it should be expected that vehicles preferred by 16-25 year-olds would be involved in more crashes, because that age group is involved in more crashes. I would view number of fatalities per 100 crashes to be more revealing of a vehicle's crashworthiness than the number of crashes per 100,000 miles, for example. </top> LA031689-0177 FT922-1008 LA090190-0126 LA101190-0218 LA082690-0158 LA112590-0109 FT944-136 LA020590-0119 FT944-5300 LA052190-0048 LA051689-0139 FT944-9371 LA032390-0172 LA042790-0172LA021790-0136LA092289-0167LA111189-0013LA120189-0179LA020490-0021LA122989-0063LA091389-0119LA072189-0048FT944-15615LA091589-0101LA021289-0208
<DOCNO> LA031689-0177 </DOCNO> <DOCID> 31701 </DOCID> <DATE><P>March 16, 1989, Thursday, Home Edition </P></DATE> <SECTION><P>Business; Part 4; Page 1; Column 5; Financial Desk </P></SECTION> <LENGTH><P>586 words </P></LENGTH> <HEADLINE><P>AGENCY TO LAUNCH STUDY OF FORD BRONCO II AFTER HIGH RATE OF ROLL-OVER ACCIDENTS </P></HEADLINE> <BYLINE><P>By LINDA WILLIAMS, Times Staff Writer </P></BYLINE> <TEXT> <P>The federal government's highway safety watchdog said Wednesday that the Ford Bronco II appears to be involved in more fatal roll-over accidents than other vehicles in its class and that it will seek to determine if the vehicle itself contributes to the accidents. </P> <P>The decision to do an engineering analysis of the Ford Motor Co. utility-sport vehicle grew out of a federal accident study of the Suzuki Samurai, said Tim Hurd, a spokesman for the National Highway Traffic Safety Administration. NHTSA looked at Samurai accidents after Consumer Reports magazine charged that the vehicle had basic design flaws. </P> <P>Several Fatalities </P> <P>However, the accident study showed that the "Ford Bronco II appears to have a higher number of single-vehicle, first event roll-overs, particularly those involving fatalities," Hurd said. The engineering analysis of the Bronco, the second of three levels of investigation conducted by NHTSA, will cover the 1984-1989 Bronco II models, the agency said. </P> <P>According to a Fatal Accident Reporting System study included in the September report on the Samurai, 43 Bronco II single-vehicle roll-overs caused fatalities, or 19 of every 100,000 vehicles. There were eight Samurai fatal roll-overs, or 6 per 100,000; 13 involving the Chevrolet S10 Blazers or GMC Jimmy, or 6 per 100,000, and six fatal Jeep Cherokee roll-overs, for 2.5 per 100,000. After the accident report, NHTSA declined to investigate the Samurai. </P> ... </TEXT> <GRAPHIC><P> Photo, The Ford Bronco II "appears to have a higher number of single-vehicle, first event roll-overs," a federal official said. </P></GRAPHIC> <SUBJECT> <P>TRAFFIC ACCIDENTS; FORD MOTOR CORP; NATIONAL HIGHWAY TRAFFIC SAFETY ADMINISTRATION; VEHICLE INSPECTIONS; RECREATIONAL VEHICLES; SUZUKI MOTOR CO; AUTOMOBILE SAFETY </P> </SUBJECT> </DOC>
TREC (cont’d) • http://trec.nist.gov/tracks.html • http://trec.nist.gov/presentations/presentations.html
Most used reference collections • Generic retrieval: OHSUMED, CRANFIELD, CACM • Text classification: Reuters, 20newsgroups • Question answering: TREC-QA • Web: DOTGOV, wt100g • Blogs: Buzzmetrics datasets • TREC ad hoc collections, 2-6 GB • TREC Web collections, 2-100GB
Comparing two systems • Comparing A and B • One query? • Average performance? • Need: A to consistently outperform B [this slide: courtesy James Allan]
The sign test • Example 1: • A > B (12 times) • A = B (25 times) • A < B (3 times) • p < 0.035 (significant at the 5% level) • Example 2: • A > B (18 times) • A < B (9 times) • p < 0.122 (not significant at the 5% level) • http://www.fon.hum.uva.nl/Service/Statistics/Sign_Test.html [this slide: courtesy James Allan]
Other tests • Student t-test: takes into account the actual performances, not just which system is better • http://www.fon.hum.uva.nl/Service/Statistics/Student_t_Test.html • http://www.socialresearchmethods.net/kb/stat_t.php • Wilcoxon Matched-Pairs Signed-Ranks Test • http://www.fon.hum.uva.nl/Service/Statistics/Signed_Rank_Test.html
IR Winter 2010 … 6. Automated indexing/labeling Compression …
Indexing methods • Manual: e.g., Library of Congress subject headings, MeSH • Automatic: e.g., TF*IDF based
Medicine CLASS R - MEDICINE Subclass R R5-920 Medicine (General) R5-130.5 General works R131-687 History of medicine. Medical expeditions R690-697 Medicine as a profession. Physicians R702-703 Medicine and the humanities. Medicine and disease in relation to history, literature, etc. R711-713.97 Directories R722-722.32 Missionary medicine. Medical missionaries R723-726 Medical philosophy. Medical ethics R726.5-726.8 Medicine and disease in relation to psychology. Terminal care. Dying R727-727.5 Medical personnel and the public. Physician and the public R728-733 Practice of medicine. Medical practice economics R735-854 Medical education. Medical schools. Research R855-855.5 Medical technology R856-857 Biomedical engineering. Electronics. Instrumentation R858-859.7 Computer applications to medicine. Medical informatics R864 Medical records R895-920 Medical physics. Medical radiology. Nuclear medicine
Automatic methods • TF*IDF: pick terms with the highest TF*IDF scores • Centroid-based: pick terms that appear in the centroid with high scores • The maximal marginal relevance principle (MMR) • Related to summarization, snippet generation
Compression • Methods • Fixed length codes • Huffman coding • Ziv-Lempel codes
Fixed length codes • Binary representations • ASCII • Representational power (2k symbols where k is the number of bits)
Variable length codes • Alphabet: A .- N -. 0 ----- B -... O --- 1 .---- C -.-. P .--. 2 ..--- D -.. Q --.- 3 ...— E . R .-. 4 ....- F ..-. S ... 5 ..... G --. T - 6 -.... H .... U ..- 7 --... I .. V ...- 8 ---.. J .--- W .-- 9 ----. K -.- X -..- L .-.. Y -.— M -- Z --.. • Demo: • http://www.scphillips.com/morse/
Most frequent letters in English • Most frequent letters: • E T A O I N S H R D L U • Demo: • http://www.amstat.org/publications/jse/secure/v7n2/count-char.cfm • Also: bigrams: • TH HE IN ER AN RE ND AT ON NT
Huffman coding • Developed by David Huffman (1952) • Average of 5 bits per character (37.5% compression) • Based on frequency distributions of symbols • Algorithm: iteratively build a tree of symbols starting with the two least frequent symbols
0 1 0 1 1 0 g 0 1 0 1 0 1 i j f c 0 1 0 1 b d a 0 1 e h
Exercise • Consider the bit string: 01101101111000100110001110100111000110101101011101 • Use the Huffman code from the example to decode it. • Try inserting, deleting, and switching some bits at random locations and try decoding.
Extensions • Word-based • Domain/genre dependent models
Ziv-Lempel coding • Two types - one is known as LZ77 (used in GZIP) • Code: set of triples <a,b,c> • a: how far back in the decoded text to look for the upcoming text segment • b: how many characters to copy • c: new character to add to complete segment
<0,0,p> p • <0,0,e> pe • <0,0,t> pet • <2,1,r> peter • <0,0,_> peter_ • <6,1,i> peter_pi • <8,2,r> peter_piper • <6,3,c> peter_piper_pic • <0,0,k> peter_piper_pick • <7,1,d> peter_piper_picked • <7,1,a> peter_piper_picked_a • <9,2,e> peter_piper_picked_a_pe • <9,2,_> peter_piper_picked_a_peck_ • <0,0,o> peter_piper_picked_a_peck_o • <0,0,f> peter_piper_picked_a_peck_of • <17,5,l> peter_piper_picked_a_peck_of_pickl • <12,1,d> peter_piper_picked_a_peck_of_pickled • <16,3,p> peter_piper_picked_a_peck_of_pickled_pep • <3,2,r> peter_piper_picked_a_peck_of_pickled_pepper • <0,0,s> peter_piper_picked_a_peck_of_pickled_peppers
Links on text compression • Data compression: • http://www.data-compression.info/ • Calgary corpus: • http://links.uwaterloo.ca/calgary.corpus.html • Huffman coding: • http://www.compressconsult.com/huffman/ • http://en.wikipedia.org/wiki/Huffman_coding • LZ • http://en.wikipedia.org/wiki/LZ77
100 alternative search engines • http://rss.slashdot.org/~r/Slashdot/slashdot/~3/83468703/article.pl
Readings • 2: MRS9 • 3: MRS13, MRS14 • 4: MRS15, MRS16