90 likes | 165 Views
Information Retrieval. February 10, 2003. Handout #4. Course Information. Instructor: Dragomir R. Radev (radev@si.umich.edu) Office: 3080, West Hall Connector Phone: (734) 615-5225 Office hours: M&F 11-12 Course page: http://tangra.si.umich.edu/~radev/650/
E N D
Information Retrieval February 10, 2003 Handout #4
Course Information • Instructor: Dragomir R. Radev (radev@si.umich.edu) • Office: 3080, West Hall Connector • Phone: (734) 615-5225 • Office hours: M&F 11-12 • Course page: http://tangra.si.umich.edu/~radev/650/ • Class meets on Mondays, 1-4 PM in 409 West Hall
Suffix trees 1234567890123456789012345678901234567890123456789012345678901234567 This is a text. A text has many words. Words are made from letters. Patricia tree 60 l d 50 a m n 28 t ‘ ‘ 19 e x t . 11 w ‘ ‘ 40 o r d s . 33
Sequential string searching • Boyer-Moore algorithm • Example: search for “cats” in “the catalog of all cats” • Some preprocessing is needed. • Demos:http://www.blarg.com/~doyle/pages/bmi.htmlhttp://www-sr.informatik.uni-tuebingen.de/~buehler/BM/BM.html
LSI • Dimensionality reduction
Text tiling • Change in cohesion = topic boundary cohesion Example from Manning and Schuetze