70 likes | 191 Views
Information Retrieval. Ch 23.2. Information retrieval. Goal: Finding documents Search engines on the world wide web IR system characters Document collection Query language Result set Presentation of the result set. Evaluating IR system. Precision
E N D
Information Retrieval Ch 23.2
Information retrieval • Goal: Finding documents • Search engines on the world wide web • IR system characters • Document collection • Query language • Result set • Presentation of the result set
Evaluating IR system • Precision • (relevant doc in result set)/(doc in result set) • Recall • (relevant doc in result set)/(relevant doc)
Presentation of result sets • Relevance feedback • User saying which doc are relevant • Document classification • Preexisting taxonomy of topics • Ch 18 • Document clustering • Tree of categories is created from scratch • Ch20.3 • Agglomerative clustering: merge nearest two doc. • K-means clustering: assign doc. Into k categories.
K-means clustering • Pick k documents at random to represent the k categories • Assign every document to the closest category • Compute the mean of each cluster and use the k means to represent the new values of the k categories. • Repeat steps 2 and 3 until convergence.
Implementing IR systems • Lexicon • Stop words • Inverted file • Vector space model
Vector Space Model • Transform document into vector • Di=ABC, Dj=BBC • Di={1, 1, 1}, Dj={0,2,1} • Measure the distance between two document • Dist=Di ‧Dj = Sqrt((1-0)2+ (1-2) 2+ (1-1) 2) • Retrieval documents with smallest distance