50 likes | 180 Views
Rohan Sah. Data Indexing. Focus. Used Lucene (Indexing Lib.) via Eclipse Store data Categories Types File names Easy recall. ==city.txt== Chicago city Illinois New York city popular hot dogs. Index. Chicago. city. Illinois. New York. city. popular. Hot dogs. Why?.
E N D
RohanSah Data Indexing
Focus • Used Lucene (Indexing Lib.) via Eclipse • Store data • Categories • Types • File names • Easy recall ==city.txt== Chicago city Illinois New York city popular hot dogs Index Chicago city Illinois New York city popular Hot dogs
Why? • Mentor: Data mining & context guesses • Over text, guess nouns and descriptions related to them • (Astronaut – space, moon, NASA, etc…) • Indexing allows easier recall • Running searches over extremely large files takes up too much time • Simpler format
Algorithm • 2-pass algorithm ==city.txt== (Multiple list/text files) Chicago city Illinois New York city popular hot dogs Make preliminary index Store category names to Hash Map -Remove dups. Use terms in hash map and query prelim index Create final index
Shortcomings • List format requires large storage • Hash map and preliminary index consume resources • Specific syntax for lists: • Index can be specific [category] \t [description] \t [description]…