120 likes | 229 Views
Hyper-Searching the Web. Basic search engine. Examples: AltaVista, InfoSeek, HotBot, Lycos, Excite, Google, etc Maintains an index for every word found Processes through crawling, indexing, and returning results. Basic search engine.
E N D
Basic search engine • Examples: AltaVista, InfoSeek, HotBot, Lycos, Excite, Google, etc • Maintains an index for every word found • Processes through crawling, indexing, and returning results
Basic search engine • Different ranking systems used -most use heuristics (easiest solution) counts # of keywords that appear -Google uses PageRank
Basic search engine • No idea of searcher’s intent so “best” result hard to achieve • Problems with synonymy and polysemy ex. car and automobile ex. jaguar • One solution: store semantic relations -only can help w/synonmy • Can’t identify concepts/author intent ex. IBM site does not say “computer”
Cluster search engine • Example: Clusty • Clusters results into categories/themes • Can show results that would be ranked lower in another search engine -due to different meanings in words, can show the less searched-for
Meta-search engine • Examples: Dogpile, Surfwax, Copernic, etc • Sends searcher’s query to a database of search engines • Claimed to not be any better than database; often the referenced search engines are small, free, commercial • Users can create their own on Google of up to 5,000 URLs as “database”
“cat” “cat” “Smarter” meta-search engine • Example: Clever project (n/a online yet) • Includes clustering and linguistic analysis Cat – feline Cat – power Cat – equipment Cat – scans etc. “cat”
The Clever Project • Uses hyperlinks to locate hubs and authorities“a respected authority is a page that is referred to by many good hubs; a useful hub is a location that points to many valuable authorities”
The Clever Project • Obtains a list of webpages from a standard index & follows hyperlinks to increase own database -resulting collection = “root set” -each page gets numerical hub & authority score
The Clever Project • Similar to PageRank in determining method – guesses & constant calculations -useful by-product: clusters sites • Adds to competition because competitors don’t have to acknowledge their competition through hyperlinks
GOOGLE - gives initial rankings - keeps pages indpt. of queries - faster - looks forward “link to link” CLEVER - root sets per keyword - page priority through query context - forwards & backwards “hub and authority” - sometimes too broad ex. Fallingwater Clever vs. Google