170 likes | 345 Views
Internet Resources Discovery (IRD). Concrete Learning Agents. Ahoy! - homepage finder Finds homepage of any person by name and organization. ShopBot - robot for comparison shopping Finds where user can buy some product in any pre-learned domain. Concrete Learning Agents.
E N D
Internet Resources Discovery (IRD) Concrete Learning Agents T.Sharon-A.Frank
Ahoy! - homepage finder Finds homepage of any person by name and organization. ShopBot - robot for comparison shopping Finds where user can buy some product in any pre-learned domain. Concrete Learning Agents • ILA - Internet Learning Agent • Learns to understand the content of semi-structured pages in terms of internal concepts. T.Sharon-A.Frank
Ahoy! Homepage Finder • Personal homepages are a relatively new resource to be located on the Web. • Search engines don’t do a good job in finding personal homepages because they are hard to define/locate. • Ahoy! does it much better. • Ahoy! implements a new search method: DRS - Dynamic Reference Sifting. T.Sharon-A.Frank
Buckets URL generator URL pattern extractor Dynamic Reference Sifting (DRS) • How to improve recall and precision? • DRS architecture is proposed as a way to provide high recall and precision in automatic page finding system. • DRS Components: • Candidate References Source • Cross Filter • Heuristic-based filter T.Sharon-A.Frank
DRS Components (1) • Candidate References Source • comprehensive web indexes, like AltaVista. • E-mail services, like Whowhere, Bigfoot, Iaf • Cross Filter • filters candidates based on some orthogonal references source, like e-mail address directories. T.Sharon-A.Frank
DRS Components (2) • Heuristic-based filter • filters candidates using domain-specific knowledge and heuristics • for homepages - look for the words: “homepage”, “my homepage”, “personal page”, etc. • for names - uses nicknames database and templates like “Sharon, Taly”, etc. T.Sharon-A.Frank
DRS Components (3) • Buckets • ranks and labels the candidates into buckets of matches and near misses. • URL generator • tries to synthesize new candidate URLs if everything else fails. T.Sharon-A.Frank
Example: URL Generator T.Sharon-A.Frank
DRS Components (4) • URL pattern extractor • Extracts patterns from successful queries, to be used in URL generator. • For each successful hit saves : name, institution, URL • Learn institutions servers names and homepage paths. T.Sharon-A.Frank
Ahoy! Flow User inputs target name and institution Institutional DB provides server names MetaCrawler provides raw references E-mail services provide user names Raw references filtered and bucketed YES Success? NO URLs generated using server name, username, stored URL patterns URL patterns extracted and stored References returned T.Sharon-A.Frank
Ahoy! Search Example T.Sharon-A.Frank
Ahoy! Example: Success T.Sharon-A.Frank
Ahoy! Example Details T.Sharon-A.Frank
Search Engines Results T.Sharon-A.Frank
Ahoy! Evaluation Recall: Precision: T.Sharon-A.Frank
ILA - Internet Learning Agent • Translation problem: how to interpret the source response in terms of internal concepts of the agent? • Search engines can’t understand the information contained in the returned source response. • ILA, as a learning agent, parses the response and uses heuristics to learn its format and data fields. • ILA uses learning by comparison. T.Sharon-A.Frank
daemon:*:1:1:Mr Background:/:/dev/null sys:*:2:2::/:/bin/true bin:*:3:3::/bin:/bin/true gibuy:bncKACcgNpmFA:49:3:,,,,:/u/opers/gibuy:/bin/tcsh ariel:zNdAzJUj2G6vs:105:100:Ariel J. Frank,CS 019,035318407,03749454,:/u/opers/ariel:/bin/tcsh taly:pxEi5OQD/4N3E:1991:180:Sharon Taly:/u/grad/taly:/bin/Tcsh etc/passwd - Sample T.Sharon-A.Frank