(c) Wolfgang Hürst, Albert-Ludwigs-University

Web Search – Summer Term 2006III. Web Search - Introduction (Cont.)-Jeff Dean, Google's Systems Lab:http://www.researchchannel.org/prog/displayevent.asp?rid=2459 (c) Wolfgang Hürst, Albert-Ludwigs-University

INFORMATION INFORMATION NEED DATA / DOCUMENTS QUERY IR vs. Web Search Initial problem is similar to traditional IR ... The no. of users ishuge. Very huge. The web is huge.Very huge. Big variety in users Big variety in data Users don't cooperate (short queries, ...) Doc. authors don't cooperate (spam,...) .. but basic conditions & characteristics differ significantly

Classic IR vs. Web Search: Documents Hugh amount of data, continuous growth, high rate of change Hugh variability and heterogeneity- Quality, credibility and reputation of the source- Static vs. dynamic docs- Different media types (text, pics, audio, video)- Different formats (HTML, Flash, PDF, ...)- Miscellaneous topics- Continuous text vs. note form / keywords- Different languages, encoding Spam and advertisements Web-specific characteristics- Hypertext, linking- Broken links- Unstructured, not always conform with standards Redundancy (syntactic and semantic) Distributed (need to collect them automatically) Different popularity and access frequency

Classic IR vs. Web Search: Users Different needs and aims, e.g. users might want- to learn s.th. ("informational")- to go to a particular site ("navigational")- to do s.th., e.g. shopping, download, ... ("transactional")- to do other, miscellaneous things, e.g. finding hubs, "exploratory search", ... Different premises, qualifications, languages, ... Different network connection / bandwidths Imprecise, unspecific queriesShort, ambiguous, inexact, incorrect, no usage of operators or special syntax Classic IR vs. Web Search: Bottom line Different characteristics that cause lots of problems But there's also good news: We can take advantage of some of these characteristics (e.g. links, statistics, ...)

References [1] A. ARASU, J. CHO, H. GARCIA-MOLINA, A. PAEPCKE, S. RAGHAVAN: "SEARCHING THE WEB", ACM TRANSACTIONS ON INTERNET TECHNOLOGY, VOL 1/1, AUG. 2001Chapter 1 (Introduction, general architecture) [2] S. BRIN, L. PAGE: "THE ANATOMY OF A LARGE-SCALE HYPERTEXTUAL WEB SEARCH ENGINE", WWW 1998Chapter 1 (Introduction),Chapter 4.1 (Google Architecture Overview)

General Web Search Engine Architecture CLIENT WWW PAGE REPOSITORY QUERIES RESULTS QUERY ENGINE RANKING CRAWLER(S) COLLECTION ANALYSIS MOD. INDEXER MODULE CRAWL CONTROL INDEXES UTILITY STRUCTURE TEXT USAGE FEEDBACK (CF. [1] FIG. 1)

DOCS. RESULTS RESULT REPRESENTATION RANKING SEARCHING Recap: IR System & Tasks Involved INFORMATION NEED User Interface DOCUMENTS QUERY SELECT DATA FOR INDEXING QUERY PROCESSING (PARSING & TERM PROCESSING) PARSING & TERM PROCESSING INDEX LOGICAL VIEW OF THE INFORM. NEED PERFORMANCE EVALUATION

The Google Search Engine Founded 1998 (1996) by two Stanford students Originally academic / research project that later became a commercial tool Distinguishing features (then!?): - Special (and better) ranking - Speed - Size

SORTERS CRAWLERS BARRELS Architecture of the 1st Google Search Engine URL SERVER SEARCHER REPOSITORY STORE SERVER INDEXER ANCHORS DUMPLEXICON URL RESOLVER LEXICON DOC INDEX LINKS PAGERANK (CF. [2], FIG. 1)

Schedule Web Search: - Introduction - Crawling - Page Repository - Indexing - Ranking (PageRank, HITS) - Exercises for web search basics - Advanced / additional web search topics In parallel: - Programming project (Lucene)

References [1] A. ARASU, J. CHO, H. GARCIA-MOLINA, A. PAEPCKE, S. RAGHAVAN: "SEARCHING THE WEB", ACM TRANSACTIONS ON INTERNET TECHNOLOGY, VOL 1/1, AUG. 2001Chapter 1 (Introduction, general architecture) [2] S. BRIN, L. PAGE: "THE ANATOMY OF A LARGE-SCALE HYPERTEXTUAL WEB SEARCH ENGINE", WWW 1998Chapter 1 (Introduction),Chapter 4.1 (Google architecture overview)

(c) Wolfgang Hürst, Albert-Ludwigs-University

(c) Wolfgang Hürst, Albert-Ludwigs-University

Presentation Transcript

Peer-to-peer Communication Services Project Status Presentation Sep 18, 2007

Wolfgang Kohler

Albert Bandura: Social Cognitive Theory

Linda Albert Cooperative Discipline

John B. Watson Behaviorism: little Albert

ALBERT BANDURA

University Biomechanics

Understanding Psychology 6 th Edition Charles G. Morris and Albert A. Maisto

Albert Einstein College of Medicine Environmental Health and Safety

Albert Einstein College of Medicine of Yeshiva University

Albert Bandura ‘The Social Learning Theory’

The social learning theory of ALBERT BANDURA

Johann Wolfgang von Goethe

Environmental, Economic, and Social Costs of the Car

Albert Einstein College of Medicine of Yeshiva University

Frame Structures including SAP2000 (rev. ed.), by Wolfgang Schueller