300 likes | 424 Views
Search Engines. Introducing. Directories, Meta-Searchengine How search engines work What influences the ranking. Directories. hand-constructed hierarchy of topics (e.g. Yahoo!) use human editors for page selection, indexing and classification Covers a small part of the web
E N D
Search Engines Thomas Haidlas
Introducing • Directories, Meta-Searchengine • How search engines work • What influences the ranking Thomas Haidlas
Directories • hand-constructed hierarchy of topics (e.g. Yahoo!) • use human editors for page selection, indexing and classification • Covers a small part of the web • Small updatability • No ranking Thomas Haidlas
Directories II • No searching across the index • Searching across the reviews • Sometimes partnership with search engines to increase coverage Thomas Haidlas
Meta-Searchengine • Rare keyword requests require use of more than one web search engine • Submit the same query parallel to many engines • Duplicated entries are eliminated • The results are shown in uniform format • No harvesting or indexing Thomas Haidlas
How search engines work • Harvesting • Indexing • Analyzing Requests • Ranking Thomas Haidlas
Harvesting • programs (robots, gatherer or crawler )visit web sites and gather the web pages for indexing • Start with an initial page • Follows hyperlinks (<a href=…>) • Sometimes, more then 2 sub-levels are visited • These programs are started periodically Thomas Haidlas
Harvesting II • Problems: • Links aren‘t found in • Frames • Imagemaps • Many robots are started by a search engine => traffic Thomas Haidlas
Robot Exclusion • Two Methods: • Meta-Tags: <meta name="robots" content=„noindex,nofollow"> • robots.txt: User-agent: Scooter Disallow: /privat/geht_dich_gar_nix_an.html Allow: /allesOffen Thomas Haidlas
Robot Exclusion II • robots.txt (Example 2): User-agent: * Allow: /allesOffen Thomas Haidlas
Indexing • Indextable gets the harvesting-resuls • Indextable includes keywords • Table is located in main-mamory => fast access Thomas Haidlas
Analysing Requests • Comparison between searchstring and index-table • The searchstring consists of a word: => easy processing • The search word consists of truncation or booleans: => complex processing • If the searchstring in the index is discovered, the side is taken up to the hit-list Thomas Haidlas
Ranking • influences on the ranking: • How many keywords are found • keyword-frequency • keywords-position: • Domain/URL • Documentname Thomas Haidlas
Ranking II • Headline • Early in the text • Meta-Tags • Ranking for cash • Page Rank • Clicking frequency/ Hit Popularity Engine Thomas Haidlas
Ranking for cash • Capitalism principle • Paying money => high ranking-level • Contents are not relevant • additional incomes Thomas Haidlas
Ranking for cash II • not independently in the employment • Mostly used by e-commerce-companies • Second method: • pay for faster indexing time Thomas Haidlas
Page Rank (Google) • Evaluation through internet-community (web-admins) • Realtion between quality of a page and number of links that point to it • Links of the popular web-sites are regarded as better Thomas Haidlas
Page Rank (Google) II • Disadvantage: • new web-sites have a bad ranking • Querys with many boolean-connections and keywords are not easy to process Thomas Haidlas
Hit Popularity Engine • index already exists and is pre-sorted • A click on a link leads to a voting for this site concerned => „click“ is recorded to the database • pages with many „clicks“ are more popular • developed by „Direct Hit“ Thomas Haidlas
Hit Popularity Engine II • This method is usually combined with others • Disadvantage: • new web-sites have a bad ranking Thomas Haidlas
Ranking-Manipulation • Why? • commercial interest • Done of: • Search Engine Optimizer, SEO • Sense of: • to boost the pagerank Thomas Haidlas
Linkfarm • Many Domains are registered • Programs generate thousands among themselves linked pages • each page contains keywords • Partly these sides are arranged even complex Thomas Haidlas
Forwarding • intermediate page contains the looked for terms • HTML Meta tags and simple Javascript can be recognized • SEO‘s complicate the forwarding instructions => no recognizing Thomas Haidlas
IP Delivery • normal site is indicated by Robots • After this, contents of the site are exchanged Thomas Haidlas
IP Cloaking • Servers programs determine who the Request starts • Robots request: "cloaked" content is delivered which is designed to influence ranking • Human visitors: do not see the "cloaked" content Thomas Haidlas
Other simple tricks • Links in guestbooks • particularly effectively with high-ranking guestbooks • „Blind Text“ • Text in background-color Thomas Haidlas
Trade with weblinks • Paying for linking • Partnership =>Commission Thomas Haidlas
Resumee • suitable tools select • The www is dynamic => new developments consider • correct estimate of ranking Thomas Haidlas
Thank You! Thomas Haidlas
Sources • [1] www.suchfibel.de • [2] Jo Bager Orientierungslose Infosammler c‘t 23/99 • [3] Stefan Karzauninkat Zielfahndung c‘t 23/99 • [4] Sven Lennartz Ich bin wichtig c‘t 23/99 • [5] Stefan Karzauninkat Google zugemüllt c‘t 1/03 • [6] www.google.com/webmasters • [7] Dr. Wolfgang Sander-Beuermann Schatzsucher c‘t 13/98 • [8] Arno Dittmar Suchmaschinen und Anfragen im WWW • [9] Ralf RudolfSuchmaschinen und Anfragen im WWW Thomas Haidlas