1 / 28

AlltheWeb

Combining scientific classification of the

Donna
Download Presentation

AlltheWeb

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    Slide 1:AlltheWeb

    Torbjørn Kanestrøm January 30th, 2003 Navn: Frode Lundgren (uttale), Software Engineering Manager at Fast Search & Transfer and the lead engineer on AlltheWebNavn: Frode Lundgren (uttale), Software Engineering Manager at Fast Search & Transfer and the lead engineer on AlltheWeb

    Slide 2:Agenda

    Who is FAST ? What do we do? Libraries; Relevant projects we have done What is AlltheWeb? Under the Hood: Phrasing & Lemmatization Take a tour of AlltheWeb Simple searches (Web, News, Multimedia, FTP) Advanced Web Search Results Page Q & A

    Slide 3:Who is FAST?

    San Francisco Tokyo Boston Norway Munich Rome London Paris Fast Search & Transfer (FAST) Founded 1997 Public company (Oslo Stock Exchange – June 2001) One of the fastest growing companies in Europe Profitable 200 employees 40+ Phd’s 12 offices world wide The company was founded in 1997 as an outgrowth of academic research and development at the Norwegian University of Science and Technology. The result of the research has created a significant breakthrough in content delivery and in the search and retrieval of internet and enterprise data and information. We are hq’d in Oslo Norway and have US hq’s just outside of Boston in wellesley ma. We currently have over 175 employees and have been experiencing strong revenue growth and profitability….as an example last year’s revenues grew over 600%in a very challenging market. And we have been profitable in the first 2 quarters of 2002. We have been rapidly increasing our market share amidst growing awareness of our search solutions The company was founded in 1997 as an outgrowth of academic research and development at the Norwegian University of Science and Technology. The result of the research has created a significant breakthrough in content delivery and in the search and retrieval of internet and enterprise data and information. We are hq’d in Oslo Norway and have US hq’s just outside of Boston in wellesley ma. We currently have over 175 employees and have been experiencing strong revenue growth and profitability….as an example last year’s revenues grew over 600%in a very challenging market. And we have been profitable in the first 2 quarters of 2002. We have been rapidly increasing our market share amidst growing awareness of our search solutions

    Slide 4:What we do…

    //TECHNOLOGY Common Technology Platform

    Slide 5:FAST Solutions

    Enterprise Portals Partners //BACKGROUND

    Slide 6:FAST Customers & Partners

    FAST is the creator of the real-time integrated search and filter technology solutions that are behind the scenes at some of the world's best known companies with the world's most demanding search problems The following are some of the names of the people we work with today In enterprise we work directly with such recognizable brand names as IBM, Dell and Reuters. We also power search for the US Govt website FirstGov FAST powers over 75% of internet search in Europe and 35% in the US FAST powers mission critical applications for some of the largest companies in the world including Reuters, IBM and Banca IMI generating measurable ROI and performance FAST search delivers over 100 million search results every day Our partners such as BV who is speaking today are integrating and embedding FAST technology and providing search solutions to their own roster enterprise customers The following are some of the names of the people we work with today In enterprise we work directly with such recognizable brand names as IBM, Dell and Reuters. We also power search for the US Govt website FirstGov FAST powers over 75% of internet search in Europe and 35% in the US FAST powers mission critical applications for some of the largest companies in the world including Reuters, IBM and Banca IMI generating measurable ROI and performance FAST search delivers over 100 million search results every day Our partners such as BV who is speaking today are integrating and embedding FAST technology and providing search solutions to their own roster enterprise customers

    A few selected projects we have done - Relevant to every librarian

    Slide 8:Questia

    Slide 9:Questia – the online library

    Slide 10:Nordic Web Archive

    The Nordic Web Archive is a cooperation between the Nordic National Libraries (Finland, Sweden, Denmark, Norway, Iceland). Project started in 2000, datacenter built deep inside a mountain in northern Norway Collecting and archiving web documents of national interest and importance. Everything published in the national domains (.NO, .DK, .FI etc.) Everything written on the web in the respective languages Everything referring to one of the countries (city, company, person, etc.) Continuous project designed to scale indefinitely Available to the research community, not a public site.

    Slide 11:Elesevier Engineering Information

    Compendex® is the most comprehensive interdisciplinary engineering database in the world with almost seven million records referencing 5,000 engineering journals and conference materials dating from 1970. The database is updated weekly.

    Slide 12:Combining scientific classification of the “deep web” and proprietary publications

    “FAST’s core search technology has enabled us to provide the best scientific search results, period” - John Regazzi - Managing Director, Elsevier Science Web Server XML //BUSINESS CASES 120M web pages 17M Elsevier Science publications Scientific classification Grouping and identification of related articles Leading science Index Understanding content Scientific navigation Scirus.com – the web’s Science search

    Slide 13: What is AlltheWeb?

    Slide 14:What is AlltheWeb?

    Showcase for FAST technology Test new search features with real live audience Several milion queries per day 40% North America, 30% Europe, and 30% rest of World Integrated interface for searching 2.1+ billion web pages, PDF docs, MS Word docs, & Flash objects Continuously refreshed news from 5000+ global/local news sources 150 million images and videos 130 million ftp files 2 million mp3 files Targeted at advand searches

    Slide 15:What makes AlltheWeb different?

    Versatility Searching in 49 languages Six seperate catalogues (Web, News, Pictures, Videos, MP3, FTP) Fully customizable front-end (only major search site that is XHTML/CSS compl.) Solid Index 2.5 billion web objects (pages, pictures, videos, mp3s, etc.) One of the fastest refresh cycle (every 7 – 14 days) Advanced search features Boolean search Embedded content selectors Domain & IP filtering File format and size filtering Much more...

    Slide 16: Under the Hood - Phrasing & Lemmatization

    Slide 17:Under the Hood: Phrasing/Anti-Phrasing

    Phrasing: Known phrases are matched as a phrase New York ? “New York” Based on common phrases, names, movie names, geographic names, etc. Can detect multiple phrases within same query Anti-Phrasing: Remove words irrelevant to the query Who is… What is… Combines to create a better query Who is George Bush ? “George Bush” What is the age of the earth ? “the age of the earth” How do I get to train station in New York ? “get to” “train station” in “New York”

    Slide 18:Under the Hood: Lemmatization

    Lemmatization improves recall Literal matching only finds a fraction of candidates for a query Ratio between base and full forms English: 2 German, French, Spanish: 5 – 10 Russian, Polish: 40+ Typical Cases: Singular/plural variation, case marking, etc. Stemming vs. Lemmatization Traditional stemming Term is stemmed according to rules, e.g. walking ? walk Can easily result in “false” stemmings, e.g. Bobby Browning ? Bobby Brown Lemmatization Rewriting of terms are controlled by language-sensitive dictionaries Very comprehensive dictionaries; about 20 “man years”

    Slide 19: Take a Tour

    Slide 20:AlltheWeb Home Page

    Slide 21:Simple Search (Web/News)

    Web- and News Search Picture-, Video- and MP3 Search FTP Search ”WebSearch University”

    Slide 22:Simple Search (Rich Media)

    Web- and News Search Picture-, Video- and MP3 Search FTP Search

    Slide 23:Simple Search (FTP)

    Web- and News Search Picture-, Video- and MP3 Search FTP Search

    Slide 24:Advanced Web Search

    Embedded Content Exclude or include pages based on embedded content on these pages Specific Date range and Document depth

    File Type Limits results to PDF, MS Word, and Macromedia Flash files

    Slide 25:Advanced Web Search (cont.)

    Region Filter Limit results to different regions Presentation How many search results to list per page

    Slide 26:The Result Page

    Search Bar Click tabs to send query to other catalogs Query Rewriting Did we rewrite your query? Gives you full control!

    Slide 27:www.AllTheWeb .com Has all the advanced search features and functions that you can find on all other major web search engines – combined... And we innovate at a faster pace and invest more in R&D than ever before.

    Slide 28: AlltheWeb Q&A

More Related