1 / 15

Restricted Search Engine

ESSI2 Project. June 2002. Restricted Search Engine. Laurent Balat Christophe Decis Thomas Forey Sebastien Leclercq. Supervisor: Johny BOND. Introduction(1). What is a search engine? 3 types: disciplinary global thematic Internet users spend more than 50% of their time to search!.

Download Presentation

Restricted Search Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ESSI2 Project June 2002 Restricted Search Engine Laurent Balat Christophe Decis Thomas Forey Sebastien Leclercq Supervisor: Johny BOND

  2. Introduction(1) • What is a search engine? • 3 types: • disciplinary • global • thematic • Internet users spend more than 50% of their time to search!

  3. WEB Google Indexable WEB Introduction (2) • Lots of pages can’t be reached.

  4. How does it work ? • The search engine is composed of two parts First processing : the WEB site spider PDF unit DOC unit WEB Spider Processing HTML processing unit indexing Constraint DATABASE

  5. DATABASE Query Interface Query engine User How does it work ? • User part architecture

  6. Constraints Domain Restriction. Search depth. Theme: words accepted or not. Document type. Time delay.

  7. Check if link already visited Error download linkpriority queue Stackdata page link Download Push page HTTP HEAD Check type data in constraints The Spider Part

  8. Document Processing • Analyse of type • Send to the appropriate unit. • Extract words and links • Trying to resolve bad links

  9. Indexation • Binary Search Tree:- quick building- efficient use • Check constraints: - start list and stop list.

  10. Keywords Web links Correspondence between keywords and links Database • MySQL database. • General Structure:

  11. User interface and query engine • The web page is generated by a script (cgi). • The query engine questions the database • Formatting the results

  12. Demonstration (1) • Fill the Database

  13. Demonstration (2) • How to search pages?

  14. Conclusion • Results and perspective • Original search engine. • Easy to improve by adding units to process differents file format (ps, doc, xls,…). • Team working and repartition. • This Project shows us how to use the different tools seen this year

  15. References http://www.w3c.org http://www.mysql.com http://www.sgi.com/tech/stl http://www.searchengineshowdown.com

More Related