1 / 20

HUMANS do it better!

HUMANS do it better!. dmoz: The Open Directory Project. What is dmoz?. dmoz stands for Directory MOZilla Also known as the Open Directory Project (ODP) Searchable directory, similar to Yahoo! Administered by Netscape as a non-commercial entity. Who maintains dmoz?.

dora
Download Presentation

HUMANS do it better!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HUMANS do it better! dmoz: The Open Directory Project

  2. What is dmoz? • dmoz stands for Directory MOZilla • Also known as the Open Directory Project (ODP) • Searchable directory, similar to Yahoo! • Administered by Netscape as a non-commercial entity

  3. Who maintains dmoz? • Data maintained by “expert” volunteers • Anyone can become an editor • 47,083 editors • ODP categorizes “quality” information • 378,028 categories

  4. Interface features • Simple • No ads • Browseable directory • Regular and advanced search • http://www.dmoz.org/

  5. Web coverage • dmoz - 3,260,681 documents • Google - 2,073,418,204 documents

  6. dmoz directory structure

  7. RDF Format <RDF xmlns:r="http://www.w3.org/TR/RDF/" xmlns:d="http://purl.org/dc/elements/1.0/" xmlns="http://directory.mozilla.org/rdf"> <Topic r:id="Top"> <tag catid="1"/> <d:Title>Top</d:Title> </Topic> <Topic r:id="Top/Arts"> <tag catid="2"/> <d:Title>Arts</d:Title> <link r:resource="http://www3.bc.sympatico.ca/PHILLIPSHOTGLASS/GlassPage.html"/> </Topic> <ExternalPage about="http://www3.bc.sympatico.ca/PHILLIPSHOTGLASS/GlassPage.html"> <d:Title>John phillips Blown glass</d:Title> <d:Description>A small display of glass by John Phillips</d:Description> </ExternalPage> <Topic r:id="Top/Computers"> <tag catid="4"/> <d:Title>Computers</d:Title> <link r:resource="http://www.cs.tcd.ie/FME/"/> <link r:resource="http://pages.whowhere.com/computers/pnyhlen/Timeline.html"/> </Topic> <ExternalPage about="http://www.cs.tcd.ie/FME/"> <d:Title>FME HUB</d:Title> <d:Description>Formal Methods Europe (FME) is a European organization supported by the Commission of the European Union (via ESSI of the ESPRIT programme), with the mission of promoting and supporting the industrial use of formal methods for computer systems development.</d:Description> </ExternalPage> <ExternalPage about="http://pages.whowhere.com/computers/pnyhlen/Timeline.html"> <d:Title>Computer Timeline</d:Title> <d:Description>A brief description of the eras in computing.</d:Description> </ExternalPage>

  8. Using dmoz data • Data is freely available for download • http://dmoz.org/rdf.html • http://dmoz.org/license.html • Must provide attribution and back-link • No Warranty

  9. dmoz data • Many sites use dmoz data • AOL Search • Google • Lycos • HotBot • over 200 others • Some sites add enhancements and extensions • Google adds page rank • Lycos adds targeted ads

  10. Searching dmoz • Boolean • implicitly AND • AND, OR, ANDNOT • allows shorthand (+, |, -) • Wildcard search (pup*) • Phrasal search • Mixed searches • Field based queries

  11. Search relevance • Queries performed against fields in the RDF database • For documents: title, description, URL • For categories: title, terms/keywords • Keywords are chosen manually; potentially more relevant • Results clustered by category and ranked according to the number of matches within a given category • Some inconsistency, but it doesn't seem to be publicly documented • Some documents are flagged with a star and appear at the top of a directory listing (these do not seem to get special promotion in search results)

  12. Relevance feedback • Not directly supported • Web forms for reporting feedback • http://dmoz.org/cgi-bin/feedback.cgi

  13. Engine • Uses I-Search • http://www.etymon.com/Isearch/ • Open source • Modules may be added to enable searching of different document types • dmoz extensions to I-Search • RDF parsing module • Special search module, to return sub-records

  14. More about I-Search • Supports many different kinds of queries • Vector search (or at least some sort of weighted keyword search) • Soundex (looks for "similar" words, English and similar only) • Boolean search • Geographic search (hits within a given x1,y1,x2,y2 box) • field searches (for structured documents, like RDF) • Thesaurus expansion and stopword lists supported • Queries translated into an RPN, and pushed onto a stack • Operations/operands are handled in a generic fashion • Has a number of options for searching (for exact terms): • dictionary (hash table) • binary search of sorted index

  15. dmoz vs. UNCA Library Catalog • UNCA Library Catalog has a fixed vocabulary • Library catalog created by trained professionals; dmoz uses “expert” volunteers • Both use field-based queries • dmoz always searches the same fields

  16. dmoz vs. Google • Google uses dmoz’s data • Google is a search engine (good for finding specific information) • dmoz is a directory (good for finding general information) • Google adds page ranking to dmoz documents

  17. +"Chinese calendar" +"year of the ram“ Documents returned Google: 10 dmoz: 0 Library: 0 No dead links No overlap Relevance Google: 70% dmoz: N/A Library N/A +"Chinese calendar" Documents returned Google: 15,200 dmoz: 10; 7 categories Library: 2 No dead links Overlap 4 pages (Google/dmoz) Relevance Google: 30% dmoz: 30% Library: 50% Query 1: When is the next year of the Ram on the Chinese calendar?

  18. "douglas adams" hitchhiker guide galaxy "meaning of life" Documents returned Google: ~364 dmoz: 0 Library: 0 No dead links No overlap Relevance Google: 60% dmoz: N/A Library N/A “meaning of life“ answer Documents returned Google: 49,700 dmoz: 1 Library: 0 No dead links No overlap Relevance Google: 0% dmoz: 0% Library: 0% Query 2: According to Douglas Adams, author of "HitchHiker's Guide to the Galaxy,“ what is the answer to the question: "What is the meaning of life?"

  19. morgan horse breeders north carolina Documents returned Google: 1140 dmoz: 0 Library: 0 No dead links No overlap Relevance Google: 40% dmoz: N/A Library N/A Query 3: Find Morgan horse breeders in North Carolina

  20. Questions?

More Related