1 / 16

Crawler policy document

Crawler policy document. 6 th TF-LSD Meeting Limerick 2.6.2002 Peter Gietz peter@daasi.de. Status. Originally part of the specs of the SUDALIS crawler Implementation in the SUDALIS crawler is on ist way Could be made Part of Deliverable D Comments needed. Root DSE Attributes.

darryle
Download Presentation

Crawler policy document

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Crawler policy document 6th TF-LSD Meeting Limerick 2.6.2002 Peter Gietz peter@daasi.de

  2. Status • Originally part of the specs of the SUDALIS crawler • Implementation in the SUDALIS crawler is on ist way • Could be made Part of Deliverable D • Comments needed

  3. Root DSE Attributes ( sudalis-attributetypes.1 NAME ´supportedCrawlerPolicies´ EQUALITY objectIdentifierMatch SYNTAX numericOID USAGE directoryOperation ) • Just in case that there will be different crawlerpolicy formats

  4. Root DSE Attributes contd. ( sudalis-attributetypes.2 NAME ´indexAreas´ EQUALITY distinguishedNameMatch SYNTAX DN USAGE directoryOperation ) • Pointer to the subtrees that are to be indexed

  5. Object class indexSubentry (sudalis-objectclasses.1 NAME ´indexSubentry´ DESC ´defines index crawler policy’ SUP ldapSubentry STRUCTURAL MUST ( cn ) MAY ( indexCrawlerDN $ indexCrawlerAuthMethod $ indexObjectClasses $ indexAttributes $ indexFilter $ indexAreaLevels $ indexCrawlerVisitFrequency $ indexDescription )

  6. Attr. indexCrawlerDN ( sudalis-attributetypes.3 NAME ´indexCrawlerDN´ EQUALITY distinguishedNameMatch SYNTAX DN USAGE directoryOperation ) • Defines for which crawler(s) the policy is meant

  7. Attr. indexCrawlerAuthMethod ( sudalis-attributetypes.4 NAME ´indexCrawlerAuthMethod´ SYNTAX directoryString EQUALITY caseIgnoreMatch USAGE directoryOperation ) • Defines the authentication method the crawler has to use

  8. Attr. indexObjectClasses ( sudalis-attributetypes.5 NAME ´indexObjectClasses´ SYNTAX OID EQUALITY objectIdentifierMatch USAGE directoryOperation ) • Defines which object class attributes to include in the index. No Filter criteria!

  9. Attr. indexAttributes ( sudalis-attributetypes.6 NAME ´indexAttributes´ SYNTAX OID EQUALITY objectIdentifierMatch USAGE directoryOperation ) • Defines which attributes to crawl.

  10. Attr. indexFilter ( sudalis-attributetypes.7 NAME ´indexFilter´ SYNTAX directoryString EQUALITY caseExactMatch SINGLE-VALUE USAGE directoryOperation) • Filter that should be used by the crawler

  11. Attr. indexAreaLevels ( sudalis-attributetypes.8 NAME ´indexAreaLevels´ SYNTAX INTEGER EQUALITY integerMatch SINGLE-VALUE USAGE directoryOperation ) • Number of hierarchy levels to crawl • If 0 -> crawler go away from this subtree

  12. Attr. indexCrawlerVisitFrequency ( sudalis-attributetypes.9 NAME ´indexCrawlerVisitFrequency´ SYNTAX INTEGER EQUALITY integerMatch SINGLE-VALUE USAGE directoryOperation ) • Defines maximum frequency of crawler visits

  13. Attr. indexDescription ( sudalis-attributetypes.10 NAME ´indexDescription´ SYNTAX directoryString EQUALITY caseExactMatch SINGLE-VALUE USAGE directoryOperation ) • Human readable description

  14. Crawler Policy and Access control • Crawler policy is interpreted by client • Access control is interpreted by server • Access control may be used to enforce crawler policy

  15. Crawler registration • A crawler can register to a server by providing the following data: • Name of the Crawler • Description of the index the crawler collects data for • URI where to access the index • Pointer to a privacy statement about how the data will be used. This statement should comply to the P3P standard (http://www.w3.org/P3P/) • Email address of the crawler manager • Method and needed data (public key) for encrypted email (PGP or S/MIME)

  16. Crawler registration contd. • The data fromthe crawler will be entered in a dedicated entry, together with additional information: • Date of registration • Pointer to the person who made the decision • Date of last visit of the crawler • … • Password

More Related