110 likes | 225 Views
Contact Data in the RIPE Database. Shane Kerr RIPE NCC <shane@ripe.net>. Background & Goal. Certain kinds of data have caused problems Domain objects (heavy use by ccTLD’s) Person objects (heavy use by ccTLD’s, etc.) Cleanups have been made in the past Consistency fixes
E N D
Contact Data in the RIPE Database Shane Kerr RIPE NCC <shane@ripe.net>
Background & Goal • Certain kinds of data have caused problems • Domain objects (heavy use by ccTLD’s) • Person objects (heavy use by ccTLD’s, etc.) • Cleanups have been made in the past • Consistency fixes • Deletions of unnecessary data, one-time and ongoing • Small numbers of “inconsistencies” not a problem • Perform some measurement of data quality
Contact Data • Contacts are: • Referenced by resources recorded in the Database • Administrative or technical • Contacts have: • Name • Postal address • Phone number • E-mail address
Focus on e-mail • Name impossible to check • Postal address/phone number difficult to check • E-mail possible • Sadly optional for person objects
Checking the addresses • Unique e-mail extracted (about 500,000 in all) • Syntax check to remove garbage and bad TLD • Unique domains extracted (about 280,000 in all) • DNS checked • Algorithm from RFC 2821 • MX lookups, with fallback to A lookups • SMTP checked • VRFY unreliable • Use RSET, MAIL, RCPT for each e-mail • Minimise connections (only 140,000 unique IP’s)
Interpreting the Results • 20% of e-mail addresses can never be reached • 80% may still fail • Depends on mail software and configuration • Impossible to check further without delivering mail • Even delivered mail may never be read
reachable non-reachable no e-mail inetnum results objects • A significant percentage of inetnum objects have no valid e-mail address. • A much smaller percentage of actual IP addresses has no valid e-mail address, but still a significant amount. • Most of these are because the “e-mail:” attribute is optional in the person object. IP addresses
Conclusions & Questions • Many networks have no reachable contacts • “e-mail:” being optional is a significant reason • Is this a problem? If so, how big of a problem? • Possible actions: • Make “e-mail:” mandatory • Check e-mail reachability on person creation/update • Put a “remark:” on networks with unreachable contacts • Return parent networks if contacts unreachable