140 likes | 213 Views
Overview of Archival Resource Key (ARK) Tools. 1 July 2005 John Kunze, California Digital Library. ARK Summary. Instead of one Name Authority: Assigning Authority + Mapping Authorities http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff
E N D
Overview of Archival Resource Key (ARK) Tools 1 July 2005 John Kunze, California Digital Library
ARK Summary Instead of one Name Authority: Assigning Authority + Mapping Authorities http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff \___________________/ \__/ \___/ \______/ \____________/ (replaceable) | | | 4 Qualifier | ARK Label | | (NMA-supported) | | | 1 Name Mapping Authority | 3 Name (NAA-assigned) Hostport (NMAH) | 2 Name Assigning Authority Number (NAAN) 1 = current service provider; identity inert; replaceable 2 = organization that originally assigned the id 3 = name originally assigned to the abstract object, often opaque 4 = extension disclosing object hierarchy & variants, often non-opaque
ARK usage Two ARKs accessing the same thing http://loc.gov/ark:/12025/654xz321 http://rutgers.edu/ark:/12025/654xz321 Access to metadata -- add a ‘?’ http://loc.gov/ark:/12025/654xz321? Access to support statement -- add ‘??’ http://loc.gov/ark:/12025/654xz321?? • 3 minimal requirements to be an ARK • An archive that can’t do all 3 -- trustworthy? • Is an ARK persistent? Maybe. Have to ask.
Persistence and opaqueness • Do ARKs have to be this ugly (opaque)? http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff \___________________/ \__/ \___/ \______/ \____________/ NMAH Label NAAN Name Qualifier • No, but they encourage it. Persistence is all about managing associations between strings and things • And the landscape is littered with links that were required to die for political, legal, or social reasons • the appearance, deliberate or even accidental, of once-true assertions that are now misleading, infringing, offensive makes it hard for our descendants to continue managing • Pain of managing opaque ids is mitigated by the certainty of having strongly bound metadata
A hostname may also break • Did it break because it appears to assert a branding that is no longer relevant? Have to pay attention to this. • Semantic rot is inevitable in all ids • The more opaque, the more protected • Non-opaque ids are very useful ad hoc metadata containers; in the tradeoff, consider the more regular and complete metadata promised by ARKs • Non-opaque service label extensions to opaque base ARKs are suitable; eg, “thumb”, “hi-res”
When the hostname breaks • Use low-tech, file lookup (like old /etc/hosts) • Or use MAPTR algorithm in client or plug-in • Resolver discovery using vanilla DNS and script: use Net::DNS; # include simple DNS package my $qtype = "NAPTR"; # initialize query type my $naa = shift; # get NAAN script argument my $mad = new Net::DNS::Resolver; # mapping authority discovery &maptr("$naa.ark.arpa"); # call maptr - that's it sub maptr { # recursive maptr algorithm my $dname = shift; # domain name as argument my ($rr, $order, $pref, $flags, $service, $regexp, $replacement); my $query = $mad->query($dname, $qtype); return if (! $query || ! $query->answer); foreach $rr ($query->answer) { next if ($rr->type ne $qtype); ($order, $pref, $flags, $service, $regexp, $replacement) = split(/\s/, $rr->rdatastr); if ($flags eq "") { &maptr($replacement); # recurse } elsif ($flags eq "h") { print "$replacement\n"; # candidate NMAH }}}
ARK lexical goodies • Hyphens ignored • Neutralizes harm done by typesetters • Too many search results? Providers may disclose (or not)… • Sub-object hierarchy using reserved ‘/’ • Variant objects using reserved ‘.’ • Usual %hh (hex encoding) as an escape
ARK namespaces reserved 12025 National Library of Medicine 12026 Library of Congress 12027 National Agriculture Library 13030 California Digital Library 13038 World Intellectual Property Organization 20775 University of California San Diego 29114 University of California San Francisco 28722 University of California Berkeley 15230 Rutgers University Libraries 13960 Internet Archive 64269 Digital Curation Centre 62624 New York University Libraries 67531 University of North Texas Libraries 27927 Ithaka Electronic-Archiving Initiative 12148 National Library of France Reserve a namespace by email to ark@cdlib.org
The Their Stuff problem is easier • We can’t do much about Their Stuff except defensively test and fix Our links to it • Not worth Our ARKs -- we can’t vouch for the objects • Indirect naming may help (eg, PURL, SFX, etc) • So get a link validator, staff to replace dead URLs, and figure out how much effort you’ll expend • Email Them (external providers), if appropriate, but if They don’t maintain their ids, no scheme will help
Our Stuff Solutions for persistent identifierproblems • Identifier maintenance is different from but deeply implicated in collection mgmt • Recall: an identifier is [a string and] an association between a string and a thing • If you maintain object metadata, you already maintain ids (assuming your object has an id) • Natural to maintain redirection info as one more column of metadata, and ask your DB admin to nightly recreate web server redirect config files
Opaque identifier tools • Non-opaque identifier strings are chosen deliberately to assert some things that are true at the time of assignment • Opaque identifier strings are best chosen by automated means, such as • NOID (nice opaque identifier) • Or UUID/GUID (universally unique identifier) • Sequence of hex encodings of your computer’s MAC address, current time, and sometimes a random number • No need to ask permission or register yourself • Looks like a something found in nature, but actually it’s based on IEEE and hardware vendor registries
Nice opaque identifiers (NOID) • A noid minter is a lightweight database for generating, tracking, and binding unique ids • The noid tool creates minters and accepts commands that operate them • Open source, available at www.cpan.org • Can mint in random or sequential order, with or without a check character guaranteeing against the most common transcription errors • Anyone can run a noid minter, maintain associations via bindings to arbitrary elements (assertions), and set up a resolver (including rule-based)
Using NOID • Identifiers minted according to a template: noid dbcreate f5.reedeedk long 13030 which produces as first minted id 13030/f54x54g11 • Noid is scheme-independent • Can be used to mint DOIs, URNs, URLs, lotto numbers, etc. • We (at CDL) use it to mint random ARKs with check chars
ARK Documentation • ARK specification http://www.ietf.org/internet-drafts/draft-kunze-ark-09.txt • ARK information sites http://www.cdlib.org/inside/diglib/ark/ http://ark.nlm.nih.gov/ • Overview article http://www.infotoday.com/cilmag/feb04/primers.shtml • Background paper http://bibnum.bnf.fr/ecdl/2003/proceedings.php?f=kunze