290 likes | 617 Views
Uniform Resource Identifiers. Jacek Kope cký WSML Working Group June 2004. Overview . History of URIs URI syntax URI references and their resolution Good practices for creating URIs Interesting issues. URI History. Universal Resource Identifiers (RFC 1630, June 1994)
E N D
Uniform Resource Identifiers Jacek Kopecký WSML Working Group June 2004
Overview • History of URIs • URI syntax • URI references and their resolution • Good practices for creating URIs • Interesting issues Jacek Kopecký, jacek.kopecky@deri.org
URI History • Universal Resource Identifiers (RFC 1630, June 1994) • Uniform Resource Locators and Names • RFC 2396, August 1998 • 2396bis in development • Originally “Universal”, later “Uniform” as a compromise • “Universal” again preferred by TimBL Jacek Kopecký, jacek.kopecky@deri.org
URLs and URNs • Locators (addresses) vs. Names • URNs not easily dereferencable • URNs can be made dereferencable by infrastructure • URLs perceived as less persistent • URLs and URNs drifting towards middle ground • http://www.w3.org/DesignIssues/NameMyth.html • No point in making the distinction any more Jacek Kopecký, jacek.kopecky@deri.org
Uniform Resource Identifiers • URIs “identify” “resources” • Identification doesn’t imply interaction • Resource is a sameness of characteristics over time • Latest blog rant • Latest blog rant on politics • Blog rant on politics from 2004-6-22 • Resource need not be accessible when URI is created • Pictures from my future trip to London will be at http://jacek.cz/photos/2004-08-london Jacek Kopecký, jacek.kopecky@deri.org
URI Syntax • According to 2396bis • http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html • Examples • http://www.ietf.org/rfc/rfc2396.txt • mailto:John.Doe@example.com • news:comp.infosystems.www.servers.unix • telnet://melvyl.ucop.edu/ • URI Syntax - simplified • scheme: [//authority] [/path] [?query] [#fragid] • Relative URI without “scheme:” • Dot path segments (‘.’ and ‘..’) treated specially Jacek Kopecký, jacek.kopecky@deri.org
URI Syntax cont’d • Reserved characters (like /:?#@$&+* ) • Many allowed characters • Rest of UNICODE percent-encoded from UTF-8 • http://google.com/search?q=kopeck%C3%BD • Percent-encoding allowed characters creates equivalent URIs • But namespaces compared char-by-char Jacek Kopecký, jacek.kopecky@deri.org
URI Reference Resolution • Resolving URI A against base URI B • Going from the left, keep as much from B as is undefined in A • First part of A replaces that part from B • Path resolution special • If A has absolute path, that is taken • Relative path from A resolved against path from B, removing dot segments from result • Everything after first part of A taken from A • Fragment always taken from A Jacek Kopecký, jacek.kopecky@deri.org
URI Ref. Resolution Examples • Base URI: http://a/b/c/d?e#f • g = http://a/b/c/g • . = http://a/b/c/ • ./ = http://a/b/c/ • ./g = http://a/b/c/g • .. = http://a/b/ • ../ = http://a/b/ • ../g = http://a/b/g • ../../g = http://a/g • ../../../g = http://a/g Jacek Kopecký, jacek.kopecky@deri.org
URI Ref. Resolution Examples • Base URI: http://a/b/c/d?e#f • /./g = http://a/g • //g = http://g • #s = http://a/b/c/d?e#s • g#s = http://a/b/c/g#s • ?y = http://a/b/c/d?y • g?y = http://a/b/c/g?y • g?y#s = http://a/b/c/g?y#s • g:h = g:h • ./g:h = http://a/b/d/g:h Jacek Kopecký, jacek.kopecky@deri.org
Base URIs • Necessary when resolving URI references • Explicit base URI embedded in content • <link xml:base=“http://example.com/bar/” href=“x.html” /> • URI of the document • Usual in HTML files on the web • App-dependent base URI default Jacek Kopecký, jacek.kopecky@deri.org
URI Equivalence • Do two URIs identify the same resource? • Comparing without accessing the resources • Various applications for URI comparison • Increasing cache efficiency • Comparing the namespaces of two symbols • Algorithms must avoid false positives • False negatives unavoidable • http://weather.example.com/innsbruck • http://jacek.cz/innsbruckweather redirect to above Jacek Kopecký, jacek.kopecky@deri.org
Uses of URIs • Addresses on the Web • Namespaces in XML QNames • Namespaces in QNames in other languages • Identifiers of things and concepts (e.g. RDF) • Unique keys (e.g. MIME message ID) Jacek Kopecký, jacek.kopecky@deri.org
QName • Introduced in XML Namespaces • Name of an XML namespace-qualified element • RDF uses QNames for brevity of URI notation • XML Schema expanded use of QNames to further things (6 symbol spaces) • Every following language uses QNames as identifiers • Number of independent symbol spaces • => Turning QNames into URIs is cumbersome • Should have been as simple as in RDF (IMHO) Jacek Kopecký, jacek.kopecky@deri.org
Creating URIs for Web Resources • Versioning approach for persistence • http://w3.org/TR/soap vs. • http://w3.org/TR/soap12 vs. • http://w3.org/TR/2003/REC-soap12-part1-20030624/ • Simple, memorable URIs • http://jacek.cz/blog • Scribbled on a napkin • Correcting spelling and case helps – mod_speling • Making the “www.” prefix optional (both ways) helps • Content negotiation – drop .html (.php, .asp) • URI changes harmful Jacek Kopecký, jacek.kopecky@deri.org
Creating Example URIs • http://example.com • http://example.net • http://example.org • Reserved for precisely this purpose • Or use own domain (deri.org, wsmo.org) • http://foo.com not good Jacek Kopecký, jacek.kopecky@deri.org
Creating URIs for Namespaces • Dereferencable, ending with ‘/’ or ‘#’ • Canonical URIs – no unnecessary dot segments or percent-encoding • Namespaces compared char-by-char • Namespace document • Preferably in the language that uses the namespace – enables automatic discovery • With human-oriented descriptions • To allow for the above, don’t share namespace URIs for schema and WSDL Jacek Kopecký, jacek.kopecky@deri.org
Creating URIs for Concepts • Group concepts in a common, dereferencable namespace • Each concept identified by its fragID • In RDF/XML, namespace ends with ‘#’ • Namespace document describes the concepts • Two problems • FragIDs depend on media types • Can http://example.com/#car identify a car? Jacek Kopecký, jacek.kopecky@deri.org
Fragment IDs in URIs • Fragment ID identifies a secondary resource • Interpretation of fragment IDs depends on media type • In HTML <a name=“foo”> • In XML <element xml:id=“foo”/> • No meaning in JPEG • xml:id in development • So far language-dependent (often DTD) solutions • Fragment IDs should mean the same thing across media types with content negotiation Jacek Kopecký, jacek.kopecky@deri.org
Range of HTTP URIs? • Open W3C TAG issue • Can http: URI identify a car? • Can I say http://jacek.cz/dragstar/ is my motorbike? • TimBL doesn’t seem to think so • Is it necessary to distinguish between a thing and a description of that thing? Jacek Kopecký, jacek.kopecky@deri.org
Other Interesting Issues • data: URI scheme – the URI is the resource • RFC 2397 • data:image/gif;base64,R0lGODdhMAAwAPAA… • mailto: scheme a misnomer • URIs don’t specify actions but identifiers • uuid: scheme for unique identifiers • Good for transient identification in closed systems • Mismatches between perceived and intended meaning of a resource • http://w3.org/tr/soap • Should URIs be human-readable? • http://www.bscw.semanticweb.org/bscw/bscw.cgi/0/21621 Jacek Kopecký, jacek.kopecky@deri.org
Main Points • Cool URIs don’t change • URIs can be (and are) scribbled on napkins • URIs don’t (necessarily) point to documents • Dereferencable URIs also good as names • URLs, URNs obsolete Jacek Kopecký, jacek.kopecky@deri.org
References • http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html • http://www.ietf.org/rfc/rfc2396.txt • http://www.w3.org/Provider/Style/URI • http://www.w3.org/DesignIssues/Architecture.html • http://www.w3.org/DesignIssues/Axioms.html • http://www.w3.org/DesignIssues/NameMyth.html Jacek Kopecký, jacek.kopecky@deri.org
Hope it Helped • Thanks for your attention • Questions? Comments? • jacek.kopecky@deri.org Jacek Kopecký, jacek.kopecky@deri.org