2244 lines
82 KiB
Plaintext
2244 lines
82 KiB
Plaintext
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Network Working Group T. Berners-Lee
|
|||
|
Request for Comments: 2396 MIT/LCS
|
|||
|
Updates: 1808, 1738 R. Fielding
|
|||
|
Category: Standards Track U.C. Irvine
|
|||
|
L. Masinter
|
|||
|
Xerox Corporation
|
|||
|
August 1998
|
|||
|
|
|||
|
|
|||
|
Uniform Resource Identifiers (URI): Generic Syntax
|
|||
|
|
|||
|
Status of this Memo
|
|||
|
|
|||
|
This document specifies an Internet standards track protocol for the
|
|||
|
Internet community, and requests discussion and suggestions for
|
|||
|
improvements. Please refer to the current edition of the "Internet
|
|||
|
Official Protocol Standards" (STD 1) for the standardization state
|
|||
|
and status of this protocol. Distribution of this memo is unlimited.
|
|||
|
|
|||
|
Copyright Notice
|
|||
|
|
|||
|
Copyright (C) The Internet Society (1998). All Rights Reserved.
|
|||
|
|
|||
|
IESG Note
|
|||
|
|
|||
|
This paper describes a "superset" of operations that can be applied
|
|||
|
to URI. It consists of both a grammar and a description of basic
|
|||
|
functionality for URI. To understand what is a valid URI, both the
|
|||
|
grammar and the associated description have to be studied. Some of
|
|||
|
the functionality described is not applicable to all URI schemes, and
|
|||
|
some operations are only possible when certain media types are
|
|||
|
retrieved using the URI, regardless of the scheme used.
|
|||
|
|
|||
|
Abstract
|
|||
|
|
|||
|
A Uniform Resource Identifier (URI) is a compact string of characters
|
|||
|
for identifying an abstract or physical resource. This document
|
|||
|
defines the generic syntax of URI, including both absolute and
|
|||
|
relative forms, and guidelines for their use; it revises and replaces
|
|||
|
the generic definitions in RFC 1738 and RFC 1808.
|
|||
|
|
|||
|
This document defines a grammar that is a superset of all valid URI,
|
|||
|
such that an implementation can parse the common components of a URI
|
|||
|
reference without knowing the scheme-specific requirements of every
|
|||
|
possible identifier type. This document does not define a generative
|
|||
|
grammar for URI; that task will be performed by the individual
|
|||
|
specifications of each URI scheme.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 1]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
1. Introduction
|
|||
|
|
|||
|
Uniform Resource Identifiers (URI) provide a simple and extensible
|
|||
|
means for identifying a resource. This specification of URI syntax
|
|||
|
and semantics is derived from concepts introduced by the World Wide
|
|||
|
Web global information initiative, whose use of such objects dates
|
|||
|
from 1990 and is described in "Universal Resource Identifiers in WWW"
|
|||
|
[RFC1630]. The specification of URI is designed to meet the
|
|||
|
recommendations laid out in "Functional Recommendations for Internet
|
|||
|
Resource Locators" [RFC1736] and "Functional Requirements for Uniform
|
|||
|
Resource Names" [RFC1737].
|
|||
|
|
|||
|
This document updates and merges "Uniform Resource Locators"
|
|||
|
[RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order
|
|||
|
to define a single, generic syntax for all URI. It excludes those
|
|||
|
portions of RFC 1738 that defined the specific syntax of individual
|
|||
|
URL schemes; those portions will be updated as separate documents, as
|
|||
|
will the process for registration of new URI schemes. This document
|
|||
|
does not discuss the issues and recommendation for dealing with
|
|||
|
characters outside of the US-ASCII character set [ASCII]; those
|
|||
|
recommendations are discussed in a separate document.
|
|||
|
|
|||
|
All significant changes from the prior RFCs are noted in Appendix G.
|
|||
|
|
|||
|
1.1 Overview of URI
|
|||
|
|
|||
|
URI are characterized by the following definitions:
|
|||
|
|
|||
|
Uniform
|
|||
|
Uniformity provides several benefits: it allows different types
|
|||
|
of resource identifiers to be used in the same context, even
|
|||
|
when the mechanisms used to access those resources may differ;
|
|||
|
it allows uniform semantic interpretation of common syntactic
|
|||
|
conventions across different types of resource identifiers; it
|
|||
|
allows introduction of new types of resource identifiers
|
|||
|
without interfering with the way that existing identifiers are
|
|||
|
used; and, it allows the identifiers to be reused in many
|
|||
|
different contexts, thus permitting new applications or
|
|||
|
protocols to leverage a pre-existing, large, and widely-used
|
|||
|
set of resource identifiers.
|
|||
|
|
|||
|
Resource
|
|||
|
A resource can be anything that has identity. Familiar
|
|||
|
examples include an electronic document, an image, a service
|
|||
|
(e.g., "today's weather report for Los Angeles"), and a
|
|||
|
collection of other resources. Not all resources are network
|
|||
|
"retrievable"; e.g., human beings, corporations, and bound
|
|||
|
books in a library can also be considered resources.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 2]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
The resource is the conceptual mapping to an entity or set of
|
|||
|
entities, not necessarily the entity which corresponds to that
|
|||
|
mapping at any particular instance in time. Thus, a resource
|
|||
|
can remain constant even when its content---the entities to
|
|||
|
which it currently corresponds---changes over time, provided
|
|||
|
that the conceptual mapping is not changed in the process.
|
|||
|
|
|||
|
Identifier
|
|||
|
An identifier is an object that can act as a reference to
|
|||
|
something that has identity. In the case of URI, the object is
|
|||
|
a sequence of characters with a restricted syntax.
|
|||
|
|
|||
|
Having identified a resource, a system may perform a variety of
|
|||
|
operations on the resource, as might be characterized by such words
|
|||
|
as `access', `update', `replace', or `find attributes'.
|
|||
|
|
|||
|
1.2. URI, URL, and URN
|
|||
|
|
|||
|
A URI can be further classified as a locator, a name, or both. The
|
|||
|
term "Uniform Resource Locator" (URL) refers to the subset of URI
|
|||
|
that identify resources via a representation of their primary access
|
|||
|
mechanism (e.g., their network "location"), rather than identifying
|
|||
|
the resource by name or by some other attribute(s) of that resource.
|
|||
|
The term "Uniform Resource Name" (URN) refers to the subset of URI
|
|||
|
that are required to remain globally unique and persistent even when
|
|||
|
the resource ceases to exist or becomes unavailable.
|
|||
|
|
|||
|
The URI scheme (Section 3.1) defines the namespace of the URI, and
|
|||
|
thus may further restrict the syntax and semantics of identifiers
|
|||
|
using that scheme. This specification defines those elements of the
|
|||
|
URI syntax that are either required of all URI schemes or are common
|
|||
|
to many URI schemes. It thus defines the syntax and semantics that
|
|||
|
are needed to implement a scheme-independent parsing mechanism for
|
|||
|
URI references, such that the scheme-dependent handling of a URI can
|
|||
|
be postponed until the scheme-dependent semantics are needed. We use
|
|||
|
the term URL below when describing syntax or semantics that only
|
|||
|
apply to locators.
|
|||
|
|
|||
|
Although many URL schemes are named after protocols, this does not
|
|||
|
imply that the only way to access the URL's resource is via the named
|
|||
|
protocol. Gateways, proxies, caches, and name resolution services
|
|||
|
might be used to access some resources, independent of the protocol
|
|||
|
of their origin, and the resolution of some URL may require the use
|
|||
|
of more than one protocol (e.g., both DNS and HTTP are typically used
|
|||
|
to access an "http" URL's resource when it can't be found in a local
|
|||
|
cache).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 3]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
A URN differs from a URL in that it's primary purpose is persistent
|
|||
|
labeling of a resource with an identifier. That identifier is drawn
|
|||
|
from one of a set of defined namespaces, each of which has its own
|
|||
|
set name structure and assignment procedures. The "urn" scheme has
|
|||
|
been reserved to establish the requirements for a standardized URN
|
|||
|
namespace, as defined in "URN Syntax" [RFC2141] and its related
|
|||
|
specifications.
|
|||
|
|
|||
|
Most of the examples in this specification demonstrate URL, since
|
|||
|
they allow the most varied use of the syntax and often have a
|
|||
|
hierarchical namespace. A parser of the URI syntax is capable of
|
|||
|
parsing both URL and URN references as a generic URI; once the scheme
|
|||
|
is determined, the scheme-specific parsing can be performed on the
|
|||
|
generic URI components. In other words, the URI syntax is a superset
|
|||
|
of the syntax of all URI schemes.
|
|||
|
|
|||
|
1.3. Example URI
|
|||
|
|
|||
|
The following examples illustrate URI that are in common use.
|
|||
|
|
|||
|
ftp://ftp.is.co.za/rfc/rfc1808.txt
|
|||
|
-- ftp scheme for File Transfer Protocol services
|
|||
|
|
|||
|
gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
|
|||
|
-- gopher scheme for Gopher and Gopher+ Protocol services
|
|||
|
|
|||
|
http://www.math.uio.no/faq/compression-faq/part1.html
|
|||
|
-- http scheme for Hypertext Transfer Protocol services
|
|||
|
|
|||
|
mailto:mduerst@ifi.unizh.ch
|
|||
|
-- mailto scheme for electronic mail addresses
|
|||
|
|
|||
|
news:comp.infosystems.www.servers.unix
|
|||
|
-- news scheme for USENET news groups and articles
|
|||
|
|
|||
|
telnet://melvyl.ucop.edu/
|
|||
|
-- telnet scheme for interactive services via the TELNET Protocol
|
|||
|
|
|||
|
1.4. Hierarchical URI and Relative Forms
|
|||
|
|
|||
|
An absolute identifier refers to a resource independent of the
|
|||
|
context in which the identifier is used. In contrast, a relative
|
|||
|
identifier refers to a resource by describing the difference within a
|
|||
|
hierarchical namespace between the current context and an absolute
|
|||
|
identifier of the resource.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 4]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
Some URI schemes support a hierarchical naming system, where the
|
|||
|
hierarchy of the name is denoted by a "/" delimiter separating the
|
|||
|
components in the scheme. This document defines a scheme-independent
|
|||
|
`relative' form of URI reference that can be used in conjunction with
|
|||
|
a `base' URI (of a hierarchical scheme) to produce another URI. The
|
|||
|
syntax of hierarchical URI is described in Section 3; the relative
|
|||
|
URI calculation is described in Section 5.
|
|||
|
|
|||
|
1.5. URI Transcribability
|
|||
|
|
|||
|
The URI syntax was designed with global transcribability as one of
|
|||
|
its main concerns. A URI is a sequence of characters from a very
|
|||
|
limited set, i.e. the letters of the basic Latin alphabet, digits,
|
|||
|
and a few special characters. A URI may be represented in a variety
|
|||
|
of ways: e.g., ink on paper, pixels on a screen, or a sequence of
|
|||
|
octets in a coded character set. The interpretation of a URI depends
|
|||
|
only on the characters used and not how those characters are
|
|||
|
represented in a network protocol.
|
|||
|
|
|||
|
The goal of transcribability can be described by a simple scenario.
|
|||
|
Imagine two colleagues, Sam and Kim, sitting in a pub at an
|
|||
|
international conference and exchanging research ideas. Sam asks Kim
|
|||
|
for a location to get more information, so Kim writes the URI for the
|
|||
|
research site on a napkin. Upon returning home, Sam takes out the
|
|||
|
napkin and types the URI into a computer, which then retrieves the
|
|||
|
information to which Kim referred.
|
|||
|
|
|||
|
There are several design concerns revealed by the scenario:
|
|||
|
|
|||
|
o A URI is a sequence of characters, which is not always
|
|||
|
represented as a sequence of octets.
|
|||
|
|
|||
|
o A URI may be transcribed from a non-network source, and thus
|
|||
|
should consist of characters that are most likely to be able to
|
|||
|
be typed into a computer, within the constraints imposed by
|
|||
|
keyboards (and related input devices) across languages and
|
|||
|
locales.
|
|||
|
|
|||
|
o A URI often needs to be remembered by people, and it is easier
|
|||
|
for people to remember a URI when it consists of meaningful
|
|||
|
components.
|
|||
|
|
|||
|
These design concerns are not always in alignment. For example, it
|
|||
|
is often the case that the most meaningful name for a URI component
|
|||
|
would require characters that cannot be typed into some systems. The
|
|||
|
ability to transcribe the resource identifier from one medium to
|
|||
|
another was considered more important than having its URI consist of
|
|||
|
the most meaningful of components. In local and regional contexts
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 5]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
and with improving technology, users might benefit from being able to
|
|||
|
use a wider range of characters; such use is not defined in this
|
|||
|
document.
|
|||
|
|
|||
|
1.6. Syntax Notation and Common Elements
|
|||
|
|
|||
|
This document uses two conventions to describe and define the syntax
|
|||
|
for URI. The first, called the layout form, is a general description
|
|||
|
of the order of components and component separators, as in
|
|||
|
|
|||
|
<first>/<second>;<third>?<fourth>
|
|||
|
|
|||
|
The component names are enclosed in angle-brackets and any characters
|
|||
|
outside angle-brackets are literal separators. Whitespace should be
|
|||
|
ignored. These descriptions are used informally and do not define
|
|||
|
the syntax requirements.
|
|||
|
|
|||
|
The second convention is a BNF-like grammar, used to define the
|
|||
|
formal URI syntax. The grammar is that of [RFC822], except that "|"
|
|||
|
is used to designate alternatives. Briefly, rules are separated from
|
|||
|
definitions by an equal "=", indentation is used to continue a rule
|
|||
|
definition over more than one line, literals are quoted with "",
|
|||
|
parentheses "(" and ")" are used to group elements, optional elements
|
|||
|
are enclosed in "[" and "]" brackets, and elements may be preceded
|
|||
|
with <n>* to designate n or more repetitions of the following
|
|||
|
element; n defaults to 0.
|
|||
|
|
|||
|
Unlike many specifications that use a BNF-like grammar to define the
|
|||
|
bytes (octets) allowed by a protocol, the URI grammar is defined in
|
|||
|
terms of characters. Each literal in the grammar corresponds to the
|
|||
|
character it represents, rather than to the octet encoding of that
|
|||
|
character in any particular coded character set. How a URI is
|
|||
|
represented in terms of bits and bytes on the wire is dependent upon
|
|||
|
the character encoding of the protocol used to transport it, or the
|
|||
|
charset of the document which contains it.
|
|||
|
|
|||
|
The following definitions are common to many elements:
|
|||
|
|
|||
|
alpha = lowalpha | upalpha
|
|||
|
|
|||
|
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
|
|||
|
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
|
|||
|
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
|
|||
|
|
|||
|
upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
|
|||
|
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
|
|||
|
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 6]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
|
|||
|
"8" | "9"
|
|||
|
|
|||
|
alphanum = alpha | digit
|
|||
|
|
|||
|
The complete URI syntax is collected in Appendix A.
|
|||
|
|
|||
|
2. URI Characters and Escape Sequences
|
|||
|
|
|||
|
URI consist of a restricted set of characters, primarily chosen to
|
|||
|
aid transcribability and usability both in computer systems and in
|
|||
|
non-computer communications. Characters used conventionally as
|
|||
|
delimiters around URI were excluded. The restricted set of
|
|||
|
characters consists of digits, letters, and a few graphic symbols
|
|||
|
were chosen from those common to most of the character encodings and
|
|||
|
input facilities available to Internet users.
|
|||
|
|
|||
|
uric = reserved | unreserved | escaped
|
|||
|
|
|||
|
Within a URI, characters are either used as delimiters, or to
|
|||
|
represent strings of data (octets) within the delimited portions.
|
|||
|
Octets are either represented directly by a character (using the US-
|
|||
|
ASCII character for that octet [ASCII]) or by an escape encoding.
|
|||
|
This representation is elaborated below.
|
|||
|
|
|||
|
2.1 URI and non-ASCII characters
|
|||
|
|
|||
|
The relationship between URI and characters has been a source of
|
|||
|
confusion for characters that are not part of US-ASCII. To describe
|
|||
|
the relationship, it is useful to distinguish between a "character"
|
|||
|
(as a distinguishable semantic entity) and an "octet" (an 8-bit
|
|||
|
byte). There are two mappings, one from URI characters to octets, and
|
|||
|
a second from octets to original characters:
|
|||
|
|
|||
|
URI character sequence->octet sequence->original character sequence
|
|||
|
|
|||
|
A URI is represented as a sequence of characters, not as a sequence
|
|||
|
of octets. That is because URI might be "transported" by means that
|
|||
|
are not through a computer network, e.g., printed on paper, read over
|
|||
|
the radio, etc.
|
|||
|
|
|||
|
A URI scheme may define a mapping from URI characters to octets;
|
|||
|
whether this is done depends on the scheme. Commonly, within a
|
|||
|
delimited component of a URI, a sequence of characters may be used to
|
|||
|
represent a sequence of octets. For example, the character "a"
|
|||
|
represents the octet 97 (decimal), while the character sequence "%",
|
|||
|
"0", "a" represents the octet 10 (decimal).
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 7]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
There is a second translation for some resources: the sequence of
|
|||
|
octets defined by a component of the URI is subsequently used to
|
|||
|
represent a sequence of characters. A 'charset' defines this mapping.
|
|||
|
There are many charsets in use in Internet protocols. For example,
|
|||
|
UTF-8 [UTF-8] defines a mapping from sequences of octets to sequences
|
|||
|
of characters in the repertoire of ISO 10646.
|
|||
|
|
|||
|
In the simplest case, the original character sequence contains only
|
|||
|
characters that are defined in US-ASCII, and the two levels of
|
|||
|
mapping are simple and easily invertible: each 'original character'
|
|||
|
is represented as the octet for the US-ASCII code for it, which is,
|
|||
|
in turn, represented as either the US-ASCII character, or else the
|
|||
|
"%" escape sequence for that octet.
|
|||
|
|
|||
|
For original character sequences that contain non-ASCII characters,
|
|||
|
however, the situation is more difficult. Internet protocols that
|
|||
|
transmit octet sequences intended to represent character sequences
|
|||
|
are expected to provide some way of identifying the charset used, if
|
|||
|
there might be more than one [RFC2277]. However, there is currently
|
|||
|
no provision within the generic URI syntax to accomplish this
|
|||
|
identification. An individual URI scheme may require a single
|
|||
|
charset, define a default charset, or provide a way to indicate the
|
|||
|
charset used.
|
|||
|
|
|||
|
It is expected that a systematic treatment of character encoding
|
|||
|
within URI will be developed as a future modification of this
|
|||
|
specification.
|
|||
|
|
|||
|
2.2. Reserved Characters
|
|||
|
|
|||
|
Many URI include components consisting of or delimited by, certain
|
|||
|
special characters. These characters are called "reserved", since
|
|||
|
their usage within the URI component is limited to their reserved
|
|||
|
purpose. If the data for a URI component would conflict with the
|
|||
|
reserved purpose, then the conflicting data must be escaped before
|
|||
|
forming the URI.
|
|||
|
|
|||
|
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
|
|||
|
"$" | ","
|
|||
|
|
|||
|
The "reserved" syntax class above refers to those characters that are
|
|||
|
allowed within a URI, but which may not be allowed within a
|
|||
|
particular component of the generic URI syntax; they are used as
|
|||
|
delimiters of the components described in Section 3.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 8]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
Characters in the "reserved" set are not reserved in all contexts.
|
|||
|
The set of characters actually reserved within any given URI
|
|||
|
component is defined by that component. In general, a character is
|
|||
|
reserved if the semantics of the URI changes if the character is
|
|||
|
replaced with its escaped US-ASCII encoding.
|
|||
|
|
|||
|
2.3. Unreserved Characters
|
|||
|
|
|||
|
Data characters that are allowed in a URI but do not have a reserved
|
|||
|
purpose are called unreserved. These include upper and lower case
|
|||
|
letters, decimal digits, and a limited set of punctuation marks and
|
|||
|
symbols.
|
|||
|
|
|||
|
unreserved = alphanum | mark
|
|||
|
|
|||
|
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
|
|||
|
|
|||
|
Unreserved characters can be escaped without changing the semantics
|
|||
|
of the URI, but this should not be done unless the URI is being used
|
|||
|
in a context that does not allow the unescaped character to appear.
|
|||
|
|
|||
|
2.4. Escape Sequences
|
|||
|
|
|||
|
Data must be escaped if it does not have a representation using an
|
|||
|
unreserved character; this includes data that does not correspond to
|
|||
|
a printable character of the US-ASCII coded character set, or that
|
|||
|
corresponds to any US-ASCII character that is disallowed, as
|
|||
|
explained below.
|
|||
|
|
|||
|
2.4.1. Escaped Encoding
|
|||
|
|
|||
|
An escaped octet is encoded as a character triplet, consisting of the
|
|||
|
percent character "%" followed by the two hexadecimal digits
|
|||
|
representing the octet code. For example, "%20" is the escaped
|
|||
|
encoding for the US-ASCII space character.
|
|||
|
|
|||
|
escaped = "%" hex hex
|
|||
|
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
|
|||
|
"a" | "b" | "c" | "d" | "e" | "f"
|
|||
|
|
|||
|
2.4.2. When to Escape and Unescape
|
|||
|
|
|||
|
A URI is always in an "escaped" form, since escaping or unescaping a
|
|||
|
completed URI might change its semantics. Normally, the only time
|
|||
|
escape encodings can safely be made is when the URI is being created
|
|||
|
from its component parts; each component may have its own set of
|
|||
|
characters that are reserved, so only the mechanism responsible for
|
|||
|
generating or interpreting that component can determine whether or
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 9]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
not escaping a character will change its semantics. Likewise, a URI
|
|||
|
must be separated into its components before the escaped characters
|
|||
|
within those components can be safely decoded.
|
|||
|
|
|||
|
In some cases, data that could be represented by an unreserved
|
|||
|
character may appear escaped; for example, some of the unreserved
|
|||
|
"mark" characters are automatically escaped by some systems. If the
|
|||
|
given URI scheme defines a canonicalization algorithm, then
|
|||
|
unreserved characters may be unescaped according to that algorithm.
|
|||
|
For example, "%7e" is sometimes used instead of "~" in an http URL
|
|||
|
path, but the two are equivalent for an http URL.
|
|||
|
|
|||
|
Because the percent "%" character always has the reserved purpose of
|
|||
|
being the escape indicator, it must be escaped as "%25" in order to
|
|||
|
be used as data within a URI. Implementers should be careful not to
|
|||
|
escape or unescape the same string more than once, since unescaping
|
|||
|
an already unescaped string might lead to misinterpreting a percent
|
|||
|
data character as another escaped character, or vice versa in the
|
|||
|
case of escaping an already escaped string.
|
|||
|
|
|||
|
2.4.3. Excluded US-ASCII Characters
|
|||
|
|
|||
|
Although they are disallowed within the URI syntax, we include here a
|
|||
|
description of those US-ASCII characters that have been excluded and
|
|||
|
the reasons for their exclusion.
|
|||
|
|
|||
|
The control characters in the US-ASCII coded character set are not
|
|||
|
used within a URI, both because they are non-printable and because
|
|||
|
they are likely to be misinterpreted by some control mechanisms.
|
|||
|
|
|||
|
control = <US-ASCII coded characters 00-1F and 7F hexadecimal>
|
|||
|
|
|||
|
The space character is excluded because significant spaces may
|
|||
|
disappear and insignificant spaces may be introduced when URI are
|
|||
|
transcribed or typeset or subjected to the treatment of word-
|
|||
|
processing programs. Whitespace is also used to delimit URI in many
|
|||
|
contexts.
|
|||
|
|
|||
|
space = <US-ASCII coded character 20 hexadecimal>
|
|||
|
|
|||
|
The angle-bracket "<" and ">" and double-quote (") characters are
|
|||
|
excluded because they are often used as the delimiters around URI in
|
|||
|
text documents and protocol fields. The character "#" is excluded
|
|||
|
because it is used to delimit a URI from a fragment identifier in URI
|
|||
|
references (Section 4). The percent character "%" is excluded because
|
|||
|
it is used for the encoding of escaped characters.
|
|||
|
|
|||
|
delims = "<" | ">" | "#" | "%" | <">
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 10]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
Other characters are excluded because gateways and other transport
|
|||
|
agents are known to sometimes modify such characters, or they are
|
|||
|
used as delimiters.
|
|||
|
|
|||
|
unwise = "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
|
|||
|
|
|||
|
Data corresponding to excluded characters must be escaped in order to
|
|||
|
be properly represented within a URI.
|
|||
|
|
|||
|
3. URI Syntactic Components
|
|||
|
|
|||
|
The URI syntax is dependent upon the scheme. In general, absolute
|
|||
|
URI are written as follows:
|
|||
|
|
|||
|
<scheme>:<scheme-specific-part>
|
|||
|
|
|||
|
An absolute URI contains the name of the scheme being used (<scheme>)
|
|||
|
followed by a colon (":") and then a string (the <scheme-specific-
|
|||
|
part>) whose interpretation depends on the scheme.
|
|||
|
|
|||
|
The URI syntax does not require that the scheme-specific-part have
|
|||
|
any general structure or set of semantics which is common among all
|
|||
|
URI. However, a subset of URI do share a common syntax for
|
|||
|
representing hierarchical relationships within the namespace. This
|
|||
|
"generic URI" syntax consists of a sequence of four main components:
|
|||
|
|
|||
|
<scheme>://<authority><path>?<query>
|
|||
|
|
|||
|
each of which, except <scheme>, may be absent from a particular URI.
|
|||
|
For example, some URI schemes do not allow an <authority> component,
|
|||
|
and others do not use a <query> component.
|
|||
|
|
|||
|
absoluteURI = scheme ":" ( hier_part | opaque_part )
|
|||
|
|
|||
|
URI that are hierarchical in nature use the slash "/" character for
|
|||
|
separating hierarchical components. For some file systems, a "/"
|
|||
|
character (used to denote the hierarchical structure of a URI) is the
|
|||
|
delimiter used to construct a file name hierarchy, and thus the URI
|
|||
|
path will look similar to a file pathname. This does NOT imply that
|
|||
|
the resource is a file or that the URI maps to an actual filesystem
|
|||
|
pathname.
|
|||
|
|
|||
|
hier_part = ( net_path | abs_path ) [ "?" query ]
|
|||
|
|
|||
|
net_path = "//" authority [ abs_path ]
|
|||
|
|
|||
|
abs_path = "/" path_segments
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 11]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
URI that do not make use of the slash "/" character for separating
|
|||
|
hierarchical components are considered opaque by the generic URI
|
|||
|
parser.
|
|||
|
|
|||
|
opaque_part = uric_no_slash *uric
|
|||
|
|
|||
|
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
|
|||
|
"&" | "=" | "+" | "$" | ","
|
|||
|
|
|||
|
We use the term <path> to refer to both the <abs_path> and
|
|||
|
<opaque_part> constructs, since they are mutually exclusive for any
|
|||
|
given URI and can be parsed as a single component.
|
|||
|
|
|||
|
3.1. Scheme Component
|
|||
|
|
|||
|
Just as there are many different methods of access to resources,
|
|||
|
there are a variety of schemes for identifying such resources. The
|
|||
|
URI syntax consists of a sequence of components separated by reserved
|
|||
|
characters, with the first component defining the semantics for the
|
|||
|
remainder of the URI string.
|
|||
|
|
|||
|
Scheme names consist of a sequence of characters beginning with a
|
|||
|
lower case letter and followed by any combination of lower case
|
|||
|
letters, digits, plus ("+"), period ("."), or hyphen ("-"). For
|
|||
|
resiliency, programs interpreting URI should treat upper case letters
|
|||
|
as equivalent to lower case in scheme names (e.g., allow "HTTP" as
|
|||
|
well as "http").
|
|||
|
|
|||
|
scheme = alpha *( alpha | digit | "+" | "-" | "." )
|
|||
|
|
|||
|
Relative URI references are distinguished from absolute URI in that
|
|||
|
they do not begin with a scheme name. Instead, the scheme is
|
|||
|
inherited from the base URI, as described in Section 5.2.
|
|||
|
|
|||
|
3.2. Authority Component
|
|||
|
|
|||
|
Many URI schemes include a top hierarchical element for a naming
|
|||
|
authority, such that the namespace defined by the remainder of the
|
|||
|
URI is governed by that authority. This authority component is
|
|||
|
typically defined by an Internet-based server or a scheme-specific
|
|||
|
registry of naming authorities.
|
|||
|
|
|||
|
authority = server | reg_name
|
|||
|
|
|||
|
The authority component is preceded by a double slash "//" and is
|
|||
|
terminated by the next slash "/", question-mark "?", or by the end of
|
|||
|
the URI. Within the authority component, the characters ";", ":",
|
|||
|
"@", "?", and "/" are reserved.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 12]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
An authority component is not required for a URI scheme to make use
|
|||
|
of relative references. A base URI without an authority component
|
|||
|
implies that any relative reference will also be without an authority
|
|||
|
component.
|
|||
|
|
|||
|
3.2.1. Registry-based Naming Authority
|
|||
|
|
|||
|
The structure of a registry-based naming authority is specific to the
|
|||
|
URI scheme, but constrained to the allowed characters for an
|
|||
|
authority component.
|
|||
|
|
|||
|
reg_name = 1*( unreserved | escaped | "$" | "," |
|
|||
|
";" | ":" | "@" | "&" | "=" | "+" )
|
|||
|
|
|||
|
3.2.2. Server-based Naming Authority
|
|||
|
|
|||
|
URL schemes that involve the direct use of an IP-based protocol to a
|
|||
|
specified server on the Internet use a common syntax for the server
|
|||
|
component of the URI's scheme-specific data:
|
|||
|
|
|||
|
<userinfo>@<host>:<port>
|
|||
|
|
|||
|
where <userinfo> may consist of a user name and, optionally, scheme-
|
|||
|
specific information about how to gain authorization to access the
|
|||
|
server. The parts "<userinfo>@" and ":<port>" may be omitted.
|
|||
|
|
|||
|
server = [ [ userinfo "@" ] hostport ]
|
|||
|
|
|||
|
The user information, if present, is followed by a commercial at-sign
|
|||
|
"@".
|
|||
|
|
|||
|
userinfo = *( unreserved | escaped |
|
|||
|
";" | ":" | "&" | "=" | "+" | "$" | "," )
|
|||
|
|
|||
|
Some URL schemes use the format "user:password" in the userinfo
|
|||
|
field. This practice is NOT RECOMMENDED, because the passing of
|
|||
|
authentication information in clear text (such as URI) has proven to
|
|||
|
be a security risk in almost every case where it has been used.
|
|||
|
|
|||
|
The host is a domain name of a network host, or its IPv4 address as a
|
|||
|
set of four decimal digit groups separated by ".". Literal IPv6
|
|||
|
addresses are not supported.
|
|||
|
|
|||
|
hostport = host [ ":" port ]
|
|||
|
host = hostname | IPv4address
|
|||
|
hostname = *( domainlabel "." ) toplabel [ "." ]
|
|||
|
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
|
|||
|
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 13]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
|
|||
|
port = *digit
|
|||
|
|
|||
|
Hostnames take the form described in Section 3 of [RFC1034] and
|
|||
|
Section 2.1 of [RFC1123]: a sequence of domain labels separated by
|
|||
|
".", each domain label starting and ending with an alphanumeric
|
|||
|
character and possibly also containing "-" characters. The rightmost
|
|||
|
domain label of a fully qualified domain name will never start with a
|
|||
|
digit, thus syntactically distinguishing domain names from IPv4
|
|||
|
addresses, and may be followed by a single "." if it is necessary to
|
|||
|
distinguish between the complete domain name and any local domain.
|
|||
|
To actually be "Uniform" as a resource locator, a URL hostname should
|
|||
|
be a fully qualified domain name. In practice, however, the host
|
|||
|
component may be a local domain literal.
|
|||
|
|
|||
|
Note: A suitable representation for including a literal IPv6
|
|||
|
address as the host part of a URL is desired, but has not yet been
|
|||
|
determined or implemented in practice.
|
|||
|
|
|||
|
The port is the network port number for the server. Most schemes
|
|||
|
designate protocols that have a default port number. Another port
|
|||
|
number may optionally be supplied, in decimal, separated from the
|
|||
|
host by a colon. If the port is omitted, the default port number is
|
|||
|
assumed.
|
|||
|
|
|||
|
3.3. Path Component
|
|||
|
|
|||
|
The path component contains data, specific to the authority (or the
|
|||
|
scheme if there is no authority component), identifying the resource
|
|||
|
within the scope of that scheme and authority.
|
|||
|
|
|||
|
path = [ abs_path | opaque_part ]
|
|||
|
|
|||
|
path_segments = segment *( "/" segment )
|
|||
|
segment = *pchar *( ";" param )
|
|||
|
param = *pchar
|
|||
|
|
|||
|
pchar = unreserved | escaped |
|
|||
|
":" | "@" | "&" | "=" | "+" | "$" | ","
|
|||
|
|
|||
|
The path may consist of a sequence of path segments separated by a
|
|||
|
single slash "/" character. Within a path segment, the characters
|
|||
|
"/", ";", "=", and "?" are reserved. Each path segment may include a
|
|||
|
sequence of parameters, indicated by the semicolon ";" character.
|
|||
|
The parameters are not significant to the parsing of relative
|
|||
|
references.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 14]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
3.4. Query Component
|
|||
|
|
|||
|
The query component is a string of information to be interpreted by
|
|||
|
the resource.
|
|||
|
|
|||
|
query = *uric
|
|||
|
|
|||
|
Within a query component, the characters ";", "/", "?", ":", "@",
|
|||
|
"&", "=", "+", ",", and "$" are reserved.
|
|||
|
|
|||
|
4. URI References
|
|||
|
|
|||
|
The term "URI-reference" is used here to denote the common usage of a
|
|||
|
resource identifier. A URI reference may be absolute or relative,
|
|||
|
and may have additional information attached in the form of a
|
|||
|
fragment identifier. However, "the URI" that results from such a
|
|||
|
reference includes only the absolute URI after the fragment
|
|||
|
identifier (if any) is removed and after any relative URI is resolved
|
|||
|
to its absolute form. Although it is possible to limit the
|
|||
|
discussion of URI syntax and semantics to that of the absolute
|
|||
|
result, most usage of URI is within general URI references, and it is
|
|||
|
impossible to obtain the URI from such a reference without also
|
|||
|
parsing the fragment and resolving the relative form.
|
|||
|
|
|||
|
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
|
|||
|
|
|||
|
The syntax for relative URI is a shortened form of that for absolute
|
|||
|
URI, where some prefix of the URI is missing and certain path
|
|||
|
components ("." and "..") have a special meaning when, and only when,
|
|||
|
interpreting a relative path. The relative URI syntax is defined in
|
|||
|
Section 5.
|
|||
|
|
|||
|
4.1. Fragment Identifier
|
|||
|
|
|||
|
When a URI reference is used to perform a retrieval action on the
|
|||
|
identified resource, the optional fragment identifier, separated from
|
|||
|
the URI by a crosshatch ("#") character, consists of additional
|
|||
|
reference information to be interpreted by the user agent after the
|
|||
|
retrieval action has been successfully completed. As such, it is not
|
|||
|
part of a URI, but is often used in conjunction with a URI.
|
|||
|
|
|||
|
fragment = *uric
|
|||
|
|
|||
|
The semantics of a fragment identifier is a property of the data
|
|||
|
resulting from a retrieval action, regardless of the type of URI used
|
|||
|
in the reference. Therefore, the format and interpretation of
|
|||
|
fragment identifiers is dependent on the media type [RFC2046] of the
|
|||
|
retrieval result. The character restrictions described in Section 2
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 15]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
for URI also apply to the fragment in a URI-reference. Individual
|
|||
|
media types may define additional restrictions or structure within
|
|||
|
the fragment for specifying different types of "partial views" that
|
|||
|
can be identified within that media type.
|
|||
|
|
|||
|
A fragment identifier is only meaningful when a URI reference is
|
|||
|
intended for retrieval and the result of that retrieval is a document
|
|||
|
for which the identified fragment is consistently defined.
|
|||
|
|
|||
|
4.2. Same-document References
|
|||
|
|
|||
|
A URI reference that does not contain a URI is a reference to the
|
|||
|
current document. In other words, an empty URI reference within a
|
|||
|
document is interpreted as a reference to the start of that document,
|
|||
|
and a reference containing only a fragment identifier is a reference
|
|||
|
to the identified fragment of that document. Traversal of such a
|
|||
|
reference should not result in an additional retrieval action.
|
|||
|
However, if the URI reference occurs in a context that is always
|
|||
|
intended to result in a new request, as in the case of HTML's FORM
|
|||
|
element, then an empty URI reference represents the base URI of the
|
|||
|
current document and should be replaced by that URI when transformed
|
|||
|
into a request.
|
|||
|
|
|||
|
4.3. Parsing a URI Reference
|
|||
|
|
|||
|
A URI reference is typically parsed according to the four main
|
|||
|
components and fragment identifier in order to determine what
|
|||
|
components are present and whether the reference is relative or
|
|||
|
absolute. The individual components are then parsed for their
|
|||
|
subparts and, if not opaque, to verify their validity.
|
|||
|
|
|||
|
Although the BNF defines what is allowed in each component, it is
|
|||
|
ambiguous in terms of differentiating between an authority component
|
|||
|
and a path component that begins with two slash characters. The
|
|||
|
greedy algorithm is used for disambiguation: the left-most matching
|
|||
|
rule soaks up as much of the URI reference string as it is capable of
|
|||
|
matching. In other words, the authority component wins.
|
|||
|
|
|||
|
Readers familiar with regular expressions should see Appendix B for a
|
|||
|
concrete parsing example and test oracle.
|
|||
|
|
|||
|
5. Relative URI References
|
|||
|
|
|||
|
It is often the case that a group or "tree" of documents has been
|
|||
|
constructed to serve a common purpose; the vast majority of URI in
|
|||
|
these documents point to resources within the tree rather than
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 16]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
outside of it. Similarly, documents located at a particular site are
|
|||
|
much more likely to refer to other resources at that site than to
|
|||
|
resources at remote sites.
|
|||
|
|
|||
|
Relative addressing of URI allows document trees to be partially
|
|||
|
independent of their location and access scheme. For instance, it is
|
|||
|
possible for a single set of hypertext documents to be simultaneously
|
|||
|
accessible and traversable via each of the "file", "http", and "ftp"
|
|||
|
schemes if the documents refer to each other using relative URI.
|
|||
|
Furthermore, such document trees can be moved, as a whole, without
|
|||
|
changing any of the relative references. Experience within the WWW
|
|||
|
has demonstrated that the ability to perform relative referencing is
|
|||
|
necessary for the long-term usability of embedded URI.
|
|||
|
|
|||
|
The syntax for relative URI takes advantage of the <hier_part> syntax
|
|||
|
of <absoluteURI> (Section 3) in order to express a reference that is
|
|||
|
relative to the namespace of another hierarchical URI.
|
|||
|
|
|||
|
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
|
|||
|
|
|||
|
A relative reference beginning with two slash characters is termed a
|
|||
|
network-path reference, as defined by <net_path> in Section 3. Such
|
|||
|
references are rarely used.
|
|||
|
|
|||
|
A relative reference beginning with a single slash character is
|
|||
|
termed an absolute-path reference, as defined by <abs_path> in
|
|||
|
Section 3.
|
|||
|
|
|||
|
A relative reference that does not begin with a scheme name or a
|
|||
|
slash character is termed a relative-path reference.
|
|||
|
|
|||
|
rel_path = rel_segment [ abs_path ]
|
|||
|
|
|||
|
rel_segment = 1*( unreserved | escaped |
|
|||
|
";" | "@" | "&" | "=" | "+" | "$" | "," )
|
|||
|
|
|||
|
Within a relative-path reference, the complete path segments "." and
|
|||
|
".." have special meanings: "the current hierarchy level" and "the
|
|||
|
level above this hierarchy level", respectively. Although this is
|
|||
|
very similar to their use within Unix-based filesystems to indicate
|
|||
|
directory levels, these path components are only considered special
|
|||
|
when resolving a relative-path reference to its absolute form
|
|||
|
(Section 5.2).
|
|||
|
|
|||
|
Authors should be aware that a path segment which contains a colon
|
|||
|
character cannot be used as the first segment of a relative URI path
|
|||
|
(e.g., "this:that"), because it would be mistaken for a scheme name.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 17]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
It is therefore necessary to precede such segments with other
|
|||
|
segments (e.g., "./this:that") in order for them to be referenced as
|
|||
|
a relative path.
|
|||
|
|
|||
|
It is not necessary for all URI within a given scheme to be
|
|||
|
restricted to the <hier_part> syntax, since the hierarchical
|
|||
|
properties of that syntax are only necessary when relative URI are
|
|||
|
used within a particular document. Documents can only make use of
|
|||
|
relative URI when their base URI fits within the <hier_part> syntax.
|
|||
|
It is assumed that any document which contains a relative reference
|
|||
|
will also have a base URI that obeys the syntax. In other words,
|
|||
|
relative URI cannot be used within a document that has an unsuitable
|
|||
|
base URI.
|
|||
|
|
|||
|
Some URI schemes do not allow a hierarchical syntax matching the
|
|||
|
<hier_part> syntax, and thus cannot use relative references.
|
|||
|
|
|||
|
5.1. Establishing a Base URI
|
|||
|
|
|||
|
The term "relative URI" implies that there exists some absolute "base
|
|||
|
URI" against which the relative reference is applied. Indeed, the
|
|||
|
base URI is necessary to define the semantics of any relative URI
|
|||
|
reference; without it, a relative reference is meaningless. In order
|
|||
|
for relative URI to be usable within a document, the base URI of that
|
|||
|
document must be known to the parser.
|
|||
|
|
|||
|
The base URI of a document can be established in one of four ways,
|
|||
|
listed below in order of precedence. The order of precedence can be
|
|||
|
thought of in terms of layers, where the innermost defined base URI
|
|||
|
has the highest precedence. This can be visualized graphically as:
|
|||
|
|
|||
|
.----------------------------------------------------------.
|
|||
|
| .----------------------------------------------------. |
|
|||
|
| | .----------------------------------------------. | |
|
|||
|
| | | .----------------------------------------. | | |
|
|||
|
| | | | .----------------------------------. | | | |
|
|||
|
| | | | | <relative_reference> | | | | |
|
|||
|
| | | | `----------------------------------' | | | |
|
|||
|
| | | | (5.1.1) Base URI embedded in the | | | |
|
|||
|
| | | | document's content | | | |
|
|||
|
| | | `----------------------------------------' | | |
|
|||
|
| | | (5.1.2) Base URI of the encapsulating entity | | |
|
|||
|
| | | (message, document, or none). | | |
|
|||
|
| | `----------------------------------------------' | |
|
|||
|
| | (5.1.3) URI used to retrieve the entity | |
|
|||
|
| `----------------------------------------------------' |
|
|||
|
| (5.1.4) Default Base URI is application-dependent |
|
|||
|
`----------------------------------------------------------'
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 18]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
5.1.1. Base URI within Document Content
|
|||
|
|
|||
|
Within certain document media types, the base URI of the document can
|
|||
|
be embedded within the content itself such that it can be readily
|
|||
|
obtained by a parser. This can be useful for descriptive documents,
|
|||
|
such as tables of content, which may be transmitted to others through
|
|||
|
protocols other than their usual retrieval context (e.g., E-Mail or
|
|||
|
USENET news).
|
|||
|
|
|||
|
It is beyond the scope of this document to specify how, for each
|
|||
|
media type, the base URI can be embedded. It is assumed that user
|
|||
|
agents manipulating such media types will be able to obtain the
|
|||
|
appropriate syntax from that media type's specification. An example
|
|||
|
of how the base URI can be embedded in the Hypertext Markup Language
|
|||
|
(HTML) [RFC1866] is provided in Appendix D.
|
|||
|
|
|||
|
A mechanism for embedding the base URI within MIME container types
|
|||
|
(e.g., the message and multipart types) is defined by MHTML
|
|||
|
[RFC2110]. Protocols that do not use the MIME message header syntax,
|
|||
|
but which do allow some form of tagged metainformation to be included
|
|||
|
within messages, may define their own syntax for defining the base
|
|||
|
URI as part of a message.
|
|||
|
|
|||
|
5.1.2. Base URI from the Encapsulating Entity
|
|||
|
|
|||
|
If no base URI is embedded, the base URI of a document is defined by
|
|||
|
the document's retrieval context. For a document that is enclosed
|
|||
|
within another entity (such as a message or another document), the
|
|||
|
retrieval context is that entity; thus, the default base URI of the
|
|||
|
document is the base URI of the entity in which the document is
|
|||
|
encapsulated.
|
|||
|
|
|||
|
5.1.3. Base URI from the Retrieval URI
|
|||
|
|
|||
|
If no base URI is embedded and the document is not encapsulated
|
|||
|
within some other entity (e.g., the top level of a composite entity),
|
|||
|
then, if a URI was used to retrieve the base document, that URI shall
|
|||
|
be considered the base URI. Note that if the retrieval was the
|
|||
|
result of a redirected request, the last URI used (i.e., that which
|
|||
|
resulted in the actual retrieval of the document) is the base URI.
|
|||
|
|
|||
|
5.1.4. Default Base URI
|
|||
|
|
|||
|
If none of the conditions described in Sections 5.1.1--5.1.3 apply,
|
|||
|
then the base URI is defined by the context of the application.
|
|||
|
Since this definition is necessarily application-dependent, failing
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 19]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
to define the base URI using one of the other methods may result in
|
|||
|
the same content being interpreted differently by different types of
|
|||
|
application.
|
|||
|
|
|||
|
It is the responsibility of the distributor(s) of a document
|
|||
|
containing relative URI to ensure that the base URI for that document
|
|||
|
can be established. It must be emphasized that relative URI cannot
|
|||
|
be used reliably in situations where the document's base URI is not
|
|||
|
well-defined.
|
|||
|
|
|||
|
5.2. Resolving Relative References to Absolute Form
|
|||
|
|
|||
|
This section describes an example algorithm for resolving URI
|
|||
|
references that might be relative to a given base URI.
|
|||
|
|
|||
|
The base URI is established according to the rules of Section 5.1 and
|
|||
|
parsed into the four main components as described in Section 3. Note
|
|||
|
that only the scheme component is required to be present in the base
|
|||
|
URI; the other components may be empty or undefined. A component is
|
|||
|
undefined if its preceding separator does not appear in the URI
|
|||
|
reference; the path component is never undefined, though it may be
|
|||
|
empty. The base URI's query component is not used by the resolution
|
|||
|
algorithm and may be discarded.
|
|||
|
|
|||
|
For each URI reference, the following steps are performed in order:
|
|||
|
|
|||
|
1) The URI reference is parsed into the potential four components and
|
|||
|
fragment identifier, as described in Section 4.3.
|
|||
|
|
|||
|
2) If the path component is empty and the scheme, authority, and
|
|||
|
query components are undefined, then it is a reference to the
|
|||
|
current document and we are done. Otherwise, the reference URI's
|
|||
|
query and fragment components are defined as found (or not found)
|
|||
|
within the URI reference and not inherited from the base URI.
|
|||
|
|
|||
|
3) If the scheme component is defined, indicating that the reference
|
|||
|
starts with a scheme name, then the reference is interpreted as an
|
|||
|
absolute URI and we are done. Otherwise, the reference URI's
|
|||
|
scheme is inherited from the base URI's scheme component.
|
|||
|
|
|||
|
Due to a loophole in prior specifications [RFC1630], some parsers
|
|||
|
allow the scheme name to be present in a relative URI if it is the
|
|||
|
same as the base URI scheme. Unfortunately, this can conflict
|
|||
|
with the correct parsing of non-hierarchical URI. For backwards
|
|||
|
compatibility, an implementation may work around such references
|
|||
|
by removing the scheme if it matches that of the base URI and the
|
|||
|
scheme is known to always use the <hier_part> syntax. The parser
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 20]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
can then continue with the steps below for the remainder of the
|
|||
|
reference components. Validating parsers should mark such a
|
|||
|
misformed relative reference as an error.
|
|||
|
|
|||
|
4) If the authority component is defined, then the reference is a
|
|||
|
network-path and we skip to step 7. Otherwise, the reference
|
|||
|
URI's authority is inherited from the base URI's authority
|
|||
|
component, which will also be undefined if the URI scheme does not
|
|||
|
use an authority component.
|
|||
|
|
|||
|
5) If the path component begins with a slash character ("/"), then
|
|||
|
the reference is an absolute-path and we skip to step 7.
|
|||
|
|
|||
|
6) If this step is reached, then we are resolving a relative-path
|
|||
|
reference. The relative path needs to be merged with the base
|
|||
|
URI's path. Although there are many ways to do this, we will
|
|||
|
describe a simple method using a separate string buffer.
|
|||
|
|
|||
|
a) All but the last segment of the base URI's path component is
|
|||
|
copied to the buffer. In other words, any characters after the
|
|||
|
last (right-most) slash character, if any, are excluded.
|
|||
|
|
|||
|
b) The reference's path component is appended to the buffer
|
|||
|
string.
|
|||
|
|
|||
|
c) All occurrences of "./", where "." is a complete path segment,
|
|||
|
are removed from the buffer string.
|
|||
|
|
|||
|
d) If the buffer string ends with "." as a complete path segment,
|
|||
|
that "." is removed.
|
|||
|
|
|||
|
e) All occurrences of "<segment>/../", where <segment> is a
|
|||
|
complete path segment not equal to "..", are removed from the
|
|||
|
buffer string. Removal of these path segments is performed
|
|||
|
iteratively, removing the leftmost matching pattern on each
|
|||
|
iteration, until no matching pattern remains.
|
|||
|
|
|||
|
f) If the buffer string ends with "<segment>/..", where <segment>
|
|||
|
is a complete path segment not equal to "..", that
|
|||
|
"<segment>/.." is removed.
|
|||
|
|
|||
|
g) If the resulting buffer string still begins with one or more
|
|||
|
complete path segments of "..", then the reference is
|
|||
|
considered to be in error. Implementations may handle this
|
|||
|
error by retaining these components in the resolved path (i.e.,
|
|||
|
treating them as part of the final URI), by removing them from
|
|||
|
the resolved path (i.e., discarding relative levels above the
|
|||
|
root), or by avoiding traversal of the reference.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 21]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
h) The remaining buffer string is the reference URI's new path
|
|||
|
component.
|
|||
|
|
|||
|
7) The resulting URI components, including any inherited from the
|
|||
|
base URI, are recombined to give the absolute form of the URI
|
|||
|
reference. Using pseudocode, this would be
|
|||
|
|
|||
|
result = ""
|
|||
|
|
|||
|
if scheme is defined then
|
|||
|
append scheme to result
|
|||
|
append ":" to result
|
|||
|
|
|||
|
if authority is defined then
|
|||
|
append "//" to result
|
|||
|
append authority to result
|
|||
|
|
|||
|
append path to result
|
|||
|
|
|||
|
if query is defined then
|
|||
|
append "?" to result
|
|||
|
append query to result
|
|||
|
|
|||
|
if fragment is defined then
|
|||
|
append "#" to result
|
|||
|
append fragment to result
|
|||
|
|
|||
|
return result
|
|||
|
|
|||
|
Note that we must be careful to preserve the distinction between a
|
|||
|
component that is undefined, meaning that its separator was not
|
|||
|
present in the reference, and a component that is empty, meaning
|
|||
|
that the separator was present and was immediately followed by the
|
|||
|
next component separator or the end of the reference.
|
|||
|
|
|||
|
The above algorithm is intended to provide an example by which the
|
|||
|
output of implementations can be tested -- implementation of the
|
|||
|
algorithm itself is not required. For example, some systems may find
|
|||
|
it more efficient to implement step 6 as a pair of segment stacks
|
|||
|
being merged, rather than as a series of string pattern replacements.
|
|||
|
|
|||
|
Note: Some WWW client applications will fail to separate the
|
|||
|
reference's query component from its path component before merging
|
|||
|
the base and reference paths in step 6 above. This may result in
|
|||
|
a loss of information if the query component contains the strings
|
|||
|
"/../" or "/./".
|
|||
|
|
|||
|
Resolution examples are provided in Appendix C.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 22]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
6. URI Normalization and Equivalence
|
|||
|
|
|||
|
In many cases, different URI strings may actually identify the
|
|||
|
identical resource. For example, the host names used in URL are
|
|||
|
actually case insensitive, and the URL <http://www.XEROX.com> is
|
|||
|
equivalent to <http://www.xerox.com>. In general, the rules for
|
|||
|
equivalence and definition of a normal form, if any, are scheme
|
|||
|
dependent. When a scheme uses elements of the common syntax, it will
|
|||
|
also use the common syntax equivalence rules, namely that the scheme
|
|||
|
and hostname are case insensitive and a URL with an explicit ":port",
|
|||
|
where the port is the default for the scheme, is equivalent to one
|
|||
|
where the port is elided.
|
|||
|
|
|||
|
7. Security Considerations
|
|||
|
|
|||
|
A URI does not in itself pose a security threat. Users should beware
|
|||
|
that there is no general guarantee that a URL, which at one time
|
|||
|
located a given resource, will continue to do so. Nor is there any
|
|||
|
guarantee that a URL will not locate a different resource at some
|
|||
|
later point in time, due to the lack of any constraint on how a given
|
|||
|
authority apportions its namespace. Such a guarantee can only be
|
|||
|
obtained from the person(s) controlling that namespace and the
|
|||
|
resource in question. A specific URI scheme may include additional
|
|||
|
semantics, such as name persistence, if those semantics are required
|
|||
|
of all naming authorities for that scheme.
|
|||
|
|
|||
|
It is sometimes possible to construct a URL such that an attempt to
|
|||
|
perform a seemingly harmless, idempotent operation, such as the
|
|||
|
retrieval of an entity associated with the resource, will in fact
|
|||
|
cause a possibly damaging remote operation to occur. The unsafe URL
|
|||
|
is typically constructed by specifying a port number other than that
|
|||
|
reserved for the network protocol in question. The client
|
|||
|
unwittingly contacts a site that is in fact running a different
|
|||
|
protocol. The content of the URL contains instructions that, when
|
|||
|
interpreted according to this other protocol, cause an unexpected
|
|||
|
operation. An example has been the use of a gopher URL to cause an
|
|||
|
unintended or impersonating message to be sent via a SMTP server.
|
|||
|
|
|||
|
Caution should be used when using any URL that specifies a port
|
|||
|
number other than the default for the protocol, especially when it is
|
|||
|
a number within the reserved space.
|
|||
|
|
|||
|
Care should be taken when a URL contains escaped delimiters for a
|
|||
|
given protocol (for example, CR and LF characters for telnet
|
|||
|
protocols) that these are not unescaped before transmission. This
|
|||
|
might violate the protocol, but avoids the potential for such
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 23]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
characters to be used to simulate an extra operation or parameter in
|
|||
|
that protocol, which might lead to an unexpected and possibly harmful
|
|||
|
remote operation to be performed.
|
|||
|
|
|||
|
It is clearly unwise to use a URL that contains a password which is
|
|||
|
intended to be secret. In particular, the use of a password within
|
|||
|
the 'userinfo' component of a URL is strongly disrecommended except
|
|||
|
in those rare cases where the 'password' parameter is intended to be
|
|||
|
public.
|
|||
|
|
|||
|
8. Acknowledgements
|
|||
|
|
|||
|
This document was derived from RFC 1738 [RFC1738] and RFC 1808
|
|||
|
[RFC1808]; the acknowledgements in those specifications still apply.
|
|||
|
In addition, contributions by Gisle Aas, Martin Beet, Martin Duerst,
|
|||
|
Jim Gettys, Martijn Koster, Dave Kristol, Daniel LaLiberte, Foteos
|
|||
|
Macrides, James Marshall, Ryan Moats, Keith Moore, and Lauren Wood
|
|||
|
are gratefully acknowledged.
|
|||
|
|
|||
|
9. References
|
|||
|
|
|||
|
[RFC2277] Alvestrand, H., "IETF Policy on Character Sets and
|
|||
|
Languages", BCP 18, RFC 2277, January 1998.
|
|||
|
|
|||
|
[RFC1630] Berners-Lee, T., "Universal Resource Identifiers in WWW: A
|
|||
|
Unifying Syntax for the Expression of Names and Addresses
|
|||
|
of Objects on the Network as used in the World-Wide Web",
|
|||
|
RFC 1630, June 1994.
|
|||
|
|
|||
|
[RFC1738] Berners-Lee, T., Masinter, L., and M. McCahill, Editors,
|
|||
|
"Uniform Resource Locators (URL)", RFC 1738, December 1994.
|
|||
|
|
|||
|
[RFC1866] Berners-Lee T., and D. Connolly, "HyperText Markup Language
|
|||
|
Specification -- 2.0", RFC 1866, November 1995.
|
|||
|
|
|||
|
[RFC1123] Braden, R., Editor, "Requirements for Internet Hosts --
|
|||
|
Application and Support", STD 3, RFC 1123, October 1989.
|
|||
|
|
|||
|
[RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text
|
|||
|
Messages", STD 11, RFC 822, August 1982.
|
|||
|
|
|||
|
[RFC1808] Fielding, R., "Relative Uniform Resource Locators", RFC
|
|||
|
1808, June 1995.
|
|||
|
|
|||
|
[RFC2046] Freed, N., and N. Borenstein, "Multipurpose Internet Mail
|
|||
|
Extensions (MIME) Part Two: Media Types", RFC 2046,
|
|||
|
November 1996.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 24]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
[RFC1736] Kunze, J., "Functional Recommendations for Internet
|
|||
|
Resource Locators", RFC 1736, February 1995.
|
|||
|
|
|||
|
[RFC2141] Moats, R., "URN Syntax", RFC 2141, May 1997.
|
|||
|
|
|||
|
[RFC1034] Mockapetris, P., "Domain Names - Concepts and Facilities",
|
|||
|
STD 13, RFC 1034, November 1987.
|
|||
|
|
|||
|
[RFC2110] Palme, J., and A. Hopmann, "MIME E-mail Encapsulation of
|
|||
|
Aggregate Documents, such as HTML (MHTML)", RFC 2110, March
|
|||
|
1997.
|
|||
|
|
|||
|
[RFC1737] Sollins, K., and L. Masinter, "Functional Requirements for
|
|||
|
Uniform Resource Names", RFC 1737, December 1994.
|
|||
|
|
|||
|
[ASCII] US-ASCII. "Coded Character Set -- 7-bit American Standard
|
|||
|
Code for Information Interchange", ANSI X3.4-1986.
|
|||
|
|
|||
|
[UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
|
|||
|
RFC 2279, January 1998.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 25]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
10. Authors' Addresses
|
|||
|
|
|||
|
Tim Berners-Lee
|
|||
|
World Wide Web Consortium
|
|||
|
MIT Laboratory for Computer Science, NE43-356
|
|||
|
545 Technology Square
|
|||
|
Cambridge, MA 02139
|
|||
|
|
|||
|
Fax: +1(617)258-8682
|
|||
|
EMail: timbl@w3.org
|
|||
|
|
|||
|
|
|||
|
Roy T. Fielding
|
|||
|
Department of Information and Computer Science
|
|||
|
University of California, Irvine
|
|||
|
Irvine, CA 92697-3425
|
|||
|
|
|||
|
Fax: +1(949)824-1715
|
|||
|
EMail: fielding@ics.uci.edu
|
|||
|
|
|||
|
|
|||
|
Larry Masinter
|
|||
|
Xerox PARC
|
|||
|
3333 Coyote Hill Road
|
|||
|
Palo Alto, CA 94034
|
|||
|
|
|||
|
Fax: +1(415)812-4333
|
|||
|
EMail: masinter@parc.xerox.com
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 26]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
A. Collected BNF for URI
|
|||
|
|
|||
|
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
|
|||
|
absoluteURI = scheme ":" ( hier_part | opaque_part )
|
|||
|
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
|
|||
|
|
|||
|
hier_part = ( net_path | abs_path ) [ "?" query ]
|
|||
|
opaque_part = uric_no_slash *uric
|
|||
|
|
|||
|
uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
|
|||
|
"&" | "=" | "+" | "$" | ","
|
|||
|
|
|||
|
net_path = "//" authority [ abs_path ]
|
|||
|
abs_path = "/" path_segments
|
|||
|
rel_path = rel_segment [ abs_path ]
|
|||
|
|
|||
|
rel_segment = 1*( unreserved | escaped |
|
|||
|
";" | "@" | "&" | "=" | "+" | "$" | "," )
|
|||
|
|
|||
|
scheme = alpha *( alpha | digit | "+" | "-" | "." )
|
|||
|
|
|||
|
authority = server | reg_name
|
|||
|
|
|||
|
reg_name = 1*( unreserved | escaped | "$" | "," |
|
|||
|
";" | ":" | "@" | "&" | "=" | "+" )
|
|||
|
|
|||
|
server = [ [ userinfo "@" ] hostport ]
|
|||
|
userinfo = *( unreserved | escaped |
|
|||
|
";" | ":" | "&" | "=" | "+" | "$" | "," )
|
|||
|
|
|||
|
hostport = host [ ":" port ]
|
|||
|
host = hostname | IPv4address
|
|||
|
hostname = *( domainlabel "." ) toplabel [ "." ]
|
|||
|
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
|
|||
|
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
|
|||
|
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
|
|||
|
port = *digit
|
|||
|
|
|||
|
path = [ abs_path | opaque_part ]
|
|||
|
path_segments = segment *( "/" segment )
|
|||
|
segment = *pchar *( ";" param )
|
|||
|
param = *pchar
|
|||
|
pchar = unreserved | escaped |
|
|||
|
":" | "@" | "&" | "=" | "+" | "$" | ","
|
|||
|
|
|||
|
query = *uric
|
|||
|
|
|||
|
fragment = *uric
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 27]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
uric = reserved | unreserved | escaped
|
|||
|
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
|
|||
|
"$" | ","
|
|||
|
unreserved = alphanum | mark
|
|||
|
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
|
|||
|
"(" | ")"
|
|||
|
|
|||
|
escaped = "%" hex hex
|
|||
|
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
|
|||
|
"a" | "b" | "c" | "d" | "e" | "f"
|
|||
|
|
|||
|
alphanum = alpha | digit
|
|||
|
alpha = lowalpha | upalpha
|
|||
|
|
|||
|
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
|
|||
|
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
|
|||
|
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
|
|||
|
upalpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
|
|||
|
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
|
|||
|
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
|
|||
|
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
|
|||
|
"8" | "9"
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 28]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
B. Parsing a URI Reference with a Regular Expression
|
|||
|
|
|||
|
As described in Section 4.3, the generic URI syntax is not sufficient
|
|||
|
to disambiguate the components of some forms of URI. Since the
|
|||
|
"greedy algorithm" described in that section is identical to the
|
|||
|
disambiguation method used by POSIX regular expressions, it is
|
|||
|
natural and commonplace to use a regular expression for parsing the
|
|||
|
potential four components and fragment identifier of a URI reference.
|
|||
|
|
|||
|
The following line is the regular expression for breaking-down a URI
|
|||
|
reference into its components.
|
|||
|
|
|||
|
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
|
|||
|
12 3 4 5 6 7 8 9
|
|||
|
|
|||
|
The numbers in the second line above are only to assist readability;
|
|||
|
they indicate the reference points for each subexpression (i.e., each
|
|||
|
paired parenthesis). We refer to the value matched for subexpression
|
|||
|
<n> as $<n>. For example, matching the above expression to
|
|||
|
|
|||
|
http://www.ics.uci.edu/pub/ietf/uri/#Related
|
|||
|
|
|||
|
results in the following subexpression matches:
|
|||
|
|
|||
|
$1 = http:
|
|||
|
$2 = http
|
|||
|
$3 = //www.ics.uci.edu
|
|||
|
$4 = www.ics.uci.edu
|
|||
|
$5 = /pub/ietf/uri/
|
|||
|
$6 = <undefined>
|
|||
|
$7 = <undefined>
|
|||
|
$8 = #Related
|
|||
|
$9 = Related
|
|||
|
|
|||
|
where <undefined> indicates that the component is not present, as is
|
|||
|
the case for the query component in the above example. Therefore, we
|
|||
|
can determine the value of the four components and fragment as
|
|||
|
|
|||
|
scheme = $2
|
|||
|
authority = $4
|
|||
|
path = $5
|
|||
|
query = $7
|
|||
|
fragment = $9
|
|||
|
|
|||
|
and, going in the opposite direction, we can recreate a URI reference
|
|||
|
from its components using the algorithm in step 7 of Section 5.2.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 29]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
C. Examples of Resolving Relative URI References
|
|||
|
|
|||
|
Within an object with a well-defined base URI of
|
|||
|
|
|||
|
http://a/b/c/d;p?q
|
|||
|
|
|||
|
the relative URI would be resolved as follows:
|
|||
|
|
|||
|
C.1. Normal Examples
|
|||
|
|
|||
|
g:h = g:h
|
|||
|
g = http://a/b/c/g
|
|||
|
./g = http://a/b/c/g
|
|||
|
g/ = http://a/b/c/g/
|
|||
|
/g = http://a/g
|
|||
|
//g = http://g
|
|||
|
?y = http://a/b/c/?y
|
|||
|
g?y = http://a/b/c/g?y
|
|||
|
#s = (current document)#s
|
|||
|
g#s = http://a/b/c/g#s
|
|||
|
g?y#s = http://a/b/c/g?y#s
|
|||
|
;x = http://a/b/c/;x
|
|||
|
g;x = http://a/b/c/g;x
|
|||
|
g;x?y#s = http://a/b/c/g;x?y#s
|
|||
|
. = http://a/b/c/
|
|||
|
./ = http://a/b/c/
|
|||
|
.. = http://a/b/
|
|||
|
../ = http://a/b/
|
|||
|
../g = http://a/b/g
|
|||
|
../.. = http://a/
|
|||
|
../../ = http://a/
|
|||
|
../../g = http://a/g
|
|||
|
|
|||
|
C.2. Abnormal Examples
|
|||
|
|
|||
|
Although the following abnormal examples are unlikely to occur in
|
|||
|
normal practice, all URI parsers should be capable of resolving them
|
|||
|
consistently. Each example uses the same base as above.
|
|||
|
|
|||
|
An empty reference refers to the start of the current document.
|
|||
|
|
|||
|
<> = (current document)
|
|||
|
|
|||
|
Parsers must be careful in handling the case where there are more
|
|||
|
relative path ".." segments than there are hierarchical levels in the
|
|||
|
base URI's path. Note that the ".." syntax cannot be used to change
|
|||
|
the authority component of a URI.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 30]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
../../../g = http://a/../g
|
|||
|
../../../../g = http://a/../../g
|
|||
|
|
|||
|
In practice, some implementations strip leading relative symbolic
|
|||
|
elements (".", "..") after applying a relative URI calculation, based
|
|||
|
on the theory that compensating for obvious author errors is better
|
|||
|
than allowing the request to fail. Thus, the above two references
|
|||
|
will be interpreted as "http://a/g" by some implementations.
|
|||
|
|
|||
|
Similarly, parsers must avoid treating "." and ".." as special when
|
|||
|
they are not complete components of a relative path.
|
|||
|
|
|||
|
/./g = http://a/./g
|
|||
|
/../g = http://a/../g
|
|||
|
g. = http://a/b/c/g.
|
|||
|
.g = http://a/b/c/.g
|
|||
|
g.. = http://a/b/c/g..
|
|||
|
..g = http://a/b/c/..g
|
|||
|
|
|||
|
Less likely are cases where the relative URI uses unnecessary or
|
|||
|
nonsensical forms of the "." and ".." complete path segments.
|
|||
|
|
|||
|
./../g = http://a/b/g
|
|||
|
./g/. = http://a/b/c/g/
|
|||
|
g/./h = http://a/b/c/g/h
|
|||
|
g/../h = http://a/b/c/h
|
|||
|
g;x=1/./y = http://a/b/c/g;x=1/y
|
|||
|
g;x=1/../y = http://a/b/c/y
|
|||
|
|
|||
|
All client applications remove the query component from the base URI
|
|||
|
before resolving relative URI. However, some applications fail to
|
|||
|
separate the reference's query and/or fragment components from a
|
|||
|
relative path before merging it with the base path. This error is
|
|||
|
rarely noticed, since typical usage of a fragment never includes the
|
|||
|
hierarchy ("/") character, and the query component is not normally
|
|||
|
used within relative references.
|
|||
|
|
|||
|
g?y/./x = http://a/b/c/g?y/./x
|
|||
|
g?y/../x = http://a/b/c/g?y/../x
|
|||
|
g#s/./x = http://a/b/c/g#s/./x
|
|||
|
g#s/../x = http://a/b/c/g#s/../x
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 31]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
Some parsers allow the scheme name to be present in a relative URI if
|
|||
|
it is the same as the base URI scheme. This is considered to be a
|
|||
|
loophole in prior specifications of partial URI [RFC1630]. Its use
|
|||
|
should be avoided.
|
|||
|
|
|||
|
http:g = http:g ; for validating parsers
|
|||
|
| http://a/b/c/g ; for backwards compatibility
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 32]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
D. Embedding the Base URI in HTML documents
|
|||
|
|
|||
|
It is useful to consider an example of how the base URI of a document
|
|||
|
can be embedded within the document's content. In this appendix, we
|
|||
|
describe how documents written in the Hypertext Markup Language
|
|||
|
(HTML) [RFC1866] can include an embedded base URI. This appendix
|
|||
|
does not form a part of the URI specification and should not be
|
|||
|
considered as anything more than a descriptive example.
|
|||
|
|
|||
|
HTML defines a special element "BASE" which, when present in the
|
|||
|
"HEAD" portion of a document, signals that the parser should use the
|
|||
|
BASE element's "HREF" attribute as the base URI for resolving any
|
|||
|
relative URI. The "HREF" attribute must be an absolute URI. Note
|
|||
|
that, in HTML, element and attribute names are case-insensitive. For
|
|||
|
example:
|
|||
|
|
|||
|
<!doctype html public "-//IETF//DTD HTML//EN">
|
|||
|
<HTML><HEAD>
|
|||
|
<TITLE>An example HTML document</TITLE>
|
|||
|
<BASE href="http://www.ics.uci.edu/Test/a/b/c">
|
|||
|
</HEAD><BODY>
|
|||
|
... <A href="../x">a hypertext anchor</A> ...
|
|||
|
</BODY></HTML>
|
|||
|
|
|||
|
A parser reading the example document should interpret the given
|
|||
|
relative URI "../x" as representing the absolute URI
|
|||
|
|
|||
|
<http://www.ics.uci.edu/Test/a/x>
|
|||
|
|
|||
|
regardless of the context in which the example document was obtained.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 33]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
E. Recommendations for Delimiting URI in Context
|
|||
|
|
|||
|
URI are often transmitted through formats that do not provide a clear
|
|||
|
context for their interpretation. For example, there are many
|
|||
|
occasions when URI are included in plain text; examples include text
|
|||
|
sent in electronic mail, USENET news messages, and, most importantly,
|
|||
|
printed on paper. In such cases, it is important to be able to
|
|||
|
delimit the URI from the rest of the text, and in particular from
|
|||
|
punctuation marks that might be mistaken for part of the URI.
|
|||
|
|
|||
|
In practice, URI are delimited in a variety of ways, but usually
|
|||
|
within double-quotes "http://test.com/", angle brackets
|
|||
|
<http://test.com/>, or just using whitespace
|
|||
|
|
|||
|
http://test.com/
|
|||
|
|
|||
|
These wrappers do not form part of the URI.
|
|||
|
|
|||
|
In the case where a fragment identifier is associated with a URI
|
|||
|
reference, the fragment would be placed within the brackets as well
|
|||
|
(separated from the URI with a "#" character).
|
|||
|
|
|||
|
In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may
|
|||
|
need to be added to break long URI across lines. The whitespace
|
|||
|
should be ignored when extracting the URI.
|
|||
|
|
|||
|
No whitespace should be introduced after a hyphen ("-") character.
|
|||
|
Because some typesetters and printers may (erroneously) introduce a
|
|||
|
hyphen at the end of line when breaking a line, the interpreter of a
|
|||
|
URI containing a line break immediately after a hyphen should ignore
|
|||
|
all unescaped whitespace around the line break, and should be aware
|
|||
|
that the hyphen may or may not actually be part of the URI.
|
|||
|
|
|||
|
Using <> angle brackets around each URI is especially recommended as
|
|||
|
a delimiting style for URI that contain whitespace.
|
|||
|
|
|||
|
The prefix "URL:" (with or without a trailing space) was recommended
|
|||
|
as a way to used to help distinguish a URL from other bracketed
|
|||
|
designators, although this is not common in practice.
|
|||
|
|
|||
|
For robustness, software that accepts user-typed URI should attempt
|
|||
|
to recognize and strip both delimiters and embedded whitespace.
|
|||
|
|
|||
|
For example, the text:
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 34]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
Yes, Jim, I found it under "http://www.w3.org/Addressing/",
|
|||
|
but you can probably pick it up from <ftp://ds.internic.
|
|||
|
net/rfc/>. Note the warning in <http://www.ics.uci.edu/pub/
|
|||
|
ietf/uri/historical.html#WARNING>.
|
|||
|
|
|||
|
contains the URI references
|
|||
|
|
|||
|
http://www.w3.org/Addressing/
|
|||
|
ftp://ds.internic.net/rfc/
|
|||
|
http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 35]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
F. Abbreviated URLs
|
|||
|
|
|||
|
The URL syntax was designed for unambiguous reference to network
|
|||
|
resources and extensibility via the URL scheme. However, as URL
|
|||
|
identification and usage have become commonplace, traditional media
|
|||
|
(television, radio, newspapers, billboards, etc.) have increasingly
|
|||
|
used abbreviated URL references. That is, a reference consisting of
|
|||
|
only the authority and path portions of the identified resource, such
|
|||
|
as
|
|||
|
|
|||
|
www.w3.org/Addressing/
|
|||
|
|
|||
|
or simply the DNS hostname on its own. Such references are primarily
|
|||
|
intended for human interpretation rather than machine, with the
|
|||
|
assumption that context-based heuristics are sufficient to complete
|
|||
|
the URL (e.g., most hostnames beginning with "www" are likely to have
|
|||
|
a URL prefix of "http://"). Although there is no standard set of
|
|||
|
heuristics for disambiguating abbreviated URL references, many client
|
|||
|
implementations allow them to be entered by the user and
|
|||
|
heuristically resolved. It should be noted that such heuristics may
|
|||
|
change over time, particularly when new URL schemes are introduced.
|
|||
|
|
|||
|
Since an abbreviated URL has the same syntax as a relative URL path,
|
|||
|
abbreviated URL references cannot be used in contexts where relative
|
|||
|
URLs are expected. This limits the use of abbreviated URLs to places
|
|||
|
where there is no defined base URL, such as dialog boxes and off-line
|
|||
|
advertisements.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 36]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
G. Summary of Non-editorial Changes
|
|||
|
|
|||
|
G.1. Additions
|
|||
|
|
|||
|
Section 4 (URI References) was added to stem the confusion regarding
|
|||
|
"what is a URI" and how to describe fragment identifiers given that
|
|||
|
they are not part of the URI, but are part of the URI syntax and
|
|||
|
parsing concerns. In addition, it provides a reference definition
|
|||
|
for use by other IETF specifications (HTML, HTTP, etc.) that have
|
|||
|
previously attempted to redefine the URI syntax in order to account
|
|||
|
for the presence of fragment identifiers in URI references.
|
|||
|
|
|||
|
Section 2.4 was rewritten to clarify a number of misinterpretations
|
|||
|
and to leave room for fully internationalized URI.
|
|||
|
|
|||
|
Appendix F on abbreviated URLs was added to describe the shortened
|
|||
|
references often seen on television and magazine advertisements and
|
|||
|
explain why they are not used in other contexts.
|
|||
|
|
|||
|
G.2. Modifications from both RFC 1738 and RFC 1808
|
|||
|
|
|||
|
Changed to URI syntax instead of just URL.
|
|||
|
|
|||
|
Confusion regarding the terms "character encoding", the URI
|
|||
|
"character set", and the escaping of characters with %<hex><hex>
|
|||
|
equivalents has (hopefully) been reduced. Many of the BNF rule names
|
|||
|
regarding the character sets have been changed to more accurately
|
|||
|
describe their purpose and to encompass all "characters" rather than
|
|||
|
just US-ASCII octets. Unless otherwise noted here, these
|
|||
|
modifications do not affect the URI syntax.
|
|||
|
|
|||
|
Both RFC 1738 and RFC 1808 refer to the "reserved" set of characters
|
|||
|
as if URI-interpreting software were limited to a single set of
|
|||
|
characters with a reserved purpose (i.e., as meaning something other
|
|||
|
than the data to which the characters correspond), and that this set
|
|||
|
was fixed by the URI scheme. However, this has not been true in
|
|||
|
practice; any character that is interpreted differently when it is
|
|||
|
escaped is, in effect, reserved. Furthermore, the interpreting
|
|||
|
engine on a HTTP server is often dependent on the resource, not just
|
|||
|
the URI scheme. The description of reserved characters has been
|
|||
|
changed accordingly.
|
|||
|
|
|||
|
The plus "+", dollar "$", and comma "," characters have been added to
|
|||
|
those in the "reserved" set, since they are treated as reserved
|
|||
|
within the query component.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 37]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
The tilde "~" character was added to those in the "unreserved" set,
|
|||
|
since it is extensively used on the Internet in spite of the
|
|||
|
difficulty to transcribe it with some keyboards.
|
|||
|
|
|||
|
The syntax for URI scheme has been changed to require that all
|
|||
|
schemes begin with an alpha character.
|
|||
|
|
|||
|
The "user:password" form in the previous BNF was changed to a
|
|||
|
"userinfo" token, and the possibility that it might be
|
|||
|
"user:password" made scheme specific. In particular, the use of
|
|||
|
passwords in the clear is not even suggested by the syntax.
|
|||
|
|
|||
|
The question-mark "?" character was removed from the set of allowed
|
|||
|
characters for the userinfo in the authority component, since testing
|
|||
|
showed that many applications treat it as reserved for separating the
|
|||
|
query component from the rest of the URI.
|
|||
|
|
|||
|
The semicolon ";" character was added to those stated as being
|
|||
|
reserved within the authority component, since several new schemes
|
|||
|
are using it as a separator within userinfo to indicate the type of
|
|||
|
user authentication.
|
|||
|
|
|||
|
RFC 1738 specified that the path was separated from the authority
|
|||
|
portion of a URI by a slash. RFC 1808 followed suit, but with a
|
|||
|
fudge of carrying around the separator as a "prefix" in order to
|
|||
|
describe the parsing algorithm. RFC 1630 never had this problem,
|
|||
|
since it considered the slash to be part of the path. In writing
|
|||
|
this specification, it was found to be impossible to accurately
|
|||
|
describe and retain the difference between the two URI
|
|||
|
<foo:/bar> and <foo:bar>
|
|||
|
without either considering the slash to be part of the path (as
|
|||
|
corresponds to actual practice) or creating a separate component just
|
|||
|
to hold that slash. We chose the former.
|
|||
|
|
|||
|
G.3. Modifications from RFC 1738
|
|||
|
|
|||
|
The definition of specific URL schemes and their scheme-specific
|
|||
|
syntax and semantics has been moved to separate documents.
|
|||
|
|
|||
|
The URL host was defined as a fully-qualified domain name. However,
|
|||
|
many URLs are used without fully-qualified domain names (in contexts
|
|||
|
for which the full qualification is not necessary), without any host
|
|||
|
(as in some file URLs), or with a host of "localhost".
|
|||
|
|
|||
|
The URL port is now *digit instead of 1*digit, since systems are
|
|||
|
expected to handle the case where the ":" separator between host and
|
|||
|
port is supplied without a port.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 38]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
The recommendations for delimiting URI in context (Appendix E) have
|
|||
|
been adjusted to reflect current practice.
|
|||
|
|
|||
|
G.4. Modifications from RFC 1808
|
|||
|
|
|||
|
RFC 1808 (Section 4) defined an empty URL reference (a reference
|
|||
|
containing nothing aside from the fragment identifier) as being a
|
|||
|
reference to the base URL. Unfortunately, that definition could be
|
|||
|
interpreted, upon selection of such a reference, as a new retrieval
|
|||
|
action on that resource. Since the normal intent of such references
|
|||
|
is for the user agent to change its view of the current document to
|
|||
|
the beginning of the specified fragment within that document, not to
|
|||
|
make an additional request of the resource, a description of how to
|
|||
|
correctly interpret an empty reference has been added in Section 4.
|
|||
|
|
|||
|
The description of the mythical Base header field has been replaced
|
|||
|
with a reference to the Content-Location header field defined by
|
|||
|
MHTML [RFC2110].
|
|||
|
|
|||
|
RFC 1808 described various schemes as either having or not having the
|
|||
|
properties of the generic URI syntax. However, the only requirement
|
|||
|
is that the particular document containing the relative references
|
|||
|
have a base URI that abides by the generic URI syntax, regardless of
|
|||
|
the URI scheme, so the associated description has been updated to
|
|||
|
reflect that.
|
|||
|
|
|||
|
The BNF term <net_loc> has been replaced with <authority>, since the
|
|||
|
latter more accurately describes its use and purpose. Likewise, the
|
|||
|
authority is no longer restricted to the IP server syntax.
|
|||
|
|
|||
|
Extensive testing of current client applications demonstrated that
|
|||
|
the majority of deployed systems do not use the ";" character to
|
|||
|
indicate trailing parameter information, and that the presence of a
|
|||
|
semicolon in a path segment does not affect the relative parsing of
|
|||
|
that segment. Therefore, parameters have been removed as a separate
|
|||
|
component and may now appear in any path segment. Their influence
|
|||
|
has been removed from the algorithm for resolving a relative URI
|
|||
|
reference. The resolution examples in Appendix C have been modified
|
|||
|
to reflect this change.
|
|||
|
|
|||
|
Implementations are now allowed to work around misformed relative
|
|||
|
references that are prefixed by the same scheme as the base URI, but
|
|||
|
only for schemes known to use the <hier_part> syntax.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 39]
|
|||
|
|
|||
|
RFC 2396 URI Generic Syntax August 1998
|
|||
|
|
|||
|
|
|||
|
H. Full Copyright Statement
|
|||
|
|
|||
|
Copyright (C) The Internet Society (1998). All Rights Reserved.
|
|||
|
|
|||
|
This document and translations of it may be copied and furnished to
|
|||
|
others, and derivative works that comment on or otherwise explain it
|
|||
|
or assist in its implementation may be prepared, copied, published
|
|||
|
and distributed, in whole or in part, without restriction of any
|
|||
|
kind, provided that the above copyright notice and this paragraph are
|
|||
|
included on all such copies and derivative works. However, this
|
|||
|
document itself may not be modified in any way, such as by removing
|
|||
|
the copyright notice or references to the Internet Society or other
|
|||
|
Internet organizations, except as needed for the purpose of
|
|||
|
developing Internet standards in which case the procedures for
|
|||
|
copyrights defined in the Internet Standards process must be
|
|||
|
followed, or as required to translate it into languages other than
|
|||
|
English.
|
|||
|
|
|||
|
The limited permissions granted above are perpetual and will not be
|
|||
|
revoked by the Internet Society or its successors or assigns.
|
|||
|
|
|||
|
This document and the information contained herein is provided on an
|
|||
|
"AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
|
|||
|
TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
|
|||
|
BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
|
|||
|
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
|
|||
|
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
Berners-Lee, et. al. Standards Track [Page 40]
|
|||
|
|