uri(3)



uri(3tcl)         Tcl Uniform Resource Identifier Management         uri(3tcl)

______________________________________________________________________________

NAME
       uri - URI utilities

SYNOPSIS
       package require Tcl  8.2

       package require uri  ?1.2.7?

       uri::setQuirkOption option ?value?

       uri::split url ?defaultscheme?

       uri::join ?key value?...

       uri::resolve base url

       uri::isrelative url

       uri::geturl url ?options...?

       uri::canonicalize uri

       uri::register schemeList script

______________________________________________________________________________

DESCRIPTION
       This package does two things.

       First,  it provides a number of commands for manipulating URLs/URIs and
       fetching data specified by them. For fetching data this package  analy-
       ses  the  requested  URL/URI  and then dispatches it to the appropriate
       package (http, ftp, ...) for actual retrieval.   Currently  these  com-
       mands are defined for the schemes http, https, ftp, mailto, news, ldap,
       ldaps and file.  The package uri::urn adds scheme urn.

       Second, it provides regular expressions  for  a  number  of  registered
       URL/URI  schemes.  Registered  schemes  are currently ftp, ldap, ldaps,
       file, http, https, gopher, mailto, news, wais and prospero.  The  pack-
       age uri::urn adds scheme urn.

       The  commands  of the package conform to RFC 3986 (https://www.rfc-edi-
       tor.org/rfc/rfc3986.txt), with the exception of a loophole arising from
       RFC  1630 and described in RFC 3986 Sections 5.2.2 and 5.4.2. The loop-
       hole allows a relative URI to include a scheme if it is the same as the
       scheme  of  the  base URI against which it is resolved. RFC 3986 recom-
       mends avoiding this usage.

COMMANDS
       uri::setQuirkOption option ?value?
              uri::setQuirkOption is an  accessor  command  for  a  number  of
              "quirk options".  The command has the same semantics as the com-
              mand set: when called with one argument  it  reads  an  existing
              value; with two arguments it writes a new value.  The value of a
              "quirk option" is boolean: the value false requests  conformance
              with  RFC  3986, while true requests use of the quirk.  See sec-
              tion QUIRK OPTIONS for discussion of the different  options  and
              their purpose.

       uri::split url ?defaultscheme?
              uri::split  takes  a  url, decodes it and then returns a list of
              key/value pairs suitable  for  array  set  containing  the  con-
              stituents  of  the url. If the scheme is missing from the url it
              defaults to the value of defaultscheme if it was  specified,  or
              http else. Currently the schemes http, https, ftp, mailto, news,
              ldap, ldaps and file are supported by the package  itself.   See
              section EXTENDING on how to expand that range.

              The  set  of constituents of a URL (= the set of keys in the re-
              turned dictionary) is dependent on the scheme of  the  URL.  The
              only  key  which  is therefore always present is scheme. For the
              following schemes the constituents and their keys are known:

              ftp    user, pwd, host, port, path, type, pbare.  The  pbare  is
                     optional.

              http(s)
                     user, pwd, host, port, path, query, fragment, pbare.  The
                     pbare is optional.

              file   path, host. The host is optional.

              mailto user, host. The host is optional.

              ldap(s)
                     host, port, dn, attrs, scope, filter, extensions

              news   Either message-id or newsgroup-name.

              For discussion of the boolean pbare see  options  NoInitialSlash
              and NoExtraKeys in QUIRK OPTIONS.

              The  constituents  are  returned  as slices of the argument url,
              without removal of percent-encoding  ("url-encoding")  or  other
              adaptations.   Notably, on Windows(R) the path in scheme file is
              not a valid local filename.  See EXAMPLES for more information.

       uri::join ?key value?...
              uri::join  takes  a  list  of  key/value  pairs  (generated   by
              uri::split, for example) and returns the canonical URL they rep-
              resent. Currently the schemes http, https,  ftp,  mailto,  news,
              ldap,  ldaps  and  file are supported by the package itself. See
              section EXTENDING on how to expand that range.

              The arguments are expected to be slices of  a  valid  URL,  with
              percent-encoding  ("url-encoding") and any other necessary adap-
              tations.  Notably, on Windows the path in scheme file is  not  a
              valid local filename.  See EXAMPLES for more information.

       uri::resolve base url
              uri::resolve  resolves  the  specified  url relative to base, in
              conformance with RFC 3986. In other words: a non-relative url is
              returned unchanged, whereas for a relative url the missing parts
              are taken from base and prepended to it. The result of this  op-
              eration  is returned. For an empty url the result is base, with-
              out its URI fragment (if any).  The  command  is  available  for
              schemes http, https, ftp, and file.

       uri::isrelative url
              uri::isrelative determines whether the specified url is absolute
              or relative.  The command is available for a url of any scheme.

       uri::geturl url ?options...?
              uri::geturl decodes the specified url and  then  dispatches  the
              request  to  the package appropriate for the scheme found in the
              URL. The command assumes that the package to  handle  the  given
              scheme  either has the same name as the scheme itself (including
              possible capitalization) followed by ::geturl, or,  in  case  of
              this  failing, has the same name as the scheme itself (including
              possible capitalization). It further assumes that whatever pack-
              age was loaded provides a geturl-command in the namespace of the
              same name as the package itself. This command is called with the
              given  url and all given options. Currently geturl does not han-
              dle any options itself.

              Note: file-URLs are an exception to the  rule  described  above.
              They are handled internally.

              It  is  not possible to specify results of the command. They de-
              pend on the geturl-command for the scheme the request  was  dis-
              patched to.

       uri::canonicalize uri
              uri::canonicalize  returns  the  canonical  form  of a URI.  The
              canonical form of a URI is one where  relative  path  specifica-
              tions,  i.e.  "."  and "..", have been resolved.  The command is
              available for all URI schemes that have uri::split and uri::join
              commands.  The  command  returns  a canonicalized URI if the URI
              scheme has a path component (i.e. http, https, ftp,  and  file).
              For  schemes  that have uri::split and uri::join commands but no
              path component (i.e. mailto, news, ldap, and ldaps), the command
              returns the uri unchanged.

       uri::register schemeList script
              uri::register registers the first element of schemeList as a new
              scheme and the remaining elements as aliases for this scheme. It
              creates  the namespace for the scheme and executes the script in
              the new namespace. The script has to declare variables  contain-
              ing  regular  expressions  relevant  to the scheme. At least the
              variable schemepart has to be declared as that one  is  used  to
              extend the variables keeping track of the registered schemes.

SCHEMES
       In addition to the commands mentioned above this package provides regu-
       lar expression to recognize URLs for a number of URL schemes.

       For each supported scheme a namespace of the same name  as  the  scheme
       itself  is provided inside of the namespace uri containing the variable
       url whose contents are a regular expression to recognize URLs  of  that
       scheme.  Additional variables may contain regular expressions for parts
       of URLs for that scheme.

       The variable uri::schemes contains a list of  all  registered  schemes.
       Currently  these  are  ftp,  ldap,  ldaps,  file,  http, https, gopher,
       mailto, news, wais and prospero.

EXTENDING
       Extending the range of schemes supported by uri::split and uri::join is
       easy  because both commands do not handle the request by themselves but
       dispatch it to another command in the uri namespace using the scheme of
       the URL as criterion.

       uri::split  and  uri::join  call  Split[string  totitle  <scheme>]  and
       Join[string totitle <scheme>] respectively.

       The provision of split and join commands is sufficient  to  extend  the
       commands  uri::canonicalize  and uri::geturl (the latter subject to the
       availability of a suitable package with a  geturl  command).   In  con-
       trast,  to extend the command uri::resolve to a new scheme, the command
       itself must be modified.

       To extend the range of schemes for which pattern information is  avail-
       able, use the command uri::register.

       An  example of a package that provides both commands and pattern infor-
       mation for a new scheme is uri::urn, which adds scheme urn.

QUIRK OPTIONS
       The value of a "quirk option" is boolean: the value false requests con-
       formance with RFC 3986, while true requests use of the quirk.  Use com-
       mand uri::setQuirkOption to access the values of quirk options.

       Quirk options are useful both for allowing backwards compatibility when
       a  command  specification  changes, and for adding useful features that
       are not included in RFC specifications.  The  following  quirk  options
       are currently defined:

       NoInitialSlash
              This  quirk  option  concerns  the leading character of path (if
              non-empty) in the schemes http, https, and ftp.

              RFC 3986 defines path in an absolute URI to have an initial "/",
              unless  the  value  of  path is the empty string. For the scheme
              file, all versions of package uri follow this rule.   The  quirk
              option NoInitialSlash does not apply to scheme file.

              For  the  schemes  http,  https, and ftp, versions of uri before
              1.2.7 define the path NOT to include an initial "/".   When  the
              quirk option NoInitialSlash is true (the default), this behavior
              is also used in version 1.2.7.  To use instead values of path as
              defined by RFC 3986, set this quirk option to false.

              This  setting  does  not affect RFC 3986 conformance.  If NoIni-
              tialSlash is true, then the value of path in the  schemes  http,
              https, or ftp, cannot distinguish between URIs in which the full
              "RFC 3986 path" is the empty string "" or a single slash "/" re-
              spectively.   The  missing  information  is recorded in an addi-
              tional uri::split key pbare.

              The boolean pbare is defined when quirk  options  NoInitialSlash
              and  NoExtraKeys  have  values  true and false respectively.  In
              this case, if the value of path is the empty string "", pbare is
              true  if  the  full "RFC 3986 path" is "", and pbare is false if
              the full "RFC 3986 path" is "/".

              Using this quirk option NoInitialSlash is a  matter  of  prefer-
              ence.

       NoExtraKeys
              This  quirk option permits full backward compatibility with ver-
              sions of uri before 1.2.7, by omitting the uri::split key  pbare
              described  above (see quirk option NoInitialSlash).  The outcome
              is greater backward compatibility of the uri::split command, but
              an  inability to distinguish between URIs in which the full "RFC
              3986 path" is the empty string "" or a single slash "/"  respec-
              tively - i.e. a minor non-conformance with RFC 3986.

              If  the quirk option NoExtraKeys is false (the default), command
              uri::split returns an additional key  pbare,  and  the  commands
              comply  with  RFC 3986. If the quirk option NoExtraKeys is true,
              the key pbare is not defined and there is not  full  conformance
              with RFC 3986.

              Using  the  quirk option NoExtraKeys is NOT recommended, because
              if set to true it will reduce conformance with  RFC  3986.   The
              option is included only for compatibility with code, written for
              earlier versions of uri, that needs values  of  path  without  a
              leading "/", AND ALSO cannot tolerate unexpected keys in the re-
              sults of uri::split.

       HostAsDriveLetter
              When handling the scheme file on the Windows platform,  versions
              of  uri  before  1.2.7 use the host field to represent a Windows
              drive letter and the colon that follows it, and the  path  field
              to  represent  the filename path after the colon.  Such URIs are
              invalid, and are not recognized by any RFC. When the  quirk  op-
              tion  HostAsDriveLetter  is  true, this behavior is also used in
              version 1.2.7.  To use file URIs on Windows that conform to  RFC
              3986, set this quirk option to false (the default).

              Using  this  quirk is NOT recommended, because if set to true it
              will cause the uri commands to expect and produce invalid  URIs.
              The option is included only for compatibility with legacy code.

       RemoveDoubleSlashes
              When  a  URI  is canonicalized by uri::canonicalize, its path is
              normalized by removal of segments "." and "..".  RFC  3986  does
              not mandate the removal of empty segments "" (i.e. the merger of
              double slashes, which is a feature of filename normalization but
              not  of  URI  path  normalization):  it  treats URIs with excess
              slashes as referring to different resources.  When the quirk op-
              tion  RemoveDoubleSlashes  is true (the default), empty segments
              will be removed from path.  To prevent removal, and thereby con-
              form to RFC 3986, set this quirk option to false.

              Using  this  quirk is a matter of preference.  A URI with double
              slashes in its path was most likely  generated  by  error,  cer-
              tainly  so  if  it  has a straightforward mapping to a file on a
              server.  In some cases it may be better to sanitize the URI;  in
              others,  to  keep the URI and let the server handle the possible
              error.

   BACKWARD COMPATIBILITY
       To behave as similarly as possible to  versions  of  uri  earlier  than
       1.2.7, set the following quirk options:

       o      uri::setQuirkOption NoInitialSlash 1

       o      uri::setQuirkOption NoExtraKeys 1

       o      uri::setQuirkOption HostAsDriveLetter 1

       o      uri::setQuirkOption RemoveDoubleSlashes 0

       In code that can tolerate the return by uri::split of an additional key
       pbare, set

       o      uri::setQuirkOption NoExtraKeys 0

       in order to achieve greater compliance with RFC 3986.

   NEW DESIGNS
       For new projects, the following settings are recommended:

       o      uri::setQuirkOption NoInitialSlash 0

       o      uri::setQuirkOption NoExtraKeys 0

       o      uri::setQuirkOption HostAsDriveLetter 0

       o      uri::setQuirkOption RemoveDoubleSlashes 0|1

   DEFAULT VALUES
       The default values for package uri version 1.2.7 are intended to  be  a
       compromise between backwards compatibility and improved features.  Dif-
       ferent default values may be chosen in future versions of package uri.

       o      uri::setQuirkOption NoInitialSlash 1

       o      uri::setQuirkOption NoExtraKeys 0

       o      uri::setQuirkOption HostAsDriveLetter 0

       o      uri::setQuirkOption RemoveDoubleSlashes 1

EXAMPLES
       A Windows(R) local filename such as "C:\Other Files\startup.txt" is not
       suitable for use as the path element of a URI in the scheme file.

       The  Tcl command file normalize will convert the backslashes to forward
       slashes.  To generate a valid path for the scheme file, the  normalized
       filename  must  be  prepended with "/", and then any characters that do
       not match the regexp bracket expression

                  [a-zA-Z0-9$_.+!*'(,)?:@&=-]

       must be percent-encoded.

       The result in this example is "/C:/Other%20Files/startup.txt" which  is
       a valid value for path.

              % uri::join path /C:/Other%20Files/startup.txt scheme file

              file:///C:/Other%20Files/startup.txt

              % uri::split file:///C:/Other%20Files/startup.txt

              path /C:/Other%20Files/startup.txt scheme file

       On  UNIX(R)  systems filenames begin with "/" which is also used as the
       directory separator.  The only action needed to convert a filename to a
       valid path is percent-encoding.

CREDITS
       Original code (regular expressions) by Andreas Kupries.  Modularisation
       by Steve Ball, also the split/join/resolve functionality. RFC 3986 con-
       formance by Keith Nash.

BUGS, IDEAS, FEEDBACK
       This  document,  and the package it describes, will undoubtedly contain
       bugs and other problems.  Please report such in the category uri of the
       Tcllib  Trackers  [http://core.tcl.tk/tcllib/reportlist].   Please also
       report any ideas for enhancements  you  may  have  for  either  package
       and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the out-
       put of diff -u.

       Note further that  attachments  are  strongly  preferred  over  inlined
       patches.  Attachments  can  be  made  by  going to the Edit form of the
       ticket immediately after its creation, and  then  using  the  left-most
       button in the secondary navigation bar.

KEYWORDS
       fetching  information,  file,  ftp,  gopher, http, https, ldap, mailto,
       news, prospero, rfc 1630, rfc 2255, rfc 2396, rfc 3986, uri, url, wais,
       www

CATEGORY
       Networking

tcllib                               1.2.7                           uri(3tcl)

Man(1) output converted with man2html
list of all man pages