uritools
— RFC 3986 compliant replacement for urlparse
¶
This module defines RFC 3986 compliant replacements for the most
commonly used functions of the Python 2.7 Standard Library
urlparse
and Python 3 urllib.parse
modules.
>>> from uritools import urisplit, uriunsplit, urijoin, uridefrag
>>> parts = urisplit('foo://user@example.com:8042/over/there?name=ferret#nose')
>>> parts
SplitResult(scheme='foo', authority='user@example.com:8042', path='/over/there', query='name=ferret', fragment='nose')
>>> parts.scheme
'foo'
>>> parts.authority
'user@example.com:8042'
>>> parts.userinfo
'user'
>>> parts.host
'example.com'
>>> parts.port
'8042'
>>> uriunsplit(parts[:3] + ('name=swallow&type=African', 'beak'))
'foo://user@example.com:8042/over/there?name=swallow&type=African#beak'
>>> urijoin('http://www.cwi.nl/~guido/Python.html', 'FAQ.html')
'http://www.cwi.nl/~guido/FAQ.html'
>>> uridefrag('http://pythonhosted.org/uritools/index.html#constants')
DefragResult(uri='http://pythonhosted.org/uritools/index.html', fragment='constants')
For various reasons, the Python 2 urlparse
module is not
compliant with current Internet standards, does not include Unicode
support, and is generally unusable with proprietary URI schemes.
Python 3’s urllib.parse
improves on Unicode support, but the
other issues still remain. As stated in Lib/urllib/parse.py:
FC 3986 is considered the current standard and any future changes
to urlparse module should conform with it. The urlparse module is
currently not entirely compliant with this RFC due to defacto
scenarios for parsing, and for backward compatibility purposes,
some parsing quirks from older RFCs are retained.
This module aims to provide fully RFC 3986 compliant replacements for
the most commonly used functions found in urlparse
and
urllib.parse
, plus additional functions for conveniently
composing URIs from their individual components.
See also
URI Decomposition¶
-
uritools.
uridefrag
(uristring)¶ Remove an existing fragment component from a URI string.
The return value is an instance of a subclass of
collections.namedtuple
with the following read-only attributes:Attribute Index Value uri
0 Absolute URI or relative URI reference without the fragment identifier fragment
1 Fragment identifier, or None
if not present
-
uritools.
urisplit
(uristring)¶ Split a well-formed URI string into a tuple with five components corresponding to a URI’s general structure:
<scheme>://<authority>/<path>?<query>#<fragment>
The return value is an instance of a subclass of
collections.namedtuple
with the following read-only attributes:Attribute Index Value scheme
0 URI scheme, or None
if not presentauthority
1 Authority component, or None
if not presentpath
2 Path component, always present but may be empty query
3 Query component, or None
if not presentfragment
4 Fragment identifier, or None
if not presentuserinfo
Userinfo subcomponent of authority, or None
if not presenthost
Host subcomponent of authority, or None
if not presentport
Port subcomponent of authority as a (possibly empty) string, or None
if not present
URI Composition¶
-
uritools.
uricompose
(scheme=None, authority=None, path='', query=None, fragment=None, userinfo=None, host=None, port=None, encoding='utf-8')¶ Compose a URI string from its individual components.
authority may be a Unicode string,
bytes
object, or a three-item iterable specifying userinfo, host and port subcomponents. If both authority and any of the userinfo, host or port keyword arguments are given, the keyword argument will override the corresponding authority subcomponent.If query is a mapping object or a sequence of two-element tuples, it will be converted to a string of name=value pairs seperated by &.
The returned value is of type
str
.
-
uritools.
urijoin
(base, ref, strict=False)¶ Convert a URI reference relative to a base URI to its target URI string.
If strict is
False
, a scheme in the reference is ignored if it is identical to the base URI’s scheme.
-
uritools.
uriunsplit
(parts)¶ Combine the elements of a five-item iterable into a URI string.
URI Encoding¶
-
uritools.
uridecode
(uristring, encoding='utf-8', errors='strict')¶ Decode a URI string or string component.
If encoding is set to
None
, return the percent-decoded uristring as abytes
object. Otherwise, replace any percent-encodings and decode uristring using the codec registered for encoding, returning a Unicode string.
-
uritools.
uriencode
(uristring, safe='', encoding='utf-8', errors='strict')¶ Encode a URI string or string component.
If uristring is a
bytes
object, replace any characters not inUNRESERVED
or safe with their corresponding percent-encodings and return the result as abytes
object. Otherwise, encode uristring using the codec registered for encoding before replacing any percent encodings.
Character Constants¶
-
uritools.
GEN_DELIMS
¶ A string containing all general delimiting characters specified in RFC 3986.
-
uritools.
RESERVED
¶ A string containing all reserved characters specified in RFC 3986.
-
uritools.
SUB_DELIMS
¶ A string containing all subcomponent delimiting characters specified in RFC 3986.
-
uritools.
UNRESERVED
¶ A string containing all unreserved characters specified in RFC 3986.
Structured Parse Results¶
The result objects from the uridefrag()
and urisplit()
functions are instances of subclasses of
collections.namedtuple
. These objects contain the attributes
described in the function documentation, as well as some additional
convenience methods.
-
class
uritools.
DefragResult
¶ Class to hold
uridefrag()
results.-
getfragment
(default=None, encoding='utf-8', errors='strict')¶ Return the decoded fragment identifier, or default if the original URI did not contain a fragment component.
-
geturi
()¶ Return the recombined version of the original URI as a string.
-
-
class
uritools.
SplitResult
¶ Base class to hold
urisplit()
results.-
getfragment
(default=None, encoding='utf-8', errors='strict')¶ Return the decoded fragment identifier, or default if the original URI did not contain a fragment component.
-
gethost
(default=None)¶ Return the decoded host subcomponent of the URI authority as a string or an
ipaddress
address object, or default if the original URI did not contain a host.
-
getpath
(encoding='utf-8', errors='strict')¶ Return the normalized decoded URI path.
-
getport
(default=None)¶ Return the port subcomponent of the URI authority as an
int
, or default if the original URI did not contain a port or if the port was empty.
-
getquery
(default=None, encoding='utf-8', errors='strict')¶ Return the decoded query string, or default if the original URI did not contain a query component.
-
getquerydict
(encoding='utf-8', errors='strict')¶ Split the query component into individual name=value pairs and return a dictionary of query variables. The dictionary keys are the unique query variable names and the values are lists of values for each name.
-
getquerylist
(encoding='utf-8', errors='strict')¶ Split the query component into individual name=value pairs and return a list of (name, value) tuples.
-
getscheme
(default=None)¶ Return the URI scheme in canonical (lowercase) form, or default if the original URI did not contain a scheme component.
-
geturi
()¶ Return the re-combined version of the original URI as a string.
-
getuserinfo
(default=None, encoding='utf-8', errors='strict')¶ Return the decoded userinfo subcomponent of the URI authority, or default if the original URI did not contain a userinfo field.
-
transform
(ref, strict=False)¶ Transform a URI reference relative to self into a
SplitResult
representing its target URI.
-