2 @setfilename ../../info/url.info
3 @settitle URL Programmer's Manual
9 @c @setchapternewpage odd
14 %\global\baselineskip 30pt % for printing in double space
16 @dircategory Emacs lisp libraries
18 * URL: (url). URL loading package.
22 This is the manual for the @code{url} Emacs Lisp library.
24 Copyright @copyright{} 1993--1999, 2002, 2004--2016 Free Software
28 Permission is granted to copy, distribute and/or modify this document
29 under the terms of the GNU Free Documentation License, Version 1.3 or
30 any later version published by the Free Software Foundation; with no
31 Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
32 and with the Back-Cover Texts as in (a) below. A copy of the license
33 is included in the section entitled ``GNU Free Documentation License''.
35 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
36 modify this GNU manual.''
42 @title URL Programmer's Manual
43 @subtitle First Edition, URL Version 2.0
44 @author William M. Perry @email{wmperry@@gnu.org}
45 @author David Love @email{fx@@gnu.org}
47 @vskip 0pt plus 1filll
61 * Introduction:: About the @code{url} library.
62 * URI Parsing:: Parsing (and unparsing) URIs.
63 * Retrieving URLs:: How to use this package to retrieve a URL.
64 * Supported URL Types:: Descriptions of URL types currently supported.
65 * General Facilities:: URLs can be cached, accessed via a gateway
66 and tracked in a history list.
67 * Customization:: Variables you can alter.
68 * GNU Free Documentation License:: The license for this documentation.
78 @cindex uniform resource identifier
79 @cindex uniform resource locator
81 A @dfn{Uniform Resource Identifier} (URI) is a specially-formatted
82 name, such as an Internet address, that identifies some name or
83 resource. The format of URIs is described in RFC 3986, which updates
84 and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A
85 @dfn{Uniform Resource Locator} (URL) is an older but still-common
86 term, which basically refers to a URI corresponding to a resource that
87 can be accessed (usually over a network) in a specific way.
89 Here are some examples of URIs (taken from RFC 3986):
92 ftp://ftp.is.co.za/rfc/rfc1808.txt
93 http://www.ietf.org/rfc/rfc2396.txt
94 ldap://[2001:db8::7]/c=GB?objectClass?one
95 mailto:John.Doe@@example.com
96 news:comp.infosystems.www.servers.unix
98 telnet://192.0.2.16:80/
99 urn:oasis:names:specification:docbook:dtd:xml:4.1.2
102 This manual describes the @code{url} library, an Emacs Lisp library
103 for parsing URIs and retrieving the resources to which they refer.
104 (The library is so-named for historical reasons; nowadays, the ``URI''
105 terminology is regarded as the more general one, and ``URL'' is
106 technically obsolete despite its widespread vernacular usage.)
111 A URI consists of several @dfn{components}, each having a different
112 meaning. For example, the URI
115 http://www.gnu.org/software/emacs/
119 specifies the scheme component @samp{http}, the hostname component
120 @samp{www.gnu.org}, and the path component @samp{/software/emacs/}.
123 The format of URIs is specified by RFC 3986. The @code{url} library
124 provides the Lisp function @code{url-generic-parse-url}, a (mostly)
125 standard-compliant URI parser, as well as function
126 @code{url-recreate-url}, which converts a parsed URI back into a URI
129 @defun url-generic-parse-url uri-string
130 This function returns a parsed version of the string @var{uri-string}.
133 @defun url-recreate-url uri-obj
134 @cindex unparsing URLs
135 Given a parsed URI, this function returns the corresponding URI string.
139 The return value of @code{url-generic-parse-url}, and the argument
140 expected by @code{url-recreate-url}, is a @dfn{parsed URI}: a CL
141 structure whose slots hold the various components of the URI@.
142 @xref{Top,the CL Manual,,cl,GNU Emacs Common Lisp Emulation}, for
143 details about CL structures. Most of the other functions in the
144 @code{url} library act on parsed URIs.
147 * Parsed URIs:: Format of parsed URI structures.
148 * URI Encoding:: Non-@acronym{ASCII} characters in URIs.
152 @section Parsed URI structures
154 Each parsed URI structure contains the following slots:
158 The URI scheme (a string, e.g., @code{http}). @xref{Supported URL
159 Types}, for a list of schemes that the @code{url} library knows how to
160 process. This slot can also be @code{nil}, if the URI is not fully
164 The user name (a string), or @code{nil}.
167 The user password (a string), or @code{nil}. The use of this URI
168 component is strongly discouraged; nowadays, passwords are transmitted
169 by other means, not as part of a URI.
172 The host name (a string), or @code{nil}. If present, this is
173 typically a domain name or IP address.
176 The port number (an integer), or @code{nil}. Omitting this component
177 usually means to use the ``standard'' port associated with the URI
181 The combination of the ``path'' and ``query'' components of the URI (a
182 string), or @code{nil}. If the query component is present, it is the
183 substring following the first @samp{?} character, and the path
184 component is the substring before the @samp{?}. The meaning of these
185 components is scheme-dependent; they do not necessarily refer to a
189 The fragment component (a string), or @code{nil}. The fragment
190 component specifies a ``secondary resource'', such as a section of a
194 This is @code{t} if the URI is fully specified, i.e., the
195 hierarchical components of the URI (the hostname and/or username
196 and/or password) are preceded by @samp{//}.
206 @findex url-attributes
208 These slots have accessors named @code{url-@var{part}}, where
209 @var{part} is the slot name. For example, the accessor for the
210 @code{host} slot is the function @code{url-host}. The @code{url-port}
211 accessor returns the default port for the URI scheme if the parsed
212 URI's @var{port} slot is @code{nil}.
214 The slots can be set using @code{setf}. For example:
217 (setf (url-port url) 80)
221 @section URI Encoding
223 @cindex percent encoding
224 The @code{url-generic-parse-url} parser does not obey RFC 3986 in
225 one respect: it allows non-@acronym{ASCII} characters in URI strings.
227 Strictly speaking, RFC 3986 compatible URIs may only consist of
228 @acronym{ASCII} characters; non-@acronym{ASCII} characters are
229 represented by converting them to UTF-8 byte sequences, and performing
230 @dfn{percent encoding} on the bytes. For example, the o-umlaut
231 character is converted to the UTF-8 byte sequence @samp{\xD3\xA7},
232 then percent encoded to @samp{%D3%A7}. (Certain ``reserved''
233 @acronym{ASCII} characters must also be percent encoded when they
234 appear in URI components.)
236 The function @code{url-encode-url} can be used to convert a URI
237 string containing arbitrary characters to one that is properly
238 percent-encoded in accordance with RFC 3986.
240 @defun url-encode-url url-string
241 This function return a properly URI-encoded version of
242 @var{url-string}. It also performs @dfn{URI normalization},
243 e.g., converting the scheme component to lowercase if it was
244 previously uppercase.
247 To convert between a string containing arbitrary characters and a
248 percent-encoded all-@acronym{ASCII} string, use the functions
249 @code{url-hexify-string} and @code{url-unhex-string}:
251 @defun url-hexify-string string &optional allowed-chars
252 This function performs percent-encoding on @var{string}, and returns
255 If @var{string} is multibyte, it is first converted to a UTF-8 byte
256 string. Each byte corresponding to an allowed character is left
257 as-is, while all other bytes are converted to a three-character
258 sequence: @samp{%} followed by two upper-case hex digits.
260 @vindex url-unreserved-chars
261 @cindex unreserved characters
262 The allowed characters are specified by @var{allowed-chars}. If this
263 argument is @code{nil}, the allowed characters are those specified as
264 @dfn{unreserved characters} by RFC 3986 (see the variable
265 @code{url-unreserved-chars}). Otherwise, @var{allowed-chars} should
266 be a vector whose @var{n}-th element is non-@code{nil} if character
270 @defun url-unhex-string string &optional allow-newlines
271 This function replaces percent-encoding sequences in @var{string} with
272 their character equivalents, and returns the resulting string.
274 If @var{allow-newlines} is non-@code{nil}, it allows the decoding of
275 carriage returns and line feeds, which are normally forbidden in URIs.
278 @node Retrieving URLs
279 @chapter Retrieving URLs
281 The @code{url} library defines the following three functions for
282 retrieving the data specified by a URL@. The actual retrieval protocol
283 depends on the URL's URI scheme, and is performed by lower-level
284 scheme-specific functions. (Those lower-level functions are not
285 documented here, and generally should not be called directly.)
287 In each of these functions, the @var{url} argument can be either a
288 string or a parsed URL structure. If it is a string, that string is
289 passed through @code{url-encode-url} before using it, to ensure that
290 it is properly URI-encoded (@pxref{URI Encoding}).
292 @defun url-retrieve-synchronously url &optional silent no-cookies timeout
293 This function synchronously retrieves the data specified by @var{url},
294 and returns a buffer containing the data. The return value is
295 @code{nil} if there is no data associated with the URL (as is the case
296 for @code{dired}, @code{info}, and @code{mailto} URLs).
298 If the optional argument @var{silent} is non-@code{nil}, progress
299 messages are suppressed. If the optional argument @var{no-cookies} is
300 non-@code{nil}, cookies are not stored or sent. If the optional
301 argument @var{timeout} is non-@code{nil}, it should be a number that
302 says (in seconds) how long to wait for a response before giving up.
305 @defun url-retrieve url callback &optional cbargs silent no-cookies
306 This function retrieves @var{url} asynchronously, calling the function
307 @var{callback} when the object has been completely retrieved. The
308 return value is the buffer into which the data will be inserted, or
309 @code{nil} if the process has already completed.
311 The callback function is called this way:
314 (apply @var{callback} @var{status} @var{cbargs})
318 where @var{status} is a plist representing what happened during the
319 retrieval, with most recent events first, or an empty list if no
320 events have occurred. Each pair in the plist is one of:
323 @item (:redirect @var{redirected-to})
324 This means that the request was redirected to the URL
327 @item (:error (@var{error-symbol} . @var{data}))
328 This means that an error occurred. If so desired, the error can be
329 signaled with @code{(signal @var{error-symbol} @var{data})}.
332 When the callback function is called, the current buffer is the one
333 containing the retrieved data (if any). The buffer also contains any
334 MIME headers associated with the data retrieval.
336 If the optional argument @var{silent} is non-@code{nil}, progress
337 messages are suppressed. If the optional argument @var{no-cookies} is
338 non-@code{nil}, cookies are not stored or sent.
341 @defun url-queue-retrieve url callback &optional cbargs silent no-cookies
342 This function acts like @code{url-retrieve}, but with limits on the
343 number of concurrently-running network processes. The option
344 @code{url-queue-parallel-processes} controls the number of concurrent
345 processes, and the option @code{url-queue-timeout} sets a timeout in
348 To use this function, you must @code{(require 'url-queue)}.
351 @vindex url-queue-parallel-processes
352 @defopt url-queue-parallel-processes
353 The value of this option is an integer specifying the maximum number
354 of concurrent @code{url-queue-retrieve} network processes. If the
355 number of @code{url-queue-retrieve} calls is larger than this number,
356 later ones are queued until earlier ones are finished.
359 @vindex url-queue-timeout
360 @defopt url-queue-timeout
361 The value of this option is a number specifying the maximum lifetime
362 of a @code{url-queue-retrieve} network process, once it is started.
363 If a process is not finished by then, it is killed and removed from
367 @node Supported URL Types
368 @chapter Supported URL Types
370 This chapter describes functions and variables affecting URL retrieval
371 for specific schemes.
374 * http/https:: Hypertext Transfer Protocol.
375 * file/ftp:: Local files and FTP archives.
376 * info:: Emacs "Info" pages.
377 * mailto:: Sending email.
378 * news/nntp/snews:: Usenet news.
379 * rlogin/telnet/tn3270:: Remote host connectivity.
380 * irc:: Internet Relay Chat.
381 * data:: Embedded data URLs.
382 * nfs:: Networked File System.
383 * ldap:: Lightweight Directory Access Protocol.
384 * man:: Unix man pages.
385 * Tramp:: Schemes supported via Tramp.
389 @section @code{http} and @code{https}
391 The @code{http} scheme refers to the Hypertext Transfer Protocol. The
392 @code{url} library supports HTTP version 1.1, specified in RFC 2616.
393 Its default port is 80.
395 The @code{https} scheme is a secure version of @code{http}, with
396 transmission via SSL@. It is defined in RFC 2069, and its default port
397 is 443. When using @code{https}, the @code{url} library performs SSL
398 encryption via the @code{ssl} library, by forcing the @code{ssl}
399 gateway method to be used. @xref{Gateways in general}.
401 @defopt url-honor-refresh-requests
402 If this option is non-@code{nil} (the default), the @code{url} library
403 honors the HTTP @samp{Refresh} header, which is used by servers to
404 direct clients to reload documents from the same URL or a or different
405 one. If the value is @code{nil}, the @samp{Refresh} header is
406 ignored; any other value means to ask the user on each request.
411 * HTTP language/coding::
413 * Dealing with HTTP documents::
419 @findex url-cookie-delete
420 @defun url-cookie-list
421 This command creates a @file{*url cookies*} buffer listing the current
422 cookies, if there are any. You can remove a cookie using the
423 @kbd{C-k} (@code{url-cookie-delete}) command.
426 @defun url-cookie-delete-cookies &optional regexp
427 This function takes a regular expression as its parameters and deletes
428 all cookies from that domain. If @var{regexp} is @code{nil}, delete
432 @defopt url-cookie-file
433 The file in which cookies are stored, defaulting to @file{cookies} in
434 the directory specified by @code{url-configuration-directory}.
437 @defopt url-cookie-confirmation
438 Specifies whether confirmation is required to accept cookies.
441 @defopt url-cookie-multiple-line
442 Specifies whether to put all cookies for the server on one line in the
443 HTTP request to satisfy broken servers like
444 @url{http://www.hotmail.com}.
447 @defopt url-cookie-trusted-urls
448 A list of regular expressions matching URLs from which to accept
452 @defopt url-cookie-untrusted-urls
453 A list of regular expressions matching URLs from which to reject
457 @defopt url-cookie-save-interval
458 The number of seconds between automatic saves of cookies to disk.
463 @node HTTP language/coding
464 @subsection Language and Encoding Preferences
466 HTTP allows clients to express preferences for the language and
467 encoding of documents which servers may honor. For each of these
468 variables, the value is a string; it can specify a single choice, or
469 it can be a comma-separated list.
471 Normally, this list is ordered by descending preference. However, each
472 element can be followed by @samp{;q=@var{priority}} to specify its
473 preference level, a decimal number from 0 to 1; e.g., for
474 @code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
475 en;q=0.7"}}. An element that has no @samp{;q} specification has
478 @defopt url-mime-charset-string
479 @cindex character sets
480 @cindex coding systems
481 This variable specifies a preference for character sets when documents
482 can be served in more than one encoding.
484 HTTP allows specifying a series of MIME charsets which indicate your
485 preferred character set encodings, e.g., Latin-9 or Big5, and these
486 can be weighted. The default series is generated automatically from
487 the associated MIME types of all defined coding systems, sorted by the
488 coding system priority specified in Emacs. @xref{Recognize Coding, ,
489 Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
492 @defopt url-mime-language-string
493 @cindex language preferences
494 A string specifying the preferred language when servers can serve
495 files in several languages. Use RFC 1766 abbreviations, e.g.,
496 @samp{en} for English, @samp{de} for German.
498 The string can be @code{"*"} to get the first available language (as
499 opposed to the default).
502 @node HTTP URL Options
503 @subsection HTTP URL Options
505 HTTP supports an @samp{OPTIONS} method describing things supported by
508 @defun url-http-options url
509 Returns a property list describing options available for URL@. The
510 property list members are:
514 A list of symbols specifying what HTTP methods the resource
519 A list of numbers specifying what DAV protocol/schema versions are
524 A list of supported DASL search types supported (string form).
527 A list of the units available for use in partial document fetches.
531 The @dfn{Platform For Privacy Protection} description for the resource.
532 Currently this is just the raw header contents.
537 @node Dealing with HTTP documents
538 @subsection Dealing with HTTP documents
540 HTTP URLs are retrieved into a buffer containing the HTTP headers
541 followed by the body. Since the headers are quasi-MIME, they may be
542 processed using the MIME library. @xref{Top,, Emacs MIME,
543 emacs-mime, The Emacs MIME Manual}.
546 @section file and ftp
549 @cindex File Transfer Protocol
550 @cindex compressed files
553 The @code{ftp} and @code{file} schemes are defined in RFC 1808. The
554 @code{url} library treats @samp{ftp:} and @samp{file:} as synonymous.
555 Such URLs have the form
558 ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
559 file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
563 If the URL specifies a local file, it is retrieved by reading the file
564 contents in the usual way. If it specifies a remote file, it is
565 retrieved using either the Tramp or the Ange-FTP package.
566 @xref{Remote Files,,, emacs, The GNU Emacs Manual}.
568 When retrieving a compressed file, it is automatically uncompressed
569 if it has the file suffix @file{.z}, @file{.gz}, @file{.Z},
570 @file{.bz2}, or @file{.xz}. (The list of supported suffixes is
571 hard-coded, and cannot be altered by customizing
572 @code{jka-compr-compression-info-list}.)
574 @defopt url-directory-index-file
575 This option specifies the filename to look for when a @code{file} or
576 @code{ftp} URL specifies a directory. The default is
577 @file{index.html}. If this file exists and is readable, it is viewed.
578 Otherwise, Emacs visits the directory using Dired.
585 @findex Info-goto-node
587 The @code{info} scheme is non-standard. Such URLs have the form
590 info:@var{file}#@var{node}
594 and are retrieved by invoking @code{Info-goto-node} with argument
595 @samp{(@var{file})@var{node}}. If @samp{#@var{node}} is omitted, the
596 @samp{Top} node is opened.
603 A @code{mailto} URL specifies an email message to be sent to a given
604 email address. For example, @samp{mailto:foo@@bar.com} specifies
605 sending a message to @samp{foo@@bar.com}. The ``retrieval method''
606 for such URLs is to open a mail composition buffer in which the
607 appropriate content (e.g., the recipient address) has been filled in.
609 As defined in RFC 6068, a @code{mailto} URL can have the form
612 @samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
616 where an arbitrary number of @var{header}s can be added. If the
617 @var{header} is @samp{body}, then @var{contents} is put in the message
618 body; otherwise, a @var{header} header field is created with
619 @var{contents} as its contents. Note that the @code{url} library does
620 not perform any checking of @var{header} or @var{contents}, so you
621 should check them before sending the message.
623 @defopt url-mail-command
624 @vindex mail-user-agent
625 The value of this variable is the function called whenever url needs
626 to send mail. This should normally be left its default, which is the
627 standard mail-composition command @code{compose-mail}. @xref{Sending
628 Mail,,, emacs, The GNU Emacs Manual}.
631 If the document containing the @code{mailto} URL itself possessed a
632 known URL, Emacs automatically inserts an @samp{X-Url-From} header
633 field into the mail buffer, specifying that URL.
635 @node news/nntp/snews
636 @section @code{news}, @code{nntp} and @code{snews}
643 The @code{news}, @code{nntp}, and @code{snews} schemes, defined in RFC
644 1738, are used for reading Usenet newsgroups. For compatibility with
645 non-standard-compliant news clients, the @code{url} library allows
646 host and port fields to be included in @code{news} URLs, even though
647 they are properly only allowed for @code{nntp} and @code{snews}.
649 @code{news} and @code{nntp} URLs have the following form:
652 @item news:@var{newsgroup}
653 Retrieves a list of messages in @var{newsgroup};
654 @item news:@var{message-id}
655 Retrieves the message with the given @var{message-id};
657 Retrieves a list of all available newsgroups;
658 @item nntp://@var{host}:@var{port}/@var{newsgroup}
659 @itemx nntp://@var{host}:@var{port}/@var{message-id}
660 @itemx nntp://@var{host}:@var{port}/*
661 Similar to the @samp{news} versions.
664 The default port for @code{nntp} (and @code{news}) is 119. The
665 difference between an @code{nntp} URL and a @code{news} URL is that an
666 @code{nttp} URL may specify an article by its number. The
667 @samp{snews} scheme is the same as @samp{nntp}, except that it is
668 tunneled through SSL and has default port 563.
670 These URLs are retrieved via the Gnus package.
672 @cindex environment variable
674 @defopt url-news-server
675 This variable specifies the default news server from which to fetch
676 news, if no server was specified in the URL@. The default value,
677 @code{nil}, means to use the server specified by the standard
678 environment variable @samp{NNTPSERVER}, or @samp{news} if that
679 environment variable is unset.
682 @node rlogin/telnet/tn3270
683 @section rlogin, telnet and tn3270
687 @cindex terminal emulation
688 @findex terminal-emulator
690 These URL schemes are defined in RFC 1738, and are used for logging in
691 via a terminal emulator. They have the form
694 telnet://@var{user}:@var{password}@@@var{host}:@var{port}
698 but the @var{password} component is ignored. By default, the
699 @code{telnet} scheme is handled via Tramp (@pxref{Tramp}).
701 To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
702 @code{telnet} or @code{tn3270} (the program names and arguments are
703 hardcoded) session is run in a @code{terminal-emulator} buffer.
704 Well-known ports are used if the URL does not specify a port.
709 @cindex Internet Relay Chat
714 The @code{irc} scheme is defined in the Internet Draft at
715 @url{http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt} (which
716 was never approved as an RFC). Such URLs have the form
719 irc://@var{host}:@var{port}/@var{target},@var{needpass}
723 and are retrieved by opening an @acronym{IRC} session using the
724 function specified by @code{url-irc-function}.
726 @defopt url-irc-function
727 The value of this option is a function, which is called to open an IRC
728 connection for @code{irc} URLs. This function must take five
729 arguments, @var{host}, @var{port}, @var{channel}, @var{user} and
730 @var{password}. The @var{channel} argument specifies the channel to
731 join immediately, and may be @code{nil}.
733 The default is @code{url-irc-rcirc}, which uses the Rcirc package.
734 Other options are @code{url-irc-erc} (which uses ERC) and
735 @code{url-irc-zenirc} (which uses ZenIRC).
742 The @code{data} scheme, defined in RFC 2397, contains MIME data in
743 the URL itself. Such URLs have the form
746 data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
750 @var{media-type} is a MIME @samp{Content-Type} string, possibly
751 including parameters. It defaults to
752 @samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
753 omitted but the charset parameter supplied. If @samp{;base64} is
754 present, the @var{data} are base64-encoded.
759 @cindex Network File System
762 The @code{nfs} scheme, defined in RFC 2224, is similar to @code{ftp}
763 except that it points to a file on a remote host that is handled by an
764 NFS automounter on the local host. Such URLs have the form
767 nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
770 @defvar url-nfs-automounter-directory-spec
772 A string saying how to invoke the NFS automounter. Certain @samp{%}
773 sequences are recognized:
777 The hostname of the NFS server;
779 The port number of the NFS server;
781 The username to use to authenticate;
783 The password to use to authenticate;
785 The filename on the remote server;
790 Each can be used any number of times.
795 @cindex Lightweight Directory Access Protocol
797 The LDAP scheme is defined in RFC 2255.
801 @cindex @command{man}
802 @cindex Unix man pages
805 The @code{man} scheme is a non-standard one. Such URLs have the form
808 @samp{man:@var{page-spec}}
812 and are retrieved by passing @var{page-spec} to the Lisp function
816 @section URL Types Supported via Tramp
818 @vindex url-tramp-protocols
819 Some additional URL types are supported by passing them to Tramp
820 (@pxref{Top, The Tramp Manual,, tramp, The Tramp Manual}). These
821 protocols are listed in the @code{url-tramp-protocols} variable, which
822 you can customize. The default value includes the following
827 The file transfer protocol. @xref{file/ftp}.
831 The secure shell protocol. @xref{Inline methods,,, tramp, The Tramp
836 The secure file copy protocol. @xref{External methods,,, tramp, The
841 The remote sync protocol.
847 @node General Facilities
848 @chapter General Facilities
853 * Gateways in general::
858 @section Disk Caching
860 @cindex Persistent Cache
863 The disk cache stores retrieved documents locally, whence they can be
864 retrieved more quickly. When requesting a URL that is in the cache,
865 the library checks to see if the page has changed since it was last
866 retrieved from the remote machine. If not, the local copy is used,
867 saving the transmission over the network.
868 @cindex Cleaning the cache
869 @cindex Clearing the cache
870 @cindex Cache cleaning
871 Currently the cache isn't cleared automatically.
872 @c Running the @code{clean-cache} shell script
873 @c fist is recommended, to allow for future cleaning of the cache. This
874 @c shell script will remove all files that have not been accessed since it
875 @c was last run. To keep the cache pared down, it is recommended that this
876 @c script be run from @i{at} or @i{cron} (see the manual pages for
877 @c crontab(5) or at(1) for more information)
879 @defopt url-automatic-caching
880 Setting this variable non-@code{nil} causes documents to be cached
884 @defopt url-cache-directory
885 This variable specifies the
886 directory to store the cache files. It defaults to sub-directory
887 @file{cache} of @code{url-configuration-directory}.
890 @defopt url-cache-creation-function
891 The cache relies on a scheme for mapping URLs to files in the cache.
892 This variable names a function which sets the type of cache to use.
893 It takes a URL as argument and returns the absolute file name of the
894 corresponding cache file. The two supplied possibilities are
895 @code{url-cache-create-filename-using-md5} and
896 @code{url-cache-create-filename-human-readable}.
899 @defun url-cache-create-filename-using-md5 url
900 Creates a cache file name from @var{url} using MD5 hashing.
901 This is creates entries with very few cache collisions and is fast.
904 (url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
905 @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
909 @defun url-cache-create-filename-human-readable url
910 Creates a cache file name from @var{url} more obviously connected to
911 @var{url} than for @code{url-cache-create-filename-using-md5}, but
912 more likely to conflict with other files.
914 (url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
915 @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
919 @defun url-cache-expired
920 This function returns non-@code{nil} if a cache entry has expired (or is absent).
921 The arguments are a URL and optional expiration delay in seconds
922 (default @var{url-cache-expire-time}).
925 @defopt url-cache-expire-time
926 This variable is the default number of seconds to use for the
927 expire-time argument of the function @code{url-cache-expired}.
930 @defun url-fetch-from-cache
931 This function takes a URL as its argument and returns a buffer
932 containing the data cached for that URL.
935 @c Fixme: never actually used currently?
936 @c @defopt url-standalone-mode
937 @c @cindex Relying on cache
938 @c @cindex Cache only mode
939 @c @cindex Standalone mode
940 @c If this variable is non-@code{nil}, the library relies solely on the
941 @c cache for fetching documents and avoids checking if they have changed
942 @c on remote servers.
945 @c With a large cache of documents on the local disk, it can be very handy
946 @c when traveling, or any other time the network connection is not active
947 @c (a laptop with a dial-on-demand PPP connection, etc.). Emacs/W3 can rely
948 @c solely on its cache, and avoid checking to see if the page has changed
949 @c on the remote server. In the case of a dial-on-demand PPP connection,
950 @c this will keep the phone line free as long as possible, only bringing up
951 @c the PPP connection when asking for a page that is not located in the
952 @c cache. This is very useful for demonstrations as well.
955 @section Proxies and Gatewaying
957 @c fixme: check/document url-ns stuff
958 @cindex proxy servers
960 @cindex environment variables
962 Proxy servers are commonly used to provide gateways through firewalls
963 or as caches serving some more-or-less local network. Each protocol
964 (HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
965 conventionally configured commonly amongst different programs through
966 environment variables of the form @code{@var{protocol}_proxy}, where
967 @var{protocol} is one of the supported network protocols (@code{http},
968 @code{ftp} etc.). The library recognizes such variables in either
969 upper or lower case. Their values are of one of the forms:
971 @item @code{@var{host}:@var{port}}
973 @item Simply a host name.
977 The @code{NO_PROXY} environment variable specifies URLs that should be
978 excluded from proxying (on servers that should be contacted directly).
979 This should be a comma-separated list of hostnames, domain names, or a
980 mixture of both. Asterisks can be used as wildcards, but other
981 clients may not support that. Domain names may be indicated by a
982 leading dot. For example:
984 NO_PROXY="*.aventail.com,home.com,.seanet.com"
986 @noindent says to contact all machines in the @samp{aventail.com} and
987 @samp{seanet.com} domains directly, as well as the machine named
988 @samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
989 and @code{no_proxy} are also tried, in that order.
991 Proxies may also be specified directly in Lisp.
993 @defopt url-proxy-services
994 This variable is an alist of URL schemes and proxy servers that
995 gateway them. The items are of the form @w{@code{(@var{scheme}
996 . @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
997 gatewayed through @var{portnumber} on the specified @var{host}. An
998 exception is the pseudo scheme @code{"no_proxy"}, which is paired with
999 a regexp matching host names not to be proxied. This variable is
1000 initialized from the environment as above.
1003 (setq url-proxy-services
1004 '(("http" . "proxy.aventail.com:80")
1005 ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
1009 @node Gateways in general
1010 @section Gateways in General
1014 The library provides a general gateway layer through which all
1015 networking passes. It can both control access to the network and
1016 provide access through gateways in firewalls. This may make direct
1017 connections in some cases and pass through some sort of gateway in
1018 others.@footnote{Proxies (which only operate over HTTP) are
1019 implemented using this.} The library's basic function responsible for
1020 making connections is @code{url-open-stream}.
1022 @defun url-open-stream name buffer host service
1023 @cindex opening a stream
1024 @cindex stream, opening
1025 Open a stream to @var{host}, possibly via a gateway. The other
1026 arguments are as for @code{open-network-stream}. This will not make a
1027 connection if @code{url-gateway-unplugged} is non-@code{nil}.
1030 @defvar url-gateway-local-host-regexp
1031 This is a regular expression that matches local hosts that do not
1032 require the use of a gateway. If @code{nil}, all connections are made
1033 through the gateway.
1036 @defvar url-gateway-method
1037 This variable controls which gateway method is used. It may be useful
1038 to bind it temporarily in some applications. It has values taken from
1039 a list of symbols. Possible values are:
1043 @cindex @command{telnet}
1044 Use this method if you must first telnet and log into a gateway host,
1045 and then run telnet from that host to connect to outside machines.
1048 @cindex @command{rlogin}
1049 This method is identical to @code{telnet}, but uses @command{rlogin}
1050 to log into the remote machine without having to send the username and
1051 password over the wire every time.
1055 Use if the firewall has a @sc{socks} gateway running on it. The
1056 @sc{socks} v5 protocol is defined in RFC 1928.
1059 @c This probably shouldn't be documented
1060 @c Fixme: why not? -- fx
1063 This method uses Emacs's builtin networking directly. This is the
1064 default. It can be used only if there is no firewall blocking access.
1068 The following variables control the gateway methods.
1070 @defopt url-gateway-telnet-host
1071 The gateway host to telnet to. Once logged in there, you then telnet
1072 out to the hosts you want to connect to.
1074 @defopt url-gateway-telnet-parameters
1075 This should be a list of parameters to pass to the @command{telnet} program.
1077 @defopt url-gateway-telnet-password-prompt
1078 This is a regular expression that matches the password prompt when
1081 @defopt url-gateway-telnet-login-prompt
1082 This is a regular expression that matches the username prompt when
1085 @defopt url-gateway-telnet-user-name
1086 The username to log in with.
1088 @defopt url-gateway-telnet-password
1089 The password to send when logging in.
1091 @defopt url-gateway-prompt-pattern
1092 This is a regular expression that matches the shell prompt.
1095 @defopt url-gateway-rlogin-host
1096 Host to @samp{rlogin} to before telnetting out.
1098 @defopt url-gateway-rlogin-parameters
1099 Parameters to pass to @samp{rsh}.
1101 @defopt url-gateway-rlogin-user-name
1102 User name to use when logging in to the gateway.
1104 @defopt url-gateway-prompt-pattern
1105 This is a regular expression that matches the shell prompt.
1108 @defopt socks-server
1109 This specifies the default server, it takes the form
1110 @w{@code{("Default server" @var{server} @var{port} @var{version})}}
1111 where @var{version} can be either 4 or 5.
1113 @defvar socks-password
1114 If this is @code{nil} then you will be asked for the password,
1115 otherwise it will be used as the password for authenticating you to
1116 the @sc{socks} server.
1118 @defvar socks-username
1119 This is the username to use when authenticating yourself to the
1120 @sc{socks} server. By default this is your login name.
1122 @defvar socks-timeout
1123 This controls how long, in seconds, to wait for responses from the
1124 @sc{socks} server; it is 5 by default.
1126 @c fixme: these have been effectively commented-out in the code
1127 @c @defopt socks-server-aliases
1128 @c This a list of server aliases. It is a list of aliases of the form
1129 @c @var{(alias hostname port version)}.
1131 @c @defopt socks-network-aliases
1132 @c This a list of network aliases. Each entry in the list takes the form
1133 @c @var{(alias (network))} where @var{alias} is a string that names the
1134 @c @var{network}. The networks can contain a pair (not a dotted pair) of
1135 @c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
1136 @c address and a netmask, a domain name or a unique hostname or @sc{ip}
1139 @c @defopt socks-redirection-rules
1140 @c This a list of redirection rules. Each rule take the form
1141 @c @var{(Destination network Connection type)} where @var{Destination
1142 @c network} is a network alias from @code{socks-network-aliases} and
1143 @c @var{Connection type} can be @code{nil} in which case a direct
1144 @c connection is used, or it can be an alias from
1145 @c @code{socks-server-aliases} in which case that server is used as a
1148 @defopt socks-nslookup-program
1149 @cindex @command{nslookup}
1150 This the @samp{nslookup} program. It is @code{"nslookup"} by default.
1154 * Suppressing network connections::
1156 @c * Broken hostname resolution::
1158 @node Suppressing network connections
1159 @subsection Suppressing Network Connections
1161 @cindex network connections, suppressing
1162 @cindex suppressing network connections
1164 @cindex HTML ``bugs''
1165 In some circumstances it is desirable to suppress making network
1166 connections. A typical case is when rendering HTML in a mail user
1167 agent, when external URLs should not be activated, particularly to
1168 avoid ``bugs'' which ``call home'' by fetch single-pixel images and the
1169 like. To arrange this, bind the following variable for the duration
1172 @defvar url-gateway-unplugged
1173 If this variable is non-@code{nil} new network connections are never
1174 opened by the URL library.
1177 @c @node Broken hostname resolution
1178 @c @subsection Broken Hostname Resolution
1180 @c @cindex hostname resolver
1181 @c @cindex resolver, hostname
1182 @c Some C libraries do not include the hostname resolver routines in
1183 @c their static libraries. If Emacs was linked statically, and was not
1184 @c linked with the resolver libraries, it will not be able to get to any
1185 @c machines off the local network. This is characterized by being able
1186 @c to reach someplace with a raw ip number, but not its hostname
1187 @c (@url{http://129.79.254.191/} works, but
1188 @c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
1189 @c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
1190 @c rebuilt linked against the resolver library, it can use the external
1191 @c @command{nslookup} program instead.
1193 @c @defopt url-gateway-broken-resolution
1194 @c @cindex @code{nslookup} program
1195 @c @cindex program, @code{nslookup}
1196 @c If non-@code{nil}, this variable says to use the program specified by
1197 @c @code{url-gateway-nslookup-program} program to do hostname resolution.
1200 @c @defopt url-gateway-nslookup-program
1201 @c The name of the program to do hostname lookup if Emacs can't do it
1202 @c directly. This program should expect a single argument on the command
1203 @c line---the hostname to resolve---and should produce output similar to
1204 @c the standard Unix @command{nslookup} program:
1206 @c Name: www.cs.indiana.edu
1207 @c Address: 129.79.254.191
1214 @findex url-do-setup
1215 The library can maintain a global history list tracking URLs accessed.
1216 URL completion can be done from it. The history mechanism is set up
1217 automatically via @code{url-do-setup} when it is configured to be on.
1218 Note that the size of the history list is currently not limited.
1220 @vindex url-history-hash-table
1221 The history ``list'' is actually a hash table,
1222 @code{url-history-hash-table}. It contains access times keyed by URL
1223 strings. The times are in the format returned by @code{current-time}.
1225 @defun url-history-update-url url time
1226 This function updates the history table with an entry for @var{url}
1227 accessed at the given @var{time}.
1230 @defopt url-history-track
1231 If non-@code{nil}, the library will keep track of all the URLs
1232 accessed. If it is @code{t}, the list is saved to disk at the end of
1233 each Emacs session. The default is @code{nil}.
1236 @defopt url-history-file
1237 The file storing the history list between sessions. It defaults to
1238 @file{history} in @code{url-configuration-directory}.
1241 @defopt url-history-save-interval
1242 @findex url-history-setup-save-timer
1243 The number of seconds between automatic saves of the history list.
1244 Default is one hour. Note that if you change this variable directly,
1245 rather than using Custom, after @code{url-do-setup} has been run, you
1246 need to run the function @code{url-history-setup-save-timer}.
1249 @defun url-history-parse-history &optional fname
1250 Parses the history file @var{fname} (default @code{url-history-file})
1251 and sets up the history list.
1254 @defun url-history-save-history &optional fname
1255 Saves the current history to file @var{fname} (default
1256 @code{url-history-file}).
1259 @defun url-completion-function string predicate function
1260 You can use this function to do completion of URLs from the history.
1264 @chapter Customization
1266 @cindex environment variables
1267 The following environment variables affect the @code{url} library's
1268 operation at startup.
1273 @vindex url-temporary-directory
1274 If this is defined, @var{url-temporary-directory} is initialized from
1278 The following user options affect the general operation of
1281 @defopt url-configuration-directory
1282 @cindex configuration files
1283 The value of this variable specifies the name of the directory where
1284 the @code{url} library stores its various configuration files, cache
1287 The default value specifies a subdirectory named @file{url/} in the
1288 standard Emacs user data directory specified by the variable
1289 @code{user-emacs-directory} (normally @file{~/.emacs.d}). However,
1290 the old default was @file{~/.url}, and this directory is used instead
1296 Specifies the types of debug messages which are logged to
1297 the @file{*URL-DEBUG*} buffer.
1298 @code{t} means log all messages.
1299 A number means log all messages and show them with @code{message}.
1300 It may also be a list of the types of messages to be logged.
1302 @defopt url-personal-mail-address
1304 @defopt url-privacy-level
1306 @defopt url-uncompressor-alist
1308 @defopt url-passwd-entry-func
1310 @defopt url-standalone-mode
1312 @defopt url-bad-port-list
1314 @defopt url-max-password-attempts
1316 @defopt url-temporary-directory
1318 @defopt url-show-status
1320 @defopt url-confirmation-func
1321 The function to use for asking yes or no functions. This is normally
1322 either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
1323 function taking a single argument (the prompt) and returning @code{t}
1324 only if an affirmative answer is given.
1326 @defopt url-gateway-method
1327 @c fixme: describe gatewaying
1328 A symbol specifying the type of gateway support to use for connections
1329 from the local machine. The supported methods are:
1333 Run telnet in a subprocess to connect;
1335 Rlogin to another machine to connect;
1337 Connect through a socks server;
1345 @defopt url-user-agent
1346 The User Agent string used for sending @acronym{HTTP}/@acronym{HTTPS}
1347 requests. The value should be @code{nil}, which means that no
1348 @samp{User-Agent} header is generated, @code{default}, which means
1349 that a string is generated based on the setting of
1350 @code{url-privacy-leve}, a string or a function of no arguments that
1353 The default is @code{default}, which means that the
1354 @w{@samp{User-Agent: @var{package-name} URL/Emacs}} string will be
1355 generated, where @var{package-name} is the value of
1356 @code{url-package-name} and its version, if they are non-@code{nil}.
1359 @node GNU Free Documentation License
1360 @appendix GNU Free Documentation License
1361 @include doclicense.texi
1363 @node Function Index
1364 @unnumbered Command and Function Index
1367 @node Variable Index
1368 @unnumbered Variable Index
1372 @unnumbered Concept Index