(uniquify-rationalize): New fun. Store the fix-list in uniquify-managed.

[gnu-emacs] / lispref / nonascii.texi
diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi

index 52330b090faf97889aea6881009c2c492bd84986..9a7549d76550997b3ac3bc5b19f34abb7ea90155 100644 (file)
--- a/lispref/nonascii.texi
+++ b/lispref/nonascii.texi
@@ -1,6 +1,6 @@
  @c -*-texinfo-*-
  @c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1998, 1999 Free Software Foundation, Inc. 
+@c Copyright (C) 1998, 1999 Free Software Foundation, Inc.
  @c See the file elisp.texi for copying conditions.
  @setfilename ../info/characters
  @node Non-ASCII Characters, Searching and Matching, Text, Top
@@ -25,7 +25,7 @@ characters and how they are stored in strings and buffers.
  * Translation of Characters::   Translation tables are used for conversion.
  * Coding Systems::          Coding systems are conversions for saving files.
  * Input Methods::           Input methods allow users to enter various
-                                non-ASCII characters without speciak keyboards.
+                                non-ASCII characters without special keyboards.
  * Locales::                 Interacting with the POSIX locale.
  @end menu
  
@@ -176,13 +176,19 @@ If this is non-@code{nil}, it overrides @code{nonascii-insert-offset}.
  @defun string-make-unibyte string
  This function converts the text of @var{string} to unibyte
  representation, if it isn't already, and returns the result.  If
-@var{string} is a unibyte string, it is returned unchanged.
+@var{string} is a unibyte string, it is returned unchanged.  Multibyte
+character codes are converted to unibyte according to
+@code{nonascii-translation-table} or, if that is @code{nil}, using
+@code{nonascii-insert-offset}.  If the lookup in the translation table
+fails, this function takes just the low 8 bits of each character.
  @end defun
  
  @defun string-make-multibyte string
  This function converts the text of @var{string} to multibyte
  representation, if it isn't already, and returns the result.  If
  @var{string} is a multibyte string, it is returned unchanged.
+The function @code{unibyte-char-to-multibyte} is used to convert
+each unibyte character to a multibyte character.
  @end defun
  
  @node Selecting a Representation
@@ -221,7 +227,10 @@ treating each byte as a character.  This means that the value may have
  more characters than @var{string} has.
  
  If @var{string} is already a unibyte string, then the value is
-@var{string} itself.
+@var{string} itself.  Otherwise it is a newly created string, with no
+text properties.  If @var{string} is multibyte, any characters it
+contains of charset @code{eight-bit-control} or @code{eight-bit-graphic}
+are converted to the corresponding single byte.
  @end defun
  
  @defun string-as-multibyte string
@@ -230,7 +239,11 @@ treating each multibyte sequence as one character.  This means that the
  value may have fewer characters than @var{string} has.
  
  If @var{string} is already a multibyte string, then the value is
-@var{string} itself.
+@var{string} itself.  Otherwise it is a newly created string, with no
+text properties.  If @var{string} is unibyte and contains any individual
+8-bit bytes (i.e.@: not part of a multibyte form), they are converted to
+the corresponding multibyte character of charset @code{eight-bit-control}
+or @code{eight-bit-graphic}.
  @end defun
  
  @node Character Codes
@@ -371,12 +384,12 @@ values is the character set's dimension.
  @end example
  @end defun
  
-@defun make-char charset &rest byte-values
-This function returns the character in character set @var{charset}
-identified by @var{byte-values}.  This is roughly the inverse of
-@code{split-char}.  Normally, you should specify either one or two
-@var{byte-values}, according to the dimension of @var{charset}.  For
-example,
+@defun make-char charset &optional code1 code2
+This function returns the character in character set @var{charset} whose
+position codes are @var{code1} and @var{code2}.  This is roughly the
+inverse of @code{split-char}.  Normally, you should specify either one
+or both of @var{code1} and @var{code2} according to the dimension of
+@var{charset}.  For example,
  
  @example
  (make-char 'latin-iso8859-1 72)
@@ -405,7 +418,10 @@ For example:
  @end example
  
  The character sets @sc{ascii}, @sc{eight-bit-control}, and
-@sc{eight-bit-graphic} don't have corresponding generic characters.
+@sc{eight-bit-graphic} don't have corresponding generic characters.  If
+@var{charset} is one of them and you don't supply @var{code1},
+@code{make-char} returns the character code corresponding to the
+smallest code in @var{charset}.
  
  @node Scanning Charsets
  @section Scanning for Character Sets
@@ -446,10 +462,14 @@ apply to all other coding systems.
  
  @defun make-translation-table &rest translations
  This function returns a translation table based on the argument
-@var{translations}.  Each element of
-@var{translations} should be a list of the form @code{(@var{from}
-. @var{to})}; this says to translate the character @var{from} into
-@var{to}.
+@var{translations}.  Each element of @var{translations} should be a
+list of elements of the form @code{(@var{from} . @var{to})}; this says
+to translate the character @var{from} into @var{to}.
+
+The arguments and the forms in each argument are processed in order,
+and if a previous form already translates @var{to} to some other
+character, say @var{to-alt}, @var{from} is also translated to
+@var{to-alt}.
  
  You can also map one whole character set into another character set with
  the same dimension.  To do this, you specify a generic character (which
@@ -597,12 +617,15 @@ you will want to find out afterwards which coding system was chosen.
  @defvar buffer-file-coding-system
  This variable records the coding system that was used for visiting the
  current buffer.  It is used for saving the buffer, and for writing part
-of the buffer with @code{write-region}.  When those operations ask the
-user to specify a different coding system,
-@code{buffer-file-coding-system} is updated to the coding system
-specified.
-
-However, @code{buffer-file-coding-system} does not affect sending text
+of the buffer with @code{write-region}.  If the text to be written
+cannot be safely encoded using the coding system specified by this
+variable, these operations select an alternative encoding by calling
+the function @code{select-safe-coding-system} (@pxref{User-Chosen
+Coding Systems}).  If selecting a different encoding requires to ask
+the user to specify a coding system, @code{buffer-file-coding-system}
+is updated to the newly selected coding system.
+
+@code{buffer-file-coding-system} does @emph{not} affect sending text
  to a subprocess.
  @end defvar
  
@@ -615,9 +638,10 @@ When a command to save the buffer starts out to use
  @code{buffer-file-coding-system} (or @code{save-buffer-coding-system}),
  and that coding system cannot handle
  the actual text in the buffer, the command asks the user to choose
-another coding system.  After that happens, the command also updates
-@code{buffer-file-coding-system} to represent the coding system that the
-user specified.
+another coding system (by calling @code{select-safe-coding-system}).
+After that happens, the command also updates
+@code{buffer-file-coding-system} to represent the coding system that
+the user specified.
  @end defvar
  
  @defvar last-coding-system-used
@@ -721,22 +745,41 @@ systems used for I/O to a subprocess.
  @node User-Chosen Coding Systems
  @subsection User-Chosen Coding Systems
  
-@defun select-safe-coding-system from to &optional preferred-coding-system
-This function selects a coding system for encoding the text between
-@var{from} and @var{to}, asking the user to choose if necessary.
-
-The optional argument @var{preferred-coding-system} specifies a coding
-system to try first.  If that one can handle the text in the specified
-region, then it is used.  If this argument is omitted, the current
-buffer's value of @code{buffer-file-coding-system} is tried first.
-
-If the region contains some multibyte characters that the preferred
-coding system cannot encode, this function asks the user to choose from
-a list of coding systems which can encode the text, and returns the
-user's choice.
-
-One other kludgy feature: if @var{from} is a string, the string is the
-target text, and @var{to} is ignored.
+@cindex select safe coding system
+@defun select-safe-coding-system from to &optional default-coding-system accept-default-p
+This function selects a coding system for encoding specified text,
+asking the user to choose if necessary.  Normally the specified text
+is the text in the current buffer between @var{from} and @var{to},
+defaulting to the whole buffer if they are @code{nil}.  If @var{from}
+is a string, the string specifies the text to encode, and @var{to} is
+ignored.
+
+If @var{default-coding-system} is non-@code{nil}, that is the first
+coding system to try; if that can handle the text,
+@code{select-safe-coding-system} returns that coding system.  It can
+also be a list of coding systems; then the function tries each of them
+one by one.  After trying all of them, it next tries the user's most
+preferred coding system (@pxref{Recognize Coding,
+prefer-coding-system, the description of @code{prefer-coding-system},
+emacs, GNU Emacs Manual}), and after that the current buffer's value
+of @code{buffer-file-coding-system} (if it is not @code{undecided}).
+
+If one of those coding systems can safely encode all the specified
+text, @code{select-safe-coding-system} chooses it and returns it.
+Otherwise, it asks the user to choose from a list of coding systems
+which can encode all the text, and returns the user's choice.
+
+The optional argument @var{accept-default-p}, if non-@code{nil},
+should be a function to determine whether the coding system selected
+without user interaction is acceptable.  If this function returns
+@code{nil}, the silently selected coding system is rejected, and the
+user is asked to select a coding system from a list of possible
+candidates.
+
+@vindex select-safe-coding-system-accept-default-p
+If the variable @code{select-safe-coding-system-accept-default-p} is
+non-@code{nil}, its value overrides the value of
+@var{accept-default-p}.
  @end defun
  
    Here are two functions you can use to let the user specify a coding
@@ -770,6 +813,18 @@ don't change these variables; instead, override them using
  @code{coding-system-for-read} and @code{coding-system-for-write}
  (@pxref{Specifying Coding Systems}).
  
+@defvar auto-coding-regexp-alist
+This variable is an alist of text patterns and corresponding coding
+systems. Each element has the form @code{(@var{regexp}
+. @var{coding-system})}; a file whose first few kilobytes match
+@var{regexp} is decoded with @var{coding-system} when its contents are
+read into a buffer.  The settings in this alist take priority over
+@code{coding:} tags in the files and the contents of
+@code{file-coding-system-alist} (see below).  The default value is set
+so that Emacs automatically recognizes mail files in Babyl format and
+reads them with no code conversions.
+@end defvar
+
  @defvar file-coding-system-alist
  This variable is an alist that specifies the coding systems to use for
  reading and writing particular files.  Each element has the form
@@ -1156,10 +1211,11 @@ how Emacs interacts with these features.
  
  @defvar locale-coding-system
  @tindex locale-coding-system
+@cindex keyboard input decoding on X
  This variable specifies the coding system to use for decoding system
-error messages, for encoding the format argument to
-@code{format-time-string}, and for decoding the return value of
-@code{format-time-string}.
+error messages and---on X Window system only---keyboard input, for
+encoding the format argument to @code{format-time-string}, and for
+decoding the return value of @code{format-time-string}.
  @end defvar
  
  @defvar system-messages-locale