X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/651f374c99a2bcf5e2ed326a26cf0d89a5c204f5..aef88a00d364bbb208acff2d9b66b2a1eb6cf8f5:/lispref/nonascii.texi diff --git a/lispref/nonascii.texi b/lispref/nonascii.texi index 2af367a0f8..0922db4fac 100644 --- a/lispref/nonascii.texi +++ b/lispref/nonascii.texi @@ -1,12 +1,13 @@ @c -*-texinfo-*- @c This is part of the GNU Emacs Lisp Reference Manual. -@c Copyright (C) 1998, 1999, 2002, 2003, 2004, -@c 2005 Free Software Foundation, Inc. +@c Copyright (C) 1998, 1999, 2001, 2002, 2003, 2004, +@c 2005, 2006, 2007, 2008 Free Software Foundation, Inc. @c See the file elisp.texi for copying conditions. @setfilename ../info/characters @node Non-ASCII Characters, Searching and Matching, Text, Top @chapter Non-@acronym{ASCII} Characters @cindex multibyte characters +@cindex characters, multi-byte @cindex non-@acronym{ASCII} characters This chapter covers the special issues relating to non-@acronym{ASCII} @@ -95,7 +96,6 @@ default value to @code{nil} early in startup. @end defvar @defun position-bytes position -@tindex position-bytes Return the byte-position corresponding to buffer position @var{position} in the current buffer. This is 1 at the start of the buffer, and counts upward in bytes. If @var{position} is out of @@ -103,7 +103,6 @@ range, the value is @code{nil}. @end defun @defun byte-to-position byte-position -@tindex byte-to-position Return the buffer position corresponding to byte-position @var{byte-position} in the current buffer. If @var{byte-position} is out of range, the value is @code{nil}. @@ -113,6 +112,13 @@ out of range, the value is @code{nil}. Return @code{t} if @var{string} is a multibyte string. @end defun +@defun string-bytes string +@cindex string, number of bytes +This function returns the number of bytes in @var{string}. +If @var{string} is a multibyte string, this can be greater than +@code{(length @var{string})}. +@end defun + @node Converting Representations @section Converting Text Representations @@ -354,7 +360,6 @@ valid character. @end defun @defun charset-plist charset -@tindex charset-plist This function returns the charset property list of the character set @var{charset}. Although @var{charset} is a symbol, this is not the same as the property list of that symbol. Charset properties are used for @@ -370,7 +375,7 @@ This command displays a list of characters in the character set @section Characters and Bytes @cindex bytes and characters -@cindex introduction sequence +@cindex introduction sequence (of character) @cindex dimension (of character set) In multibyte representation, each character occupies one or more bytes. Each character set has an @dfn{introduction sequence}, which is @@ -389,7 +394,6 @@ dimension is always 1 or 2. @end defun @defun charset-bytes charset -@tindex charset-bytes This function returns the number of bytes used to represent a character in character set @var{charset}. @end defun @@ -404,6 +408,7 @@ set's introduction sequence: @node Splitting Characters @section Splitting Characters +@cindex character as bytes The functions in this section convert between characters and the byte values used to represent them. For most purposes, there is no need to @@ -429,6 +434,7 @@ returns a list consisting of the symbol @code{unknown} and @var{character}. @end example @end defun +@cindex generate characters in charsets @defun make-char charset &optional code1 code2 This function returns the character in character set @var{charset} whose position codes are @var{code1} and @var{code2}. This is roughly the @@ -484,7 +490,7 @@ of the text in question. @defun charset-after &optional pos This function return the charset of a character in the current buffer at position @var{pos}. If @var{pos} is omitted or @code{nil}, it -defauls to the current value of point. If @var{pos} is out of range, +defaults to the current value of point. If @var{pos} is out of range, the value is @code{nil}. @end defun @@ -578,12 +584,14 @@ coding systems that don't specify any other translation table. @defvar translation-table-for-input Self-inserting characters are translated through this translation -table before they are inserted. This variable automatically becomes -buffer-local when set. +table before they are inserted. Search commands also translate their +input through this table, so they can compare more reliably with +what's in the buffer. @code{set-buffer-file-coding-system} sets this variable so that your keyboard input gets translated into the character sets that the buffer -is likely to contain. +is likely to contain. This variable automatically becomes +buffer-local when set. @end defvar @node Coding Systems @@ -650,7 +658,9 @@ coding system, you'll get Unicode characters (of charset @code{iso-latin-2} and decode the result with the same coding system, you'll get Latin-2 characters. -@cindex end of line conversion +@cindex EOL conversion +@cindex end-of-line conversion +@cindex line end conversion @dfn{End of line conversion} handles three different conventions used on various systems for representing end of line in files. The Unix convention is to use the linefeed character (also called newline). The @@ -717,8 +727,8 @@ operation finishes the job of choosing a coding system. Very often you will want to find out afterwards which coding system was chosen. @defvar buffer-file-coding-system -This variable records the coding system that was used for visiting the -current buffer. It is used for saving the buffer, and for writing part +This buffer-local variable records the coding system that was used to visit +the current buffer. It is used for saving the buffer, and for writing part of the buffer with @code{write-region}. If the text to be written cannot be safely encoded using the coding system specified by this variable, these operations select an alternative encoding by calling @@ -803,6 +813,32 @@ If that is valid, it returns @var{coding-system}. Otherwise it signals an error with condition @code{coding-system-error}. @end defun +@defun coding-system-eol-type coding-system +This function returns the type of end-of-line (a.k.a.@: @dfn{eol}) +conversion used by @var{coding-system}. If @var{coding-system} +specifies a certain eol conversion, the return value is an integer 0, +1, or 2, standing for @code{unix}, @code{dos}, and @code{mac}, +respectively. If @var{coding-system} doesn't specify eol conversion +explicitly, the return value is a vector of coding systems, each one +with one of the possible eol conversion types, like this: + +@lisp +(coding-system-eol-type 'latin-1) + @result{} [latin-1-unix latin-1-dos latin-1-mac] +@end lisp + +@noindent +If this function returns a vector, Emacs will decide, as part of the +text encoding or decoding process, what eol conversion to use. For +decoding, the end-of-line format of the text is auto-detected, and the +eol conversion is set to match it (e.g., DOS-style CRLF format will +imply @code{dos} eol conversion). For encoding, the eol conversion is +taken from the appropriate default coding system (e.g., +@code{default-buffer-file-coding-system} for +@code{buffer-file-coding-system}), or from the default eol conversion +appropriate for the underlying platform. +@end defun + @defun coding-system-change-eol-conversion coding-system eol-type This function returns a coding system which is like @var{coding-system} except for its eol conversion, which is specified by @code{eol-type}. @@ -855,8 +891,9 @@ decreasing priority. But if @var{highest} is non-@code{nil}, then the return value is just one coding system, the one that is highest in priority. -If the region contains only @acronym{ASCII} characters, the value -is @code{undecided} or @code{(undecided)}, or a variant specifying +If the region contains only @acronym{ASCII} characters except for such +ISO-2022 control characters ISO-2022 as @code{ESC}, the value is +@code{undecided} or @code{(undecided)}, or a variant specifying end-of-line conversion, if that can be deduced from the text. @end defun @@ -1068,15 +1105,15 @@ for decoding (in case @var{operation} does decoding), and @var{encoding-system} is the coding system for encoding (in case @var{operation} does encoding). -The argument @var{operation} should be a symbol, any one of -@code{insert-file-contents}, @code{write-region}, +The argument @var{operation} is a symbol, one of @code{write-region}, @code{start-process}, @code{call-process}, @code{call-process-region}, -or @code{open-network-stream}. These are the names of the Emacs I/O -primitives that can do coding system conversion. +@code{insert-file-contents}, or @code{open-network-stream}. These are +the names of the Emacs I/O primitives that can do character code and +eol conversion. The remaining arguments should be the same arguments that might be given -to that I/O primitive. Depending on the primitive, one of those -arguments is selected as the @dfn{target}. For example, if +to the corresponding I/O primitive. Depending on the primitive, one +of those arguments is selected as the @dfn{target}. For example, if @var{operation} does file I/O, whichever argument specifies the file name is the target. For subprocess primitives, the process name is the target. For @code{open-network-stream}, the target is the service name @@ -1084,7 +1121,19 @@ or port number. Depending on @var{operation}, this function looks up the target in @code{file-coding-system-alist}, @code{process-coding-system-alist}, -or @code{network-coding-system-alist}. +or @code{network-coding-system-alist}. If the target is found in the +alist, @code{find-operation-coding-system} returns its association in +the alist; otherwise it returns @code{nil}. + +If @var{operation} is @code{insert-file-contents}, the argument +corresponding to the target may be a cons cell of the form +@code{(@var{filename} . @var{buffer})}). In that case, @var{filename} +is a file name to look up in @code{file-coding-system-alist}, and +@var{buffer} is a buffer that contains the file's contents (not yet +decoded). If @code{file-coding-system-alist} specifies a function to +call for this file, and that function needs to examine the file's +contents (as it usually does), it should examine the contents of +@var{buffer} instead of reading the file. @end defun @node Specifying Coding Systems @@ -1116,9 +1165,9 @@ of the right way to use the variable: (insert-file-contents filename)) @end example -When its value is non-@code{nil}, @code{coding-system-for-read} takes -precedence over all other methods of specifying a coding system to use for -input, including @code{file-coding-system-alist}, +When its value is non-@code{nil}, this variable takes precedence over +all other methods of specifying a coding system to use for input, +including @code{file-coding-system-alist}, @code{process-coding-system-alist} and @code{network-coding-system-alist}. @end defvar @@ -1143,8 +1192,8 @@ decoding functions (@pxref{Explicit Encoding}). @node Explicit Encoding @subsection Explicit Encoding and Decoding -@cindex encoding text -@cindex decoding text +@cindex encoding in coding systems +@cindex decoding in coding systems All the operations that transfer text in and out of Emacs have the ability to use a coding system to encode or decode the text. @@ -1172,7 +1221,7 @@ encoding by binding @code{coding-system-for-write} to @code{no-conversion}. Here are the functions to perform explicit encoding or decoding. The -decoding functions produce sequences of bytes; the encoding functions +encoding functions produce sequences of bytes; the decoding functions are meant to operate on sequences of bytes. All of these functions discard text properties. @@ -1396,7 +1445,6 @@ to use in language-related features. These Emacs variables control how Emacs interacts with these features. @defvar locale-coding-system -@tindex locale-coding-system @cindex keyboard input decoding on X This variable specifies the coding system to use for decoding system error messages and---on X Window system only---keyboard input, for @@ -1405,7 +1453,6 @@ decoding the return value of @code{format-time-string}. @end defvar @defvar system-messages-locale -@tindex system-messages-locale This variable specifies the locale to use for generating system error messages. Changing the locale can cause messages to come out in a different language or in a different orthography. If the variable is @@ -1414,7 +1461,6 @@ usual POSIX fashion. @end defvar @defvar system-time-locale -@tindex system-time-locale This variable specifies the locale to use for formatting time values. Changing the locale can cause messages to appear according to the conventions of a different language. If the variable is @code{nil}, the