X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/e8757f091a502b858912a4c267210e009227d6e6..a7fecaa0c5f8247c3b3747506201ec2a2ecbe292:/doc/emacs/mule.texi diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi index 1dfae79c78..1600f19499 100644 --- a/doc/emacs/mule.texi +++ b/doc/emacs/mule.texi @@ -1,11 +1,10 @@ @c This is part of the Emacs manual. -@c Copyright (C) 1997, 1999-2012 Free Software Foundation, Inc. +@c Copyright (C) 1997, 1999-2014 Free Software Foundation, Inc. @c See file emacs.texi for copying conditions. @node International @chapter International Character Set Support @c This node is referenced in the tutorial. When renaming or deleting @c it, the tutorial needs to be adjusted. (TUTORIAL.de) -@cindex MULE @cindex international scripts @cindex multibyte characters @cindex encoding of characters @@ -90,7 +89,6 @@ value to make sure Emacs interprets keyboard input correctly; see @menu * International Chars:: Basic concepts of multibyte characters. -* Disabling Multibyte:: Controlling whether to use multibyte characters. * Language Environments:: Setting things up for the language you use. * Input Methods:: Entering text characters not on your keyboard. * Select Input Method:: Specifying your choice of input methods. @@ -244,79 +242,6 @@ Character code properties: customize what to show decomposition: (65 768) ('A' '`') @end smallexample -@c FIXME? Does this section even belong in the user manual? -@c Seems more appropriate to the lispref? -@node Disabling Multibyte -@section Disabling Multibyte Characters - - By default, Emacs starts in multibyte mode: it stores the contents -of buffers and strings using an internal encoding that represents -non-@acronym{ASCII} characters using multi-byte sequences. Multibyte -mode allows you to use all the supported languages and scripts without -limitations. - -@cindex turn multibyte support on or off - Under very special circumstances, you may want to disable multibyte -character support, for a specific buffer. -When multibyte characters are disabled in a buffer, we call -that @dfn{unibyte mode}. In unibyte mode, each character in the -buffer has a character code ranging from 0 through 255 (0377 octal); 0 -through 127 (0177 octal) represent @acronym{ASCII} characters, and 128 -(0200 octal) through 255 (0377 octal) represent non-@acronym{ASCII} -characters. - - To edit a particular file in unibyte representation, visit it using -@code{find-file-literally}. @xref{Visiting}. You can convert a -multibyte buffer to unibyte by saving it to a file, killing the -buffer, and visiting the file again with @code{find-file-literally}. -Alternatively, you can use @kbd{C-x @key{RET} c} -(@code{universal-coding-system-argument}) and specify @samp{raw-text} -as the coding system with which to visit or save a file. @xref{Text -Coding}. Unlike @code{find-file-literally}, finding a file as -@samp{raw-text} doesn't disable format conversion, uncompression, or -auto mode selection. - -@c Not a single file in Emacs uses this feature. Is it really worth -@c mentioning in the _user_ manual? Also, this duplicates somewhat -@c "Loading Non-ASCII" from the lispref. -@cindex Lisp files, and multibyte operation -@cindex multibyte operation, and Lisp files -@cindex unibyte operation, and Lisp files -@cindex init file, and non-@acronym{ASCII} characters - Emacs normally loads Lisp files as multibyte. -This includes the Emacs initialization -file, @file{.emacs}, and the initialization files of packages -such as Gnus. However, you can specify unibyte loading for a -particular Lisp file, by adding an entry @samp{coding: raw-text} in a file -local variables section. @xref{Specify Coding}. -Then that file is always loaded as unibyte text. -@ignore -@c I don't see the point of this statement: -The motivation for these conventions is that it is more reliable to -always load any particular Lisp file in the same way. -@end ignore -You can also load a Lisp file as unibyte, on any one occasion, by -typing @kbd{C-x @key{RET} c raw-text @key{RET}} immediately before -loading it. - -@c See http://debbugs.gnu.org/11226 for lack of unibyte tooltip. -@vindex enable-multibyte-characters -The buffer-local variable @code{enable-multibyte-characters} is -non-@code{nil} in multibyte buffers, and @code{nil} in unibyte ones. -The mode line also indicates whether a buffer is multibyte or not. -@xref{Mode Line}. With a graphical display, in a multibyte buffer, -the portion of the mode line that indicates the character set has a -tooltip that (amongst other things) says that the buffer is multibyte. -In a unibyte buffer, the character set indicator is absent. Thus, in -a unibyte buffer (when using a graphical display) there is normally -nothing before the indication of the visited file's end-of-line -convention (colon, backslash, etc.), unless you are using an input -method. - -@findex toggle-enable-multibyte-characters -You can turn off multibyte support in a specific buffer by invoking the -command @code{toggle-enable-multibyte-characters} in that buffer. - @node Language Environments @section Language Environments @cindex language environments @@ -919,19 +844,6 @@ pattern, are decoded correctly. Unlike the previous two, this variable does not override any @samp{-*-coding:-*-} tag. -@c FIXME? This seems somewhat out of place. Move to the Rmail section? -@vindex rmail-decode-mime-charset -@vindex rmail-file-coding-system - When you get new mail in Rmail, each message is translated -automatically from the coding system it is written in, as if it were a -separate file. This uses the priority list of coding systems that you -have specified. If a MIME message specifies a character set, Rmail -obeys that specification. For reading and saving Rmail files -themselves, Emacs uses the coding system specified by the variable -@code{rmail-file-coding-system}. The default value is @code{nil}, -which means that Rmail files are not translated (they are read and -written in the Emacs internal character code). - @node Specify Coding @section Specifying a File's Coding System @@ -995,7 +907,7 @@ decoding. (You can still use an unsuitable coding system if you enter its name at the prompt.) @c It seems that select-message-coding-system does this. -@c Both sendmail.el and smptmail.el call it; i.e. smtpmail.el still +@c Both sendmail.el and smptmail.el call it; i.e., smtpmail.el still @c obeys sendmail-coding-system. @vindex sendmail-coding-system When you send a mail message (@pxref{Sending Mail}), @@ -1040,12 +952,16 @@ decoding it using coding system @var{right} instead. @findex set-buffer-file-coding-system The command @kbd{C-x @key{RET} f} (@code{set-buffer-file-coding-system}) sets the file coding system for -the current buffer---in other words, it says which coding system to -use when saving or reverting the visited file. You specify which -coding system using the minibuffer. If you specify a coding system -that cannot handle all of the characters in the buffer, Emacs warns -you about the troublesome characters when you actually save the -buffer. +the current buffer (i.e., the coding system to use when saving or +reverting the file). You specify which coding system using the +minibuffer. You can also invoke this command by clicking with +@kbd{Mouse-3} on the coding system indicator in the mode line +(@pxref{Mode Line}). + + If you specify a coding system that cannot handle all the characters +in the buffer, Emacs will warn you about the troublesome characters, +and ask you to choose another coding system, when you try to save the +buffer (@pxref{Output Coding}). @cindex specify end-of-line conversion You can also use this command to specify the end-of-line conversion @@ -1213,6 +1129,21 @@ In the default language environment, non-@acronym{ASCII} characters in file names are not encoded specially; they appear in the file system using the internal Emacs representation. +@cindex file-name encoding, MS-Windows +@vindex w32-unicode-filenames + When Emacs runs on MS-Windows versions that are descendants of the +NT family (Windows 2000, XP, Vista, Windows 7, and Windows 8), the +value of @code{file-name-coding-system} is largely ignored, as Emacs +by default uses APIs that allow to pass Unicode file names directly. +By contrast, on Windows 9X, file names are encoded using +@code{file-name-coding-system}, which should be set to the codepage +(@pxref{Coding Systems, codepage}) pertinent for the current system +locale. The value of the variable @code{w32-unicode-filenames} +controls whether Emacs uses the Unicode APIs when it calls OS +functions that accept file names. This variable is set by the startup +code to @code{nil} on Windows 9X, and to @code{t} on newer versions of +MS-Windows. + @strong{Warning:} if you change @code{file-name-coding-system} (or the language environment) in the middle of an Emacs session, problems can result if you have already visited files whose names were encoded using @@ -1320,7 +1251,7 @@ scripts.@footnote{If you run Emacs on X, you may need to inform the X server about the location of the newly installed fonts with commands such as: @c FIXME? I feel like this may be out of date. -@c Eg the intlfonts tarfile is ~ 10 years old. +@c E.g., the intlfonts tarfile is ~ 10 years old. @example xset fp+ /usr/local/share/emacs/fonts @@ -1566,7 +1497,7 @@ no font appear as a hollow box. If you use Latin-1 characters but your terminal can't display Latin-1, you can arrange to display mnemonic @acronym{ASCII} sequences -instead, e.g.@: @samp{"o} for o-umlaut. Load the library +instead, e.g., @samp{"o} for o-umlaut. Load the library @file{iso-ascii} to do this. @vindex latin1-display @@ -1588,15 +1519,13 @@ the range 0240 to 0377 octal (160 to 255 decimal) to handle the accented letters and punctuation needed by various European languages (and some non-European ones). Note that Emacs considers bytes with codes in this range as raw bytes, not as characters, even in a unibyte -buffer, i.e.@: if you disable multibyte characters. However, Emacs -can still handle these character codes as if they belonged to -@emph{one} of the single-byte character sets at a time. To specify -@emph{which} of these codes to use, invoke @kbd{M-x -set-language-environment} and specify a suitable language environment -such as @samp{Latin-@var{n}}. - - For more information about unibyte operation, see -@ref{Disabling Multibyte}. +buffer, i.e., if you disable multibyte characters. However, Emacs can +still handle these character codes as if they belonged to @emph{one} +of the single-byte character sets at a time. To specify @emph{which} +of these codes to use, invoke @kbd{M-x set-language-environment} and +specify a suitable language environment such as @samp{Latin-@var{n}}. +@xref{Disabling Multibyte, , Disabling Multibyte Characters, elisp, +GNU Emacs Lisp Reference Manual}. @vindex unibyte-display-via-language-environment Emacs can also display bytes in the range 160 to 255 as readable @@ -1764,7 +1693,7 @@ directionality when they are displayed. The default value is Each paragraph of bidirectional text can have its own @dfn{base direction}, either right-to-left or left-to-right. (Paragraph @c paragraph-separate etc have no influence on this? -boundaries are empty lines, i.e.@: lines consisting entirely of +boundaries are empty lines, i.e., lines consisting entirely of whitespace characters.) Text in left-to-right paragraphs begins on the screen at the left margin of the window and is truncated or continued when it reaches the right margin. By contrast, text in @@ -1801,4 +1730,6 @@ jump when point traverses reordered bidirectional text. Similarly, a highlighted region covering a contiguous range of character positions may look discontinuous if the region spans reordered text. This is normal and similar to the behavior of other programs that support -bidirectional text. +bidirectional text. If you set @code{visual-order-cursor-movement} to +a non-@code{nil} value, cursor motion by the arrow keys follows the +visual order on screen (@pxref{Moving Point, visual-order movement}).