@c This is part of the Emacs manual.
-@c Copyright (C) 1997, 1999, 2000, 2001 Free Software Foundation, Inc.
+@c Copyright (C) 1997, 1999, 2000, 2001, 2002, 2003, 2004,
+@c 2005 Free Software Foundation, Inc.
@c See file emacs.texi for copying conditions.
@node International, Major Modes, Frames, Top
@chapter International Character Set Support
@cindex Dutch
@cindex Spanish
Emacs supports a wide variety of international character sets,
-including European variants of the Latin alphabet, as well as Chinese,
-Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA,
-Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts. These features
-have been merged from the modified version of Emacs known as MULE (for
-``MULti-lingual Enhancement to GNU Emacs'')
+including European and Vietnamese variants of the Latin alphabet, as
+well as Cyrillic, Devanagari (for Hindi and Marathi), Ethiopic, Greek,
+Han (for Chinese and Japanese), Hangul (for Korean), Hebrew, IPA,
+Kannada, Lao, Malayalam, Tamil, Thai, Tibetan, and Vietnamese scripts.
+These features have been merged from the modified version of Emacs
+known as MULE (for ``MULti-lingual Enhancement to GNU Emacs'')
Emacs also supports various encodings of these characters used by
other internationalized software, such as word processors and mailers.
You can insert non-@acronym{ASCII} characters or search for them. To do that,
you can specify an input method (@pxref{Select Input Method}) suitable
for your language, or use the default input method set up when you set
-your language environment. (Emacs input methods are part of the Leim
-package, which must be installed for you to be able to use them.) If
+your language environment. If
your keyboard can produce non-@acronym{ASCII} characters, you can select an
appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs
will accept those characters. Latin-1 characters can also be input by
that cover the whole spectrum of characters.
* Defining Fontsets:: Defining a new fontset.
* Undisplayable Characters:: When characters don't display.
-* Single-Byte Character Support::
- You can pick one European character set
- to use without multibyte characters.
+* Single-Byte Character Support:: You can pick one European character set
+ to use without multibyte characters.
* Charsets:: How Emacs groups its internal character codes.
@end menu
@cindex Euro sign
@cindex UTF-8
@quotation
-Chinese-BIG5, Chinese-CNS, Chinese-GB, Cyrillic-ALT, Cyrillic-ISO,
-Cyrillic-KOI8, Czech, Devanagari, Dutch, English, Ethiopic, German,
-Greek, Hebrew, IPA, Japanese, Korean, Lao, Latin-1, Latin-2, Latin-3,
-Latin-4, Latin-5, Latin-8 (Celtic), Latin-9 (updated Latin-1, with the
-Euro sign), Polish, Romanian, Slovak, Slovenian, Spanish, Thai, Tibetan,
-Turkish, UTF-8 (for a setup which prefers Unicode characters and files
-encoded in UTF-8), and Vietnamese.
+Belarusian, Brazilian Portuguese, Bulgarian, Chinese-BIG5,
+Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Croatian, Cyrillic-ALT,
+Cyrillic-ISO, Cyrillic-KOI8, Czech, Devanagari, Dutch, English,
+Ethiopic, French, Georgian, German, Greek, Hebrew, IPA, Italian,
+Japanese, Kannada, Korean, Lao, Latin-1, Latin-2, Latin-3,
+Latin-4, Latin-5, Latin-6, Latin-7, Latin-8 (Celtic),
+Latin-9 (updated Latin-1 with the Euro sign), Latvian,
+Lithuanian, Malayalam, Polish, Romanian, Russian, Slovak,
+Slovenian, Spanish, Swedish, Tajik, Tamil, Thai, Tibetan,
+Turkish, UTF-8 (for a setup which prefers Unicode characters and
+files encoded in UTF-8), Ukrainian, Vietnamese, Welsh, and
+Windows-1255 (for a setup which prefers Cyrillic characters and
+files encoded in Windows-1255).
@end quotation
@cindex fonts for various scripts
To display the script(s) used by your language environment on a
graphical display, you need to have a suitable font. If some of the
characters appear as empty boxes, you should install the GNU Intlfonts
-package, which includes fonts for all supported scripts.@footnote{If
+package, which includes fonts for most supported scripts.@footnote{If
you run Emacs on X, you need to inform the X server about the location
of the newly installed fonts with the following commands:
because it stops waiting for more characters to combine, and starts
searching for what you have already entered.
+ To find out how to input the character after point using the current
+input method, type @kbd{C-u C-x =}. @xref{Position Info}.
+
@vindex input-method-verbose-flag
@vindex input-method-highlight-flag
The variables @code{input-method-highlight-flag} and
possible characters to type next is displayed in the echo area (but
not when you are in the minibuffer).
-@cindex Leim package
- Input methods are implemented in the separate Leim package: they are
-available only if the system administrator used Leim when building
-Emacs. If Emacs was built without Leim, you will find that no input
-methods are defined.
-
@node Select Input Method
@section Selecting an Input Method
actual keyboard layout. To specify which layout your keyboard has, use
the command @kbd{M-x quail-set-keyboard-layout}.
+@findex quail-show-key
+ You can use the command @kbd{M-x quail-show-key} to show what key (or
+key sequence) to type in order to input the character following point,
+using the selected keyboard layout. The command @kbd{C-u C-x =} also
+shows that information in addition to the other information about the
+character.
+
@findex list-input-methods
To display a list of all the supported input methods, type @kbd{M-x
list-input-methods}. The list gives information about each input
@cindex international files from DOS/Windows systems
A special class of coding systems, collectively known as
@dfn{codepages}, is designed to support text encoded by MS-Windows and
-MS-DOS software. To use any of these systems, you need to create it
-with @kbd{M-x codepage-setup}. @xref{MS-DOS and MULE}. After
-creating the coding system for the codepage, you can use it as any
-other coding system. For example, to visit a file encoded in codepage
-850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename}
-@key{RET}}.
+MS-DOS software. The names of these coding systems are
+@code{cp@var{nnnn}}, where @var{nnnn} is a 3- or 4-digit number of the
+codepage. You can use these encodings just like any other coding
+system; for example, to visit a file encoded in codepage 850, type
+@kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename}
+@key{RET}}@footnote{
+In the MS-DOS port of Emacs, you need to create a @code{cp@var{nnn}}
+coding system with @kbd{M-x codepage-setup}, before you can use it.
+@xref{MS-DOS and MULE}.}.
In addition to converting various representations of non-@acronym{ASCII}
characters, a coding system can perform end-of-line conversion. Emacs
@code{china-iso-8bit}, you can execute this Lisp expression:
@smallexample
-(modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit)
+(modify-coding-system-alist 'file "\\.txt\\'" 'chinese-iso-8bit)
@end smallexample
@noindent
If Emacs recognizes the encoding of a file incorrectly, you can
reread the file using the correct coding system by typing @kbd{C-x
-@key{RET} c @var{coding-system} @key{RET} M-x revert-buffer
+@key{RET} r @var{coding-system}
@key{RET}}. To see what coding system Emacs actually used to decode
the file, look at the coding system mnemonic letter near the left edge
of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}.
Specify coding system @var{coding} for the immediately following
command.
+@item C-x @key{RET} r @var{coding} @key{RET}
+Revisit the current file using the coding system @var{coding}.
+
@item C-x @key{RET} k @var{coding} @key{RET}
Use coding system @var{coding} for keyboard input.
@item C-x @key{RET} X @var{coding} @key{RET}
Use coding system @var{coding} for transferring @emph{one}
selection---the next one---to or from the window system.
+
+@item M-x recode-region
+Convert the region from a previous coding system to a new one.
@end table
@kindex C-x RET f
variable to a good choice of default coding system for that language
environment.
+@kindex C-x RET r
+@findex revert-buffer-with-coding-system
+ If you visit a file with a wrong coding system, you can correct this
+with @kbd{C-x @key{RET} r} (@code{revert-buffer-with-coding-system}).
+This visits the current file again, using a coding system you specify.
+
@kindex C-x RET t
@findex set-terminal-coding-system
The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system})
The default for translation of process input and output depends on the
current language environment.
+@findex recode-region
+ If a piece of text has already been inserted into a buffer using the
+wrong coding system, you can decode it again using @kbd{M-x
+recode-region}. This prompts you for the old coding system and the
+desired coding system, and acts on the text in the region.
+
@vindex file-name-coding-system
@cindex file names with non-@acronym{ASCII} characters
@findex set-file-name-coding-system
name, or it may get an error. If such a problem happens, use @kbd{C-x
C-w} to specify a new file name for that buffer.
+@findex recode-file-name
+ If a mistake occurs when encoding a file name, use the command
+@kbd{M-x recode-file-name} to change the file name's coding
+system. This prompts for an existing file name, its old coding
+system, and the coding system to which you wish to convert.
+
@vindex locale-coding-system
@cindex decoding non-@acronym{ASCII} keyboard input on X
The variable @code{locale-coding-system} specifies a coding system
non-standard ``extended'' versions of ISO-8859 character sets by using the
function @code{standard-display-8bit} in the @code{disp-table} library.
- There are several ways you can input single-byte non-@acronym{ASCII}
+ There are two ways to input single-byte non-@acronym{ASCII}
characters:
@itemize @bullet
@cindex 8-bit input
+@item
+You can use an input method for the selected language environment.
+@xref{Input Methods}. When you use an input method in a unibyte buffer,
+the non-@acronym{ASCII} character you specify with it is converted to unibyte.
+
@item
If your keyboard can generate character codes 128 (decimal) and up,
representing non-@acronym{ASCII} characters, you can type those character codes
directly.
-On a windowing terminal, you should not need to do anything special to
-use these keys; they should simply work. On a text-only terminal, you
+On a window system, you should not need to do anything special to use
+these keys; they should simply work. On a text-only terminal, you
should use the command @code{M-x set-keyboard-coding-system} or the
-variable @code{keyboard-coding-system} to specify which coding
-system your keyboard uses (@pxref{Specify Coding}). Enabling this
-feature will probably require you to use @kbd{ESC} to type Meta
-characters; however, on a console terminal or in @code{xterm}, you can
-arrange for Meta to be converted to @kbd{ESC} and still be able type
-8-bit characters present directly on the keyboard or using
-@kbd{Compose} or @kbd{AltGr} keys. @xref{User Input}.
-
-@item
-You can use an input method for the selected language environment.
-@xref{Input Methods}. When you use an input method in a unibyte buffer,
-the non-@acronym{ASCII} character you specify with it is converted to unibyte.
+variable @code{keyboard-coding-system} to specify which coding system
+your keyboard uses (@pxref{Specify Coding}). Enabling this feature
+will probably require you to use @kbd{ESC} to type Meta characters;
+however, on a console terminal or in @code{xterm}, you can arrange for
+Meta to be converted to @kbd{ESC} and still be able type 8-bit
+characters present directly on the keyboard or using @kbd{Compose} or
+@kbd{AltGr} keys. @xref{User Input}.
@kindex C-x 8
@cindex @code{iso-transl} library
@cindex compose character
@cindex dead character
@item
-For Latin-1 only, you can use the
-key @kbd{C-x 8} as a ``compose character'' prefix for entry of
-non-@acronym{ASCII} Latin-1 printing characters. @kbd{C-x 8} is good for
-insertion (in the minibuffer as well as other buffers), for searching,
-and in any other context where a key sequence is allowed.
+For Latin-1 only, you can use the key @kbd{C-x 8} as a ``compose
+character'' prefix for entry of non-@acronym{ASCII} Latin-1 printing
+characters. @kbd{C-x 8} is good for insertion (in the minibuffer as
+well as other buffers), for searching, and in any other context where
+a key sequence is allowed.
@kbd{C-x 8} works by loading the @code{iso-transl} library. Once that
-library is loaded, the @key{ALT} modifier key, if you have one, serves
-the same purpose as @kbd{C-x 8}; use @key{ALT} together with an accent
-character to modify the following letter. In addition, if you have keys
-for the Latin-1 ``dead accent characters,'' they too are defined to
-compose with the following character, once @code{iso-transl} is loaded.
-Use @kbd{C-x 8 C-h} to list the available translations as mnemonic
-command names.
-
-@item
-@cindex @code{iso-acc} library
-@cindex ISO Accents mode
-@findex iso-accents-mode
-@cindex Latin-1, Latin-2 and Latin-3 input mode
-For Latin-1, Latin-2 and Latin-3, @kbd{M-x iso-accents-mode} enables
-a minor mode that works much like the @code{latin-1-prefix} input
-method, but does not depend on having the input methods installed. This
-mode is buffer-local. It can be customized for various languages with
-@kbd{M-x iso-accents-customize}.
+library is loaded, the @key{ALT} modifier key, if the keyboard has
+one, serves the same purpose as @kbd{C-x 8}: use @key{ALT} together
+with an accent character to modify the following letter. In addition,
+if the keyboard has keys for the Latin-1 ``dead accent characters,''
+they too are defined to compose with the following character, once
+@code{iso-transl} is loaded.
+
+Use @kbd{C-x 8 C-h} to list all the available @kbd{C-x 8} translations.
@end itemize
@node Charsets