@c This is part of the Emacs manual.
-@c Copyright (C) 1997, 1999, 2000, 2001 Free Software Foundation, Inc.
+@c Copyright (C) 1997, 1999, 2000, 2001, 2002, 2003, 2004,
+@c 2005 Free Software Foundation, Inc.
@c See file emacs.texi for copying conditions.
@node International, Major Modes, Frames, Top
@chapter International Character Set Support
@cindex Dutch
@cindex Spanish
Emacs supports a wide variety of international character sets,
-including European variants of the Latin alphabet, as well as Chinese,
-Cyrillic, Devanagari (Hindi and Marathi), Ethiopic, Greek, Hebrew, IPA,
-Japanese, Korean, Lao, Thai, Tibetan, and Vietnamese scripts. These features
-have been merged from the modified version of Emacs known as MULE (for
-``MULti-lingual Enhancement to GNU Emacs'')
+including European and Vietnamese variants of the Latin alphabet, as
+well as Cyrillic, Devanagari (for Hindi and Marathi), Ethiopic, Greek,
+Han (for Chinese and Japanese), Hangul (for Korean), Hebrew, IPA,
+Kannada, Lao, Malayalam, Tamil, Thai, Tibetan, and Vietnamese scripts.
+These features have been merged from the modified version of Emacs
+known as MULE (for ``MULti-lingual Enhancement to GNU Emacs'')
Emacs also supports various encodings of these characters used by
other internationalized software, such as word processors and mailers.
You can insert non-@acronym{ASCII} characters or search for them. To do that,
you can specify an input method (@pxref{Select Input Method}) suitable
for your language, or use the default input method set up when you set
-your language environment. (Emacs input methods are part of the Leim
-package, which must be installed for you to be able to use them.) If
+your language environment. If
your keyboard can produce non-@acronym{ASCII} characters, you can select an
appropriate keyboard coding system (@pxref{Specify Coding}), and Emacs
will accept those characters. Latin-1 characters can also be input by
that cover the whole spectrum of characters.
* Defining Fontsets:: Defining a new fontset.
* Undisplayable Characters:: When characters don't display.
-* Single-Byte Character Support::
- You can pick one European character set
- to use without multibyte characters.
+* Single-Byte Character Support:: You can pick one European character set
+ to use without multibyte characters.
* Charsets:: How Emacs groups its internal character codes.
@end menu
@cindex Euro sign
@cindex UTF-8
@quotation
-Chinese-BIG5, Chinese-CNS, Chinese-GB, Cyrillic-ALT, Cyrillic-ISO,
-Cyrillic-KOI8, Czech, Devanagari, Dutch, English, Ethiopic, German,
-Greek, Hebrew, IPA, Japanese, Korean, Lao, Latin-1, Latin-2, Latin-3,
-Latin-4, Latin-5, Latin-8 (Celtic), Latin-9 (updated Latin-1, with the
-Euro sign), Polish, Romanian, Slovak, Slovenian, Spanish, Thai, Tibetan,
-Turkish, UTF-8 (for a setup which prefers Unicode characters and files
-encoded in UTF-8), and Vietnamese.
+Belarusian, Brazilian Portuguese, Bulgarian, Chinese-BIG5,
+Chinese-CNS, Chinese-EUC-TW, Chinese-GB, Croatian, Cyrillic-ALT,
+Cyrillic-ISO, Cyrillic-KOI8, Czech, Devanagari, Dutch, English,
+Ethiopic, French, Georgian, German, Greek, Hebrew, IPA, Italian,
+Japanese, Kannada, Korean, Lao, Latin-1, Latin-2, Latin-3,
+Latin-4, Latin-5, Latin-6, Latin-7, Latin-8 (Celtic),
+Latin-9 (updated Latin-1 with the Euro sign), Latvian,
+Lithuanian, Malayalam, Polish, Romanian, Russian, Slovak,
+Slovenian, Spanish, Swedish, Tajik, Tamil, Thai, Tibetan,
+Turkish, UTF-8 (for a setup which prefers Unicode characters and
+files encoded in UTF-8), Ukrainian, Vietnamese, Welsh, and
+Windows-1255 (for a setup which prefers Cyrillic characters and
+files encoded in Windows-1255).
@end quotation
@cindex fonts for various scripts
To display the script(s) used by your language environment on a
graphical display, you need to have a suitable font. If some of the
characters appear as empty boxes, you should install the GNU Intlfonts
-package, which includes fonts for all supported scripts.@footnote{If
+package, which includes fonts for most supported scripts.@footnote{If
you run Emacs on X, you need to inform the X server about the location
of the newly installed fonts with the following commands:
because it stops waiting for more characters to combine, and starts
searching for what you have already entered.
+ To find out how to input the character after point using the current
+input method, type @kbd{C-u C-x =}. @xref{Position Info}.
+
@vindex input-method-verbose-flag
@vindex input-method-highlight-flag
The variables @code{input-method-highlight-flag} and
the command @kbd{M-x quail-set-keyboard-layout}.
@findex quail-show-key
- You can use the command @kbd{M-x quail-show-key} to show what key
-(or key sequence) to type in order to input the character following
-point, using the selected keyboard layout.
+ You can use the command @kbd{M-x quail-show-key} to show what key (or
+key sequence) to type in order to input the character following point,
+using the selected keyboard layout. The command @kbd{C-u C-x =} also
+shows that information in addition to the other information about the
+character.
@findex list-input-methods
To display a list of all the supported input methods, type @kbd{M-x
@cindex international files from DOS/Windows systems
A special class of coding systems, collectively known as
@dfn{codepages}, is designed to support text encoded by MS-Windows and
-MS-DOS software. To use any of these systems, you need to create it
-with @kbd{M-x codepage-setup}. @xref{MS-DOS and MULE}. After
-creating the coding system for the codepage, you can use it as any
-other coding system. For example, to visit a file encoded in codepage
-850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename}
-@key{RET}}.
+MS-DOS software. The names of these coding systems are
+@code{cp@var{nnnn}}, where @var{nnnn} is a 3- or 4-digit number of the
+codepage. You can use these encodings just like any other coding
+system; for example, to visit a file encoded in codepage 850, type
+@kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename}
+@key{RET}}@footnote{
+In the MS-DOS port of Emacs, you need to create a @code{cp@var{nnn}}
+coding system with @kbd{M-x codepage-setup}, before you can use it.
+@xref{MS-DOS and MULE}.}.
In addition to converting various representations of non-@acronym{ASCII}
characters, a coding system can perform end-of-line conversion. Emacs
@code{china-iso-8bit}, you can execute this Lisp expression:
@smallexample
-(modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit)
+(modify-coding-system-alist 'file "\\.txt\\'" 'chinese-iso-8bit)
@end smallexample
@noindent
If Emacs recognizes the encoding of a file incorrectly, you can
reread the file using the correct coding system by typing @kbd{C-x
-@key{RET} c @var{coding-system} @key{RET} M-x revert-buffer
+@key{RET} r @var{coding-system}
@key{RET}}. To see what coding system Emacs actually used to decode
the file, look at the coding system mnemonic letter near the left edge
of the mode line (@pxref{Mode Line}), or type @kbd{C-h C @key{RET}}.
@item C-x @key{RET} X @var{coding} @key{RET}
Use coding system @var{coding} for transferring @emph{one}
selection---the next one---to or from the window system.
+
+@item M-x recode-region
+Convert the region from a previous coding system to a new one.
@end table
@kindex C-x RET f
The default for translation of process input and output depends on the
current language environment.
+@findex recode-region
+ If a piece of text has already been inserted into a buffer using the
+wrong coding system, you can decode it again using @kbd{M-x
+recode-region}. This prompts you for the old coding system and the
+desired coding system, and acts on the text in the region.
+
@vindex file-name-coding-system
@cindex file names with non-@acronym{ASCII} characters
@findex set-file-name-coding-system
name, or it may get an error. If such a problem happens, use @kbd{C-x
C-w} to specify a new file name for that buffer.
+@findex recode-file-name
+ If a mistake occurs when encoding a file name, use the command
+@kbd{M-x recode-file-name} to change the file name's coding
+system. This prompts for an existing file name, its old coding
+system, and the coding system to which you wish to convert.
+
@vindex locale-coding-system
@cindex decoding non-@acronym{ASCII} keyboard input on X
The variable @code{locale-coding-system} specifies a coding system
Meta to be converted to @kbd{ESC} and still be able type 8-bit
characters present directly on the keyboard or using @kbd{Compose} or
@kbd{AltGr} keys. @xref{User Input}.
+
+@kindex C-x 8
+@cindex @code{iso-transl} library
+@cindex compose character
+@cindex dead character
+@item
+For Latin-1 only, you can use the key @kbd{C-x 8} as a ``compose
+character'' prefix for entry of non-@acronym{ASCII} Latin-1 printing
+characters. @kbd{C-x 8} is good for insertion (in the minibuffer as
+well as other buffers), for searching, and in any other context where
+a key sequence is allowed.
+
+@kbd{C-x 8} works by loading the @code{iso-transl} library. Once that
+library is loaded, the @key{ALT} modifier key, if the keyboard has
+one, serves the same purpose as @kbd{C-x 8}: use @key{ALT} together
+with an accent character to modify the following letter. In addition,
+if the keyboard has keys for the Latin-1 ``dead accent characters,''
+they too are defined to compose with the following character, once
+@code{iso-transl} is loaded.
+
+Use @kbd{C-x 8 C-h} to list all the available @kbd{C-x 8} translations.
@end itemize
@node Charsets