@c This is part of the Emacs manual.
@c Copyright (C) 1997, 1999, 2000, 2001, 2002, 2003, 2004,
-@c 2005, 2006, 2007, 2008 Free Software Foundation, Inc.
+@c 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
@c See file emacs.texi for copying conditions.
@node International, Major Modes, Frames, Top
@chapter International Character Set Support
* Fontsets:: Fontsets are collections of fonts
that cover the whole spectrum of characters.
* Defining Fontsets:: Defining a new fontset.
+* Modifying Fontsets:: Modifying an existing fontset.
* Undisplayable Characters:: When characters don't display.
* Unibyte Mode:: You can pick one European character set
to use without multibyte characters.
The prefix key @kbd{C-x @key{RET}} is used for commands that pertain
to multibyte characters, coding systems, and input methods.
+@kindex C-x =
+@findex what-cursor-position
+ The command @kbd{C-x =} (@code{what-cursor-position}) shows
+information about the character at point. In addition to the
+character position, which was described in @ref{Position Info}, this
+command displays how the character is encoded. For instance, it
+displays the following line in the echo area for the character
+@samp{c}:
+
+@smallexample
+Char: c (99, #o143, #x63) point=28062 of 36168 (78%) column=53
+@end smallexample
+
+ The four values after @samp{Char:} describe the character that
+follows point, first by showing it and then by giving its character
+code in decimal, octal and hex. For a non-@acronym{ASCII} multibyte
+character, these are followed by @samp{file} and the character's
+representation, in hex, in the buffer's coding system, if that coding
+system encodes the character safely and with a single byte
+(@pxref{Coding Systems}). If the character's encoding is longer than
+one byte, Emacs shows @samp{file ...}.
+
+ However, if the character displayed is in the range 0200 through
+0377 octal, it may actually stand for an invalid UTF-8 byte read from
+a file. In Emacs, that byte is represented as a sequence of 8-bit
+characters, but all of them together display as the original invalid
+byte, in octal code. In this case, @kbd{C-x =} shows @samp{part of
+display ...} instead of @samp{file}.
+
+@cindex character set of character at point
+@cindex font of character at point
+@cindex text properties at point
+@cindex face at point
+ With a prefix argument (@kbd{C-u C-x =}), this command displays a
+detailed description of the character in a window:
+
+@itemize @bullet
+@item
+The character set name, and the codes that identify the character
+within that character set; @acronym{ASCII} characters are identified
+as belonging to the @code{ascii} character set.
+
+@item
+The character's syntax and categories.
+
+@item
+The character's encodings, both internally in the buffer, and externally
+if you were to save the file.
+
+@item
+What keys to type to input the character in the current input method
+(if it supports the character).
+
+@item
+If you are running Emacs on a graphical display, the font name and
+glyph code for the character. If you are running Emacs on a text-only
+terminal, the code(s) sent to the terminal.
+
+@item
+The character's text properties (@pxref{Text Properties,,,
+elisp, the Emacs Lisp Reference Manual}), including any non-default
+faces used to display the character, and any overlays containing it
+(@pxref{Overlays,,, elisp, the same manual}).
+@end itemize
+
+ Here's an example showing the Latin-1 character A with grave accent,
+in a buffer whose coding system is @code{utf-8-unix}:
+
+@smallexample
+ character: @`A (192, #o300, #xc0)
+preferred charset: unicode (Unicode (ISO10646))
+ code point: 0xC0
+ syntax: w which means: word
+ category: j:Japanese l:Latin v:Vietnamese
+ buffer code: #xC3 #x80
+ file code: not encodable by coding system undecided-unix
+ display: by this font (glyph code)
+ xft:-unknown-DejaVu Sans Mono-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1 (#x82)
+
+Character code properties: customize what to show
+ name: LATIN CAPITAL LETTER A WITH GRAVE
+ general-category: Lu (Letter, Uppercase)
+ decomposition: (65 768) ('A' '̀')
+ old-name: LATIN CAPITAL LETTER A GRAVE
+
+There are text properties here:
+ auto-composed t
+@end smallexample
+
@node Enabling Multibyte
@section Enabling Multibyte Characters
codepage. You can use these encodings just like any other coding
system; for example, to visit a file encoded in codepage 850, type
@kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename}
-@key{RET}}@footnote{
-In the MS-DOS port of Emacs, you need to create a @code{cp@var{nnn}}
-coding system with @kbd{M-x codepage-setup}, before you can use it.
-@iftex
-@xref{MS-DOS and MULE,,,emacs-extra,Specialized Emacs Features}.
-@end iftex
-@ifnottex
-@xref{MS-DOS and MULE}.
-@end ifnottex
-}.
+@key{RET}}.
In addition to converting various representations of non-@acronym{ASCII}
characters, a coding system can perform end-of-line conversion. Emacs
A font typically defines shapes for a single alphabet or script.
Therefore, displaying the entire range of scripts that Emacs supports
requires a collection of many fonts. In Emacs, such a collection is
-called a @dfn{fontset}. A fontset is defined by a list of fonts, each
-assigned to handle a range of character codes.
+called a @dfn{fontset}. A fontset is defined by a list of font specs,
+each assigned to handle a range of character codes, and may fall back
+on another fontset for characters which are not covered by the fonts
+it specifies.
Each fontset has a name, like a font. However, while fonts are
stored in the system and the available font names are defined by the
installation instructions have information on additional font
support.}
- Emacs creates two fontsets automatically: the @dfn{standard fontset}
-and the @dfn{startup fontset}. The standard fontset is most likely to
-have fonts for a wide variety of non-@acronym{ASCII} characters;
-however, this is not the default for Emacs to use. (By default, Emacs
-tries to find a font that has bold and italic variants.) You can
-specify use of the standard fontset with the @samp{-fn} option. For
-example,
+ Emacs creates three fontsets automatically: the @dfn{standard
+fontset}, the @dfn{startup fontset} and the @dfn{default fontset}.
+The default fontset is most likely to have fonts for a wide variety of
+non-@acronym{ASCII} characters and is the default fallback for the
+other two fontsets, and if you set a default font rather than fontset.
+However it does not specify font family names, so results can be
+somewhat random if you use it directly. You can specify use of a
+specific fontset with the @samp{-fn} option. For example,
@example
emacs -fn fontset-standard
You can also specify a fontset with the @samp{Font} resource (@pxref{X
Resources}).
+ If no fontset is specified for use, then Emacs uses an
+@acronym{ASCII} font, with @samp{fontset-default} as a fallback for
+characters the font does not cover. The standard fontset is only used if
+explicitly requested, despite its name.
+
A fontset does not necessarily specify a font for every character
code. If a fontset specifies no font for a certain character, or if it
specifies a font that does not exist on your system, then it cannot
@section Defining fontsets
@vindex standard-fontset-spec
+@vindex w32-standard-fontset-spec
+@vindex ns-standard-fontset-spec
@cindex standard fontset
- Emacs creates a standard fontset automatically according to the value
+ When running on X, Emacs creates a standard fontset automatically according to the value
of @code{standard-fontset-spec}. This fontset's name is
@example
@noindent
or just @samp{fontset-standard} for short.
+ On GNUstep and Mac, fontset-standard is created using the value of
+@code{ns-standard-fontset-spec}, and on Windows it is
+created using the value of @code{w32-standard-fontset-spec}.
+
Bold, italic, and bold-italic variants of the standard fontset are
created automatically. Their names have @samp{bold} instead of
@samp{medium}, or @samp{i} instead of @samp{r}, or both.
@cindex startup fontset
- If you specify a default @acronym{ASCII} font with the @samp{Font} resource or
-the @samp{-fn} argument, Emacs generates a fontset from it
-automatically. This is the @dfn{startup fontset} and its name is
-@code{fontset-startup}. It does this by replacing the @var{foundry},
-@var{family}, @var{add_style}, and @var{average_width} fields of the
-font name with @samp{*}, replacing @var{charset_registry} field with
-@samp{fontset}, and replacing @var{charset_encoding} field with
-@samp{startup}, then using the resulting string to specify a fontset.
+ Emacs generates a fontset automatically, based on any default
+@acronym{ASCII} font that you specify with the @samp{Font} resource or
+the @samp{-fn} argument, or the default font that Emacs found when it
+started. This is the @dfn{startup fontset} and its name is
+@code{fontset-startup}. It does this by replacing the
+@var{charset_registry} field with @samp{fontset}, and replacing
+@var{charset_encoding} field with @samp{startup}, then using the
+resulting string to specify a fontset.
For instance, if you start Emacs this way,
window frame:
@example
--*-*-medium-r-normal-*-14-140-*-*-*-*-fontset-startup
+-*-courier-medium-r-normal-*-14-140-*-*-*-*-fontset-startup
@end example
+ The startup fontset will use the font that you specify or a variant
+with a different registry and encoding for all the characters which
+are supported by that font, and fallback on @samp{fontset-default} for
+other characters.
+
With the X resource @samp{Emacs.Font}, you can specify a fontset name
just like an actual font name. But be careful not to specify a fontset
name in a wildcard resource like @samp{Emacs*Font}---that wildcard
@xref{Font X}, for more information about font naming in X.
+@node Modifying Fontsets
+@section Modifying Fontsets
+@cindex fontsets, modifying
+@findex set-fontset-font
+
+ Fontsets do not always have to be created from scratch. If only
+minor changes are required it may be easier to modify an existing
+fontset. Modifying @samp{fontset-default} will also affect other
+fontsets that use it as a fallback, so can be an effective way of
+fixing problems with the fonts that Emacs chooses for a particular
+script.
+
+Fontsets can be modified using the function @code{set-fontset-font},
+specifying a character, a charset, a script, or a range of characters
+to modify the font for, and a font-spec for the font to be used. Some
+examples are:
+
+@example
+;; Use Liberation Mono for latin-3 charset.
+(set-fontset-font "fontset-default" 'iso-8859-3 "Liberation Mono")
+
+;; Prefer a big5 font for han characters
+(set-fontset-font "fontset-default" 'han (font-spec :registry "big5")
+ nil 'prepend)
+
+;; Use DejaVu Sans Mono as a fallback in fontset-startup before
+;; resorting to fontset-default.
+(set-fontset-font "fontset-startup" nil "DejaVu Sans Mono" nil 'append)
+
+;; Use MyPrivateFont for the Unicode private use area.
+(set-fontset-font "fontset-default" '(#xe000 . #xf8ff) "MyPrivateFont")
+
+@end example
+
+
@node Undisplayable Characters
@section Undisplayable Characters