X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/8dd59f01de203f3f02c3f898a7015bb522a0e4bc..d3b8292706d967e022e96ccd458cad47e095e0fd:/doc/emacs/mule.texi diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi index 504e28acc8..a622722f1c 100644 --- a/doc/emacs/mule.texi +++ b/doc/emacs/mule.texi @@ -1,9 +1,11 @@ @c This is part of the Emacs manual. @c Copyright (C) 1997, 1999, 2000, 2001, 2002, 2003, 2004, -@c 2005, 2006, 2007, 2008 Free Software Foundation, Inc. +@c 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc. @c See file emacs.texi for copying conditions. @node International, Major Modes, Frames, Top @chapter International Character Set Support +@c This node is referenced in the tutorial. When renaming or deleting +@c it, the tutorial needs to be adjusted. (TUTORIAL.de) @cindex MULE @cindex international scripts @cindex multibyte characters @@ -101,6 +103,7 @@ to make sure Emacs interprets keyboard input correctly; see * Fontsets:: Fontsets are collections of fonts that cover the whole spectrum of characters. * Defining Fontsets:: Defining a new fontset. +* Modifying Fontsets:: Modifying an existing fontset. * Undisplayable Characters:: When characters don't display. * Unibyte Mode:: You can pick one European character set to use without multibyte characters. @@ -140,6 +143,95 @@ language, to make it convenient to type them. The prefix key @kbd{C-x @key{RET}} is used for commands that pertain to multibyte characters, coding systems, and input methods. +@kindex C-x = +@findex what-cursor-position + The command @kbd{C-x =} (@code{what-cursor-position}) shows +information about the character at point. In addition to the +character position, which was described in @ref{Position Info}, this +command displays how the character is encoded. For instance, it +displays the following line in the echo area for the character +@samp{c}: + +@smallexample +Char: c (99, #o143, #x63) point=28062 of 36168 (78%) column=53 +@end smallexample + + The four values after @samp{Char:} describe the character that +follows point, first by showing it and then by giving its character +code in decimal, octal and hex. For a non-@acronym{ASCII} multibyte +character, these are followed by @samp{file} and the character's +representation, in hex, in the buffer's coding system, if that coding +system encodes the character safely and with a single byte +(@pxref{Coding Systems}). If the character's encoding is longer than +one byte, Emacs shows @samp{file ...}. + + However, if the character displayed is in the range 0200 through +0377 octal, it may actually stand for an invalid UTF-8 byte read from +a file. In Emacs, that byte is represented as a sequence of 8-bit +characters, but all of them together display as the original invalid +byte, in octal code. In this case, @kbd{C-x =} shows @samp{part of +display ...} instead of @samp{file}. + +@cindex character set of character at point +@cindex font of character at point +@cindex text properties at point +@cindex face at point + With a prefix argument (@kbd{C-u C-x =}), this command displays a +detailed description of the character in a window: + +@itemize @bullet +@item +The character set name, and the codes that identify the character +within that character set; @acronym{ASCII} characters are identified +as belonging to the @code{ascii} character set. + +@item +The character's syntax and categories. + +@item +The character's encodings, both internally in the buffer, and externally +if you were to save the file. + +@item +What keys to type to input the character in the current input method +(if it supports the character). + +@item +If you are running Emacs on a graphical display, the font name and +glyph code for the character. If you are running Emacs on a text-only +terminal, the code(s) sent to the terminal. + +@item +The character's text properties (@pxref{Text Properties,,, +elisp, the Emacs Lisp Reference Manual}), including any non-default +faces used to display the character, and any overlays containing it +(@pxref{Overlays,,, elisp, the same manual}). +@end itemize + + Here's an example showing the Latin-1 character A with grave accent, +in a buffer whose coding system is @code{utf-8-unix}: + +@smallexample + character: @`A (192, #o300, #xc0) +preferred charset: unicode (Unicode (ISO10646)) + code point: 0xC0 + syntax: w which means: word + category: j:Japanese l:Latin v:Vietnamese + buffer code: #xC3 #x80 + file code: not encodable by coding system undecided-unix + display: by this font (glyph code) + xft:-unknown-DejaVu Sans Mono-normal-normal-normal-*-13-*-*-*-m-0-iso10646-1 (#x82) + +Character code properties: customize what to show + name: LATIN CAPITAL LETTER A WITH GRAVE + general-category: Lu (Letter, Uppercase) + decomposition: (65 768) ('A' '̀') + old-name: LATIN CAPITAL LETTER A GRAVE + +There are text properties here: + auto-composed t +@end smallexample + @node Enabling Multibyte @section Enabling Multibyte Characters @@ -620,16 +712,7 @@ MS-DOS software. The names of these coding systems are codepage. You can use these encodings just like any other coding system; for example, to visit a file encoded in codepage 850, type @kbd{C-x @key{RET} c cp850 @key{RET} C-x C-f @var{filename} -@key{RET}}@footnote{ -In the MS-DOS port of Emacs, you need to create a @code{cp@var{nnn}} -coding system with @kbd{M-x codepage-setup}, before you can use it. -@iftex -@xref{MS-DOS and MULE,,,emacs-extra,Specialized Emacs Features}. -@end iftex -@ifnottex -@xref{MS-DOS and MULE}. -@end ifnottex -}. +@key{RET}}. In addition to converting various representations of non-@acronym{ASCII} characters, a coding system can perform end-of-line conversion. Emacs @@ -1251,8 +1334,10 @@ non-graphic characters. A font typically defines shapes for a single alphabet or script. Therefore, displaying the entire range of scripts that Emacs supports requires a collection of many fonts. In Emacs, such a collection is -called a @dfn{fontset}. A fontset is defined by a list of fonts, each -assigned to handle a range of character codes. +called a @dfn{fontset}. A fontset is defined by a list of font specs, +each assigned to handle a range of character codes, and may fall back +on another fontset for characters which are not covered by the fonts +it specifies. Each fontset has a name, like a font. However, while fonts are stored in the system and the available font names are defined by the @@ -1265,13 +1350,14 @@ use for them has no font for those characters.@footnote{The Emacs installation instructions have information on additional font support.} - Emacs creates two fontsets automatically: the @dfn{standard fontset} -and the @dfn{startup fontset}. The standard fontset is most likely to -have fonts for a wide variety of non-@acronym{ASCII} characters; -however, this is not the default for Emacs to use. (By default, Emacs -tries to find a font that has bold and italic variants.) You can -specify use of the standard fontset with the @samp{-fn} option. For -example, + Emacs creates three fontsets automatically: the @dfn{standard +fontset}, the @dfn{startup fontset} and the @dfn{default fontset}. +The default fontset is most likely to have fonts for a wide variety of +non-@acronym{ASCII} characters and is the default fallback for the +other two fontsets, and if you set a default font rather than fontset. +However it does not specify font family names, so results can be +somewhat random if you use it directly. You can specify use of a +specific fontset with the @samp{-fn} option. For example, @example emacs -fn fontset-standard @@ -1281,6 +1367,11 @@ emacs -fn fontset-standard You can also specify a fontset with the @samp{Font} resource (@pxref{X Resources}). + If no fontset is specified for use, then Emacs uses an +@acronym{ASCII} font, with @samp{fontset-default} as a fallback for +characters the font does not cover. The standard fontset is only used if +explicitly requested, despite its name. + A fontset does not necessarily specify a font for every character code. If a fontset specifies no font for a certain character, or if it specifies a font that does not exist on your system, then it cannot @@ -1291,8 +1382,10 @@ empty box instead. @section Defining fontsets @vindex standard-fontset-spec +@vindex w32-standard-fontset-spec +@vindex ns-standard-fontset-spec @cindex standard fontset - Emacs creates a standard fontset automatically according to the value + When running on X, Emacs creates a standard fontset automatically according to the value of @code{standard-fontset-spec}. This fontset's name is @example @@ -1302,19 +1395,23 @@ of @code{standard-fontset-spec}. This fontset's name is @noindent or just @samp{fontset-standard} for short. + On GNUstep and Mac, fontset-standard is created using the value of +@code{ns-standard-fontset-spec}, and on Windows it is +created using the value of @code{w32-standard-fontset-spec}. + Bold, italic, and bold-italic variants of the standard fontset are created automatically. Their names have @samp{bold} instead of @samp{medium}, or @samp{i} instead of @samp{r}, or both. @cindex startup fontset - If you specify a default @acronym{ASCII} font with the @samp{Font} resource or -the @samp{-fn} argument, Emacs generates a fontset from it -automatically. This is the @dfn{startup fontset} and its name is -@code{fontset-startup}. It does this by replacing the @var{foundry}, -@var{family}, @var{add_style}, and @var{average_width} fields of the -font name with @samp{*}, replacing @var{charset_registry} field with -@samp{fontset}, and replacing @var{charset_encoding} field with -@samp{startup}, then using the resulting string to specify a fontset. + Emacs generates a fontset automatically, based on any default +@acronym{ASCII} font that you specify with the @samp{Font} resource or +the @samp{-fn} argument, or the default font that Emacs found when it +started. This is the @dfn{startup fontset} and its name is +@code{fontset-startup}. It does this by replacing the +@var{charset_registry} field with @samp{fontset}, and replacing +@var{charset_encoding} field with @samp{startup}, then using the +resulting string to specify a fontset. For instance, if you start Emacs this way, @@ -1327,9 +1424,14 @@ Emacs generates the following fontset and uses it for the initial X window frame: @example --*-*-medium-r-normal-*-14-140-*-*-*-*-fontset-startup +-*-courier-medium-r-normal-*-14-140-*-*-*-*-fontset-startup @end example + The startup fontset will use the font that you specify or a variant +with a different registry and encoding for all the characters which +are supported by that font, and fallback on @samp{fontset-default} for +other characters. + With the X resource @samp{Emacs.Font}, you can specify a fontset name just like an actual font name. But be careful not to specify a fontset name in a wildcard resource like @samp{Emacs*Font}---that wildcard @@ -1414,6 +1516,41 @@ call this function explicitly to create a fontset. @xref{Font X}, for more information about font naming in X. +@node Modifying Fontsets +@section Modifying Fontsets +@cindex fontsets, modifying +@findex set-fontset-font + + Fontsets do not always have to be created from scratch. If only +minor changes are required it may be easier to modify an existing +fontset. Modifying @samp{fontset-default} will also affect other +fontsets that use it as a fallback, so can be an effective way of +fixing problems with the fonts that Emacs chooses for a particular +script. + +Fontsets can be modified using the function @code{set-fontset-font}, +specifying a character, a charset, a script, or a range of characters +to modify the font for, and a font-spec for the font to be used. Some +examples are: + +@example +;; Use Liberation Mono for latin-3 charset. +(set-fontset-font "fontset-default" 'iso-8859-3 "Liberation Mono") + +;; Prefer a big5 font for han characters +(set-fontset-font "fontset-default" 'han (font-spec :registry "big5") + nil 'prepend) + +;; Use DejaVu Sans Mono as a fallback in fontset-startup before +;; resorting to fontset-default. +(set-fontset-font "fontset-startup" nil "DejaVu Sans Mono" nil 'append) + +;; Use MyPrivateFont for the Unicode private use area. +(set-fontset-font "fontset-default" '(#xe000 . #xf8ff) "MyPrivateFont") + +@end example + + @node Undisplayable Characters @section Undisplayable Characters