@c This is part of the Emacs manual.
-@c Copyright (C) 1997, 1999, 2000, 2001, 2002, 2003, 2004,
-@c 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+@c Copyright (C) 1997, 1999-2011 Free Software Foundation, Inc.
@c See file emacs.texi for copying conditions.
@node International, Major Modes, Frames, Top
@chapter International Character Set Support
incorrectly, refer to @ref{Undisplayable Characters}, which describes
possible problems and explains how to solve them.
+@item
+Characters from scripts whose natural ordering of text is from right
+to left are reordered for display (@pxref{Bidirectional Editing}).
+These scripts include Arabic, Hebrew, Syriac, Thaana, and a few
+others.
+
@item
You can insert non-@acronym{ASCII} characters or search for them. To do that,
you can specify an input method (@pxref{Select Input Method}) suitable
* Unibyte Mode:: You can pick one European character set
to use without multibyte characters.
* Charsets:: How Emacs groups its internal character codes.
+* Bidirectional Editing:: Support for right-to-left scripts.
@end menu
@node International Chars
Character code properties: customize what to show
name: LATIN CAPITAL LETTER A WITH GRAVE
general-category: Lu (Letter, Uppercase)
- decomposition: (65 768) ('A' '̀')
+ decomposition: (65 768) ('A' '`')
old-name: LATIN CAPITAL LETTER A GRAVE
There are text properties here:
@cindex Intlfonts package, installation
To display the script(s) used by your language environment on a
graphical display, you need to have a suitable font. If some of the
-characters appear as empty boxes, you should install the GNU Intlfonts
-package, which includes fonts for most supported scripts.@footnote{If
-you run Emacs on X, you need to inform the X server about the location
-of the newly installed fonts with the following commands:
+characters appear as empty boxes or hex codes, you should install the
+GNU Intlfonts package, which includes fonts for most supported
+scripts.@footnote{If you run Emacs on X, you need to inform the X
+server about the location of the newly installed fonts with the
+following commands:
@example
xset fp+ /usr/local/share/emacs/fonts
the end-of-line conversion, and leave the character code conversion to
be deduced from the text itself.
+@cindex @code{raw-text}, coding system
The coding system @code{raw-text} is good for a file which is mainly
@acronym{ASCII} text, but may contain byte values above 127 which are
not meant to encode non-@acronym{ASCII} characters. With
encountered, and has the usual three variants to specify the kind of
end-of-line conversion to use.
+@cindex @code{no-conversion}, coding system
In contrast, the coding system @code{no-conversion} specifies no
character code conversion at all---none for non-@acronym{ASCII} byte values and
none for end of line. This is useful for reading or writing binary
@code{no-conversion}, and also suppresses other Emacs features that
might convert the file contents before you see them. @xref{Visiting}.
+@cindex @code{emacs-internal}, coding system
The coding system @code{emacs-internal} (or @code{utf-8-emacs},
which is equivalent) means that the file contains non-@acronym{ASCII}
characters stored with the internal Emacs encoding. This coding
@section Specifying a File's Coding System
If Emacs recognizes the encoding of a file incorrectly, you can
-reread the file using the correct coding system by typing @kbd{C-x
-@key{RET} r @var{coding-system} @key{RET}}. To see what coding system
-Emacs actually used to decode the file, look at the coding system
-mnemonic letter near the left edge of the mode line (@pxref{Mode
-Line}), or type @kbd{C-h C @key{RET}}.
+reread the file using the correct coding system with @kbd{C-x
+@key{RET} r} (@code{revert-buffer-with-coding-system}). This command
+prompts for the coding system to use. To see what coding system Emacs
+actually used to decode the file, look at the coding system mnemonic
+letter near the left edge of the mode line (@pxref{Mode Line}), or
+type @kbd{C-h C} (@code{describe-coding-system}).
@vindex coding
You can specify the coding system for a particular file in the file
@table @kbd
@item C-x @key{RET} f @var{coding} @key{RET}
-Use coding system @var{coding} for saving or revisiting the visited
-file in the current buffer.
+Use coding system @var{coding} to save or revisit the visited file in
+the current buffer (@code{set-buffer-file-coding-system})
@item C-x @key{RET} c @var{coding} @key{RET}
Specify coding system @var{coding} for the immediately following
-command.
+command (@code{universal-coding-system-argument}).
@item C-x @key{RET} r @var{coding} @key{RET}
-Revisit the current file using the coding system @var{coding}.
+Revisit the current file using the coding system @var{coding}
+(@code{revert-buffer-with-coding-system}).
@item M-x recode-region @key{RET} @var{right} @key{RET} @var{wrong} @key{RET}
Convert a region that was decoded using coding system @var{wrong},
@table @kbd
@item C-x @key{RET} x @var{coding} @key{RET}
Use coding system @var{coding} for transferring selections to and from
-other window-based applications.
+other window-based applications (@code{set-selection-coding-system}).
@item C-x @key{RET} X @var{coding} @key{RET}
Use coding system @var{coding} for transferring @emph{one}
-selection---the next one---to or from another window-based application.
+selection---the next one---to or from another window-based application
+(@code{set-next-selection-coding-system}).
@item C-x @key{RET} p @var{input-coding} @key{RET} @var{output-coding} @key{RET}
Use coding systems @var{input-coding} and @var{output-coding} for
-subprocess input and output in the current buffer.
-
-@item C-x @key{RET} c @var{coding} @key{RET}
-Specify coding system @var{coding} for the immediately following
-command.
+subprocess input and output in the current buffer
+(@code{set-buffer-process-coding-system}).
@end table
@kindex C-x RET x
and from a particular subprocess by giving the command in the
corresponding buffer.
- You can also use @kbd{C-x @key{RET} c} just before the command that
-runs or starts a subprocess, to specify the coding system to use for
-communication with that subprocess.
+ You can also use @kbd{C-x @key{RET} c}
+(@code{universal-coding-system-argument}) just before the command that
+runs or starts a subprocess, to specify the coding system for
+communicating with that subprocess. @xref{Text Coding}.
The default for translation of process input and output depends on the
current language environment.
specified above, whose value is nonempty is the one that determines
the text representation.)
-@vindex x-select-request-type
- The variable @code{x-select-request-type} specifies a selection data
-type of selection to request from the X server. The default value is
-@code{nil}, which means Emacs tries @code{COMPOUND_TEXT} and
-@code{UTF8_STRING}, and uses whichever result seems more appropriate.
-You can explicitly specify the data type by setting the variable to
-one of the symbols @code{COMPOUND_TEXT}, @code{UTF8_STRING},
-@code{STRING} and @code{TEXT}.
-
@node File Name Coding
@section Coding Systems for File Names
@table @kbd
@item C-x @key{RET} F @var{coding} @key{RET}
Use coding system @var{coding} for encoding and decoding file
-@emph{names}.
+@emph{names} (@code{set-file-name-coding-system}).
@end table
@vindex file-name-coding-system
@table @kbd
@item C-x @key{RET} k @var{coding} @key{RET}
-Use coding system @var{coding} for keyboard input.
+Use coding system @var{coding} for keyboard input
+(@code{set-keyboard-coding-system}).
@item C-x @key{RET} t @var{coding} @key{RET}
-Use coding system @var{coding} for terminal output.
+Use coding system @var{coding} for terminal output
+(@code{set-terminal-coding-system}).
@end table
@kindex C-x RET t
explicitly requested, despite its name.
A fontset does not necessarily specify a font for every character
-code. If a fontset specifies no font for a certain character, or if it
-specifies a font that does not exist on your system, then it cannot
-display that character properly. It will display that character as an
-empty box instead.
+code. If a fontset specifies no font for a certain character, or if
+it specifies a font that does not exist on your system, then it cannot
+display that character properly. It will display that character as a
+hex code or thin space or an empty box instead. (@xref{Text Display, ,
+glyphless characters}, for details.)
@node Defining Fontsets
@section Defining fontsets
point before it and type @kbd{C-u C-x =} (@pxref{International
Chars}).
-@ignore
- arch-tag: 310ba60d-31ef-4ce7-91f1-f282dd57b6b3
-@end ignore
+@node Bidirectional Editing
+@section Bidirectional Editing
+@cindex bidirectional editing
+@cindex right-to-left text
+
+ Emacs supports editing text written in scripts, such as Arabic and
+Hebrew, whose natural ordering of horizontal text for display is from
+right to left. However, digits and Latin text embedded in these
+scripts are still displayed left to right. It is also not uncommon to
+have small portions of text in Arabic or Hebrew embedded in otherwise
+Latin document, e.g., as comments and strings in a program source
+file. For these reasons, text that uses these scripts is actually
+@dfn{bidirectional}: a mixture of runs of left-to-right and
+right-to-left characters.
+
+ This section describes the facilities and options provided by Emacs
+for editing bidirectional text.
+
+@cindex logical order
+@cindex visual order
+ Emacs stores right-to-left and bidirectional text in the so-called
+@dfn{logical} (or @dfn{reading}) order: the buffer or string position
+of the first character you read precedes that of the next character.
+Reordering of bidirectional text into the @dfn{visual} order happens
+at display time. As result, character positions no longer increase
+monotonically with their positions on display. Emacs implements the
+Unicode Bidirectional Algorithm described in the Unicode Standard
+Annex #9, for reordering of bidirectional text for display.
+
+@vindex bidi-display-reordering
+ The buffer-local variable @code{bidi-display-reordering} controls
+whether text in the buffer is reordered for display. If its value is
+non-@code{nil}, Emacs reorders characters that have right-to-left
+directionality when they are displayed. The default value is
+@code{nil}.
+
+ Each paragraph of bidirectional text can have its own @dfn{base
+direction}, either right-to-left or left-to-right. (Paragraph
+boundaries are defined by the regular expressions
+@code{paragraph-start} and @code{paragraph-separate}, see
+@ref{Paragraphs}.) Text in left-to-right paragraphs begins at the
+left margin of the window and is truncated or continued when it
+reaches the right margin. By contrast, text in right-to-left
+paragraphs begins at the right margin and is continued or truncated at
+the left margin.
+
+@vindex bidi-paragraph-direction
+ Emacs determines the base direction of each paragraph dynamically,
+based on the text at the beginning of the paragraph. However,
+sometimes a buffer may need to force a certain base direction for its
+paragraphs. The variable @code{bidi-paragraph-direction}, if
+non-@code{nil}, disables the dynamic determination of the base
+direction, and instead forces all paragraphs in the buffer to have the
+direction specified by its buffer-local value. The value can be either
+@code{right-to-left} or @code{left-to-right}. Any other value is
+interpreted as @code{nil}.
+
+@cindex LRM
+@cindex RLM
+ Alternatively, you can control the base direction of a paragraph by
+inserting special formatting characters in front of the paragraph.
+The special character @code{RIGHT-TO-LEFT MARK}, or @sc{rlm}, forces
+the right-to-left direction on the following paragraph, while
+@code{LEFT-TO-RIGHT MARK}, or @sc{lrm} forces the left-to-right
+direction. (You can use @kbd{C-x 8 RET} to insert these characters.)
+In a GUI session, the @sc{lrm} and @sc{rlm} characters display as
+blanks.
+
+ Because characters are reordered for display, Emacs commands that
+operate in the logical order or on stretches of buffer positions may
+produce unusual effects. For example, @kbd{C-f} and @kbd{C-b}
+commands move point in the logical order, so the cursor will sometimes
+jump when point traverses reordered bidirectional text. Similarly, a
+highlighted region covering a contiguous range of character positions
+may look discontinuous if the region spans reordered text. This is
+normal and similar to behavior of other programs that support
+bidirectional text.