-@c -*-texinfo-*-
+@c -*- mode: texinfo; coding: utf-8 -*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1998-1999, 2001-2014 Free Software Foundation, Inc.
+@c Copyright (C) 1998-1999, 2001-2015 Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@node Non-ASCII Characters
@chapter Non-@acronym{ASCII} Characters
@code{#x110000..#x3FFFFF}, which it uses for representing characters
that are not unified with Unicode and @dfn{raw 8-bit bytes} that
cannot be interpreted as characters. Thus, a character codepoint in
-Emacs is a 22-bit integer number.
+Emacs is a 22-bit integer.
@cindex internal representation of characters
@cindex characters, representation in buffers and strings
characters.
@end defun
-@c FIXME: Should `@var{character}' be `@var{byte}'?
+@c FIXME: Should '@var{character}' be '@var{byte}'?
@defun byte-to-string byte
@cindex byte to string
This function returns a unibyte string containing a single byte of
@defun multibyte-char-to-unibyte char
This converts the multibyte character @var{char} to a unibyte
character, and returns that character. If @var{char} is neither
-@acronym{ASCII} nor eight-bit, the function returns -1.
+@acronym{ASCII} nor eight-bit, the function returns @minus{}1.
@end defun
@defun unibyte-char-to-multibyte char
@item canonical-combining-class
Corresponds to the @code{Canonical_Combining_Class} Unicode property.
-The value is an integer number. For unassigned codepoints, the value
+The value is an integer. For unassigned codepoints, the value
is zero.
@cindex bidirectional class of characters
the brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses
@samp{small}. }; the other elements are characters that give the
compatibility decomposition sequence of this character. For
-unassigned codepoints, the value is the character itself.
+characters that don't have decomposition sequences, and for unassigned
+codepoints, the value is a list with a single member, the character
+itself.
@item decimal-digit-value
Corresponds to the Unicode @code{Numeric_Value} property for
characters whose @code{Numeric_Type} is @samp{Decimal}. The value is
-an integer number. For unassigned codepoints, the value is
-@code{nil}, which means @acronym{NaN}, or ``not-a-number''.
+an integer, or @code{nil} if the character has no decimal digit value.
+For unassigned codepoints, the value is @code{nil}, which means
+@acronym{NaN}, or not a number.
@item digit-value
Corresponds to the Unicode @code{Numeric_Value} property for
characters whose @code{Numeric_Type} is @samp{Digit}. The value is an
-integer number. Examples of such characters include compatibility
-subscript and superscript digits, for which the value is the
-corresponding number. For unassigned codepoints, the value is
-@code{nil}, which means @acronym{NaN}.
+integer. Examples of such characters include compatibility subscript
+and superscript digits, for which the value is the corresponding
+number. For characters that don't have any numeric value, and for
+unassigned codepoints, the value is @code{nil}, which means
+@acronym{NaN}.
@item numeric-value
Corresponds to the Unicode @code{Numeric_Value} property for
characters whose @code{Numeric_Type} is @samp{Numeric}. The value of
-this property is an integer or a floating-point number. Examples of
-characters that have this property include fractions, subscripts,
-superscripts, Roman numerals, currency numerators, and encircled
-numbers. For example, the value of this property for the character
-@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. For
-unassigned codepoints, the value is @code{nil}, which means
-@acronym{NaN}.
+this property is a number. Examples of characters that have this
+property include fractions, subscripts, superscripts, Roman numerals,
+currency numerators, and encircled numbers. For example, the value of
+this property for the character @code{U+2155} (@sc{vulgar fraction one
+fifth}) is @code{0.2}. For characters that don't have any numeric
+value, and for unassigned codepoints, the value is @code{nil}, which
+means @acronym{NaN}.
@cindex mirroring of characters
@item mirrored
(@pxref{Bidirectional Display}). For unassigned codepoints, the value
is @code{nil}.
+@item paired-bracket
+Corresponds to the Unicode @code{Bidi_Paired_Bracket} property. The
+value of this property is the codepoint of a character's @dfn{paired
+bracket}, or @code{nil} if the character is not a bracket character.
+This establishes a mapping between characters that are treated as
+bracket pairs by the Unicode Bidirectional Algorithm; Emacs uses this
+property when it decides how to reorder for display parentheses,
+braces, and other similar characters (@pxref{Bidirectional Display}).
+
+@item bracket-type
+Corresponds to the Unicode @code{Bidi_Paired_Bracket_Type} property.
+For characters whose @code{paired-bracket} property is non-@code{nil},
+the value of this property is a symbol, either @code{o} (for opening
+bracket characters) or @code{c} (for closing bracket characters). For
+characters whose @code{paired-bracket} property is @code{nil}, the
+value is the symbol @code{n} (None). Like @code{paired-bracket}, this
+property is used for bidirectional display.
+
@item old-name
Corresponds to the Unicode @code{Unicode_1_Name} property. The value
-is a string. Unassigned codepoints, and characters that have no value
-for this property, the value is @code{nil}.
+is a string. For unassigned codepoints, and characters that have no
+value for this property, the value is @code{nil}.
@item iso-10646-comment
Corresponds to the Unicode @code{ISO_Comment} property. The value is
-a string. For unassigned codepoints, the value is an empty string.
+either a string or @code{nil}. For unassigned codepoints, the value
+is @code{nil}.
@item uppercase
Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
@defun get-char-code-property char propname
This function returns the value of @var{char}'s @var{propname} property.
-@c FIXME: Use ‘?\s’ instead of ‘? ’ for the space character in the
-@c first example? --xfq
@example
@group
-(get-char-code-property ? 'general-category)
+(get-char-code-property ?\s 'general-category)
@result{} Zs
@end group
@group
-(get-char-code-property ?1 'general-category)
+(get-char-code-property ?1 'general-category)
@result{} Nd
@end group
@group
(get-char-code-property ?\u2163 'numeric-value)
@result{} 4
@end group
+@group
+(get-char-code-property ?\( 'paired-bracket)
+ @result{} 41 ;; closing parenthesis
+@end group
+@group
+(get-char-code-property ?\) 'bracket-type)
+ @result{} c
+@end group
@end example
@end defun
system (@pxref{Coding Systems}).
@end defun
-@c TODO: Explain the properties here and add indexes such as ‘charset property’.
+@c TODO: Explain the properties here and add indexes such as 'charset property'.
@defun charset-plist charset
This function returns the property list of the character set
@var{charset}. Although @var{charset} is a symbol, this is not the
@node Scanning Charsets
@section Scanning for Character Sets
+@cindex scanning for character sets
+@cindex character set, searching
Sometimes it is useful to find out which character set a particular
character belongs to. One use for this is in determining which coding
The optional argument @var{accept-default-p}, if non-@code{nil},
should be a function to determine whether a coding system selected
-without user interaction is acceptable. @code{select-safe-coding-system}
+without user interaction is acceptable. @code{select-safe-coding-system}
calls this function with one argument, the base coding system of the
selected coding system. If @var{accept-default-p} returns @code{nil},
@code{select-safe-coding-system} rejects the silently selected coding
@cindex file contents, and default coding system
@defopt auto-coding-regexp-alist
This variable is an alist of text patterns and corresponding coding
-systems. Each element has the form @code{(@var{regexp}
+systems. Each element has the form @code{(@var{regexp}
. @var{coding-system})}; a file whose first few kilobytes match
@var{regexp} is decoded with @var{coding-system} when its contents are
read into a buffer. The settings in this alist take priority over
@node Specifying Coding Systems
@subsection Specifying a Coding System for One Operation
+@cindex specify coding system
+@cindex force coding system for operation
+@cindex coding system for operation
You can specify the coding system for a specific operation by binding
the variables @code{coding-system-for-read} and/or
@example
@group
(decode-coding-string "Gr\374ss Gott" 'latin-1)
- @result{} #("Gr@"uss Gott" 0 9 (charset iso-8859-1))
+ @result{} #("Grüss Gott" 0 9 (charset iso-8859-1))
@end group
@end example
@end defun
@section Locales
@cindex locale
- POSIX defines a concept of ``locales'' which control which language
+ In POSIX, locales control which language
to use in language-related features. These Emacs variables control
how Emacs interacts with these features.
@cindex keyboard input decoding on X
This variable specifies the coding system to use for decoding system
error messages and---on X Window system only---keyboard input, for
+sending batch output to the standard output and error streams, for
encoding the format argument to @code{format-time-string}, and for
decoding the return value of @code{format-time-string}.
@end defvar