-@c -*-texinfo-*-
+@c -*- mode: texinfo; coding: utf-8 -*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1998-1999, 2001-2015 Free Software Foundation, Inc.
+@c Copyright (C) 1998-1999, 2001-2016 Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@node Non-ASCII Characters
@chapter Non-@acronym{ASCII} Characters
belong to the same character.
@end defun
+@cindex convert file byte to buffer position
+@cindex convert buffer position to file byte
+ The following two functions are useful when a Lisp program needs to
+map buffer positions to byte offsets in a file visited by the buffer.
+
+@defun bufferpos-to-filepos position &optional quality coding-system
+This function is similar to @code{position-bytes}, but instead of byte
+position in the current buffer it returns the offset from the
+beginning of the current buffer's file of the byte that corresponds to
+the given character @var{position} in the buffer. The conversion
+requires to know how the text is encoded in the buffer's file; this is
+what the @var{coding-system} argument is for, defaulting to the value
+of @code{buffer-file-coding-system}. The optional argument
+@var{quality} specifies how accurate the result should be; it should
+be one of the following:
+
+@table @code
+@item exact
+The result must be accurate. The function may need to encode and
+decode a large part of the buffer.
+@item approximate
+The value can be an approximation. The function may avoid expensive
+processing and return an inexact result.
+@item nil
+If the exact result needs expensive processing, the function will
+return @code{nil} rather than an approximation. This is the default
+if the argument is omitted.
+@end table
+@end defun
+
+@defun filepos-to-bufferpos byte &optional quality coding-system
+This function returns the buffer position corresponding to a file
+position specified by @var{byte}, a zero-base byte offset from the
+file's beginning. The function performs the conversion opposite to
+what @code{bufferpos-to-filepos} does. Optional arguments
+@var{quality} and @var{coding-system} have the same meaning and values
+as for @code{bufferpos-to-filepos}.
+@end defun
+
@defun multibyte-string-p string
Return @code{t} if @var{string} is a multibyte string, @code{nil}
otherwise. This function also returns @code{nil} if @var{string} is
characters.
@end defun
-@c FIXME: Should `@var{character}' be `@var{byte}'?
+@c FIXME: Should '@var{character}' be '@var{byte}'?
@defun byte-to-string byte
@cindex byte to string
This function returns a unibyte string containing a single byte of
@end example
@end defun
+@defun char-from-name string &optional ignore-case
+This function returns the character whose Unicode name is @var{string}.
+If @var{ignore-case} is non-@code{nil}, case is ignored in @var{string}.
+This function returns @code{nil} if @var{string} does not name a character.
+
+@example
+;; U+03A3
+(= (char-from-name "GREEK CAPITAL LETTER SIGMA") #x03A3)
+ @result{} t
+@end example
+@end defun
+
@defun get-byte &optional pos string
This function returns the byte at character position @var{pos} in the
current buffer. If the current buffer is unibyte, this is literally
the brackets; e.g., Unicode specifies @samp{<small>} where Emacs uses
@samp{small}. }; the other elements are characters that give the
compatibility decomposition sequence of this character. For
-unassigned codepoints, the value is the character itself.
+characters that don't have decomposition sequences, and for unassigned
+codepoints, the value is a list with a single member, the character
+itself.
@item decimal-digit-value
Corresponds to the Unicode @code{Numeric_Value} property for
characters whose @code{Numeric_Type} is @samp{Decimal}. The value is
-an integer. For unassigned codepoints, the value is
-@code{nil}, which means @acronym{NaN}, or ``not-a-number''.
+an integer, or @code{nil} if the character has no decimal digit value.
+For unassigned codepoints, the value is @code{nil}, which means
+@acronym{NaN}, or ``not a number''.
@item digit-value
Corresponds to the Unicode @code{Numeric_Value} property for
characters whose @code{Numeric_Type} is @samp{Digit}. The value is an
-integer. Examples of such characters include compatibility
-subscript and superscript digits, for which the value is the
-corresponding number. For unassigned codepoints, the value is
-@code{nil}, which means @acronym{NaN}.
+integer. Examples of such characters include compatibility subscript
+and superscript digits, for which the value is the corresponding
+number. For characters that don't have any numeric value, and for
+unassigned codepoints, the value is @code{nil}, which means
+@acronym{NaN}.
@item numeric-value
Corresponds to the Unicode @code{Numeric_Value} property for
characters whose @code{Numeric_Type} is @samp{Numeric}. The value of
-this property is a number. Examples of
-characters that have this property include fractions, subscripts,
-superscripts, Roman numerals, currency numerators, and encircled
-numbers. For example, the value of this property for the character
-@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. For
-unassigned codepoints, the value is @code{nil}, which means
-@acronym{NaN}.
+this property is a number. Examples of characters that have this
+property include fractions, subscripts, superscripts, Roman numerals,
+currency numerators, and encircled numbers. For example, the value of
+this property for the character @code{U+2155} (@sc{vulgar fraction one
+fifth}) is @code{0.2}. For characters that don't have any numeric
+value, and for unassigned codepoints, the value is @code{nil}, which
+means @acronym{NaN}.
@cindex mirroring of characters
@item mirrored
@item old-name
Corresponds to the Unicode @code{Unicode_1_Name} property. The value
-is a string. Unassigned codepoints, and characters that have no value
-for this property, the value is @code{nil}.
+is a string. For unassigned codepoints, and characters that have no
+value for this property, the value is @code{nil}.
@item iso-10646-comment
Corresponds to the Unicode @code{ISO_Comment} property. The value is
-a string. For unassigned codepoints, the value is an empty string.
+either a string or @code{nil}. For unassigned codepoints, the value
+is @code{nil}.
@item uppercase
Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property.
@result{} Nd
@end group
@group
-;; subscript 4
-(get-char-code-property ?\u2084 'digit-value)
+;; U+2084
+(get-char-code-property ?\N@{SUBSCRIPT FOUR@}
+ 'digit-value)
@result{} 4
@end group
@group
-;; one fifth
-(get-char-code-property ?\u2155 'numeric-value)
+;; U+2155
+(get-char-code-property ?\N@{VULGAR FRACTION ONE FIFTH@}
+ 'numeric-value)
@result{} 0.2
@end group
@group
-;; Roman IV
-(get-char-code-property ?\u2163 'numeric-value)
+;; U+2163
+(get-char-code-property ?\N@{ROMAN NUMERAL FOUR@}
+ 'numeric-value)
@result{} 4
@end group
@group
@cindex null bytes, and decoding text
@defvar inhibit-null-byte-detection
If this variable has a non-@code{nil} value, null bytes are ignored
-when detecting the encoding of a region or a string. This allows to
-correctly detect the encoding of text that contains null bytes, such
-as Info files with Index nodes.
+when detecting the encoding of a region or a string. This allows the
+encoding of text that contains null bytes to be correctly detected,
+such as Info files with Index nodes.
@end defvar
@defvar inhibit-iso-escape-detection
The optional argument @var{accept-default-p}, if non-@code{nil},
should be a function to determine whether a coding system selected
-without user interaction is acceptable. @code{select-safe-coding-system}
+without user interaction is acceptable. @code{select-safe-coding-system}
calls this function with one argument, the base coding system of the
selected coding system. If @var{accept-default-p} returns @code{nil},
@code{select-safe-coding-system} rejects the silently selected coding
@cindex file contents, and default coding system
@defopt auto-coding-regexp-alist
This variable is an alist of text patterns and corresponding coding
-systems. Each element has the form @code{(@var{regexp}
+systems. Each element has the form @code{(@var{regexp}
. @var{coding-system})}; a file whose first few kilobytes match
@var{regexp} is decoded with @var{coding-system} when its contents are
read into a buffer. The settings in this alist take priority over
@example
@group
(decode-coding-string "Gr\374ss Gott" 'latin-1)
- @result{} #("Gr@"uss Gott" 0 9 (charset iso-8859-1))
+ @result{} #("GrΓΌss Gott" 0 9 (charset iso-8859-1))
@end group
@end example
@end defun
@section Locales
@cindex locale
- POSIX defines a concept of ``locales'' which control which language
+ In POSIX, locales control which language
to use in language-related features. These Emacs variables control
how Emacs interacts with these features.
@cindex keyboard input decoding on X
This variable specifies the coding system to use for decoding system
error messages and---on X Window system only---keyboard input, for
+sending batch output to the standard output and error streams, for
encoding the format argument to @code{format-time-string}, and for
decoding the return value of @code{format-time-string}.
@end defvar