X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/4d36e5246e3d182b84f5d776e730a81e03fff06a..89ce83b20249dfb4e45f09dfdddf4c4b66d82968:/doc/lispref/nonascii.texi diff --git a/doc/lispref/nonascii.texi b/doc/lispref/nonascii.texi index 50e50ff39a..fd2ce3248f 100644 --- a/doc/lispref/nonascii.texi +++ b/doc/lispref/nonascii.texi @@ -1,6 +1,6 @@ -@c -*-texinfo-*- +@c -*- mode: texinfo; coding: utf-8 -*- @c This is part of the GNU Emacs Lisp Reference Manual. -@c Copyright (C) 1998-1999, 2001-2015 Free Software Foundation, Inc. +@c Copyright (C) 1998-1999, 2001-2016 Free Software Foundation, Inc. @c See the file elisp.texi for copying conditions. @node Non-ASCII Characters @chapter Non-@acronym{ASCII} Characters @@ -123,6 +123,45 @@ In other words, the value does not change for all byte positions that belong to the same character. @end defun +@cindex convert file byte to buffer position +@cindex convert buffer position to file byte + The following two functions are useful when a Lisp program needs to +map buffer positions to byte offsets in a file visited by the buffer. + +@defun bufferpos-to-filepos position &optional quality coding-system +This function is similar to @code{position-bytes}, but instead of byte +position in the current buffer it returns the offset from the +beginning of the current buffer's file of the byte that corresponds to +the given character @var{position} in the buffer. The conversion +requires to know how the text is encoded in the buffer's file; this is +what the @var{coding-system} argument is for, defaulting to the value +of @code{buffer-file-coding-system}. The optional argument +@var{quality} specifies how accurate the result should be; it should +be one of the following: + +@table @code +@item exact +The result must be accurate. The function may need to encode and +decode a large part of the buffer. +@item approximate +The value can be an approximation. The function may avoid expensive +processing and return an inexact result. +@item nil +If the exact result needs expensive processing, the function will +return @code{nil} rather than an approximation. This is the default +if the argument is omitted. +@end table +@end defun + +@defun filepos-to-bufferpos byte &optional quality coding-system +This function returns the buffer position corresponding to a file +position specified by @var{byte}, a zero-base byte offset from the +file's beginning. The function performs the conversion opposite to +what @code{bufferpos-to-filepos} does. Optional arguments +@var{quality} and @var{coding-system} have the same meaning and values +as for @code{bufferpos-to-filepos}. +@end defun + @defun multibyte-string-p string Return @code{t} if @var{string} is a multibyte string, @code{nil} otherwise. This function also returns @code{nil} if @var{string} is @@ -248,7 +287,7 @@ unibyte string, it is returned unchanged. Use this function for characters. @end defun -@c FIXME: Should `@var{character}' be `@var{byte}'? +@c FIXME: Should '@var{character}' be '@var{byte}'? @defun byte-to-string byte @cindex byte to string This function returns a unibyte string containing a single byte of @@ -381,6 +420,18 @@ codepoint can have. @end example @end defun +@defun char-from-name string &optional ignore-case +This function returns the character whose Unicode name is @var{string}. +If @var{ignore-case} is non-@code{nil}, case is ignored in @var{string}. +This function returns @code{nil} if @var{string} does not name a character. + +@example +;; U+03A3 +(= (char-from-name "GREEK CAPITAL LETTER SIGMA") #x03A3) + @result{} t +@end example +@end defun + @defun get-byte &optional pos string This function returns the byte at character position @var{pos} in the current buffer. If the current buffer is unibyte, this is literally @@ -474,32 +525,36 @@ inside @samp{<..>} brackets, but the tag names in Emacs do not include the brackets; e.g., Unicode specifies @samp{} where Emacs uses @samp{small}. }; the other elements are characters that give the compatibility decomposition sequence of this character. For -unassigned codepoints, the value is the character itself. +characters that don't have decomposition sequences, and for unassigned +codepoints, the value is a list with a single member, the character +itself. @item decimal-digit-value Corresponds to the Unicode @code{Numeric_Value} property for characters whose @code{Numeric_Type} is @samp{Decimal}. The value is -an integer. For unassigned codepoints, the value is -@code{nil}, which means @acronym{NaN}, or ``not-a-number''. +an integer, or @code{nil} if the character has no decimal digit value. +For unassigned codepoints, the value is @code{nil}, which means +@acronym{NaN}, or ``not a number''. @item digit-value Corresponds to the Unicode @code{Numeric_Value} property for characters whose @code{Numeric_Type} is @samp{Digit}. The value is an -integer. Examples of such characters include compatibility -subscript and superscript digits, for which the value is the -corresponding number. For unassigned codepoints, the value is -@code{nil}, which means @acronym{NaN}. +integer. Examples of such characters include compatibility subscript +and superscript digits, for which the value is the corresponding +number. For characters that don't have any numeric value, and for +unassigned codepoints, the value is @code{nil}, which means +@acronym{NaN}. @item numeric-value Corresponds to the Unicode @code{Numeric_Value} property for characters whose @code{Numeric_Type} is @samp{Numeric}. The value of -this property is a number. Examples of -characters that have this property include fractions, subscripts, -superscripts, Roman numerals, currency numerators, and encircled -numbers. For example, the value of this property for the character -@code{U+2155} (@sc{vulgar fraction one fifth}) is @code{0.2}. For -unassigned codepoints, the value is @code{nil}, which means -@acronym{NaN}. +this property is a number. Examples of characters that have this +property include fractions, subscripts, superscripts, Roman numerals, +currency numerators, and encircled numbers. For example, the value of +this property for the character @code{U+2155} (@sc{vulgar fraction one +fifth}) is @code{0.2}. For characters that don't have any numeric +value, and for unassigned codepoints, the value is @code{nil}, which +means @acronym{NaN}. @cindex mirroring of characters @item mirrored @@ -540,12 +595,13 @@ property is used for bidirectional display. @item old-name Corresponds to the Unicode @code{Unicode_1_Name} property. The value -is a string. Unassigned codepoints, and characters that have no value -for this property, the value is @code{nil}. +is a string. For unassigned codepoints, and characters that have no +value for this property, the value is @code{nil}. @item iso-10646-comment Corresponds to the Unicode @code{ISO_Comment} property. The value is -a string. For unassigned codepoints, the value is an empty string. +either a string or @code{nil}. For unassigned codepoints, the value +is @code{nil}. @item uppercase Corresponds to the Unicode @code{Simple_Uppercase_Mapping} property. @@ -578,18 +634,21 @@ This function returns the value of @var{char}'s @var{propname} property. @result{} Nd @end group @group -;; subscript 4 -(get-char-code-property ?\u2084 'digit-value) +;; U+2084 +(get-char-code-property ?\N@{SUBSCRIPT FOUR@} + 'digit-value) @result{} 4 @end group @group -;; one fifth -(get-char-code-property ?\u2155 'numeric-value) +;; U+2155 +(get-char-code-property ?\N@{VULGAR FRACTION ONE FIFTH@} + 'numeric-value) @result{} 0.2 @end group @group -;; Roman IV -(get-char-code-property ?\u2163 'numeric-value) +;; U+2163 +(get-char-code-property ?\N@{ROMAN NUMERAL FOUR@} + 'numeric-value) @result{} 4 @end group @group @@ -1294,9 +1353,9 @@ operates on the contents of @var{string} instead of bytes in the buffer. @cindex null bytes, and decoding text @defvar inhibit-null-byte-detection If this variable has a non-@code{nil} value, null bytes are ignored -when detecting the encoding of a region or a string. This allows to -correctly detect the encoding of text that contains null bytes, such -as Info files with Index nodes. +when detecting the encoding of a region or a string. This allows the +encoding of text that contains null bytes to be correctly detected, +such as Info files with Index nodes. @end defvar @defvar inhibit-iso-escape-detection @@ -1375,7 +1434,7 @@ alternatives described above. The optional argument @var{accept-default-p}, if non-@code{nil}, should be a function to determine whether a coding system selected -without user interaction is acceptable. @code{select-safe-coding-system} +without user interaction is acceptable. @code{select-safe-coding-system} calls this function with one argument, the base coding system of the selected coding system. If @var{accept-default-p} returns @code{nil}, @code{select-safe-coding-system} rejects the silently selected coding @@ -1437,7 +1496,7 @@ don't change these variables; instead, override them using @cindex file contents, and default coding system @defopt auto-coding-regexp-alist This variable is an alist of text patterns and corresponding coding -systems. Each element has the form @code{(@var{regexp} +systems. Each element has the form @code{(@var{regexp} . @var{coding-system})}; a file whose first few kilobytes match @var{regexp} is decoded with @var{coding-system} when its contents are read into a buffer. The settings in this alist take priority over @@ -1817,7 +1876,7 @@ original text: @example @group (decode-coding-string "Gr\374ss Gott" 'latin-1) - @result{} #("Gr@"uss Gott" 0 9 (charset iso-8859-1)) + @result{} #("Grüss Gott" 0 9 (charset iso-8859-1)) @end group @end example @end defun @@ -1951,7 +2010,7 @@ and @ref{Invoking the Input Method}. @section Locales @cindex locale - POSIX defines a concept of ``locales'' which control which language + In POSIX, locales control which language to use in language-related features. These Emacs variables control how Emacs interacts with these features. @@ -1959,6 +2018,7 @@ how Emacs interacts with these features. @cindex keyboard input decoding on X This variable specifies the coding system to use for decoding system error messages and---on X Window system only---keyboard input, for +sending batch output to the standard output and error streams, for encoding the format argument to @code{format-time-string}, and for decoding the return value of @code{format-time-string}. @end defvar