* Selecting a Representation:: Treating a byte sequence as unibyte or multi.
* Character Codes:: How unibyte and multibyte relate to
codes of individual characters.
-* Character Sets:: The space of possible characters codes
+* Character Sets:: The space of possible character codes
is divided into various character sets.
* Chars and Bytes:: More information about multibyte encodings.
* Splitting Characters:: Converting a character to its byte sequence.
@defun position-bytes position
@tindex position-bytes
-Return the byte-position corresponding to buffer position @var{position}
-in the current buffer. If @var{position} is out of range, the value
-is @code{nil}.
+Return the byte-position corresponding to buffer position
+@var{position} in the current buffer. This is 1 at the start of the
+buffer, and counts upward in bytes. If @var{position} is out of
+range, the value is @code{nil}.
@end defun
@defun byte-to-position byte-position
0 through 127 are completely legitimate in both representations.
@defun char-valid-p charcode &optional genericp
-This returns @code{t} if @var{charcode} is valid for either one of the two
-text representations.
+This returns @code{t} if @var{charcode} is valid (either for unibyte
+text or for multibyte text).
@example
(char-valid-p 65)
special purposes within Emacs.
@end defun
+@deffn Command list-charset-chars charset
+This command displays a list of characters in the character set
+@var{charset}.
+@end deffn
+
@node Chars and Bytes
@section Characters and Bytes
@cindex bytes and characters
coding systems (@pxref{Coding Systems}) are capable of representing all
of the text in question.
+@defun charset-after &optional pos
+This function return the charset of a character in the current buffer
+at position @var{pos}. If @var{pos} is omitted or @code{nil}, it
+defauls to the current value of point. If @var{pos} is out of range,
+the value is @code{nil}.
+@end defun
+
@defun find-charset-region beg end &optional translation
This function returns a list of the character sets that appear in the
current buffer between positions @var{beg} and @var{end}.
own particular translation tables; there are also default translation
tables which apply to all other coding systems.
+ For instance, the coding-system @code{utf-8} has a translation table
+that maps characters of various charsets (e.g.,
+@code{latin-iso8859-@var{x}}) into Unicode character sets. This way,
+it can encode Latin-2 characters into UTF-8. Meanwhile,
+@code{unify-8859-on-decoding-mode} operates by specifying
+@code{standard-translation-table-for-decode} to translate
+Latin-@var{x} characters into corresponding Unicode characters.
+
@defun make-translation-table &rest translations
This function returns a translation table based on the argument
@var{translations}. Each element of @var{translations} should be a
Self-inserting characters are translated through this translation
table before they are inserted. This variable automatically becomes
buffer-local when set.
+
+@code{set-buffer-file-coding-system} sets this variable so that your
+keyboard input gets translated into the character sets that the buffer
+is likely to contain.
@end defvar
@node Coding Systems
conversion, but some of them leave the choice unspecified---to be chosen
heuristically for each file, based on the data.
+ In general, a coding system doesn't guarantee roundtrip identity:
+decoding a byte sequence using coding system, then encoding the
+resulting text in the same coding system, can produce a different byte
+sequence. However, the following coding systems do guarantee that the
+byte sequence will be the same as what you originally decoded:
+
+@quotation
+chinese-big5 chinese-iso-8bit cyrillic-iso-8bit emacs-mule
+greek-iso-8bit hebrew-iso-8bit iso-latin-1 iso-latin-2 iso-latin-3
+iso-latin-4 iso-latin-5 iso-latin-8 iso-latin-9 iso-safe
+japanese-iso-8bit japanese-shift-jis korean-iso-8bit raw-text
+@end quotation
+
+ Encoding buffer text and then decoding the result can also fail to
+reproduce the original text. For instance, if you encode Latin-2
+characters with @code{utf-8} and decode the result using the same
+coding system, you'll get Unicode characters (of charset
+@code{mule-unicode-0100-24ff}). If you encode Unicode characters with
+@code{iso-latin-2} and decode the result with the same coding system,
+you'll get Latin-2 characters.
+
@cindex end of line conversion
@dfn{End of line conversion} handles three different conventions used
on various systems for representing end of line in files. The Unix
uses one to encode the buffer contents.
You can specify the coding system to use either explicitly
-(@pxref{Specifying Coding Systems}), or implicitly using the defaulting
+(@pxref{Specifying Coding Systems}), or implicitly using a default
mechanism (@pxref{Default Coding Systems}). But these methods may not
completely specify what to do. For example, they may choose a coding
system such as @code{undefined} which leaves the character code
@var{encoding-system} is the coding system for encoding (in case
@var{operation} does encoding).
-The argument @var{operation} should be a symbol, one of
-@code{insert-file-contents}, @code{write-region}, @code{call-process},
-@code{call-process-region}, @code{start-process}, or
-@code{open-network-stream}. These are the names of the Emacs I/O primitives
-that can do coding system conversion.
+The argument @var{operation} should be a symbol, any one of
+@code{insert-file-contents}, @code{write-region},
+@code{start-process}, @code{call-process}, @code{call-process-region},
+or @code{open-network-stream}. These are the names of the Emacs I/O
+primitives that can do coding system conversion.
The remaining arguments should be the same arguments that might be given
to that I/O primitive. Depending on the primitive, one of those
target. For @code{open-network-stream}, the target is the service name
or port number.
-This function looks up the target in @code{file-coding-system-alist},
-@code{process-coding-system-alist}, or
-@code{network-coding-system-alist}, depending on @var{operation}.
+Depending on @var{operation}, this function looks up the target in
+@code{file-coding-system-alist}, @code{process-coding-system-alist},
+or @code{network-coding-system-alist}.
@end defun
@node Specifying Coding Systems
@example
;; @r{Read the file with no character code conversion.}
;; @r{Assume @acronym{crlf} represents end-of-line.}
-(let ((coding-system-for-write 'emacs-mule-dos))
+(let ((coding-system-for-read 'emacs-mule-dos))
(insert-file-contents filename))
@end example