@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990-1995, 1998-1999, 2001-2012
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990-1995, 1998-1999, 2001-2014 Free Software
+@c Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@node Lisp Data Types
@chapter Lisp Data Types
@menu
* Integer Type:: Numbers without fractional parts.
-* Floating Point Type:: Numbers with fractional parts and with a large range.
+* Floating-Point Type:: Numbers with fractional parts and with a large range.
* Character Type:: The representation of letters, numbers and
control characters.
* Symbol Type:: A multi-use object that refers to a function,
@node Integer Type
@subsection Integer Type
- The range of values for integers in Emacs Lisp is @minus{}536870912 to
-536870911 (30 bits; i.e.,
+ The range of values for an integer depends on the machine. The
+minimum range is @minus{}536,870,912 to 536,870,911 (30 bits; i.e.,
@ifnottex
--2**29
+@minus{}2**29
@end ifnottex
@tex
@math{-2^{29}}
@end tex
to
@ifnottex
-2**29 - 1)
+2**29 @minus{} 1)
@end ifnottex
@tex
@math{2^{29}-1})
@end tex
-on typical 32-bit machines. (Some machines provide a wider range.)
-Emacs Lisp arithmetic functions do not check for overflow. Thus
-@code{(1+ 536870911)} is @minus{}536870912 if Emacs integers are 30 bits.
+but many machines provide a wider range.
+Emacs Lisp arithmetic functions do not check for integer overflow. Thus
+@code{(1+ 536870911)} is @minus{}536,870,912 if Emacs integers are 30 bits.
The read syntax for integers is a sequence of (base ten) digits with an
optional sign at the beginning and an optional period at the end. The
@example
@group
--1 ; @r{The integer -1.}
+-1 ; @r{The integer @minus{}1.}
1 ; @r{The integer 1.}
1. ; @r{Also the integer 1.}
+1 ; @r{Also the integer 1.}
@noindent
As a special exception, if a sequence of digits specifies an integer
too large or too small to be a valid integer object, the Lisp reader
-reads it as a floating-point number (@pxref{Floating Point Type}).
+reads it as a floating-point number (@pxref{Floating-Point Type}).
For instance, if Emacs integers are 30 bits, @code{536870912} is read
as the floating-point number @code{536870912.0}.
@xref{Numbers}, for more information.
-@node Floating Point Type
-@subsection Floating Point Type
+@node Floating-Point Type
+@subsection Floating-Point Type
- Floating point numbers are the computer equivalent of scientific
-notation; you can think of a floating point number as a fraction
+ Floating-point numbers are the computer equivalent of scientific
+notation; you can think of a floating-point number as a fraction
together with a power of ten. The precise number of significant
figures and the range of possible exponents is machine-specific; Emacs
uses the C data type @code{double} to store the value, and internally
this records a power of 2 rather than a power of 10.
- The printed representation for floating point numbers requires either
+ The printed representation for floating-point numbers requires either
a decimal point (with at least one digit following), an exponent, or
-both. For example, @samp{1500.0}, @samp{15e2}, @samp{15.0e2},
-@samp{1.5e3}, and @samp{.15e4} are five ways of writing a floating point
+both. For example, @samp{1500.0}, @samp{+15e2}, @samp{15.0e+2},
+@samp{+1500000e-3}, and @samp{.15e4} are five ways of writing a floating-point
number whose value is 1500. They are all equivalent.
@xref{Numbers}, for more information.
control characters, Emacs provides several types of escape syntax that
you can use to specify non-@acronym{ASCII} text characters.
-@cindex unicode character escape
- You can specify characters by their Unicode values.
-@code{?\u@var{nnnn}} represents a character that maps to the Unicode
-code point @samp{U+@var{nnnn}} (by convention, Unicode code points are
-given in hexadecimal). There is a slightly different syntax for
-specifying characters with code points higher than
-@code{U+@var{ffff}}: @code{\U00@var{nnnnnn}} represents the character
-whose code point is @samp{U+@var{nnnnnn}}. The Unicode Standard only
-defines code points up to @samp{U+@var{10ffff}}, so if you specify a
-code point higher than that, Emacs signals an error.
-
- This peculiar and inconvenient syntax was adopted for compatibility
-with other programming languages. Unlike some other languages, Emacs
-Lisp supports this syntax only in character literals and strings.
-
@cindex @samp{\} in character constant
@cindex backslash in character constants
-@cindex octal character code
- The most general read syntax for a character represents the
-character code in either octal or hex. To use octal, write a question
-mark followed by a backslash and the octal character code (up to three
-octal digits); thus, @samp{?\101} for the character @kbd{A},
-@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the
-character @kbd{C-b}. Although this syntax can represent any
-@acronym{ASCII} character, it is preferred only when the precise octal
-value is more important than the @acronym{ASCII} representation.
-
-@example
-@group
-?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10
-?\101 @result{} 65 ?A @result{} 65
-@end group
-@end example
-
- To use hex, write a question mark followed by a backslash, @samp{x},
-and the hexadecimal character code. You can use any number of hex
-digits, so you can represent any character code in this way.
-Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the
-character @kbd{C-a}, and @code{?\xe0} for the Latin-1 character
+@cindex unicode character escape
+ Firstly, you can specify characters by their Unicode values.
+@code{?\u@var{nnnn}} represents a character with Unicode code point
+@samp{U+@var{nnnn}}, where @var{nnnn} is (by convention) a hexadecimal
+number with exactly four digits. The backslash indicates that the
+subsequent characters form an escape sequence, and the @samp{u}
+specifies a Unicode escape sequence.
+
+ There is a slightly different syntax for specifying Unicode
+characters with code points higher than @code{U+@var{ffff}}:
+@code{?\U00@var{nnnnnn}} represents the character with code point
+@samp{U+@var{nnnnnn}}, where @var{nnnnnn} is a six-digit hexadecimal
+number. The Unicode Standard only defines code points up to
+@samp{U+@var{10ffff}}, so if you specify a code point higher than
+that, Emacs signals an error.
+
+ Secondly, you can specify characters by their hexadecimal character
+codes. A hexadecimal escape sequence consists of a backslash,
+@samp{x}, and the hexadecimal character code. Thus, @samp{?\x41} is
+the character @kbd{A}, @samp{?\x1} is the character @kbd{C-a}, and
+@code{?\xe0} is the character
@iftex
@samp{@`a}.
@end iftex
@ifnottex
@samp{a} with grave accent.
@end ifnottex
+You can use any number of hex digits, so you can represent any
+character code in this way.
+
+@cindex octal character code
+ Thirdly, you can specify characters by their character code in
+octal. An octal escape sequence consists of a backslash followed by
+up to three octal digits; thus, @samp{?\101} for the character
+@kbd{A}, @samp{?\001} for the character @kbd{C-a}, and @code{?\002}
+for the character @kbd{C-b}. Only characters up to octal code 777 can
+be specified this way.
+
+ These escape sequences may also be used in strings. @xref{Non-ASCII
+in Strings}.
@node Ctl-Char Syntax
@subsubsection Control-Character Syntax
@end quotation
Here are several examples of symbol names. Note that the @samp{+} in
-the fifth example is escaped to prevent it from being read as a number.
-This is not necessary in the fourth example because the rest of the name
+the fourth example is escaped to prevent it from being read as a number.
+This is not necessary in the sixth example because the rest of the name
makes it invalid as a number.
@example
@node Non-ASCII in Strings
@subsubsection Non-@acronym{ASCII} Characters in Strings
- You can include a non-@acronym{ASCII} international character in a
-string constant by writing it literally. There are two text
-representations for non-@acronym{ASCII} characters in Emacs strings
-(and in buffers): unibyte and multibyte (@pxref{Text
-Representations}). If the string constant is read from a multibyte
-source, such as a multibyte buffer or string, or a file that would be
-visited as multibyte, then Emacs reads the non-@acronym{ASCII}
-character as a multibyte character and automatically makes the string
-a multibyte string. If the string constant is read from a unibyte
-source, then Emacs reads the non-@acronym{ASCII} character as unibyte,
-and makes the string unibyte.
-
- Instead of writing a non-@acronym{ASCII} character literally into a
-multibyte string, you can write it as its character code using a hex
-escape, @samp{\x@var{nnnnnnn}}, with as many digits as necessary.
-(Multibyte non-@acronym{ASCII} character codes are all greater than
-256.) You can also specify a character in a multibyte string using
-the @samp{\u} or @samp{\U} Unicode escape syntax (@pxref{General
-Escape Syntax}). In either case, any character which is not a valid
-hex digit terminates the construct. If the next character in the
-string could be interpreted as a hex digit, write @w{@samp{\ }}
-(backslash and space) to terminate the hex escape---for example,
+ There are two text representations for non-@acronym{ASCII}
+characters in Emacs strings: multibyte and unibyte (@pxref{Text
+Representations}). Roughly speaking, unibyte strings store raw bytes,
+while multibyte strings store human-readable text. Each character in
+a unibyte string is a byte, i.e., its value is between 0 and 255. By
+contrast, each character in a multibyte string may have a value
+between 0 to 4194303 (@pxref{Character Type}). In both cases,
+characters above 127 are non-@acronym{ASCII}.
+
+ You can include a non-@acronym{ASCII} character in a string constant
+by writing it literally. If the string constant is read from a
+multibyte source, such as a multibyte buffer or string, or a file that
+would be visited as multibyte, then Emacs reads each
+non-@acronym{ASCII} character as a multibyte character and
+automatically makes the string a multibyte string. If the string
+constant is read from a unibyte source, then Emacs reads the
+non-@acronym{ASCII} character as unibyte, and makes the string
+unibyte.
+
+ Instead of writing a character literally into a multibyte string,
+you can write it as its character code using an escape sequence.
+@xref{General Escape Syntax}, for details about escape sequences.
+
+ If you use any Unicode-style escape sequence @samp{\uNNNN} or
+@samp{\U00NNNNNN} in a string constant (even for an @acronym{ASCII}
+character), Emacs automatically assumes that it is multibyte.
+
+ You can also use hexadecimal escape sequences (@samp{\x@var{n}}) and
+octal escape sequences (@samp{\@var{n}}) in string constants.
+@strong{But beware:} If a string constant contains hexadecimal or
+octal escape sequences, and these escape sequences all specify unibyte
+characters (i.e., less than 256), and there are no other literal
+non-@acronym{ASCII} characters or Unicode-style escape sequences in
+the string, then Emacs automatically assumes that it is a unibyte
+string. That is to say, it assumes that all non-@acronym{ASCII}
+characters occurring in the string are 8-bit raw bytes.
+
+ In hexadecimal and octal escape sequences, the escaped character
+code may contain a variable number of digits, so the first subsequent
+character which is not a valid hexadecimal or octal digit terminates
+the escape sequence. If the next character in a string could be
+interpreted as a hexadecimal or octal digit, write @w{@samp{\ }}
+(backslash and space) to terminate the escape sequence. For example,
@w{@samp{\xe0\ }} represents one character, @samp{a} with grave
accent. @w{@samp{\ }} in a string constant is just like
backslash-newline; it does not contribute any character to the string,
-but it does terminate the preceding hex escape. Using any hex escape
-in a string (even for an @acronym{ASCII} character) automatically
-forces the string to be multibyte.
-
- You can represent a unibyte non-@acronym{ASCII} character with its
-character code, which must be in the range from 128 (0200 octal) to
-255 (0377 octal). If you write all such character codes in octal and
-the string contains no other characters forcing it to be multibyte,
-this produces a unibyte string.
+but it does terminate any preceding hex escape.
@node Nonprinting Characters
@subsubsection Nonprinting Characters in Strings
special purposes. A char-table can also specify a single value for
a whole character set.
+@cindex @samp{#^} read syntax
The printed representation of a char-table is like a vector
-except that there is an extra @samp{#^} at the beginning.
+except that there is an extra @samp{#^} at the beginning.@footnote{You
+may also encounter @samp{#^^}, used for ``sub-char-tables''.}
@xref{Char-Tables}, for special functions to operate on char-tables.
Uses of char-tables include:
derived from ``subroutine''.) Most primitive functions evaluate all
their arguments when they are called. A primitive function that does
not evaluate all its arguments is called a @dfn{special form}
-(@pxref{Special Forms}).@refill
+(@pxref{Special Forms}).
It does not matter to the caller of a function whether the function is
primitive. However, this does matter if you try to redefine a primitive
redefinition of primitive functions}.
The term @dfn{function} refers to all Emacs functions, whether written
-in Lisp or C. @xref{Function Type}, for information about the
+in Lisp or C@. @xref{Function Type}, for information about the
functions written in Lisp.
Primitive functions have no read syntax and print in hash notation
@item custom-variable-p
@xref{Variable Definitions, custom-variable-p}.
-@item display-table-p
-@xref{Display Tables, display-table-p}.
-
@item floatp
@xref{Predicates on Numbers, floatp}.
Here we describe functions that test for equality between two
objects. Other functions test equality of contents between objects of
-specific types, e.g.@: strings. For these predicates, see the
+specific types, e.g., strings. For these predicates, see the
appropriate chapter describing the data type.
@defun eq object1 object2
the same object, and @code{nil} otherwise.
If @var{object1} and @var{object2} are integers with the same value,
-they are considered to be the same object (i.e.@: @code{eq} returns
+they are considered to be the same object (i.e., @code{eq} returns
@code{t}). If @var{object1} and @var{object2} are symbols with the
same name, they are normally the same object---but see @ref{Creating
-Symbols} for exceptions. For other types (e.g.@: lists, vectors,
+Symbols} for exceptions. For other types (e.g., lists, vectors,
strings), two arguments with the same contents or elements are not
necessarily @code{eq} to each other: they are @code{eq} only if they
are the same object, meaning that a change in the contents of one will