X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/ac1a0ce1c6ba60a3faddc64463cb7a697b9d8fd2..eaa8c21089bd18af88dff80ae92c5eedcf3d7dda:/doc/lispref/objects.texi diff --git a/doc/lispref/objects.texi b/doc/lispref/objects.texi index 7d40f0ff93..4e8182ccf3 100644 --- a/doc/lispref/objects.texi +++ b/doc/lispref/objects.texi @@ -1,7 +1,7 @@ @c -*-texinfo-*- @c This is part of the GNU Emacs Lisp Reference Manual. -@c Copyright (C) 1990-1995, 1998-1999, 2001-2012 -@c Free Software Foundation, Inc. +@c Copyright (C) 1990-1995, 1998-1999, 2001-2014 Free Software +@c Foundation, Inc. @c See the file elisp.texi for copying conditions. @node Lisp Data Types @chapter Lisp Data Types @@ -136,7 +136,7 @@ latter are unique to Emacs Lisp. @menu * Integer Type:: Numbers without fractional parts. -* Floating Point Type:: Numbers with fractional parts and with a large range. +* Floating-Point Type:: Numbers with fractional parts and with a large range. * Character Type:: The representation of letters, numbers and control characters. * Symbol Type:: A multi-use object that refers to a function, @@ -161,24 +161,24 @@ latter are unique to Emacs Lisp. @node Integer Type @subsection Integer Type - The range of values for integers in Emacs Lisp is @minus{}536870912 to -536870911 (30 bits; i.e., + The range of values for an integer depends on the machine. The +minimum range is @minus{}536,870,912 to 536,870,911 (30 bits; i.e., @ifnottex --2**29 +@minus{}2**29 @end ifnottex @tex @math{-2^{29}} @end tex to @ifnottex -2**29 - 1) +2**29 @minus{} 1) @end ifnottex @tex @math{2^{29}-1}) @end tex -on typical 32-bit machines. (Some machines provide a wider range.) -Emacs Lisp arithmetic functions do not check for overflow. Thus -@code{(1+ 536870911)} is @minus{}536870912 if Emacs integers are 30 bits. +but many machines provide a wider range. +Emacs Lisp arithmetic functions do not check for integer overflow. Thus +@code{(1+ 536870911)} is @minus{}536,870,912 if Emacs integers are 30 bits. The read syntax for integers is a sequence of (base ten) digits with an optional sign at the beginning and an optional period at the end. The @@ -187,7 +187,7 @@ leading @samp{+} or a final @samp{.}. @example @group --1 ; @r{The integer -1.} +-1 ; @r{The integer @minus{}1.} 1 ; @r{The integer 1.} 1. ; @r{Also the integer 1.} +1 ; @r{Also the integer 1.} @@ -197,26 +197,26 @@ leading @samp{+} or a final @samp{.}. @noindent As a special exception, if a sequence of digits specifies an integer too large or too small to be a valid integer object, the Lisp reader -reads it as a floating-point number (@pxref{Floating Point Type}). +reads it as a floating-point number (@pxref{Floating-Point Type}). For instance, if Emacs integers are 30 bits, @code{536870912} is read as the floating-point number @code{536870912.0}. @xref{Numbers}, for more information. -@node Floating Point Type -@subsection Floating Point Type +@node Floating-Point Type +@subsection Floating-Point Type - Floating point numbers are the computer equivalent of scientific -notation; you can think of a floating point number as a fraction + Floating-point numbers are the computer equivalent of scientific +notation; you can think of a floating-point number as a fraction together with a power of ten. The precise number of significant figures and the range of possible exponents is machine-specific; Emacs uses the C data type @code{double} to store the value, and internally this records a power of 2 rather than a power of 10. - The printed representation for floating point numbers requires either + The printed representation for floating-point numbers requires either a decimal point (with at least one digit following), an exponent, or -both. For example, @samp{1500.0}, @samp{15e2}, @samp{15.0e2}, -@samp{1.5e3}, and @samp{.15e4} are five ways of writing a floating point +both. For example, @samp{1500.0}, @samp{+15e2}, @samp{15.0e+2}, +@samp{+1500000e-3}, and @samp{.15e4} are five ways of writing a floating-point number whose value is 1500. They are all equivalent. @xref{Numbers}, for more information. @@ -351,51 +351,48 @@ following text.) control characters, Emacs provides several types of escape syntax that you can use to specify non-@acronym{ASCII} text characters. -@cindex unicode character escape - You can specify characters by their Unicode values. -@code{?\u@var{nnnn}} represents a character that maps to the Unicode -code point @samp{U+@var{nnnn}} (by convention, Unicode code points are -given in hexadecimal). There is a slightly different syntax for -specifying characters with code points higher than -@code{U+@var{ffff}}: @code{\U00@var{nnnnnn}} represents the character -whose code point is @samp{U+@var{nnnnnn}}. The Unicode Standard only -defines code points up to @samp{U+@var{10ffff}}, so if you specify a -code point higher than that, Emacs signals an error. - - This peculiar and inconvenient syntax was adopted for compatibility -with other programming languages. Unlike some other languages, Emacs -Lisp supports this syntax only in character literals and strings. - @cindex @samp{\} in character constant @cindex backslash in character constants -@cindex octal character code - The most general read syntax for a character represents the -character code in either octal or hex. To use octal, write a question -mark followed by a backslash and the octal character code (up to three -octal digits); thus, @samp{?\101} for the character @kbd{A}, -@samp{?\001} for the character @kbd{C-a}, and @code{?\002} for the -character @kbd{C-b}. Although this syntax can represent any -@acronym{ASCII} character, it is preferred only when the precise octal -value is more important than the @acronym{ASCII} representation. - -@example -@group -?\012 @result{} 10 ?\n @result{} 10 ?\C-j @result{} 10 -?\101 @result{} 65 ?A @result{} 65 -@end group -@end example - - To use hex, write a question mark followed by a backslash, @samp{x}, -and the hexadecimal character code. You can use any number of hex -digits, so you can represent any character code in this way. -Thus, @samp{?\x41} for the character @kbd{A}, @samp{?\x1} for the -character @kbd{C-a}, and @code{?\xe0} for the Latin-1 character +@cindex unicode character escape + Firstly, you can specify characters by their Unicode values. +@code{?\u@var{nnnn}} represents a character with Unicode code point +@samp{U+@var{nnnn}}, where @var{nnnn} is (by convention) a hexadecimal +number with exactly four digits. The backslash indicates that the +subsequent characters form an escape sequence, and the @samp{u} +specifies a Unicode escape sequence. + + There is a slightly different syntax for specifying Unicode +characters with code points higher than @code{U+@var{ffff}}: +@code{?\U00@var{nnnnnn}} represents the character with code point +@samp{U+@var{nnnnnn}}, where @var{nnnnnn} is a six-digit hexadecimal +number. The Unicode Standard only defines code points up to +@samp{U+@var{10ffff}}, so if you specify a code point higher than +that, Emacs signals an error. + + Secondly, you can specify characters by their hexadecimal character +codes. A hexadecimal escape sequence consists of a backslash, +@samp{x}, and the hexadecimal character code. Thus, @samp{?\x41} is +the character @kbd{A}, @samp{?\x1} is the character @kbd{C-a}, and +@code{?\xe0} is the character @iftex @samp{@`a}. @end iftex @ifnottex @samp{a} with grave accent. @end ifnottex +You can use any number of hex digits, so you can represent any +character code in this way. + +@cindex octal character code + Thirdly, you can specify characters by their character code in +octal. An octal escape sequence consists of a backslash followed by +up to three octal digits; thus, @samp{?\101} for the character +@kbd{A}, @samp{?\001} for the character @kbd{C-a}, and @code{?\002} +for the character @kbd{C-b}. Only characters up to octal code 777 can +be specified this way. + + These escape sequences may also be used in strings. @xref{Non-ASCII +in Strings}. @node Ctl-Char Syntax @subsubsection Control-Character Syntax @@ -568,8 +565,8 @@ Lisp, upper case and lower case letters are distinct. @end quotation Here are several examples of symbol names. Note that the @samp{+} in -the fifth example is escaped to prevent it from being read as a number. -This is not necessary in the fourth example because the rest of the name +the fourth example is escaped to prevent it from being read as a number. +This is not necessary in the sixth example because the rest of the name makes it invalid as a number. @example @@ -1026,40 +1023,53 @@ but the newline is ignored if escaped." @node Non-ASCII in Strings @subsubsection Non-@acronym{ASCII} Characters in Strings - You can include a non-@acronym{ASCII} international character in a -string constant by writing it literally. There are two text -representations for non-@acronym{ASCII} characters in Emacs strings -(and in buffers): unibyte and multibyte (@pxref{Text -Representations}). If the string constant is read from a multibyte -source, such as a multibyte buffer or string, or a file that would be -visited as multibyte, then Emacs reads the non-@acronym{ASCII} -character as a multibyte character and automatically makes the string -a multibyte string. If the string constant is read from a unibyte -source, then Emacs reads the non-@acronym{ASCII} character as unibyte, -and makes the string unibyte. - - Instead of writing a non-@acronym{ASCII} character literally into a -multibyte string, you can write it as its character code using a hex -escape, @samp{\x@var{nnnnnnn}}, with as many digits as necessary. -(Multibyte non-@acronym{ASCII} character codes are all greater than -256.) You can also specify a character in a multibyte string using -the @samp{\u} or @samp{\U} Unicode escape syntax (@pxref{General -Escape Syntax}). In either case, any character which is not a valid -hex digit terminates the construct. If the next character in the -string could be interpreted as a hex digit, write @w{@samp{\ }} -(backslash and space) to terminate the hex escape---for example, + There are two text representations for non-@acronym{ASCII} +characters in Emacs strings: multibyte and unibyte (@pxref{Text +Representations}). Roughly speaking, unibyte strings store raw bytes, +while multibyte strings store human-readable text. Each character in +a unibyte string is a byte, i.e., its value is between 0 and 255. By +contrast, each character in a multibyte string may have a value +between 0 to 4194303 (@pxref{Character Type}). In both cases, +characters above 127 are non-@acronym{ASCII}. + + You can include a non-@acronym{ASCII} character in a string constant +by writing it literally. If the string constant is read from a +multibyte source, such as a multibyte buffer or string, or a file that +would be visited as multibyte, then Emacs reads each +non-@acronym{ASCII} character as a multibyte character and +automatically makes the string a multibyte string. If the string +constant is read from a unibyte source, then Emacs reads the +non-@acronym{ASCII} character as unibyte, and makes the string +unibyte. + + Instead of writing a character literally into a multibyte string, +you can write it as its character code using an escape sequence. +@xref{General Escape Syntax}, for details about escape sequences. + + If you use any Unicode-style escape sequence @samp{\uNNNN} or +@samp{\U00NNNNNN} in a string constant (even for an @acronym{ASCII} +character), Emacs automatically assumes that it is multibyte. + + You can also use hexadecimal escape sequences (@samp{\x@var{n}}) and +octal escape sequences (@samp{\@var{n}}) in string constants. +@strong{But beware:} If a string constant contains hexadecimal or +octal escape sequences, and these escape sequences all specify unibyte +characters (i.e., less than 256), and there are no other literal +non-@acronym{ASCII} characters or Unicode-style escape sequences in +the string, then Emacs automatically assumes that it is a unibyte +string. That is to say, it assumes that all non-@acronym{ASCII} +characters occurring in the string are 8-bit raw bytes. + + In hexadecimal and octal escape sequences, the escaped character +code may contain a variable number of digits, so the first subsequent +character which is not a valid hexadecimal or octal digit terminates +the escape sequence. If the next character in a string could be +interpreted as a hexadecimal or octal digit, write @w{@samp{\ }} +(backslash and space) to terminate the escape sequence. For example, @w{@samp{\xe0\ }} represents one character, @samp{a} with grave accent. @w{@samp{\ }} in a string constant is just like backslash-newline; it does not contribute any character to the string, -but it does terminate the preceding hex escape. Using any hex escape -in a string (even for an @acronym{ASCII} character) automatically -forces the string to be multibyte. - - You can represent a unibyte non-@acronym{ASCII} character with its -character code, which must be in the range from 128 (0200 octal) to -255 (0377 octal). If you write all such character codes in octal and -the string contains no other characters forcing it to be multibyte, -this produces a unibyte string. +but it does terminate any preceding hex escape. @node Nonprinting Characters @subsubsection Nonprinting Characters in Strings @@ -1167,8 +1177,10 @@ inherit from, a default value, and a small number of extra slots to use for special purposes. A char-table can also specify a single value for a whole character set. +@cindex @samp{#^} read syntax The printed representation of a char-table is like a vector -except that there is an extra @samp{#^} at the beginning. +except that there is an extra @samp{#^} at the beginning.@footnote{You +may also encounter @samp{#^^}, used for ``sub-char-tables''.} @xref{Char-Tables}, for special functions to operate on char-tables. Uses of char-tables include: @@ -1289,7 +1301,7 @@ called @dfn{subrs} or @dfn{built-in functions}. (The word ``subr'' is derived from ``subroutine''.) Most primitive functions evaluate all their arguments when they are called. A primitive function that does not evaluate all its arguments is called a @dfn{special form} -(@pxref{Special Forms}).@refill +(@pxref{Special Forms}). It does not matter to the caller of a function whether the function is primitive. However, this does matter if you try to redefine a primitive @@ -1300,7 +1312,7 @@ may still use the built-in definition. Therefore, @strong{we discourage redefinition of primitive functions}. The term @dfn{function} refers to all Emacs functions, whether written -in Lisp or C. @xref{Function Type}, for information about the +in Lisp or C@. @xref{Function Type}, for information about the functions written in Lisp. Primitive functions have no read syntax and print in hash notation @@ -1924,7 +1936,7 @@ This function returns a symbol naming the primitive type of Here we describe functions that test for equality between two objects. Other functions test equality of contents between objects of -specific types, e.g.@: strings. For these predicates, see the +specific types, e.g., strings. For these predicates, see the appropriate chapter describing the data type. @defun eq object1 object2 @@ -1932,10 +1944,10 @@ This function returns @code{t} if @var{object1} and @var{object2} are the same object, and @code{nil} otherwise. If @var{object1} and @var{object2} are integers with the same value, -they are considered to be the same object (i.e.@: @code{eq} returns +they are considered to be the same object (i.e., @code{eq} returns @code{t}). If @var{object1} and @var{object2} are symbols with the same name, they are normally the same object---but see @ref{Creating -Symbols} for exceptions. For other types (e.g.@: lists, vectors, +Symbols} for exceptions. For other types (e.g., lists, vectors, strings), two arguments with the same contents or elements are not necessarily @code{eq} to each other: they are @code{eq} only if they are the same object, meaning that a change in the contents of one will