Make verify-visited-file-modtime default to the current buffer.

[gnu-emacs] / doc / lispref / strings.texi
diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi

index f119b3ab84b61497942d3e24592239d3d256f5af..94d2765a833dcc570c67e008914e45e21e41ddde 100644 (file)
--- a/doc/lispref/strings.texi
+++ b/doc/lispref/strings.texi
@@ -1,7 +1,8 @@
  @c -*-texinfo-*-
  @c This is part of the GNU Emacs Lisp Reference Manual.
  @c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2001,
-@c   2002, 2003, 2004, 2005, 2006, 2007  Free Software Foundation, Inc.
+@c   2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+@c   Free Software Foundation, Inc.
  @c See the file elisp.texi for copying conditions.
  @setfilename ../../info/strings
  @node Strings and Characters, Lists, Numbers, Top
@@ -31,7 +32,7 @@ keyboard character events.
  * String Conversion::         Converting to and from characters and strings.
  * Formatting Strings::        @code{format}: Emacs's analogue of @code{printf}.
  * Case Conversion::           Case conversion functions.
-* Case Tables::                      Customizing case conversion.
+* Case Tables::               Customizing case conversion.
  @end menu
  
  @node String Basics
@@ -39,7 +40,8 @@ keyboard character events.
  
    Characters are represented in Emacs Lisp as integers;
  whether an integer is a character or not is determined only by how it is
-used.  Thus, strings really contain integers.
+used.  Thus, strings really contain integers.  @xref{Character Codes},
+for details about character representation in Emacs.
  
    The length of a string (like any array) is fixed, and cannot be
  altered once the string exists.  Strings in Lisp are @emph{not}
@@ -54,24 +56,19 @@ and @code{aset} (@pxref{Array Functions}).
  
    There are two text representations for non-@acronym{ASCII} characters in
  Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text
-Representations}).  An @acronym{ASCII} character always occupies one byte in a
-string; in fact, when a string is all @acronym{ASCII}, there is no real
-difference between the unibyte and multibyte representations.
-For most Lisp programming, you don't need to be concerned with these two
-representations.
-
-  Sometimes key sequences are represented as strings.  When a string is
-a key sequence, string elements in the range 128 to 255 represent meta
-characters (which are large integers) rather than character
-codes in the range 128 to 255.
-
-  Strings cannot hold characters that have the hyper, super or alt
-modifiers; they can hold @acronym{ASCII} control characters, but no other
-control characters.  They do not distinguish case in @acronym{ASCII} control
-characters.  If you want to store such characters in a sequence, such as
-a key sequence, you must use a vector instead of a string.
-@xref{Character Type}, for more information about the representation of meta
-and other modifiers for keyboard input characters.
+Representations}).  For most Lisp programming, you don't need to be
+concerned with these two representations.
+
+  Sometimes key sequences are represented as unibyte strings.  When a
+unibyte string is a key sequence, string elements in the range 128 to
+255 represent meta characters (which are large integers) rather than
+character codes in the range 128 to 255.  Strings cannot hold
+characters that have the hyper, super or alt modifiers; they can hold
+@acronym{ASCII} control characters, but no other control characters.
+They do not distinguish case in @acronym{ASCII} control characters.
+If you want to store such characters in a sequence, such as a key
+sequence, you must use a vector instead of a string.  @xref{Character
+Type}, for more information about keyboard input characters.
  
    Strings are useful for holding regular expressions.  You can also
  match regular expressions against strings with @code{string-match}
@@ -103,8 +100,8 @@ otherwise.
  @end defun
  
  @defun string-or-null-p object
-This function returns @code{t} if @var{object} is a string or nil,
-@code{nil} otherwise.
+This function returns @code{t} if @var{object} is a string or
+@code{nil}.  It returns @code{nil} otherwise.
  @end defun
  
  @defun char-or-string-p object
@@ -129,9 +126,8 @@ This function returns a string made up of @var{count} repetitions of
       @result{} ""
  @end example
  
-  Other functions to compare with this one include @code{char-to-string}
-(@pxref{String Conversion}), @code{make-vector} (@pxref{Vectors}), and
-@code{make-list} (@pxref{Building Lists}).
+  Other functions to compare with this one include @code{make-vector}
+(@pxref{Vectors}) and @code{make-list} (@pxref{Building Lists}).
  @end defun
  
  @defun string &rest characters
@@ -157,11 +153,11 @@ index @var{start} up to (but excluding) the character at the index
  @end example
  
  @noindent
-Here the index for @samp{a} is 0, the index for @samp{b} is 1, and the
-index for @samp{c} is 2.  Thus, three letters, @samp{abc}, are copied
-from the string @code{"abcdefg"}.  The index 3 marks the character
-position up to which the substring is copied.  The character whose index
-is 3 is actually the fourth character in the string.
+In the above example, the index for @samp{a} is 0, the index for
+@samp{b} is 1, and the index for @samp{c} is 2.  The index 3---which
+is the fourth character in the string---marks the character position
+up to which the substring is copied.  Thus, @samp{abc} is copied from
+the string @code{"abcdefg"}.
  
  A negative number counts from the end of the string, so that @minus{}1
  signifies the index of the last character of the string.  For example:
@@ -258,38 +254,32 @@ returns an empty string.
  @end example
  
  @noindent
-The @code{concat} function always constructs a new string that is
-not @code{eq} to any existing string, except when the result is empty
-(since empty strings are canonicalized to save space).
-
-In Emacs versions before 21, when an argument was an integer (not a
-sequence of integers), it was converted to a string of digits making up
-the decimal printed representation of the integer.  This obsolete usage
-no longer works.  The proper way to convert an integer to its decimal
-printed form is with @code{format} (@pxref{Formatting Strings}) or
-@code{number-to-string} (@pxref{String Conversion}).
+This function always constructs a new string that is not @code{eq} to
+any existing string, except when the result is the empty string (to
+save space, Emacs makes only one empty multibyte string).
  
  For information about other concatenation functions, see the
  description of @code{mapconcat} in @ref{Mapping Functions},
  @code{vconcat} in @ref{Vector Functions}, and @code{append} in @ref{Building
-Lists}.
+Lists}.  For concatenating individual command-line arguments into a
+string to be used as a shell command, see @ref{Shell Arguments,
+combine-and-quote-strings}.
  @end defun
  
  @defun split-string string &optional separators omit-nulls
-This function splits @var{string} into substrings at matches for the
-regular expression @var{separators}.  Each match for @var{separators}
-defines a splitting point; the substrings between the splitting points
-are made into a list, which is the value returned by
-@code{split-string}.
+This function splits @var{string} into substrings based on the regular
+expression @var{separators} (@pxref{Regular Expressions}).  Each match
+for @var{separators} defines a splitting point; the substrings between
+splitting points are made into a list, which is returned.
  
-If @var{omit-nulls} is @code{nil}, the result contains null strings
-whenever there are two consecutive matches for @var{separators}, or a
-match is adjacent to the beginning or end of @var{string}.  If
-@var{omit-nulls} is @code{t}, these null strings are omitted from the
-result.
+If @var{omit-nulls} is @code{nil} (or omitted), the result contains
+null strings whenever there are two consecutive matches for
+@var{separators}, or a match is adjacent to the beginning or end of
+@var{string}.  If @var{omit-nulls} is @code{t}, these null strings are
+omitted from the result.
  
-If @var{separators} is @code{nil} (or omitted),
-the default is the value of @code{split-string-default-separators}.
+If @var{separators} is @code{nil} (or omitted), the default is the
+value of @code{split-string-default-separators}.
  
  As a special case, when @var{separators} is @code{nil} (or omitted),
  null strings are always omitted from the result.  Thus:
@@ -357,6 +347,10 @@ practice:
  (split-string "ooo" "\\|o+" t)
       @result{} ("o" "o" "o")
  @end example
+
+If you need to split a string that is a shell command, where
+individual arguments could be quoted, see @ref{Shell Arguments,
+split-string-and-unquote}.
  @end defun
  
  @defvar split-string-default-separators
@@ -437,9 +431,9 @@ For technical reasons, a unibyte and a multibyte string are
  @code{equal} if and only if they contain the same sequence of
  character codes and all these codes are either in the range 0 through
  127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}).
-However, when a unibyte string gets converted to a multibyte string,
-all characters with codes in the range 160 through 255 get converted
-to characters with higher codes, whereas @acronym{ASCII} characters
+However, when a unibyte string is converted to a multibyte string, all
+characters with codes in the range 160 through 255 are converted to
+characters with higher codes, whereas @acronym{ASCII} characters
  remain unchanged.  Thus, a unibyte string and its conversion to
  multibyte are only @code{equal} if the string is all @acronym{ASCII}.
  Character codes 160 through 255 are not entirely proper in multibyte
@@ -545,7 +539,7 @@ be a list of strings or symbols rather than an actual alist.
  @xref{Association Lists}.
  @end defun
  
-  See also the @code{compare-buffer-substrings} function in
+  See also the function @code{compare-buffer-substrings} in
  @ref{Comparing Text}, for a way to compare text in buffers.  The
  function @code{string-match}, which matches a regular expression
  against a string, can be used for a kind of string comparison; see
@@ -556,52 +550,20 @@ against a string, can be used for a kind of string comparison; see
  @section Conversion of Characters and Strings
  @cindex conversion of strings
  
-  This section describes functions for conversions between characters,
-strings and integers.  @code{format} (@pxref{Formatting Strings})
-and @code{prin1-to-string}
-(@pxref{Output Functions}) can also convert Lisp objects into strings.
-@code{read-from-string} (@pxref{Input Functions}) can ``convert'' a
-string representation of a Lisp object into an object.  The functions
-@code{string-make-multibyte} and @code{string-make-unibyte} convert the
-text representation of a string (@pxref{Converting Representations}).
+  This section describes functions for converting between characters,
+strings and integers.  @code{format} (@pxref{Formatting Strings}) and
+@code{prin1-to-string} (@pxref{Output Functions}) can also convert
+Lisp objects into strings.  @code{read-from-string} (@pxref{Input
+Functions}) can ``convert'' a string representation of a Lisp object
+into an object.  The functions @code{string-make-multibyte} and
+@code{string-make-unibyte} convert the text representation of a string
+(@pxref{Converting Representations}).
  
    @xref{Documentation}, for functions that produce textual descriptions
  of text characters and general input events
  (@code{single-key-description} and @code{text-char-description}).  These
  are used primarily for making help messages.
  
-@defun char-to-string character
-@cindex character to string
-This function returns a new string containing one character,
-@var{character}.  This function is semi-obsolete because the function
-@code{string} is more general.  @xref{Creating Strings}.
-@end defun
-
-@defun string-to-char string
-@cindex string to character
-  This function returns the first character in @var{string}.  If the
-string is empty, the function returns 0.  The value is also 0 when the
-first character of @var{string} is the null character, @acronym{ASCII} code
-0.
-
-@example
-(string-to-char "ABC")
-     @result{} 65
-
-(string-to-char "xyz")
-     @result{} 120
-(string-to-char "")
-     @result{} 0
-@group
-(string-to-char "\000")
-     @result{} 0
-@end group
-@end example
-
-This function may be eliminated in the future if it does not seem useful
-enough to retain.
-@end defun
-
  @defun number-to-string number
  @cindex integer to string
  @cindex integer to decimal
@@ -662,21 +624,41 @@ this function returns 0.
  
  @findex string-to-int
  @code{string-to-int} is an obsolete alias for this function.
+@end defun
+
+@defun char-to-string character
+@cindex character to string
+This function returns a new string containing one character,
+@var{character}.  This function is semi-obsolete because the function
+@code{string} is more general.  @xref{Creating Strings}.
+@end defun
+
+@defun string-to-char string
+  This function returns the first character in @var{string}.  This
+mostly identical to @code{(aref string 0)}, except that it returns 0
+if the string is empty.  (The value is also 0 when the first character
+of @var{string} is the null character, @acronym{ASCII} code 0.)  This
+function may be eliminated in the future if it does not seem useful
+enough to retain.
  @end defun
  
    Here are some other functions that can convert to or from a string:
  
  @table @code
  @item concat
-@code{concat} can convert a vector or a list into a string.
+This function converts a vector or a list into a string.
  @xref{Creating Strings}.
  
  @item vconcat
-@code{vconcat} can convert a string into a vector.  @xref{Vector
+This function converts a string into a vector.  @xref{Vector
  Functions}.
  
  @item append
-@code{append} can convert a string into a list.  @xref{Building Lists}.
+This function converts a string into a list.  @xref{Building Lists}.
+
+@item byte-to-string
+This function converts a byte of character data into a unibyte string.
+@xref{Converting Representations}.
  @end table
  
  @node Formatting Strings
@@ -685,10 +667,10 @@ Functions}.
  @cindex formatting strings
  @cindex strings, formatting them
  
-  @dfn{Formatting} means constructing a string by substitution of
-computed values at various places in a constant string.  This constant string
-controls how the other values are printed, as well as where they appear;
-it is called a @dfn{format string}.
+  @dfn{Formatting} means constructing a string by substituting
+computed values at various places in a constant string.  This constant
+string controls how the other values are printed, as well as where
+they appear; it is called a @dfn{format string}.
  
    Formatting is often useful for computing messages to be displayed.  In
  fact, the functions @code{message} and @code{error} provide the same
@@ -823,58 +805,80 @@ operation} error.
  
  @cindex field width
  @cindex padding
-  A specification can have a @dfn{width}, which is a signed decimal
-number between the @samp{%} and the specification character.  If the
-printed representation of the object contains fewer characters than
-this width, @code{format} extends it with padding.  The padding goes
-on the left if the width is positive (or starts with zero) and on the
-right if the width is negative.  The padding character is normally a
-space, but it's @samp{0} if the width starts with a zero.
-
-  Some of these conventions are ignored for specification characters
-for which they do not make sense.  That is, @samp{%s}, @samp{%S} and
-@samp{%c} accept a width starting with 0, but still pad with
-@emph{spaces} on the left.  Also, @samp{%%} accepts a width, but
-ignores it.  Here are some examples of padding:
+  A specification can have a @dfn{width}, which is a decimal number
+between the @samp{%} and the specification character.  If the printed
+representation of the object contains fewer characters than this
+width, @code{format} extends it with padding.  The width specifier is
+ignored for the @samp{%%} specification.  Any padding introduced by
+the width specifier normally consists of spaces inserted on the left:
  
  @example
-(format "%06d is padded on the left with zeros" 123)
-     @result{} "000123 is padded on the left with zeros"
-
-(format "%-6d is padded on the right" 123)
-     @result{} "123    is padded on the right"
+(format "%5d is padded on the left with spaces" 123)
+     @result{} "  123 is padded on the left with spaces"
  @end example
  
  @noindent
  If the width is too small, @code{format} does not truncate the
  object's printed representation.  Thus, you can use a width to specify
  a minimum spacing between columns with no risk of losing information.
+In the following three examples, @samp{%7s} specifies a minimum width
+of 7.  In the first case, the string inserted in place of @samp{%7s}
+has only 3 letters, and needs 4 blank spaces as padding.  In the
+second case, the string @code{"specification"} is 13 letters wide but
+is not truncated.
  
-  In the following three examples, @samp{%7s} specifies a minimum
-width of 7.  In the first case, the string inserted in place of
-@samp{%7s} has only 3 letters, it needs 4 blank spaces as padding.  In
-the second case, the string @code{"specification"} is 13 letters wide
-but is not truncated.  In the third case, the padding is on the right.
-
-@smallexample
+@example
  @group
  (format "The word `%7s' actually has %d letters in it."
          "foo" (length "foo"))
       @result{} "The word `    foo' actually has 3 letters in it."
-@end group
-
-@group
  (format "The word `%7s' actually has %d letters in it."
          "specification" (length "specification"))
       @result{} "The word `specification' actually has 13 letters in it."
  @end group
+@end example
+
+@cindex flags in format specifications
+  Immediately after the @samp{%} and before the optional width
+specifier, you can also put certain @dfn{flag characters}.
+
+  The flag @samp{+} inserts a plus sign before a positive number, so
+that it always has a sign.  A space character as flag inserts a space
+before a positive number.  (Otherwise, positive numbers start with the
+first digit.)  These flags are useful for ensuring that positive
+numbers and negative numbers use the same number of columns.  They are
+ignored except for @samp{%d}, @samp{%e}, @samp{%f}, @samp{%g}, and if
+both flags are used, @samp{+} takes precedence.
+
+  The flag @samp{#} specifies an ``alternate form'' which depends on
+the format in use.  For @samp{%o}, it ensures that the result begins
+with a @samp{0}.  For @samp{%x} and @samp{%X}, it prefixes the result
+with @samp{0x} or @samp{0X}.  For @samp{%e}, @samp{%f}, and @samp{%g},
+the @samp{#} flag means include a decimal point even if the precision
+is zero.
  
+  The flag @samp{-} causes the padding inserted by the width
+specifier, if any, to be inserted on the right rather than the left.
+The flag @samp{0} ensures that the padding consists of @samp{0}
+characters instead of spaces, inserted on the left.  These flags are
+ignored for specification characters for which they do not make sense:
+@samp{%s}, @samp{%S} and @samp{%c} accept the @samp{0} flag, but still
+pad with @emph{spaces} on the left.  If both @samp{-} and @samp{0} are
+present and valid, @samp{-} takes precedence.
+
+@example
  @group
+(format "%06d is padded on the left with zeros" 123)
+     @result{} "000123 is padded on the left with zeros"
+
+(format "%-6d is padded on the right" 123)
+     @result{} "123    is padded on the right"
+
  (format "The word `%-7s' actually has %d letters in it."
          "foo" (length "foo"))
       @result{} "The word `foo    ' actually has 3 letters in it."
  @end group
-@end smallexample
+@end example
  
  @cindex precision in format specifications
    All the specification characters allow an optional @dfn{precision}
@@ -888,25 +892,6 @@ shows only the first three characters of the representation for
  @var{object}.  Precision has no effect for other specification
  characters.
  
-@cindex flags in format specifications
-  Immediately after the @samp{%} and before the optional width and
-precision, you can put certain ``flag'' characters.
-
-  @samp{+} as a flag inserts a plus sign before a positive number, so
-that it always has a sign.  A space character as flag inserts a space
-before a positive number.  (Otherwise, positive numbers start with the
-first digit.)  Either of these two flags ensures that positive numbers
-and negative numbers use the same number of columns.  These flags are
-ignored except for @samp{%d}, @samp{%e}, @samp{%f}, @samp{%g}, and if
-both flags are used, the @samp{+} takes precedence.
-
-  The flag @samp{#} specifies an ``alternate form'' which depends on
-the format in use.  For @samp{%o} it ensures that the result begins
-with a @samp{0}.  For @samp{%x} and @samp{%X}, it prefixes the result
-with @samp{0x} or @samp{0X}.  For @samp{%e}, @samp{%f}, and @samp{%g},
-the @samp{#} flag means include a decimal point even if the precision
-is zero.
-
  @node Case Conversion
  @comment node-name, next, previous, up
  @section Case Conversion in Lisp
@@ -929,15 +914,15 @@ arguments.
  @acronym{ASCII} codes 88 and 120 respectively.
  
  @defun downcase string-or-char
-This function converts a character or a string to lower case.
+This function converts @var{string-or-char}, which should be either a
+character or a string, to lower case.
  
-When the argument to @code{downcase} is a string, the function creates
-and returns a new string in which each letter in the argument that is
-upper case is converted to lower case.  When the argument to
-@code{downcase} is a character, @code{downcase} returns the
-corresponding lower case character.  This value is an integer.  If the
-original character is lower case, or is not a letter, then the value
-equals the original character.
+When @var{string-or-char} is a string, this function returns a new
+string in which each letter in the argument that is upper case is
+converted to lower case.  When @var{string-or-char} is a character,
+this function returns the corresponding lower case character (an
+integer); if the original character is lower case, or is not a letter,
+the return value is equal to the original character.
  
  @example
  (downcase "The cat in the hat")
@@ -949,16 +934,15 @@ equals the original character.
  @end defun
  
  @defun upcase string-or-char
-This function converts a character or a string to upper case.
-
-When the argument to @code{upcase} is a string, the function creates
-and returns a new string in which each letter in the argument that is
-lower case is converted to upper case.
+This function converts @var{string-or-char}, which should be either a
+character or a string, to upper case.
  
-When the argument to @code{upcase} is a character, @code{upcase}
-returns the corresponding upper case character.  This value is an integer.
-If the original character is upper case, or is not a letter, then the
-value returned equals the original character.
+When @var{string-or-char} is a string, this function returns a new
+string in which each letter in the argument that is lower case is
+converted to upper case.  When @var{string-or-char} is a character,
+this function returns the corresponding upper case character (an
+integer); if the original character is upper case, or is not a letter,
+the return value is equal to the original character.
  
  @example
  (upcase "The cat in the hat")
@@ -972,9 +956,9 @@ value returned equals the original character.
  @defun capitalize string-or-char
  @cindex capitalization
  This function capitalizes strings or characters.  If
-@var{string-or-char} is a string, the function creates and returns a new
-string, whose contents are a copy of @var{string-or-char} in which each
-word has been capitalized.  This means that the first character of each
+@var{string-or-char} is a string, the function returns a new string
+whose contents are a copy of @var{string-or-char} in which each word
+has been capitalized.  This means that the first character of each
  word is converted to upper case, and the rest are converted to lower
  case.
  
@@ -982,8 +966,8 @@ The definition of a word is any sequence of consecutive characters that
  are assigned to the word constituent syntax class in the current syntax
  table (@pxref{Syntax Class Table}).
  
-When the argument to @code{capitalize} is a character, @code{capitalize}
-has the same result as @code{upcase}.
+When @var{string-or-char} is a character, this function does the same
+thing as @code{upcase}.
  
  @example
  @group
@@ -1077,13 +1061,13 @@ equivalent).  (For ordinary @acronym{ASCII}, this would map @samp{a} into
  @samp{A} and @samp{A} into @samp{a}, and likewise for each set of
  equivalent characters.)
  
-  When you construct a case table, you can provide @code{nil} for
+  When constructing a case table, you can provide @code{nil} for
  @var{canonicalize}; then Emacs fills in this slot from the lower case
  and upper case mappings.  You can also provide @code{nil} for
  @var{equivalences}; then Emacs fills in this slot from
  @var{canonicalize}.  In a case table that is actually in use, those
-components are non-@code{nil}.  Do not try to specify @var{equivalences}
-without also specifying @var{canonicalize}.
+components are non-@code{nil}.  Do not try to specify
+@var{equivalences} without also specifying @var{canonicalize}.
  
    Here are the functions for working with case tables:
  
@@ -1118,7 +1102,7 @@ of an abnormal exit via @code{throw} or error (@pxref{Nonlocal
  Exits}).
  @end defmac
  
-  Some language environments may modify the case conversions of
+  Some language environments modify the case conversions of
  @acronym{ASCII} characters; for example, in the Turkish language
  environment, the @acronym{ASCII} character @samp{I} is downcased into
  a Turkish ``dotless i''.  This can interfere with code that requires