@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2003
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2001,
+@c 2002, 2003, 2004, 2005, 2006, 2007 Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/strings
@node Strings and Characters, Lists, Numbers, Top
and other modifiers for keyboard input characters.
Strings are useful for holding regular expressions. You can also
-match regular expressions against strings (@pxref{Regexp Search}). The
-functions @code{match-string} (@pxref{Simple Match Data}) and
-@code{replace-match} (@pxref{Replacing Match}) are useful for
-decomposing and modifying strings based on regular expression matching.
+match regular expressions against strings with @code{string-match}
+(@pxref{Regexp Search}). The functions @code{match-string}
+(@pxref{Simple Match Data}) and @code{replace-match} (@pxref{Replacing
+Match}) are useful for decomposing and modifying strings after
+matching regular expressions against them.
Like a buffer, a string can contain text properties for the characters
in it, as well as the characters themselves. @xref{Text Properties}.
otherwise.
@end defun
+@defun string-or-null-p object
+This function returns @code{t} if @var{object} is a string or nil,
+@code{nil} otherwise.
+@end defun
+
@defun char-or-string-p object
This function returns @code{t} if @var{object} is a string or a
character (i.e., an integer), @code{nil} otherwise.
whenever there are two consecutive matches for @var{separators}, or a
match is adjacent to the beginning or end of @var{string}. If
@var{omit-nulls} is @code{t}, these null strings are omitted from the
-result list.
+result.
If @var{separators} is @code{nil} (or omitted),
the default is the value of @code{split-string-default-separators}.
@result{} ("two" "words")
@end example
-The result is not @samp{("" "two" "words" "")}, which would rarely be
-useful. If you need such a result, use an explict value for
+The result is not @code{("" "two" "words" "")}, which would rarely be
+useful. If you need such a result, use an explicit value for
@var{separators}:
@example
-(split-string " two words " split-string-default-separators)
+(split-string " two words "
+ split-string-default-separators)
@result{} ("" "two" "words" "")
@end example
@end defun
@defvar split-string-default-separators
-The default value of @var{separators} for @code{split-string}, initially
-@w{@samp{"[ \f\t\n\r\v]+"}}.
+The default value of @var{separators} for @code{split-string}. Its
+usual value is @w{@code{"[ \f\t\n\r\v]+"}}.
@end defvar
@node Modifying Strings
@code{clear-string}:
@defun clear-string string
-This clears the contents of @var{string} to zeros
-and may change its length.
+This makes @var{string} a unibyte string and clears its contents to
+zeros. It may also change @var{string}'s length.
@end defun
@need 2000
This function works like @code{assoc}, except that @var{key} must be a
string, and comparison is done using @code{compare-strings}. If
@var{case-fold} is non-@code{nil}, it ignores case differences.
+Unlike @code{assoc}, this function can also match elements of the alist
+that are strings rather than conses. In particular, @var{alist} can
+be a list of strings rather than an actual alist.
@xref{Association Lists}.
@end defun
- See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for
-a way to compare text in buffers. The function @code{string-match},
-which matches a regular expression against a string, can be used
-for a kind of string comparison; see @ref{Regexp Search}.
+ See also the @code{compare-buffer-substrings} function in
+@ref{Comparing Text}, for a way to compare text in buffers. The
+function @code{string-match}, which matches a regular expression
+against a string, can be used for a kind of string comparison; see
+@ref{Regexp Search}.
@node String Conversion
@comment node-name, next, previous, up
@cindex conversion of strings
This section describes functions for conversions between characters,
-strings and integers. @code{format} and @code{prin1-to-string}
+strings and integers. @code{format} (@pxref{Formatting Strings})
+and @code{prin1-to-string}
(@pxref{Output Functions}) can also convert Lisp objects into strings.
@code{read-from-string} (@pxref{Input Functions}) can ``convert'' a
string representation of a Lisp object into an object. The functions
@xref{Documentation}, for functions that produce textual descriptions
of text characters and general input events
(@code{single-key-description} and @code{text-char-description}). These
-functions are used primarily for making help messages.
+are used primarily for making help messages.
@defun char-to-string character
@cindex character to string
arguments @var{objects} are the computed values to be formatted.
The characters in @var{string}, other than the format specifications,
-are copied directly into the output; starting in Emacs 21, if they have
-text properties, these are copied into the output also.
+are copied directly into the output, including their text properties,
+if any.
@end defun
@cindex @samp{%} in format
@end group
@end example
+ Since @code{format} interprets @samp{%} characters as format
+specifications, you should @emph{never} pass an arbitrary string as
+the first argument. This is particularly true when the string is
+generated by some Lisp code. Unless the string is @emph{known} to
+never include any @samp{%} characters, pass @code{"%s"}, described
+below, as the first argument, and the string as the second, like this:
+
+@example
+ (format "%s" @var{arbitrary-string})
+@end example
+
If @var{string} contains more than one format specification, the
format specifications correspond to successive values from
@var{objects}. Thus, the first format specification in @var{string}
uses the first such value, the second format specification uses the
second such value, and so on. Any extra format specifications (those
-for which there are no corresponding values) cause unpredictable
-behavior. Any extra values to be formatted are ignored.
+for which there are no corresponding values) cause an error. Any
+extra values to be formatted are ignored.
Certain format specifications require values of particular types. If
you supply a value that doesn't fit the requirements, an error is
by their contents alone, with no @samp{"} characters, and symbols appear
without @samp{\} characters.
-Starting in Emacs 21, if the object is a string, its text properties are
+If the object is a string, its text properties are
copied into the output. The text properties of the @samp{%s} itself
are also copied, but those of the object take priority.
@end group
@end example
-@cindex numeric prefix
@cindex field width
@cindex padding
- All the specification characters allow an optional ``width'', which
+ All the specification characters allow an optional ``width,'' which
is a digit-string between the @samp{%} and the character. If the
printed representation of the object contains fewer characters than
this width, then it is padded. The padding is on the left if the
-prefix is positive (or starts with zero) and on the right if the
-prefix is negative. The padding character is normally a space, but if
+width is positive (or starts with zero) and on the right if the
+width is negative. The padding character is normally a space, but if
the width starts with a zero, zeros are used for padding. Some of
these conventions are ignored for specification characters for which
-they do not make sense. That is, %s, %S and %c accept a width
-starting with 0, but still pad with @emph{spaces} on the left. Also,
-%% accepts a width, but ignores it. Here are some examples of
-padding:
+they do not make sense. That is, @samp{%s}, @samp{%S} and @samp{%c}
+accept a width starting with 0, but still pad with @emph{spaces} on
+the left. Also, @samp{%%} accepts a width, but ignores it. Here are
+some examples of padding:
@example
(format "%06d is padded on the left with zeros" 123)
@end group
@end smallexample
+@cindex precision in format specifications
All the specification characters allow an optional ``precision''
before the character (after the width, if present). The precision is
a decimal-point @samp{.} followed by a digit-string. For the
-floating-point specifications (%e, %f, %g), the precision specifies
-how many decimal places to show; if zero, the decimal-point itself is
-also omitted. For %s and %S, the precision truncates the string to
-the given width, so @code{"%.3s"} shows only the first three
-characters of the representation for @var{object}. Precision is
-ignored for other specification characters.
-
-Immediately after the % and before the optional width and precision,
-you can put certain ``flag'' characters.
-
-A space @var{" "} inserts a space for positive numbers (otherwise
+floating-point specifications (@samp{%e}, @samp{%f}, @samp{%g}), the
+precision specifies how many decimal places to show; if zero, the
+decimal-point itself is also omitted. For @samp{%s} and @samp{%S},
+the precision truncates the string to the given width, so
+@samp{%.3s} shows only the first three characters of the
+representation for @var{object}. Precision is ignored for other
+specification characters.
+
+@cindex flags in format specifications
+Immediately after the @samp{%} and before the optional width and
+precision, you can put certain ``flag'' characters.
+
+A space character inserts a space for positive numbers (otherwise
nothing is inserted for positive numbers). This flag is ignored
-except for %d, %e, %f, %g.
+except for @samp{%d}, @samp{%e}, @samp{%f}, @samp{%g}.
-The flag @var{"#"} indicates ``alternate form''. For %o it ensures
-that the result begins with a 0. For %x and %X the result is prefixed
-with ``0x'' or ``0X''. For %e, %f, and %g a decimal point is always
-shown even if the precision is zero.
+The flag @samp{#} indicates ``alternate form.'' For @samp{%o} it
+ensures that the result begins with a 0. For @samp{%x} and @samp{%X}
+the result is prefixed with @samp{0x} or @samp{0X}. For @samp{%e},
+@samp{%f}, and @samp{%g} a decimal point is always shown even if the
+precision is zero.
@node Case Conversion
@comment node-name, next, previous, up
canonical equivalent character (which should be either @samp{a} for both
of them, or @samp{A} for both of them).
- The extra table @var{equivalences} is a map that cyclicly permutes
+ The extra table @var{equivalences} is a map that cyclically permutes
each equivalence class (of characters with the same canonical
equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into
@samp{A} and @samp{A} into @samp{a}, and likewise for each set of