X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/a10f6c69acc8ff6f268352293a4cf3cd0f4e4563..171920a66145032a61fab9458ec6104000ff8dd8:/lispref/strings.texi diff --git a/lispref/strings.texi b/lispref/strings.texi index 1ca59d81f6..d0504684f8 100644 --- a/lispref/strings.texi +++ b/lispref/strings.texi @@ -1,6 +1,7 @@ @c -*-texinfo-*- @c This is part of the GNU Emacs Lisp Reference Manual. -@c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc. +@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2003 +@c Free Software Foundation, Inc. @c See the file elisp.texi for copying conditions. @setfilename ../info/strings @node Strings and Characters, Lists, Numbers, Top @@ -13,8 +14,8 @@ A string in Emacs Lisp is an array that contains an ordered sequence of characters. Strings are used as names of symbols, buffers, and -files, to send messages to users, to hold text being copied between -buffers, and for many other purposes. Because strings are so important, +files; to send messages to users; to hold text being copied between +buffers; and for many other purposes. Because strings are so important, Emacs Lisp has many functions expressly for manipulating them. Emacs Lisp programs use strings more often than individual characters. @@ -25,57 +26,70 @@ keyboard character events. * Basics: String Basics. Basic properties of strings and characters. * Predicates for Strings:: Testing whether an object is a string or char. * Creating Strings:: Functions to allocate new strings. +* Modifying Strings:: Altering the contents of an existing string. * Text Comparison:: Comparing characters or strings. -* String Conversion:: Converting characters or strings and vice versa. -* Formatting Strings:: @code{format}: Emacs's analog of @code{printf}. -* Character Case:: Case conversion functions. -* Case Table:: Customizing case conversion. +* String Conversion:: Converting to and from characters and strings. +* Formatting Strings:: @code{format}: Emacs's analogue of @code{printf}. +* Case Conversion:: Case conversion functions. +* Case Tables:: Customizing case conversion. @end menu @node String Basics @section String and Character Basics - Strings in Emacs Lisp are arrays that contain an ordered sequence of -characters. Characters are represented in Emacs Lisp as integers; -whether an integer was intended as a character or not is determined only -by how it is used. Thus, strings really contain integers. - - The length of a string (like any array) is fixed and independent of -the string contents, and cannot be altered. Strings in Lisp are -@emph{not} terminated by a distinguished character code. (By contrast, -strings in C are terminated by a character with @sc{ASCII} code 0.) -This means that any character, including the null character (@sc{ASCII} -code 0), is a valid element of a string.@refill - - Since strings are considered arrays, you can operate on them with the -general array functions. (@xref{Sequences Arrays Vectors}.) For -example, you can access or change individual characters in a string -using the functions @code{aref} and @code{aset} (@pxref{Array -Functions}). - - Each character in a string is stored in a single byte. Therefore, -numbers not in the range 0 to 255 are truncated when stored into a -string. This means that a string takes up much less memory than a -vector of the same length. + Characters are represented in Emacs Lisp as integers; +whether an integer is a character or not is determined only by how it is +used. Thus, strings really contain integers. + + The length of a string (like any array) is fixed, and cannot be +altered once the string exists. Strings in Lisp are @emph{not} +terminated by a distinguished character code. (By contrast, strings in +C are terminated by a character with @acronym{ASCII} code 0.) + + Since strings are arrays, and therefore sequences as well, you can +operate on them with the general array and sequence functions. +(@xref{Sequences Arrays Vectors}.) For example, you can access or +change individual characters in a string using the functions @code{aref} +and @code{aset} (@pxref{Array Functions}). + + There are two text representations for non-@acronym{ASCII} characters in +Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text +Representations}). An @acronym{ASCII} character always occupies one byte in a +string; in fact, when a string is all @acronym{ASCII}, there is no real +difference between the unibyte and multibyte representations. +For most Lisp programming, you don't need to be concerned with these two +representations. Sometimes key sequences are represented as strings. When a string is a key sequence, string elements in the range 128 to 255 represent meta -characters (which are extremely large integers) rather than keyboard -events in the range 128 to 255. +characters (which are large integers) rather than character +codes in the range 128 to 255. Strings cannot hold characters that have the hyper, super or alt -modifiers; they can hold @sc{ASCII} control characters, but no other -control characters. They do not distinguish case in @sc{ASCII} control -characters. @xref{Character Type}, for more information about -representation of meta and other modifiers for keyboard input -characters. +modifiers; they can hold @acronym{ASCII} control characters, but no other +control characters. They do not distinguish case in @acronym{ASCII} control +characters. If you want to store such characters in a sequence, such as +a key sequence, you must use a vector instead of a string. +@xref{Character Type}, for more information about the representation of meta +and other modifiers for keyboard input characters. + + Strings are useful for holding regular expressions. You can also +match regular expressions against strings with @code{string-match} +(@pxref{Regexp Search}). The functions @code{match-string} +(@pxref{Simple Match Data}) and @code{replace-match} (@pxref{Replacing +Match}) are useful for decomposing and modifying strings after +matching regular expressions against them. Like a buffer, a string can contain text properties for the characters in it, as well as the characters themselves. @xref{Text Properties}. +All the Lisp primitives that copy text from strings to buffers or other +strings also copy the properties of the characters being copied. @xref{Text}, for information about functions that display strings or copy them into buffers. @xref{Character Type}, and @ref{String Type}, for information about the syntax of characters and strings. +@xref{Non-ASCII Characters}, for functions to convert between text +representations and to encode and decode character codes. @node Predicates for Strings @section The Predicates for Strings @@ -84,12 +98,12 @@ For more information about general sequence and array predicates, see @ref{Sequences Arrays Vectors}, and @ref{Arrays}. @defun stringp object - This function returns @code{t} if @var{object} is a string, @code{nil} +This function returns @code{t} if @var{object} is a string, @code{nil} otherwise. @end defun @defun char-or-string-p object - This function returns @code{t} if @var{object} is a string or a +This function returns @code{t} if @var{object} is a string or a character (i.e., an integer), @code{nil} otherwise. @end defun @@ -100,7 +114,7 @@ character (i.e., an integer), @code{nil} otherwise. putting strings together, or by taking them apart. @defun make-string count character - This function returns a string made up of @var{count} repetitions of +This function returns a string made up of @var{count} repetitions of @var{character}. If @var{count} is negative, an error is signaled. @example @@ -115,8 +129,17 @@ putting strings together, or by taking them apart. @code{make-list} (@pxref{Building Lists}). @end defun +@defun string &rest characters +This returns a string containing the characters @var{characters}. + +@example +(string ?a ?b ?c) + @result{} "abc" +@end example +@end defun + @defun substring string start &optional end - This function returns a new string which consists of those characters +This function returns a new string which consists of those characters from @var{string} in the range from (and including) the character at the index @var{start} up to (but excluding) the character at the index @var{end}. The first character is at index zero. @@ -136,7 +159,7 @@ position up to which the substring is copied. The character whose index is 3 is actually the fourth character in the string. A negative number counts from the end of the string, so that @minus{}1 -signifies the index of the last character of the string. For example: +signifies the index of the last character of the string. For example: @example @group @@ -150,7 +173,7 @@ In this example, the index for @samp{e} is @minus{}3, the index for @samp{f} is @minus{}2, and the index for @samp{g} is @minus{}1. Therefore, @samp{e} and @samp{f} are included, and @samp{g} is excluded. -When @code{nil} is used as an index, it stands for the length of the +When @code{nil} is used for @var{end}, it stands for the length of the string. Thus, @example @@ -175,10 +198,22 @@ of @var{string}. But we recommend @code{copy-sequence} for this purpose (@pxref{Sequence Functions}). -A @code{wrong-type-argument} error is signaled if either @var{start} or -@var{end} is not an integer or @code{nil}. An @code{args-out-of-range} -error is signaled if @var{start} indicates a character following -@var{end}, or if either integer is out of range for @var{string}. +If the characters copied from @var{string} have text properties, the +properties are copied into the new string also. @xref{Text Properties}. + +@code{substring} also accepts a vector for the first argument. +For example: + +@example +(substring [a b (c) "d"] 1 3) + @result{} [b (c)] +@end example + +A @code{wrong-type-argument} error is signaled if @var{start} is not +an integer or if @var{end} is neither an integer nor @code{nil}. An +@code{args-out-of-range} error is signaled if @var{start} indicates a +character following @var{end}, or if either integer is out of range +for @var{string}. Contrast this function with @code{buffer-substring} (@pxref{Buffer Contents}), which returns a string containing a portion of the text in @@ -186,18 +221,27 @@ the current buffer. The beginning of a string is at index 0, but the beginning of a buffer is at index 1. @end defun +@defun substring-no-properties string &optional start end +This works like @code{substring} but discards all text properties from +the value. Also, @var{start} may be omitted or @code{nil}, which is +equivalent to 0. Thus, @w{@code{(substring-no-properties +@var{string})}} returns a copy of @var{string}, with all text +properties removed. +@end defun + @defun concat &rest sequences @cindex copying strings @cindex concatenating strings This function returns a new string consisting of the characters in the -arguments passed to it. The arguments may be strings, lists of numbers, -or vectors of numbers; they are not themselves changed. If -@code{concat} receives no arguments, it returns an empty string. +arguments passed to it (along with their text properties, if any). The +arguments may be strings, lists of numbers, or vectors of numbers; they +are not themselves changed. If @code{concat} receives no arguments, it +returns an empty string. @example (concat "abc" "-def") @result{} "abc-def" -(concat "abc" (list 120 (+ 256 121)) [122]) +(concat "abc" (list 120 121) [122]) @result{} "abcxyz" ;; @r{@code{nil} is an empty sequence.} (concat "abc" nil "-def") @@ -209,36 +253,144 @@ or vectors of numbers; they are not themselves changed. If @end example @noindent -The second example above shows how characters stored in strings are -taken modulo 256. In other words, each character in the string is -stored in one byte. - The @code{concat} function always constructs a new string that is not @code{eq} to any existing string. -When an argument is an integer (not a sequence of integers), it is -converted to a string of digits making up the decimal printed -representation of the integer. @string{Don't use this feature---it -exists for historical compatibility only, and we plan to change it by -and by.} If you wish to convert an integer to a decimal number in this -way, use @code{format} (@pxref{Formatting Strings}) or +In Emacs versions before 21, when an argument was an integer (not a +sequence of integers), it was converted to a string of digits making up +the decimal printed representation of the integer. This obsolete usage +no longer works. The proper way to convert an integer to its decimal +printed form is with @code{format} (@pxref{Formatting Strings}) or @code{number-to-string} (@pxref{String Conversion}). -@example -@group -(concat 137) - @result{} "137" -(concat 54 321) - @result{} "54321" -@end group -@end example - For information about other concatenation functions, see the description of @code{mapconcat} in @ref{Mapping Functions}, -@code{vconcat} in @ref{Vectors}, and @code{append} in @ref{Building +@code{vconcat} in @ref{Vector Functions}, and @code{append} in @ref{Building Lists}. @end defun +@defun split-string string &optional separators omit-nulls +This function splits @var{string} into substrings at matches for the +regular expression @var{separators}. Each match for @var{separators} +defines a splitting point; the substrings between the splitting points +are made into a list, which is the value returned by +@code{split-string}. + +If @var{omit-nulls} is @code{nil}, the result contains null strings +whenever there are two consecutive matches for @var{separators}, or a +match is adjacent to the beginning or end of @var{string}. If +@var{omit-nulls} is @code{t}, these null strings are omitted from the +result list. + +If @var{separators} is @code{nil} (or omitted), +the default is the value of @code{split-string-default-separators}. + +As a special case, when @var{separators} is @code{nil} (or omitted), +null strings are always omitted from the result. Thus: + +@example +(split-string " two words ") + @result{} ("two" "words") +@end example + +The result is not @samp{("" "two" "words" "")}, which would rarely be +useful. If you need such a result, use an explicit value for +@var{separators}: + +@example +(split-string " two words " + split-string-default-separators) + @result{} ("" "two" "words" "") +@end example + +More examples: + +@example +(split-string "Soup is good food" "o") + @result{} ("S" "up is g" "" "d f" "" "d") +(split-string "Soup is good food" "o" t) + @result{} ("S" "up is g" "d f" "d") +(split-string "Soup is good food" "o+") + @result{} ("S" "up is g" "d f" "d") +@end example + +Empty matches do count, except that @code{split-string} will not look +for a final empty match when it already reached the end of the string +using a non-empty match or when @var{string} is empty: + +@example +(split-string "aooob" "o*") + @result{} ("" "a" "" "b" "") +(split-string "ooaboo" "o*") + @result{} ("" "" "a" "b" "") +(split-string "" "") + @result{} ("") +@end example + +However, when @var{separators} can match the empty string, +@var{omit-nulls} is usually @code{t}, so that the subtleties in the +three previous examples are rarely relevant: + +@example +(split-string "Soup is good food" "o*" t) + @result{} ("S" "u" "p" " " "i" "s" " " "g" "d" " " "f" "d") +(split-string "Nice doggy!" "" t) + @result{} ("N" "i" "c" "e" " " "d" "o" "g" "g" "y" "!") +(split-string "" "" t) + @result{} nil +@end example + +Somewhat odd, but predictable, behavior can occur for certain +``non-greedy'' values of @var{separators} that can prefer empty +matches over non-empty matches. Again, such values rarely occur in +practice: + +@example +(split-string "ooo" "o*" t) + @result{} nil +(split-string "ooo" "\\|o+" t) + @result{} ("o" "o" "o") +@end example +@end defun + +@defvar split-string-default-separators +The default value of @var{separators} for @code{split-string}. Its +usual value is @w{@samp{"[ \f\t\n\r\v]+"}}. +@end defvar + +@node Modifying Strings +@section Modifying Strings + + The most basic way to alter the contents of an existing string is with +@code{aset} (@pxref{Array Functions}). @code{(aset @var{string} +@var{idx} @var{char})} stores @var{char} into @var{string} at index +@var{idx}. Each character occupies one or more bytes, and if @var{char} +needs a different number of bytes from the character already present at +that index, @code{aset} signals an error. + + A more powerful function is @code{store-substring}: + +@defun store-substring string idx obj +This function alters part of the contents of the string @var{string}, by +storing @var{obj} starting at index @var{idx}. The argument @var{obj} +may be either a character or a (smaller) string. + +Since it is impossible to change the length of an existing string, it is +an error if @var{obj} doesn't fit within @var{string}'s actual length, +or if any new character requires a different number of bytes from the +character currently present at that point in @var{string}. +@end defun + + To clear out a string that contained a password, use +@code{clear-string}: + +@defun clear-string string +This clears the contents of @var{string} to zeros. +It may also change @var{string}'s length and convert it to +a unibyte string. +@end defun + +@need 2000 @node Text Comparison @section Comparison of Characters and Strings @cindex string equality @@ -251,16 +403,17 @@ in case if @code{case-fold-search} is non-@code{nil}. @example (char-equal ?x ?x) @result{} t -(char-to-string (+ 256 ?x)) - @result{} "x" -(char-equal ?x (+ 256 ?x)) - @result{} t +(let ((case-fold-search nil)) + (char-equal ?x ?X)) + @result{} nil @end example @end defun @defun string= string1 string2 This function returns @code{t} if the characters of the two strings -match exactly; case is significant. +match exactly. Symbols are also allowed as arguments, in which case +their print names are used. +Case is always significant, regardless of @code{case-fold-search}. @example (string= "abc" "abc") @@ -270,6 +423,26 @@ match exactly; case is significant. (string= "ab" "ABC") @result{} nil @end example + +The function @code{string=} ignores the text properties of the two +strings. When @code{equal} (@pxref{Equality Predicates}) compares two +strings, it uses @code{string=}. + +For technical reasons, a unibyte and a multibyte string are +@code{equal} if and only if they contain the same sequence of +character codes and all these codes are either in the range 0 through +127 (@acronym{ASCII}) or 160 through 255 (@code{eight-bit-graphic}). +However, when a unibyte string gets converted to a multibyte string, +all characters with codes in the range 160 through 255 get converted +to characters with higher codes, whereas @acronym{ASCII} characters +remain unchanged. Thus, a unibyte string and its conversion to +multibyte are only @code{equal} if the string is all @acronym{ASCII}. +Character codes 160 through 255 are not entirely proper in multibyte +text, even though they can occur. As a consequence, the situation +where a unibyte and a multibyte string are @code{equal} without both +being all @acronym{ASCII} is a technical oddity that very few Emacs +Lisp programmers ever get confronted with. @xref{Text +Representations}. @end defun @defun string-equal string1 string2 @@ -279,19 +452,21 @@ match exactly; case is significant. @cindex lexical comparison @defun string< string1 string2 @c (findex string< causes problems for permuted index!!) -This function compares two strings a character at a time. First it -scans both the strings at once to find the first pair of corresponding -characters that do not match. If the lesser character of those two is +This function compares two strings a character at a time. It +scans both the strings at the same time to find the first pair of corresponding +characters that do not match. If the lesser character of these two is the character from @var{string1}, then @var{string1} is less, and this function returns @code{t}. If the lesser character is the one from @var{string2}, then @var{string1} is greater, and this function returns @code{nil}. If the two strings match entirely, the value is @code{nil}. -Pairs of characters are compared by their @sc{ASCII} codes. Keep in -mind that lower case letters have higher numeric values in the -@sc{ASCII} character set than their upper case counterparts; numbers and +Pairs of characters are compared according to their character codes. +Keep in mind that lower case letters have higher numeric values in the +@acronym{ASCII} character set than their upper case counterparts; digits and many punctuation characters have a lower numeric value than upper case -letters. +letters. An @acronym{ASCII} character is less than any non-@acronym{ASCII} +character; a unibyte non-@acronym{ASCII} character is always less than any +multibyte non-@acronym{ASCII} character (@pxref{Text Representations}). @example @group @@ -320,13 +495,48 @@ no characters is less than any other string. (string< "abc" "ab") @result{} nil (string< "" "") - @result{} nil + @result{} nil @end group @end example + +Symbols are also allowed as arguments, in which case their print names +are used. @end defun @defun string-lessp string1 string2 @code{string-lessp} is another name for @code{string<}. +@end defun + +@defun compare-strings string1 start1 end1 string2 start2 end2 &optional ignore-case +This function compares the specified part of @var{string1} with the +specified part of @var{string2}. The specified part of @var{string1} +runs from index @var{start1} up to index @var{end1} (@code{nil} means +the end of the string). The specified part of @var{string2} runs from +index @var{start2} up to index @var{end2} (@code{nil} means the end of +the string). + +The strings are both converted to multibyte for the comparison +(@pxref{Text Representations}) so that a unibyte string and its +conversion to multibyte are always regarded as equal. If +@var{ignore-case} is non-@code{nil}, then case is ignored, so that +upper case letters can be equal to lower case letters. + +If the specified portions of the two strings match, the value is +@code{t}. Otherwise, the value is an integer which indicates how many +leading characters agree, and which string is less. Its absolute value +is one plus the number of characters that agree at the beginning of the +two strings. The sign is negative if @var{string1} (or its specified +portion) is less. +@end defun + +@defun assoc-string key alist &optional case-fold +This function works like @code{assoc}, except that @var{key} must be a +string, and comparison is done using @code{compare-strings}. If +@var{case-fold} is non-@code{nil}, it ignores case differences. +Unlike @code{assoc}, this function can also match elements of the alist +that are strings rather than conses. In particular, @var{alist} can +be a list of strings rather than an actual alist. +@xref{Association Lists}. @end defun See also @code{compare-buffer-substrings} in @ref{Comparing Text}, for @@ -340,10 +550,13 @@ for a kind of string comparison; see @ref{Regexp Search}. @cindex conversion of strings This section describes functions for conversions between characters, -strings and integers. @code{format} and @code{prin1-to-string} +strings and integers. @code{format} (@pxref{Formatting Strings}) +and @code{prin1-to-string} (@pxref{Output Functions}) can also convert Lisp objects into strings. @code{read-from-string} (@pxref{Input Functions}) can ``convert'' a -string representation of a Lisp object into an object. +string representation of a Lisp object into an object. The functions +@code{string-make-multibyte} and @code{string-make-unibyte} convert the +text representation of a string (@pxref{Converting Representations}). @xref{Documentation}, for functions that produce textual descriptions of text characters and general input events @@ -352,30 +565,16 @@ functions are used primarily for making help messages. @defun char-to-string character @cindex character to string - This function returns a new string with a length of one character. -The value of @var{character}, modulo 256, is used to initialize the -element of the string. - -This function is similar to @code{make-string} with an integer argument -of 1. (@xref{Creating Strings}.) This conversion can also be done with -@code{format} using the @samp{%c} format specification. -(@xref{Formatting Strings}.) - -@example -(char-to-string ?x) - @result{} "x" -(char-to-string (+ 256 ?x)) - @result{} "x" -(make-string 1 ?x) - @result{} "x" -@end example +This function returns a new string containing one character, +@var{character}. This function is semi-obsolete because the function +@code{string} is more general. @xref{Creating Strings}. @end defun @defun string-to-char string @cindex string to character This function returns the first character in @var{string}. If the string is empty, the function returns 0. The value is also 0 when the -first character of @var{string} is the null character, @sc{ASCII} code +first character of @var{string} is the null character, @acronym{ASCII} code 0. @example @@ -385,8 +584,10 @@ first character of @var{string} is the null character, @sc{ASCII} code @result{} 120 (string-to-char "") @result{} 0 +@group (string-to-char "\000") @result{} 0 +@end group @end example This function may be eliminated in the future if it does not seem useful @@ -396,16 +597,18 @@ enough to retain. @defun number-to-string number @cindex integer to string @cindex integer to decimal -This function returns a string consisting of the printed +This function returns a string consisting of the printed base-ten representation of @var{number}, which may be an integer or a floating -point number. The value starts with a sign if the argument is +point number. The returned value starts with a minus sign if the argument is negative. @example (number-to-string 256) @result{} "256" +@group (number-to-string -23) @result{} "-23" +@end group (number-to-string -23.5) @result{} "-23.5" @end example @@ -416,15 +619,25 @@ negative. See also the function @code{format} in @ref{Formatting Strings}. @end defun -@defun string-to-number string +@defun string-to-number string &optional base @cindex string to number This function returns the numeric value of the characters in -@var{string}, read in base ten. It skips spaces and tabs at the -beginning of @var{string}, then reads as much of @var{string} as it can -interpret as a number. (On some systems it ignores other whitespace at -the beginning, not just spaces and tabs.) If the first character after -the ignored whitespace is not a digit or a minus sign, this function -returns 0. +@var{string}. If @var{base} is non-@code{nil}, it must be an integer +between 2 and 16 (inclusive), and integers are converted in that base. +If @var{base} is @code{nil}, then base ten is used. Floating point +conversion only works in base ten; we have not implemented other +radices for floating point numbers, because that would be much more +work and does not seem useful. If @var{string} looks like an integer +but its value is too large to fit into a Lisp integer, +@code{string-to-number} returns a floating point result. + +The parsing skips spaces and tabs at the beginning of @var{string}, +then reads as much of @var{string} as it can interpret as a number in +the given base. (On some systems it ignores other whitespace at the +beginning, not just spaces and tabs.) If the first character after +the ignored whitespace is neither a digit in the given base, nor a +plus or minus sign, nor the leading dot of a floating point number, +this function returns 0. @example (string-to-number "256") @@ -435,12 +648,29 @@ returns 0. @result{} 0 (string-to-number "-4.5") @result{} -4.5 +(string-to-number "1e5") + @result{} 100000.0 @end example @findex string-to-int @code{string-to-int} is an obsolete alias for this function. @end defun + Here are some other functions that can convert to or from a string: + +@table @code +@item concat +@code{concat} can convert a vector or a list into a string. +@xref{Creating Strings}. + +@item vconcat +@code{vconcat} can convert a string into a vector. @xref{Vector +Functions}. + +@item append +@code{append} can convert a string into a list. @xref{Building Lists}. +@end table + @node Formatting Strings @comment node-name, next, previous, up @section Formatting Strings @@ -448,8 +678,8 @@ returns 0. @cindex strings, formatting them @dfn{Formatting} means constructing a string by substitution of -computed values at various places in a constant string. This string -controls how the other values are printed as well as where they appear; +computed values at various places in a constant string. This constant string +controls how the other values are printed, as well as where they appear; it is called a @dfn{format string}. Formatting is often useful for computing messages to be displayed. In @@ -458,10 +688,14 @@ formatting feature described here; they differ from @code{format} only in how they use the result of formatting. @defun format string &rest objects - This function returns a new string that is made by copying -@var{string} and then replacing any format specification +This function returns a new string that is made by copying +@var{string} and then replacing any format specification in the copy with encodings of the corresponding @var{objects}. The arguments @var{objects} are the computed values to be formatted. + +The characters in @var{string}, other than the format specifications, +are copied directly into the output; if they have text properties, +these are copied into the output also. @end defun @cindex @samp{%} in format @@ -480,35 +714,36 @@ For example: @end example If @var{string} contains more than one format specification, the -format specifications correspond with successive values from +format specifications correspond to successive values from @var{objects}. Thus, the first format specification in @var{string} uses the first such value, the second format specification uses the second such value, and so on. Any extra format specifications (those -for which there are no corresponding values) cause unpredictable -behavior. Any extra values to be formatted are ignored. +for which there are no corresponding values) cause an error. Any +extra values to be formatted are ignored. - Certain format specifications require values of particular types. -However, no error is signaled if the value actually supplied fails to -have the expected type. Instead, the output is likely to be -meaningless. + Certain format specifications require values of particular types. If +you supply a value that doesn't fit the requirements, an error is +signaled. Here is a table of valid format specifications: @table @samp @item %s Replace the specification with the printed representation of the object, -made without quoting. Thus, strings are represented by their contents -alone, with no @samp{"} characters, and symbols appear without @samp{\} -characters. +made without quoting (that is, using @code{princ}, not +@code{prin1}---@pxref{Output Functions}). Thus, strings are represented +by their contents alone, with no @samp{"} characters, and symbols appear +without @samp{\} characters. -If there is no corresponding object, the empty string is used. +If the object is a string, its text properties are +copied into the output. The text properties of the @samp{%s} itself +are also copied, but those of the object take priority. @item %S Replace the specification with the printed representation of the object, -made with quoting. Thus, strings are enclosed in @samp{"} characters, -and @samp{\} characters appear where necessary before special characters. - -If there is no corresponding object, the empty string is used. +made with quoting (that is, using @code{prin1}---@pxref{Output +Functions}). Thus, strings are enclosed in @samp{"} characters, and +@samp{\} characters appear where necessary before special characters. @item %o @cindex integer to octal @@ -520,9 +755,10 @@ Replace the specification with the base-ten representation of an integer. @item %x +@itemx %X @cindex integer to hexadecimal Replace the specification with the base-sixteen representation of an -integer. +integer. @samp{%x} uses lower case and @samp{%X} uses upper case. @item %c Replace the specification with the character which is the value given. @@ -537,13 +773,13 @@ point number. @item %g Replace the specification with notation for a floating point number, -using either exponential notation or decimal-point notation whichever +using either exponential notation or decimal-point notation, whichever is shorter. @item %% -A single @samp{%} is placed in the string. This format specification is -unusual in that it does not use a value. For example, @code{(format "%% -%d" 30)} returns @code{"% 30"}. +Replace the specification with a single @samp{%}. This format +specification is unusual in that it does not use a value. For example, +@code{(format "%% %d" 30)} returns @code{"% 30"}. @end table Any other format character results in an @samp{Invalid format @@ -557,26 +793,29 @@ operation} error. @result{} "The name of this buffer is strings.texi." (format "The buffer object prints as %s." (current-buffer)) - @result{} "The buffer object prints as #." + @result{} "The buffer object prints as strings.texi." -(format "The octal value of %d is %o, +(format "The octal value of %d is %o, and the hex value is %x." 18 18 18) - @result{} "The octal value of 18 is 22, + @result{} "The octal value of 18 is 22, and the hex value is 12." @end group @end example -@cindex numeric prefix @cindex field width @cindex padding - All the specification characters allow an optional numeric prefix -between the @samp{%} and the character. The optional numeric prefix -defines the minimum width for the object. If the printed representation -of the object contains fewer characters than this, then it is padded. -The padding is on the left if the prefix is positive (or starts with -zero) and on the right if the prefix is negative. The padding character -is normally a space, but if the numeric prefix starts with a zero, zeros -are used for padding. + All the specification characters allow an optional ``width'', which +is a digit-string between the @samp{%} and the character. If the +printed representation of the object contains fewer characters than +this width, then it is padded. The padding is on the left if the +width is positive (or starts with zero) and on the right if the +width is negative. The padding character is normally a space, but if +the width starts with a zero, zeros are used for padding. Some of +these conventions are ignored for specification characters for which +they do not make sense. That is, @samp{%s}, @samp{%S} and @samp{%c} +accept a width starting with 0, but still pad with @emph{spaces} on +the left. Also, @samp{%%} accepts a width, but ignores it. Here are +some examples of padding: @example (format "%06d is padded on the left with zeros" 123) @@ -586,10 +825,9 @@ are used for padding. @result{} "123 is padded on the right" @end example - @code{format} never truncates an object's printed representation, no -matter what width you specify. Thus, you can use a numeric prefix to -specify a minimum spacing between columns with no risk of losing -information. +If the width is too small, @code{format} does not truncate the +object's printed representation. Thus, you can use a width to specify +a minimum spacing between columns with no risk of losing information. In the following three examples, @samp{%7s} specifies a minimum width of 7. In the first case, the string inserted in place of @samp{%7s} has @@ -597,41 +835,72 @@ only 3 letters, so 4 blank spaces are inserted for padding. In the second case, the string @code{"specification"} is 13 letters wide but is not truncated. In the third case, the padding is on the right. -@smallexample +@smallexample @group (format "The word `%7s' actually has %d letters in it." "foo" (length "foo")) - @result{} "The word ` foo' actually has 3 letters in it." + @result{} "The word ` foo' actually has 3 letters in it." @end group @group (format "The word `%7s' actually has %d letters in it." - "specification" (length "specification")) - @result{} "The word `specification' actually has 13 letters in it." + "specification" (length "specification")) + @result{} "The word `specification' actually has 13 letters in it." @end group @group (format "The word `%-7s' actually has %d letters in it." "foo" (length "foo")) - @result{} "The word `foo ' actually has 3 letters in it." + @result{} "The word `foo ' actually has 3 letters in it." @end group @end smallexample -@node Character Case -@comment node-name, next, previous, up -@section Character Case -@cindex upper case -@cindex lower case -@cindex character case +@cindex precision in format specifications + All the specification characters allow an optional ``precision'' +before the character (after the width, if present). The precision is +a decimal-point @samp{.} followed by a digit-string. For the +floating-point specifications (@samp{%e}, @samp{%f}, @samp{%g}), the +precision specifies how many decimal places to show; if zero, the +decimal-point itself is also omitted. For @samp{%s} and @samp{%S}, +the precision truncates the string to the given width, so +@samp{%.3s} shows only the first three characters of the +representation for @var{object}. Precision is ignored for other +specification characters. + +@cindex flags in format specifications +Immediately after the @samp{%} and before the optional width and +precision, you can put certain ``flag'' characters. + +A space character inserts a space for positive numbers (otherwise +nothing is inserted for positive numbers). This flag is ignored +except for @samp{%d}, @samp{%e}, @samp{%f}, @samp{%g}. + +The flag @samp{#} indicates ``alternate form''. For @samp{%o} it +ensures that the result begins with a 0. For @samp{%x} and @samp{%X} +the result is prefixed with @samp{0x} or @samp{0X}. For @samp{%e}, +@samp{%f}, and @samp{%g} a decimal point is always shown even if the +precision is zero. + +@node Case Conversion +@comment node-name, next, previous, up +@section Case Conversion in Lisp +@cindex upper case +@cindex lower case +@cindex character case +@cindex case conversion in Lisp The character case functions change the case of single characters or -of the contents of strings. The functions convert only alphabetic -characters (the letters @samp{A} through @samp{Z} and @samp{a} through -@samp{z}); other characters are not altered. The functions do not -modify the strings that are passed to them as arguments. +of the contents of strings. The functions normally convert only +alphabetic characters (the letters @samp{A} through @samp{Z} and +@samp{a} through @samp{z}, as well as non-@acronym{ASCII} letters); other +characters are not altered. You can specify a different case +conversion mapping by specifying a case table (@pxref{Case Tables}). + + These functions do not modify the strings that are passed to them as +arguments. The examples below use the characters @samp{X} and @samp{x} which have -@sc{ASCII} codes 88 and 120 respectively. +@acronym{ASCII} codes 88 and 120 respectively. @defun downcase string-or-char This function converts a character or a string to lower case. @@ -663,7 +932,7 @@ lower case is converted to upper case. When the argument to @code{upcase} is a character, @code{upcase} returns the corresponding upper case character. This value is an integer. If the original character is upper case, or is not a letter, then the -value equals the original character. +value returned equals the original character. @example (upcase "The cat in the hat") @@ -685,17 +954,21 @@ case. The definition of a word is any sequence of consecutive characters that are assigned to the word constituent syntax class in the current syntax -table (@xref{Syntax Class Table}). +table (@pxref{Syntax Class Table}). When the argument to @code{capitalize} is a character, @code{capitalize} has the same result as @code{upcase}. @example +@group (capitalize "The cat in the hat") @result{} "The Cat In The Hat" +@end group +@group (capitalize "THE 77TH-HATTED CAT") @result{} "The 77th-Hatted Cat" +@end group @group (capitalize ?x) @@ -704,57 +977,88 @@ has the same result as @code{upcase}. @end example @end defun -@node Case Table -@section The Case Table +@defun upcase-initials string-or-char +If @var{string-or-char} is a string, this function capitalizes the +initials of the words in @var{string-or-char}, without altering any +letters other than the initials. It returns a new string whose +contents are a copy of @var{string-or-char}, in which each word has +had its initial letter converted to upper case. - You can customize case conversion by installing a special @dfn{case -table}. A case table specifies the mapping between upper case and lower -case letters. It affects both the string and character case conversion -functions (see the previous section) and those that apply to text in the -buffer (@pxref{Case Changes}). You need a case table if you are using a -language which has letters other than the standard @sc{ASCII} letters. +The definition of a word is any sequence of consecutive characters that +are assigned to the word constituent syntax class in the current syntax +table (@pxref{Syntax Class Table}). - A case table is a list of this form: +When the argument to @code{upcase-initials} is a character, +@code{upcase-initials} has the same result as @code{upcase}. @example -(@var{downcase} @var{upcase} @var{canonicalize} @var{equivalences}) +@group +(upcase-initials "The CAT in the hAt") + @result{} "The CAT In The HAt" +@end group @end example +@end defun -@noindent -where each element is either @code{nil} or a string of length 256. The -element @var{downcase} says how to map each character to its lower-case -equivalent. The element @var{upcase} maps each character to its -upper-case equivalent. If lower and upper case characters are in -one-to-one correspondence, use @code{nil} for @var{upcase}; then Emacs -deduces the upcase table from @var{downcase}. + @xref{Text Comparison}, for functions that compare strings; some of +them ignore case differences, or can optionally ignore case differences. + +@node Case Tables +@section The Case Table + + You can customize case conversion by installing a special @dfn{case +table}. A case table specifies the mapping between upper case and lower +case letters. It affects both the case conversion functions for Lisp +objects (see the previous section) and those that apply to text in the +buffer (@pxref{Case Changes}). Each buffer has a case table; there is +also a standard case table which is used to initialize the case table +of new buffers. + + A case table is a char-table (@pxref{Char-Tables}) whose subtype is +@code{case-table}. This char-table maps each character into the +corresponding lower case character. It has three extra slots, which +hold related tables: + +@table @var +@item upcase +The upcase table maps each character into the corresponding upper +case character. +@item canonicalize +The canonicalize table maps all of a set of case-related characters +into a particular member of that set. +@item equivalences +The equivalences table maps each one of a set of case-related characters +into the next character in that set. +@end table + + In simple cases, all you need to specify is the mapping to lower-case; +the three related tables will be calculated automatically from that one. For some languages, upper and lower case letters are not in one-to-one correspondence. There may be two different lower case letters with the same upper case equivalent. In these cases, you need to specify the -maps for both directions. +maps for both lower case and upper case. - The element @var{canonicalize} maps each character to a canonical + The extra table @var{canonicalize} maps each character to a canonical equivalent; any two characters that are related by case-conversion have -the same canonical equivalent character. +the same canonical equivalent character. For example, since @samp{a} +and @samp{A} are related by case-conversion, they should have the same +canonical equivalent character (which should be either @samp{a} for both +of them, or @samp{A} for both of them). - The element @var{equivalences} is a map that cyclicly permutes each -equivalence class (of characters with the same canonical equivalent). -(For ordinary @sc{ASCII}, this would map @samp{a} into @samp{A} and -@samp{A} into @samp{a}, and likewise for each set of equivalent -characters.) + The extra table @var{equivalences} is a map that cyclically permutes +each equivalence class (of characters with the same canonical +equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into +@samp{A} and @samp{A} into @samp{a}, and likewise for each set of +equivalent characters.) When you construct a case table, you can provide @code{nil} for -@var{canonicalize}; then Emacs fills in this string from @var{upcase} -and @var{downcase}. You can also provide @code{nil} for -@var{equivalences}; then Emacs fills in this string from +@var{canonicalize}; then Emacs fills in this slot from the lower case +and upper case mappings. You can also provide @code{nil} for +@var{equivalences}; then Emacs fills in this slot from @var{canonicalize}. In a case table that is actually in use, those components are non-@code{nil}. Do not try to specify @var{equivalences} without also specifying @var{canonicalize}. - Each buffer has a case table. Emacs also has a @dfn{standard case -table} which is copied into each buffer when you create the buffer. -Changing the standard case table doesn't affect any existing buffers. - Here are the functions for working with case tables: @defun case-table-p object @@ -764,7 +1068,7 @@ table. @defun set-standard-case-table table This function makes @var{table} the standard case table, so that it will -apply to any buffers created subsequently. +be used in any buffers created subsequently. @end defun @defun standard-case-table @@ -780,22 +1084,22 @@ This sets the current buffer's case table to @var{table}. @end defun The following three functions are convenient subroutines for packages -that define non-@sc{ASCII} character sets. They modify a string -@var{downcase-table} provided as an argument; this should be a string to -be used as the @var{downcase} part of a case table. They also modify -the standard syntax table. @xref{Syntax Tables}. +that define non-@acronym{ASCII} character sets. They modify the specified +case table @var{case-table}; they also modify the standard syntax table. +@xref{Syntax Tables}. Normally you would use these functions to change +the standard case table. -@defun set-case-syntax-pair uc lc downcase-table +@defun set-case-syntax-pair uc lc case-table This function specifies a pair of corresponding letters, one upper case and one lower case. @end defun -@defun set-case-syntax-delims l r downcase-table +@defun set-case-syntax-delims l r case-table This function makes characters @var{l} and @var{r} a matching pair of case-invariant delimiters. @end defun -@defun set-case-syntax char syntax downcase-table +@defun set-case-syntax char syntax case-table This function makes @var{char} case-invariant, with syntax @var{syntax}. @end defun @@ -805,7 +1109,6 @@ This command displays a description of the contents of the current buffer's case table. @end deffn -@cindex ISO Latin 1 -@pindex iso-syntax -You can load the library @file{iso-syntax} to set up the standard syntax -table and define a case table for the 8-bit ISO Latin 1 character set. +@ignore + arch-tag: 700b8e95-7aa5-4b52-9eb3-8f2e1ea152b4 +@end ignore