X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/a9ab721e7fdbedd15e520f74c9c90154a8c36669..3df14aa6898dce9eeb8af1f1e35dac7924bdbaac:/doc/lispref/strings.texi diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi index 2b8911277c..e6b00f06f7 100644 --- a/doc/lispref/strings.texi +++ b/doc/lispref/strings.texi @@ -1,11 +1,9 @@ @c -*-texinfo-*- @c This is part of the GNU Emacs Lisp Reference Manual. -@c Copyright (C) 1990-1995, 1998-1999, 2001-2011 -@c Free Software Foundation, Inc. +@c Copyright (C) 1990-1995, 1998-1999, 2001-2014 Free Software +@c Foundation, Inc. @c See the file elisp.texi for copying conditions. -@setfilename ../../info/strings -@node Strings and Characters, Lists, Numbers, Top -@comment node-name, next, previous, up +@node Strings and Characters @chapter Strings and Characters @cindex strings @cindex character arrays @@ -37,26 +35,31 @@ keyboard character events. @node String Basics @section String and Character Basics - Characters are represented in Emacs Lisp as integers; -whether an integer is a character or not is determined only by how it is -used. Thus, strings really contain integers. @xref{Character Codes}, -for details about character representation in Emacs. + A character is a Lisp object which represents a single character of +text. In Emacs Lisp, characters are simply integers; whether an +integer is a character or not is determined only by how it is used. +@xref{Character Codes}, for details about character representation in +Emacs. - The length of a string (like any array) is fixed, and cannot be -altered once the string exists. Strings in Lisp are @emph{not} -terminated by a distinguished character code. (By contrast, strings in -C are terminated by a character with @acronym{ASCII} code 0.) + A string is a fixed sequence of characters. It is a type of +sequence called a @dfn{array}, meaning that its length is fixed and +cannot be altered once it is created (@pxref{Sequences Arrays +Vectors}). Unlike in C, Emacs Lisp strings are @emph{not} terminated +by a distinguished character code. Since strings are arrays, and therefore sequences as well, you can -operate on them with the general array and sequence functions. -(@xref{Sequences Arrays Vectors}.) For example, you can access or +operate on them with the general array and sequence functions documented +in @ref{Sequences Arrays Vectors}. For example, you can access or change individual characters in a string using the functions @code{aref} -and @code{aset} (@pxref{Array Functions}). +and @code{aset} (@pxref{Array Functions}). However, note that +@code{length} should @emph{not} be used for computing the width of a +string on display; use @code{string-width} (@pxref{Size of Displayed +Text}) instead. - There are two text representations for non-@acronym{ASCII} characters in -Emacs strings (and in buffers): unibyte and multibyte (@pxref{Text -Representations}). For most Lisp programming, you don't need to be -concerned with these two representations. + There are two text representations for non-@acronym{ASCII} +characters in Emacs strings (and in buffers): unibyte and multibyte. +For most Lisp programming, you don't need to be concerned with these +two representations. @xref{Text Representations}, for details. Sometimes key sequences are represented as unibyte strings. When a unibyte string is a key sequence, string elements in the range 128 to @@ -88,7 +91,7 @@ for information about the syntax of characters and strings. representations and to encode and decode character codes. @node Predicates for Strings -@section The Predicates for Strings +@section Predicates for Strings For more information about general sequence and array predicates, see @ref{Sequences Arrays Vectors}, and @ref{Arrays}. @@ -265,7 +268,7 @@ string to be used as a shell command, see @ref{Shell Arguments, combine-and-quote-strings}. @end defun -@defun split-string string &optional separators omit-nulls +@defun split-string string &optional separators omit-nulls trim This function splits @var{string} into substrings based on the regular expression @var{separators} (@pxref{Regular Expressions}). Each match for @var{separators} defines a splitting point; the substrings between @@ -347,6 +350,11 @@ practice: @result{} ("o" "o" "o") @end example +If the optional argument @var{trim} is non-@code{nil}, it should be a +regular expression to match text to trim from the beginning and end of +each substring. If trimming makes the substring empty, it is treated +as null. + If you need to split a string into a list of individual command-line arguments suitable for @code{call-process} or @code{start-process}, see @ref{Shell Arguments, split-string-and-unquote}. @@ -410,8 +418,15 @@ in case if @code{case-fold-search} is non-@code{nil}. @defun string= string1 string2 This function returns @code{t} if the characters of the two strings match exactly. Symbols are also allowed as arguments, in which case -their print names are used. -Case is always significant, regardless of @code{case-fold-search}. +the symbol names are used. Case is always significant, regardless of +@code{case-fold-search}. + +This function is equivalent to @code{equal} for comparing two strings +(@pxref{Equality Predicates}). In particular, the text properties of +the two strings are ignored; use @code{equal-including-properties} if +you need to distinguish between strings that differ only in their text +properties. However, unlike @code{equal}, if either argument is not a +string or symbol, @code{string=} signals an error. @example (string= "abc" "abc") @@ -422,10 +437,6 @@ Case is always significant, regardless of @code{case-fold-search}. @result{} nil @end example -The function @code{string=} ignores the text properties of the two -strings. When @code{equal} (@pxref{Equality Predicates}) compares two -strings, it uses @code{string=}. - For technical reasons, a unibyte and a multibyte string are @code{equal} if and only if they contain the same sequence of character codes and all these codes are either in the range 0 through @@ -505,26 +516,44 @@ are used. @code{string-lessp} is another name for @code{string<}. @end defun +@defun string-prefix-p string1 string2 &optional ignore-case +This function returns non-@code{nil} if @var{string1} is a prefix of +@var{string2}; i.e., if @var{string2} starts with @var{string1}. If +the optional argument @var{ignore-case} is non-@code{nil}, the +comparison ignores case differences. +@end defun + +@defun string-suffix-p suffix string &optional ignore-case +This function returns non-@code{nil} if @var{suffix} is a suffix of +@var{string}; i.e., if @var{string} ends with @var{suffix}. If the +optional argument @var{ignore-case} is non-@code{nil}, the comparison +ignores case differences. +@end defun + @defun compare-strings string1 start1 end1 string2 start2 end2 &optional ignore-case -This function compares the specified part of @var{string1} with the +This function compares a specified part of @var{string1} with a specified part of @var{string2}. The specified part of @var{string1} -runs from index @var{start1} up to index @var{end1} (@code{nil} means -the end of the string). The specified part of @var{string2} runs from -index @var{start2} up to index @var{end2} (@code{nil} means the end of -the string). - -The strings are both converted to multibyte for the comparison -(@pxref{Text Representations}) so that a unibyte string and its -conversion to multibyte are always regarded as equal. If -@var{ignore-case} is non-@code{nil}, then case is ignored, so that -upper case letters can be equal to lower case letters. +runs from index @var{start1} (inclusive) up to index @var{end1} +(exclusive); @code{nil} for @var{start1} means the start of the +string, while @code{nil} for @var{end1} means the length of the +string. Likewise, the specified part of @var{string2} runs from index +@var{start2} up to index @var{end2}. + +The strings are compared by the numeric values of their characters. +For instance, @var{str1} is considered ``smaller than'' @var{str2} if +its first differing character has a smaller numeric value. If +@var{ignore-case} is non-@code{nil}, characters are converted to +lower-case before comparing them. Unibyte strings are converted to +multibyte for comparison (@pxref{Text Representations}), so that a +unibyte string and its conversion to multibyte are always regarded as +equal. If the specified portions of the two strings match, the value is @code{t}. Otherwise, the value is an integer which indicates how many -leading characters agree, and which string is less. Its absolute value -is one plus the number of characters that agree at the beginning of the -two strings. The sign is negative if @var{string1} (or its specified -portion) is less. +leading characters agree, and which string is less. Its absolute +value is one plus the number of characters that agree at the beginning +of the two strings. The sign is negative if @var{string1} (or its +specified portion) is less. @end defun @defun assoc-string key alist &optional case-fold @@ -545,7 +574,6 @@ against a string, can be used for a kind of string comparison; see @ref{Regexp Search}. @node String Conversion -@comment node-name, next, previous, up @section Conversion of Characters and Strings @cindex conversion of strings @@ -567,9 +595,8 @@ are used primarily for making help messages. @cindex integer to string @cindex integer to decimal This function returns a string consisting of the printed base-ten -representation of @var{number}, which may be an integer or a floating -point number. The returned value starts with a minus sign if the argument is -negative. +representation of @var{number}. The returned value starts with a +minus sign if the argument is negative. @example (number-to-string 256) @@ -593,20 +620,18 @@ See also the function @code{format} in @ref{Formatting Strings}. This function returns the numeric value of the characters in @var{string}. If @var{base} is non-@code{nil}, it must be an integer between 2 and 16 (inclusive), and integers are converted in that base. -If @var{base} is @code{nil}, then base ten is used. Floating point +If @var{base} is @code{nil}, then base ten is used. Floating-point conversion only works in base ten; we have not implemented other -radices for floating point numbers, because that would be much more +radices for floating-point numbers, because that would be much more work and does not seem useful. If @var{string} looks like an integer but its value is too large to fit into a Lisp integer, -@code{string-to-number} returns a floating point result. +@code{string-to-number} returns a floating-point result. The parsing skips spaces and tabs at the beginning of @var{string}, then reads as much of @var{string} as it can interpret as a number in the given base. (On some systems it ignores other whitespace at the -beginning, not just spaces and tabs.) If the first character after -the ignored whitespace is neither a digit in the given base, nor a -plus or minus sign, nor the leading dot of a floating point number, -this function returns 0. +beginning, not just spaces and tabs.) If @var{string} cannot be +interpreted as a number, this function returns 0. @example (string-to-number "256") @@ -661,7 +686,6 @@ This function converts a byte of character data into a unibyte string. @end table @node Formatting Strings -@comment node-name, next, previous, up @section Formatting Strings @cindex formatting strings @cindex strings, formatting them @@ -764,15 +788,15 @@ integer. @samp{%x} uses lower case and @samp{%X} uses upper case. Replace the specification with the character which is the value given. @item %e -Replace the specification with the exponential notation for a floating -point number. +Replace the specification with the exponential notation for a +floating-point number. @item %f -Replace the specification with the decimal-point notation for a floating -point number. +Replace the specification with the decimal-point notation for a +floating-point number. @item %g -Replace the specification with notation for a floating point number, +Replace the specification with notation for a floating-point number, using either exponential notation or decimal-point notation, whichever is shorter. @@ -894,7 +918,6 @@ shows only the first three characters of the representation for characters. @node Case Conversion -@comment node-name, next, previous, up @section Case Conversion in Lisp @cindex upper case @cindex lower case @@ -1107,7 +1130,7 @@ Exits}). @acronym{ASCII} characters; for example, in the Turkish language environment, the @acronym{ASCII} character @samp{I} is downcased into a Turkish ``dotless i''. This can interfere with code that requires -ordinary ASCII case conversion, such as implementations of +ordinary @acronym{ASCII} case conversion, such as implementations of @acronym{ASCII}-based network protocols. In that case, use the @code{with-case-table} macro with the variable @var{ascii-case-table}, which stores the unmodified case table for the @acronym{ASCII}