@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2001,
-@c 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Free Software Foundation, Inc.
+@c Copyright (C) 1990-1995, 1998-1999, 2001-2012
+@c Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../../info/searching
@node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top
* POSIX Regexps:: Searching POSIX-style for the longest match.
* Match Data:: Finding out which part of the text matched,
after a string or regexp search.
-* Search and Replace:: Commands that loop, searching and replacing.
+* Search and Replace:: Commands that loop, searching and replacing.
* Standard Regexps:: Useful regexps for finding sentences, pages,...
@end menu
@var{string}. If successful, it sets point to the end of the occurrence
found, and returns the new value of point. If no match is found, the
value and side effects depend on @var{noerror} (see below).
-@c Emacs 19 feature
In the following example, point is initially at the beginning of the
line. Then @code{(search-forward "fox")} moves point after the last
find a match. Invalid arguments cause errors regardless of
@var{noerror}.
-If @var{repeat} is supplied (it must be a positive number), then the
-search is repeated that many times (each time starting at the end of the
-previous time's match). If these successive searches succeed, the
-function succeeds, moving point and returning its new value. Otherwise
-the search fails, with results depending on the value of
-@var{noerror}, as described above.
+If @var{repeat} is a positive number @var{n}, it serves as a repeat
+count: the search is repeated @var{n} times, each time starting at the
+end of the previous time's match. If these successive searches
+succeed, the function succeeds, moving point and returning its new
+value. Otherwise the search fails, with results depending on the
+value of @var{noerror}, as described above. If @var{repeat} is a
+negative number -@var{n}, it serves as a repeat count of @var{n} for a
+search in the opposite (backward) direction.
@end deffn
@deffn Command search-backward string &optional limit noerror repeat
This function searches backward from point for @var{string}. It is
-just like @code{search-forward} except that it searches backwards and
-leaves point at the beginning of the match.
+like @code{search-forward}, except that it searches backwards rather
+than forwards. Backward searches leave point at the beginning of the
+match.
@end deffn
@deffn Command word-search-forward string &optional limit noerror repeat
Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
-(including the empty string), from which it follows that @samp{c[ad]*r}
+(including the empty string). It follows that @samp{c[ad]*r}
matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
You can also include character ranges in a character alternative, by
To include @samp{^} in a character alternative, put it anywhere but at
the beginning.
-The beginning and end of a range of multibyte characters must be in
-the same character set (@pxref{Character Sets}). Thus,
-@code{"[\x8e0-\x97c]"} is invalid because character 0x8e0 (@samp{a}
-with grave accent) is in the Emacs character set for Latin-1 but the
-character 0x97c (@samp{u} with diaeresis) is in the Emacs character
-set for Latin-2. (We use Lisp string syntax to write that example,
-and a few others in the next few paragraphs, in order to include hex
-escape sequences in them.)
-
If a range starts with a unibyte character @var{c} and ends with a
multibyte character @var{c2}, the range is divided into two parts: one
is @samp{@var{c}..?\377}, the other is @samp{@var{c1}..@var{c2}}, where
@var{c1} is the first character of the charset to which @var{c2}
belongs.
-You cannot always match all non-@acronym{ASCII} characters with the regular
-expression @code{"[\200-\377]"}. This works when searching a unibyte
-buffer or string (@pxref{Text Representations}), but not in a multibyte
-buffer or string, because many non-@acronym{ASCII} characters have codes
-above octal 0377. However, the regular expression @code{"[^\000-\177]"}
-does match all non-@acronym{ASCII} characters (see below regarding @samp{^}),
-in both multibyte and unibyte representations, because only the
-@acronym{ASCII} characters are excluded.
-
-A character alternative can also specify named
-character classes (@pxref{Char Classes}). This is a POSIX feature whose
-syntax is @samp{[:@var{class}:]}. Using a character class is equivalent
-to mentioning each of the characters in that class; but the latter is
-not feasible in practice, since some classes include thousands of
+A character alternative can also specify named character classes
+(@pxref{Char Classes}). This is a POSIX feature whose syntax is
+@samp{[:@var{class}:]}. Using a character class is equivalent to
+mentioning each of the characters in that class; but the latter is not
+feasible in practice, since some classes include thousands of
different characters.
@item @samp{[^ @dots{} ]}
mentioned as one of the characters not to match. This is in contrast to
the handling of regexps in programs such as @code{grep}.
+You can specify named character classes, just like in character
+alternatives. For instance, @samp{[^[:ascii:]]} matches any
+non-@acronym{ASCII} character. @xref{Char Classes}.
+
@item @samp{^}
@cindex beginning of line in regexp
When matching a buffer, @samp{^} matches the empty string, but only at the
This matches graphic characters---everything except @acronym{ASCII} control
characters, space, and the delete character.
@item [:lower:]
-This matches any lower-case letter, as determined by
-the current case table (@pxref{Case Tables}).
+This matches any lower-case letter, as determined by the current case
+table (@pxref{Case Tables}). If @code{case-fold-search} is
+non-@code{nil}, this also matches any upper-case letter.
@item [:multibyte:]
This matches any multibyte character (@pxref{Text Representations}).
@item [:nonascii:]
@item [:unibyte:]
This matches any unibyte character (@pxref{Text Representations}).
@item [:upper:]
-This matches any upper-case letter, as determined by
-the current case table (@pxref{Case Tables}).
+This matches any upper-case letter, as determined by the current case
+table (@pxref{Case Tables}). If @code{case-fold-search} is
+non-@code{nil}, this also matches any lower-case letter.
@item [:word:]
This matches any character that has word syntax (@pxref{Syntax Class
Table}).
@node Regexp Backslash
@subsubsection Backslash Constructs in Regular Expressions
+@cindex backslash in regular expressions
For the most part, @samp{\} followed by any character matches only
that character. However, there are several exceptions: certain
For example, @samp{c[ad]\@{1,2\@}r} matches the strings @samp{car},
@samp{cdr}, @samp{caar}, @samp{cadr}, @samp{cdar}, and @samp{cddr}, and
nothing else.@*
-@samp{\@{0,1\@}} or @samp{\@{,1\@}} is equivalent to @samp{?}. @*
-@samp{\@{0,\@}} or @samp{\@{,\@}} is equivalent to @samp{*}. @*
+@samp{\@{0,1\@}} or @samp{\@{,1\@}} is equivalent to @samp{?}.@*
+@samp{\@{0,\@}} or @samp{\@{,\@}} is equivalent to @samp{*}.@*
@samp{\@{1,\@}} is equivalent to @samp{+}.
@item \( @dots{} \)
@cindex @samp{\S} in regexp
matches any character whose syntax is not @var{code}.
+@cindex category, regexp search for
@item \c@var{c}
matches any character whose category is @var{c}. Here @var{c} is a
character that represents a category: thus, @samp{c} for Chinese
characters or @samp{g} for Greek characters in the standard category
-table.
+table. You can see the list of all the currently defined categories
+with @kbd{M-x describe-categories @key{RET}}. You can also define
+your own categories in addition to the standard ones using the
+@code{define-category} function (@pxref{Categories}).
@item \C@var{c}
matches any character whose category is not @var{c}.
If the optional argument @var{paren} is non-@code{nil}, then the
returned regular expression is always enclosed by at least one
parentheses-grouping construct. If @var{paren} is @code{words}, then
-that construct is additionally surrounded by @samp{\<} and @samp{\>}.
+that construct is additionally surrounded by @samp{\<} and @samp{\>};
+alternatively, if @var{paren} is @code{symbols}, then that construct
+is additionally surrounded by @samp{\_<} and @samp{\_>}
+(@code{symbols} is often appropriate when matching
+programming-language keywords and the like).
This simplified definition of @code{regexp-opt} produces a
regular expression which is equivalent to the actual value
can't avoid another intervening search, you must save and restore the
match data around it, to prevent it from being overwritten.
+ Notice that all functions are allowed to overwrite the match data
+unless they're explicitly documented not to do so. A consequence is
+that functions that are run implicitly in the background
+(@pxref{Timers}, and @ref{Idle Timers}) should likely save and restore
+the match data explicitly.
+
@menu
-* Replacing Match:: Replacing a substring that was matched.
+* Replacing Match:: Replacing a substring that was matched.
* Simple Match Data:: Accessing single items of match data,
- such as where a particular subexpression started.
+ such as where a particular subexpression started.
* Entire Match Data:: Accessing the entire match data at once, as a list.
* Saving Match Data:: Saving and restoring the match data.
@end menu
@code{sentence-end-without-period} and
@code{sentence-end-without-space}.
@end defun
-
-@ignore
- arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f
-@end ignore