@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990-1995, 1998-1999, 2001-2012
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990-1995, 1998-1999, 2001-2015 Free Software
+@c Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@node Searching and Matching
@chapter Searching and Matching
@group
(word-search-forward "Please find the ball, boy.")
- @result{} 36
+ @result{} 39
---------- Buffer: foo ----------
He said "Please! Find
times. Point is positioned at the end of the last match.
@findex word-search-regexp
-Internal, @code{word-search-forward} and related functions use the
+Internally, @code{word-search-forward} and related functions use the
function @code{word-search-regexp} to convert @var{string} to a
regular expression that ignores punctuation.
@end deffn
@deffn Command word-search-forward-lax string &optional limit noerror repeat
This command is identical to @code{word-search-forward}, except that
-the end of @var{string} need not match a word boundary, unless @var{string} ends
-in whitespace. For instance, searching for @samp{ball boy} matches
-@samp{ball boyee}, but does not match @samp{aball boy}.
+the beginning or the end of @var{string} need not match a word
+boundary, unless @var{string} begins or ends in whitespace.
+For instance, searching for @samp{ball boy} matches @samp{ball boyee},
+but does not match @samp{balls boy}.
@end deffn
@deffn Command word-search-backward string &optional limit noerror repeat
@deffn Command word-search-backward-lax string &optional limit noerror repeat
This command is identical to @code{word-search-backward}, except that
-the end of @var{string} need not match a word boundary, unless @var{string} ends
-in whitespace.
+the beginning or the end of @var{string} need not match a word
+boundary, unless @var{string} begins or ends in whitespace.
@end deffn
@node Searching and Case
@node Syntax of Regexps
@subsection Syntax of Regular Expressions
+@cindex regexp syntax
+@cindex syntax of regular expressions
Regular expressions have a syntax in which a few characters are
special constructs and the rest are @dfn{ordinary}. An ordinary
therefore @samp{f} is a regular expression that matches the string
@samp{f} and no other string. (It does @emph{not} match the string
@samp{fg}, but it does match a @emph{part} of that string.) Likewise,
-@samp{o} is a regular expression that matches only @samp{o}.@refill
+@samp{o} is a regular expression that matches only @samp{o}.
Any two regular expressions @var{a} and @var{b} can be concatenated. The
result is a regular expression that matches a string if @var{a} matches
some amount of the beginning of that string and @var{b} matches the rest of
-the string.@refill
+the string.
As a simple example, we can concatenate the regular expressions @samp{f}
and @samp{o} to get the regular expression @samp{fo}, which matches only
@node Regexp Special
@subsubsection Special Characters in Regular Expressions
+@cindex regexp, special characters in
Here is a list of the characters that are special in a regular
expression.
is a special character that matches any single character except a newline.
Using concatenation, we can make regular expressions like @samp{a.b}, which
matches any three-character string that begins with @samp{a} and ends with
-@samp{b}.@refill
+@samp{b}.
@item @samp{*}
@cindex @samp{*} in regexp
matches upper-case letters. Note that a range like @samp{[a-z]} is
not affected by the locale's collation sequence, it always represents
a sequence in @acronym{ASCII} order.
-@c This wasn't obvious to me, since eg the grep manual "Character
+@c This wasn't obvious to me, since, e.g., the grep manual "Character
@c Classes and Bracket Expressions" specifically notes the opposite
@c behavior. But by experiment Emacs seems unaffected by LC_COLLATE
@c in this regard.
@samp{\\}. To write a Lisp string that contains the characters
@samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
@samp{\}. Therefore, the read syntax for a regular expression matching
-@samp{\} is @code{"\\\\"}.@refill
+@samp{\} is @code{"\\\\"}.
@end table
@strong{Please note:} For historical compatibility, special characters
meanings make no sense. For example, @samp{*foo} treats @samp{*} as
ordinary since there is no preceding expression on which the @samp{*}
can act. It is poor practice to depend on this behavior; quote the
-special character anyway, regardless of where it appears.@refill
+special character anyway, regardless of where it appears.
As a @samp{\} is not special inside a character alternative, it can
never remove the special meaning of @samp{-} or @samp{]}. So you
For the most part, @samp{\} followed by any character matches only
that character. However, there are several exceptions: certain
-two-character sequences starting with @samp{\} that have special
-meanings. (The character after the @samp{\} in such a sequence is
-always ordinary when used on its own.) Here is a table of the special
-@samp{\} constructs.
+sequences starting with @samp{\} that have special meanings. Here is
+a table of the special @samp{\} constructs.
@table @samp
@item \|
specifies an alternative.
Two regular expressions @var{a} and @var{b} with @samp{\|} in
between form an expression that matches anything that either @var{a} or
-@var{b} matches.@refill
+@var{b} matches.
Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
-but no other string.@refill
+but no other string.
@samp{\|} applies to the largest possible surrounding expressions. Only a
surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
-@samp{\|}.@refill
+@samp{\|}.
If you need full backtracking capability to handle multiple uses of
@samp{\|}, use the POSIX regular expression functions (@pxref{POSIX
their number implicitly, based on their position, which can be
inconvenient. This construct allows you to force a particular group
number. There is no particular restriction on the numbering,
-e.g.@: you can have several groups with the same number in which case
-the last one to match (i.e.@: the rightmost match) will win.
+e.g., you can have several groups with the same number in which case
+the last one to match (i.e., the rightmost match) will win.
Implicitly numbered groups always get the smallest integer larger than
the one of any previous group.
matches the empty string, but only at the beginning or
end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
@samp{foo} as a separate word. @samp{\bballs?\b} matches
-@samp{ball} or @samp{balls} as a separate word.@refill
+@samp{ball} or @samp{balls} as a separate word.
@samp{\b} matches at the beginning or end of the buffer (or string)
regardless of what text appears next to it.
These functions operate on regular expressions.
+@cindex quote special characters in regexp
@defun regexp-quote string
This function returns a regular expression whose only exact match is
@var{string}. Using this regular expression in @code{looking-at} will
@end example
@end defun
+@cindex optimize regexp
@defun regexp-opt strings &optional paren
This function returns an efficient regular expression that will match
any of the strings in the list @var{strings}. This is useful when you
guarantee that its result is absolutely the most efficient form
possible. A hand-tuned regular expression can sometimes be slightly
more efficient, but is almost never worth the effort.}.
-@c See eg http://debbugs.gnu.org/2816
+@c E.g., see http://debbugs.gnu.org/2816
If the optional argument @var{paren} is non-@code{nil}, then the
returned regular expression is always enclosed by at least one
point. That can be quite slow if it has to search a long distance.
You can bound the time required by specifying @var{limit}, which says
not to search before @var{limit}. In this case, the match that is
-found must begin at or after @var{limit}.
-
-If @var{greedy} is non-@code{nil}, this function extends the match
-backwards as far as possible, stopping when a single additional
-previous character cannot be part of a match for regexp. When the
-match is extended, its starting position is allowed to occur before
-@var{limit}.
+found must begin at or after @var{limit}. Here's an example:
@example
@group
@end group
@end example
+If @var{greedy} is non-@code{nil}, this function extends the match
+backwards as far as possible, stopping when a single additional
+previous character cannot be part of a match for regexp. When the
+match is extended, its starting position is allowed to occur before
+@var{limit}.
+
@c http://debbugs.gnu.org/5689
As a general recommendation, try to avoid using @code{looking-back}
wherever possible, since it is slow. For this reason, there are no
full backtracking specified by the POSIX standard for regular expression
matching. They continue backtracking until they have tried all
possibilities and found all matches, so they can report the longest
-match, as required by POSIX. This is much slower, so use these
+match, as required by POSIX@. This is much slower, so use these
functions only when you really need the longest match.
The POSIX search and match functions do not properly support the
the current buffer is the one in which you performed the last search.
Then this function edits the buffer, replacing the matched text with
@var{replacement}. It leaves point at the end of the replacement
-text, and returns @code{t}.
+text.
If you performed the last search on a string, pass the same string as
@var{string}. Then this function returns a new string, in which the
@item @samp{\?}
This stands for itself (for compatibility with @code{replace-regexp}
-and related commands; @pxref{Regexp Replacement,,, emacs, The GNU
+and related commands; @pxref{Regexp Replace,,, emacs, The GNU
Emacs Manual}).
@end table
may save and restore the match data (@pxref{Saving Match Data}) around
the call to functions that could perform another search. Or use the
functions that explicitly do not modify the match data;
-e.g. @code{string-match-p}.
+e.g., @code{string-match-p}.
@c This is an old comment and presumably there is no prospect of this
@c changing now. But still the advice stands.
@item exit-prefix
Like @code{exit}, but add the key that was pressed to
-@code{unread-comment-events}.
+@code{unread-command-events} (@pxref{Event Input Misc}).
@item act-and-exit
Answer this question ``yes'', and give up on the entire series of