@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990-1995, 1998-1999, 2001-2012
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990-1995, 1998-1999, 2001-2015 Free Software
+@c Foundation, Inc.
@c See the file elisp.texi for copying conditions.
-@setfilename ../../info/searching
-@node Searching and Matching, Syntax Tables, Non-ASCII Characters, Top
+@node Searching and Matching
@chapter Searching and Matching
@cindex searching
@group
(word-search-forward "Please find the ball, boy.")
- @result{} 36
+ @result{} 39
---------- Buffer: foo ----------
He said "Please! Find
times. Point is positioned at the end of the last match.
@findex word-search-regexp
-Internal, @code{word-search-forward} and related functions use the
+Internally, @code{word-search-forward} and related functions use the
function @code{word-search-regexp} to convert @var{string} to a
regular expression that ignores punctuation.
@end deffn
@deffn Command word-search-forward-lax string &optional limit noerror repeat
This command is identical to @code{word-search-forward}, except that
-the end of @var{string} need not match a word boundary, unless @var{string} ends
-in whitespace. For instance, searching for @samp{ball boy} matches
-@samp{ball boyee}, but does not match @samp{aball boy}.
+the beginning or the end of @var{string} need not match a word
+boundary, unless @var{string} begins or ends in whitespace.
+For instance, searching for @samp{ball boy} matches @samp{ball boyee},
+but does not match @samp{balls boy}.
@end deffn
@deffn Command word-search-backward string &optional limit noerror repeat
@deffn Command word-search-backward-lax string &optional limit noerror repeat
This command is identical to @code{word-search-backward}, except that
-the end of @var{string} need not match a word boundary, unless @var{string} ends
-in whitespace.
+the beginning or the end of @var{string} need not match a word
+boundary, unless @var{string} begins or ends in whitespace.
@end deffn
@node Searching and Case
@node Syntax of Regexps
@subsection Syntax of Regular Expressions
+@cindex regexp syntax
+@cindex syntax of regular expressions
Regular expressions have a syntax in which a few characters are
special constructs and the rest are @dfn{ordinary}. An ordinary
therefore @samp{f} is a regular expression that matches the string
@samp{f} and no other string. (It does @emph{not} match the string
@samp{fg}, but it does match a @emph{part} of that string.) Likewise,
-@samp{o} is a regular expression that matches only @samp{o}.@refill
+@samp{o} is a regular expression that matches only @samp{o}.
Any two regular expressions @var{a} and @var{b} can be concatenated. The
result is a regular expression that matches a string if @var{a} matches
some amount of the beginning of that string and @var{b} matches the rest of
-the string.@refill
+the string.
As a simple example, we can concatenate the regular expressions @samp{f}
and @samp{o} to get the regular expression @samp{fo}, which matches only
@node Regexp Special
@subsubsection Special Characters in Regular Expressions
+@cindex regexp, special characters in
Here is a list of the characters that are special in a regular
expression.
is a special character that matches any single character except a newline.
Using concatenation, we can make regular expressions like @samp{a.b}, which
matches any three-character string that begins with @samp{a} and ends with
-@samp{b}.@refill
+@samp{b}.
@item @samp{*}
@cindex @samp{*} in regexp
matches upper-case letters. Note that a range like @samp{[a-z]} is
not affected by the locale's collation sequence, it always represents
a sequence in @acronym{ASCII} order.
-@c This wasn't obvious to me, since eg the grep manual "Character
+@c This wasn't obvious to me, since, e.g., the grep manual "Character
@c Classes and Bracket Expressions" specifically notes the opposite
@c behavior. But by experiment Emacs seems unaffected by LC_COLLATE
@c in this regard.
@samp{\\}. To write a Lisp string that contains the characters
@samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
@samp{\}. Therefore, the read syntax for a regular expression matching
-@samp{\} is @code{"\\\\"}.@refill
+@samp{\} is @code{"\\\\"}.
@end table
@strong{Please note:} For historical compatibility, special characters
meanings make no sense. For example, @samp{*foo} treats @samp{*} as
ordinary since there is no preceding expression on which the @samp{*}
can act. It is poor practice to depend on this behavior; quote the
-special character anyway, regardless of where it appears.@refill
+special character anyway, regardless of where it appears.
As a @samp{\} is not special inside a character alternative, it can
never remove the special meaning of @samp{-} or @samp{]}. So you
For the most part, @samp{\} followed by any character matches only
that character. However, there are several exceptions: certain
-two-character sequences starting with @samp{\} that have special
-meanings. (The character after the @samp{\} in such a sequence is
-always ordinary when used on its own.) Here is a table of the special
-@samp{\} constructs.
+sequences starting with @samp{\} that have special meanings. Here is
+a table of the special @samp{\} constructs.
@table @samp
@item \|
specifies an alternative.
Two regular expressions @var{a} and @var{b} with @samp{\|} in
between form an expression that matches anything that either @var{a} or
-@var{b} matches.@refill
+@var{b} matches.
Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
-but no other string.@refill
+but no other string.
@samp{\|} applies to the largest possible surrounding expressions. Only a
surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
-@samp{\|}.@refill
+@samp{\|}.
If you need full backtracking capability to handle multiple uses of
@samp{\|}, use the POSIX regular expression functions (@pxref{POSIX
their number implicitly, based on their position, which can be
inconvenient. This construct allows you to force a particular group
number. There is no particular restriction on the numbering,
-e.g.@: you can have several groups with the same number in which case
-the last one to match (i.e.@: the rightmost match) will win.
+e.g., you can have several groups with the same number in which case
+the last one to match (i.e., the rightmost match) will win.
Implicitly numbered groups always get the smallest integer larger than
the one of any previous group.
matches the empty string, but only at the beginning or
end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
@samp{foo} as a separate word. @samp{\bballs?\b} matches
-@samp{ball} or @samp{balls} as a separate word.@refill
+@samp{ball} or @samp{balls} as a separate word.
@samp{\b} matches at the beginning or end of the buffer (or string)
regardless of what text appears next to it.
an @code{invalid-regexp} error is signaled.
@node Regexp Example
-@comment node-name, next, previous, up
@subsection Complex Regexp Example
Here is a complicated regexp which was formerly used by Emacs to
These functions operate on regular expressions.
+@cindex quote special characters in regexp
@defun regexp-quote string
This function returns a regular expression whose only exact match is
@var{string}. Using this regular expression in @code{looking-at} will
succeed only if the next characters in the buffer are @var{string};
using it in a search function will succeed if the text being searched
-contains @var{string}.
+contains @var{string}. @xref{Regexp Search}.
This allows you to request an exact string match or search when calling
a function that wants a regular expression.
@end example
@end defun
+@cindex optimize regexp
@defun regexp-opt strings &optional paren
This function returns an efficient regular expression that will match
any of the strings in the list @var{strings}. This is useful when you
need to make matching or searching as fast as possible---for example,
-for Font Lock mode.
+for Font Lock mode@footnote{Note that @code{regexp-opt} does not
+guarantee that its result is absolutely the most efficient form
+possible. A hand-tuned regular expression can sometimes be slightly
+more efficient, but is almost never worth the effort.}.
+@c E.g., see http://debbugs.gnu.org/2816
If the optional argument @var{paren} is non-@code{nil}, then the
returned regular expression is always enclosed by at least one
(but not as efficient):
@example
-(defun regexp-opt (strings paren)
+(defun regexp-opt (strings &optional paren)
(let ((open-paren (if paren "\\(" ""))
(close-paren (if paren "\\)" "")))
(concat open-paren
shy groups (@pxref{Regexp Backslash}).
@end defun
+@c Supposedly an internal regexp-opt function, but table.el uses it at least.
+@defun regexp-opt-charset chars
+This function returns a regular expression matching a character in the
+list of characters @var{chars}.
+
+@example
+(regexp-opt-charset '(?a ?b ?c ?d ?e))
+ @result{} "[a-e]"
+@end example
+@end defun
+
+@c Internal functions: regexp-opt-group
+
@node Regexp Search
@section Regular Expression Searching
@cindex regular expression searching
succeed only starting with the first character following point. The
result is @code{t} if so, @code{nil} otherwise.
-This function does not move point, but it updates the match data, which
-you can access using @code{match-beginning} and @code{match-end}.
+This function does not move point, but it does update the match data.
@xref{Match Data}. If you need to test for a match without modifying
the match data, use @code{looking-at-p}, described below.
@end defun
@defun looking-back regexp &optional limit greedy
-This function returns @code{t} if @var{regexp} matches text before
-point, ending at point, and @code{nil} otherwise.
+This function returns @code{t} if @var{regexp} matches the text
+immediately before point (i.e., ending at point), and @code{nil} otherwise.
Because regular expression matching works only going forward, this is
implemented by searching backwards from point for a match that ends at
point. That can be quite slow if it has to search a long distance.
You can bound the time required by specifying @var{limit}, which says
not to search before @var{limit}. In this case, the match that is
-found must begin at or after @var{limit}.
-
-If @var{greedy} is non-@code{nil}, this function extends the match
-backwards as far as possible, stopping when a single additional
-previous character cannot be part of a match for regexp. When the
-match is extended, its starting position is allowed to occur before
-@var{limit}.
+found must begin at or after @var{limit}. Here's an example:
@example
@group
@result{} nil
@end group
@end example
+
+If @var{greedy} is non-@code{nil}, this function extends the match
+backwards as far as possible, stopping when a single additional
+previous character cannot be part of a match for regexp. When the
+match is extended, its starting position is allowed to occur before
+@var{limit}.
+
+@c http://debbugs.gnu.org/5689
+As a general recommendation, try to avoid using @code{looking-back}
+wherever possible, since it is slow. For this reason, there are no
+plans to add a @code{looking-back-p} function.
@end defun
@defun looking-at-p regexp
@node POSIX Regexps
@section POSIX Regular Expression Searching
+@cindex backtracking and POSIX regular expressions
The usual regular expression functions do backtracking when necessary
to handle the @samp{\|} and repetition constructs, but they continue
this only until they find @emph{some} match. Then they succeed and
full backtracking specified by the POSIX standard for regular expression
matching. They continue backtracking until they have tried all
possibilities and found all matches, so they can report the longest
-match, as required by POSIX. This is much slower, so use these
+match, as required by POSIX@. This is much slower, so use these
functions only when you really need the longest match.
The POSIX search and match functions do not properly support the
@cindex case in replacements
@defun replace-match replacement &optional fixedcase literal string subexp
-This function replaces the text in the buffer (or in @var{string}) that
-was matched by the last search. It replaces that text with
-@var{replacement}.
+This function performs a replacement operation on a buffer or string.
-If you did the last search in a buffer, you should specify @code{nil}
-for @var{string} and make sure that the current buffer when you call
-@code{replace-match} is the one in which you did the searching or
-matching. Then @code{replace-match} does the replacement by editing
-the buffer; it leaves point at the end of the replacement text, and
-returns @code{t}.
+If you did the last search in a buffer, you should omit the
+@var{string} argument or specify @code{nil} for it, and make sure that
+the current buffer is the one in which you performed the last search.
+Then this function edits the buffer, replacing the matched text with
+@var{replacement}. It leaves point at the end of the replacement
+text.
-If you did the search in a string, pass the same string as @var{string}.
-Then @code{replace-match} does the replacement by constructing and
-returning a new string.
+If you performed the last search on a string, pass the same string as
+@var{string}. Then this function returns a new string, in which the
+matched text is replaced by @var{replacement}.
If @var{fixedcase} is non-@code{nil}, then @code{replace-match} uses
the replacement text without case conversion; otherwise, it converts
@table @asis
@item @samp{\&}
@cindex @samp{&} in replacement
-@samp{\&} stands for the entire text being replaced.
+This stands for the entire text being replaced.
-@item @samp{\@var{n}}
+@item @samp{\@var{n}}, where @var{n} is a digit
@cindex @samp{\@var{n}} in replacement
-@samp{\@var{n}}, where @var{n} is a digit, stands for the text that
-matched the @var{n}th subexpression in the original regexp.
-Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
-If the @var{n}th subexpression never matched, an empty string is substituted.
+This stands for the text that matched the @var{n}th subexpression in
+the original regexp. Subexpressions are those expressions grouped
+inside @samp{\(@dots{}\)}. If the @var{n}th subexpression never
+matched, an empty string is substituted.
@item @samp{\\}
@cindex @samp{\} in replacement
-@samp{\\} stands for a single @samp{\} in the replacement text.
+This stands for a single @samp{\} in the replacement text.
+
+@item @samp{\?}
+This stands for itself (for compatibility with @code{replace-regexp}
+and related commands; @pxref{Regexp Replace,,, emacs, The GNU
+Emacs Manual}).
@end table
-These substitutions occur after case conversion, if any,
-so the strings they substitute are never case-converted.
+@noindent
+Any other character following @samp{\} signals an error.
+
+The substitutions performed by @samp{\&} and @samp{\@var{n}} occur
+after case conversion, if any. Therefore, the strings they substitute
+are never case-converted.
If @var{subexp} is non-@code{nil}, that says to replace just
subexpression number @var{subexp} of the regexp that was matched, not
query the match data immediately after searching, before calling any
other function that might perform another search. Alternatively, you
may save and restore the match data (@pxref{Saving Match Data}) around
-the call to functions that could perform another search.
+the call to functions that could perform another search. Or use the
+functions that explicitly do not modify the match data;
+e.g., @code{string-match-p}.
+@c This is an old comment and presumably there is no prospect of this
+@c changing now. But still the advice stands.
A search which fails may or may not alter the match data. In the
-past, a failing search did not do this, but we may change it in the
-future. So don't try to rely on the value of the match data after
-a failing search.
+current implementation, it does not, but we may change it in the
+future. Don't try to rely on the value of the match data after a
+failing search.
@defun match-string count &optional in-string
This function returns, as a string, the text matched in the last search
you should omit @var{in-string} or pass @code{nil} for it; but you
should make sure that the current buffer when you call
@code{match-string} is the one in which you did the searching or
-matching.
+matching. Failure to follow this advice will lead to incorrect results.
The value is @code{nil} if @var{count} is out of range, or for a
subexpression inside a @samp{\|} alternative that wasn't used or a
@end defun
@defun match-beginning count
-This function returns the position of the start of text matched by the
+This function returns the position of the start of the text matched by the
last regular expression searched for, or a subexpression of it.
If @var{count} is zero, then the value is the position of the start of
@defun match-data &optional integers reuse reseat
This function returns a list of positions (markers or integers) that
-record all the information on what text the last search matched.
+record all the information on the text that the last search matched.
Element zero is the position of the beginning of the match for the
whole expression; element one is the position of the end of the match
for the expression. The next two elements are the positions of the
If @var{reseat} is non-@code{nil}, all markers on the @var{match-list} list
are reseated to point to nowhere.
+@c TODO Make it properly obsolete.
@findex store-match-data
@code{store-match-data} is a semi-obsolete alias for @code{set-match-data}.
@end defun
@node Saving Match Data
@subsection Saving and Restoring the Match Data
- When you call a function that may do a search, you may need to save
+ When you call a function that may search, you may need to save
and restore the match data around that call, if you want to preserve the
match data from an earlier search for later use. Here is an example
that shows the problem that arises if you fail to save the match data:
@group
(re-search-forward "The \\(cat \\)")
@result{} 48
-(foo) ; @r{Perhaps @code{foo} does}
- ; @r{more searching.}
+(foo) ; @r{@code{foo} does more searching.}
(match-end 0)
@result{} 61 ; @r{Unexpected result---not 48!}
@end group
@code{replace-regexp-in-string} calls @var{rep} for each match,
passing the text of the match as its sole argument. It collects the
value @var{rep} returns and passes that to @code{replace-match} as the
-replacement string. The match-data at this point are the result
+replacement string. The match data at this point are the result
of matching @var{regexp} against a substring of @var{string}.
@end defun
If @var{from-string} contains upper-case letters, then
@code{perform-replace} binds @code{case-fold-search} to @code{nil}, and
-it uses the @code{replacements} without altering the case of them.
+it uses the @var{replacements} without altering their case.
Normally, the keymap @code{query-replace-map} defines the possible
user responses for queries. The argument @var{map}, if
Prefix keys are not supported; each key binding must be for a
single-event key sequence. This is because the functions don't use
@code{read-key-sequence} to get the input; instead, they read a single
-event and look it up ``by hand.''
+event and look it up ``by hand''.
@end itemize
@end defvar
@table @code
@item act
-Do take the action being considered---in other words, ``yes.''
+Do take the action being considered---in other words, ``yes''.
@item skip
-Do not take action for this question---in other words, ``no.''
+Do not take action for this question---in other words, ``no''.
@item exit
-Answer this question ``no,'' and give up on the entire series of
-questions, assuming that the answers will be ``no.''
+Answer this question ``no'', and give up on the entire series of
+questions, assuming that the answers will be ``no''.
+
+@item exit-prefix
+Like @code{exit}, but add the key that was pressed to
+@code{unread-command-events} (@pxref{Event Input Misc}).
@item act-and-exit
-Answer this question ``yes,'' and give up on the entire series of
-questions, assuming that subsequent answers will be ``no.''
+Answer this question ``yes'', and give up on the entire series of
+questions, assuming that subsequent answers will be ``no''.
@item act-and-show
-Answer this question ``yes,'' but show the results---don't advance yet
+Answer this question ``yes'', but show the results---don't advance yet
to the next question.
@item automatic
Answer this question and all subsequent questions in the series with
-``yes,'' without further user interaction.
+``yes'', without further user interaction.
@item backup
Move back to the previous place that a question was asked about.
Enter a recursive edit to deal with this question---instead of any
other action that would normally be taken.
+@item edit-replacement
+Edit the replacement for this question in the minibuffer.
+
@item delete-and-edit
Delete the text being considered, then enter a recursive edit to replace
it.
@item recenter
-Redisplay and center the window, then ask the same question again.
+@itemx scroll-up
+@itemx scroll-down
+@itemx scroll-other-window
+@itemx scroll-other-window-down
+Perform the specified window scroll operation, then ask the same
+question again. Only @code{y-or-n-p} and related functions use this
+answer.
@item quit
Perform a quit right away. Only @code{y-or-n-p} and related functions
@defvar multi-query-replace-map
This variable holds a keymap that extends @code{query-replace-map} by
providing additional keybindings that are useful in multi-buffer
-replacements.
+replacements. The additional ``bindings'' are:
+
+@table @code
+@item automatic-all
+Answer this question and all subsequent questions in the series with
+``yes'', without further user interaction, for all remaining buffers.
+
+@item exit-current
+Answer this question ``no'', and give up on the entire series of
+questions for the current buffer. Continue to the next buffer in the
+sequence.
+@end table
@end defvar
@defvar replace-search-function
the end of a sentence, including the whitespace following the
sentence. (All paragraph boundaries also end sentences, regardless.)
-If the value is @code{nil}, the default, then the function
-@code{sentence-end} has to construct the regexp. That is why you
+If the value is @code{nil}, as it is by default, then the function
+@code{sentence-end} constructs the regexp. That is why you
should always call the function @code{sentence-end} to obtain the
regexp to be used to recognize the end of a sentence.
@end defopt
if non-@code{nil}. Otherwise it returns a default value based on the
values of the variables @code{sentence-end-double-space}
(@pxref{Definition of sentence-end-double-space}),
-@code{sentence-end-without-period} and
+@code{sentence-end-without-period}, and
@code{sentence-end-without-space}.
@end defun