@c This is part of the Emacs manual.
-@c Copyright (C) 1985, 86, 87, 93, 94, 95, 97, 2000
+@c Copyright (C) 1985, 86, 87, 93, 94, 95, 97, 2000, 2001
@c Free Software Foundation, Inc.
@c See file emacs.texi for copying conditions.
@node Search, Fixit, Display, Top
When you are satisfied with the place you have reached, you can type
@key{RET}, which stops searching, leaving the cursor where the search
brought it. Also, any command not specially meaningful in searches
-stops the searching and is then executed. Thus, typing @kbd{C-a} would
-exit the search and then move to the beginning of the line. @key{RET}
-is necessary only if the next command you want to type is a printing
-character, @key{DEL}, @key{RET}, or another control character that is
+stops the searching and is then executed. Thus, typing @kbd{C-a}
+would exit the search and then move to the beginning of the line.
+@key{RET} is necessary only if the next command you want to type is a
+printing character, @key{DEL}, @key{RET}, or another character that is
special within searches (@kbd{C-q}, @kbd{C-w}, @kbd{C-r}, @kbd{C-s},
-@kbd{C-y}, @kbd{M-y}, @kbd{M-r}, or @kbd{M-s}).
+@kbd{C-y}, @kbd{M-y}, @kbd{M-r}, @kbd{M-s}, and some other
+meta-characters).
Sometimes you search for @samp{FOO} and find it, but not the one you
-expected to find. There was a second @samp{FOO} that you forgot about,
-before the one you were aiming for. In this event, type another @kbd{C-s}
-to move to the next occurrence of the search string. This can be done any
-number of times. If you overshoot, you can cancel some @kbd{C-s}
-characters with @key{DEL}.
+expected to find. There was a second @samp{FOO} that you forgot
+about, before the one you were aiming for. In this event, type
+another @kbd{C-s} to move to the next occurrence of the search string.
+You can repeat this any number of times. If you overshoot, you can
+cancel some @kbd{C-s} characters with @key{DEL}.
After you exit a search, you can search for the same string again by
typing just @kbd{C-s C-s}: the first @kbd{C-s} is the key that invokes
case-sensitive. If you delete the upper-case character from the search
string, it ceases to have this effect. @xref{Search Case}.
+ To search for a newline, type @kbd{C-j}. To search for another
+control character, such as control-S or carriage return, you must quote
+it by typing @kbd{C-q} first. This function of @kbd{C-q} is analogous
+to its use for insertion (@pxref{Inserting Text}): it causes the
+following character to be treated the way any ``ordinary'' character is
+treated in the same context. You can also specify a character by its
+octal code: enter @kbd{C-q} followed by a sequence of octal digits.
+
+@cindex searching for non-ASCII characters
+@cindex input method, during incremental search
+ To search for non-ASCII characters, you must use an input method
+(@pxref{Input Methods}). If an input method is turned on in the
+current buffer when you start the search, you can use it while you
+type the search string also. Emacs indicates that by including the
+input method mnemonic in its prompt, like this:
+
+@example
+I-search [@var{im}]:
+@end example
+
+@noindent
+@findex isearch-toggle-input-method
+@findex isearch-toggle-specified-input-method
+where @var{im} is the mnemonic of the active input method. You can
+toggle (enable or disable) the input method while you type the search
+string with @kbd{C-\} (@code{isearch-toggle-input-method}). You can
+turn on a certain (non-default) input method with @kbd{C-^}
+(@code{isearch-toggle-specified-input-method}), which prompts for the
+name of the input method. Note that the input method you turn on
+during incremental search is turned on in the current buffer as well.
+
If a search is failing and you ask to repeat it by typing another
-@kbd{C-s}, it starts again from the beginning of the buffer. Repeating
-a failing reverse search with @kbd{C-r} starts again from the end. This
-is called @dfn{wrapping around}. @samp{Wrapped} appears in the search
-prompt once this has happened. If you keep on going past the original
-starting point of the search, it changes to @samp{Overwrapped}, which
-means that you are revisiting matches that you have already seen.
+@kbd{C-s}, it starts again from the beginning of the buffer.
+Repeating a failing reverse search with @kbd{C-r} starts again from
+the end. This is called @dfn{wrapping around}, and @samp{Wrapped}
+appears in the search prompt once this has happened. If you keep on
+going past the original starting point of the search, it changes to
+@samp{Overwrapped}, which means that you are revisiting matches that
+you have already seen.
@cindex quitting (in search)
The @kbd{C-g} ``quit'' character does special things during searches;
search is now successful and waiting for more input, so a second @kbd{C-g}
will cancel the entire search.
- To search for a newline, type @kbd{C-j}. To search for another
-control character, such as control-S or carriage return, you must quote
-it by typing @kbd{C-q} first. This function of @kbd{C-q} is analogous
-to its use for insertion (@pxref{Inserting Text}): it causes the
-following character to be treated the way any ``ordinary'' character is
-treated in the same context. You can also specify a character by its
-octal code: enter @kbd{C-q} followed by a sequence of octal digits.
-
You can change to searching backwards with @kbd{C-r}. If a search fails
because the place you started was too late in the file, you should do this.
Repeated @kbd{C-r} keeps looking for more occurrences backwards. A
The character @kbd{M-y} copies text from the kill ring into the search
string. It uses the same text that @kbd{C-y} as a command would yank.
-@kbd{mouse-2} in the echo area does the same.
+@kbd{Mouse-2} in the echo area does the same.
@xref{Yanking}.
When you exit the incremental search, it sets the mark to where point
activating it, and does so only if the mark is not already active.
@cindex lazy search highlighting
- By default, Isearch uses @dfn{lazy highlighting}. All matches for
-the current search string in the buffer after the point where searching
-starts are highlighted. The extra highlighting makes it easier to
-anticipate where the cursor will end up each time you press @kbd{C-s} or
-@kbd{C-r} to repeat a pending search. Highlighting of these additional
-matches happens in a deferred fashion so as not to rob Isearch of its
-usual snappy response.
-@vindex isearch-lazy-highlight-cleanup
-By default the highlighting of matches is cleared when you end the
-search. Customize the variable @code{isearch-lazy-highlight-cleanup} to
-avoid cleaning up automatically. The command @kbd{M-x
-isearch-lazy-highlight-cleanup} can be used to clean up manually.
@vindex isearch-lazy-highlight
-Customize the variable @code{isearch-lazy-highlight} to turn off this
-feature.
+ When you pause for a little while during incremental search, it
+highlights all other possible matches for the search string. This
+makes it easier to anticipate where you can get to by typing @kbd{C-s}
+or @kbd{C-r} to repeat the search. The short delay before highlighting
+other matches helps indicate which match is the current one.
+If you don't like this feature, you can turn it off by setting
+@code{isearch-lazy-highlight} to @code{nil}.
+
+@vindex isearch-lazy-highlight-face
+@cindex faces for highlighting search matches
+ You can control how does the highlighting of matches look like by
+customizing the faces @code{isearch} (used for the current match) and
+@code{isearch-lazy-highlight-face} (used for the other matches).
@vindex isearch-mode-map
To customize the special characters that incremental search understands,
Then Emacs redisplays the window in which the search was done, to show
its new position of point.
-@ignore
- The three dots at the end of the search string, normally used to indicate
-that searching is going on, are not displayed in slow style display.
-@end ignore
-
@vindex search-slow-speed
The slow terminal style of display is used when the terminal baud rate is
less than or equal to the value of the variable @code{search-slow-speed},
-initially 1200.
+initially 1200. See @code{baud-rate} in @ref{Display Custom}.
@vindex search-slow-window-lines
The number of lines to use in slow terminal search display is controlled
@node Regexps, Search Case, Regexp Search, Search
@section Syntax of Regular Expressions
-@cindex regexp syntax
+@cindex syntax of regexps
Regular expressions have a syntax in which a few characters are
special constructs and the rest are @dfn{ordinary}. An ordinary
character and nothing else. The special characters are @samp{$},
@samp{^}, @samp{.}, @samp{*}, @samp{+}, @samp{?}, @samp{[}, @samp{]} and
@samp{\}. Any other character appearing in a regular expression is
-ordinary, unless a @samp{\} precedes it.
+ordinary, unless a @samp{\} precedes it. (When you use regular
+expressions in a Lisp program, each @samp{\} must be doubled, see the
+example near the end of this section.)
For example, @samp{f} is not a special character, so it is ordinary, and
therefore @samp{f} is a regular expression that matches the string
@item *?, +?, ??
@cindex non-greedy regexp matching
are non-greedy variants of the operators above. The normal operators
-@samp{*}, @samp{+}, @samp{?} are @dfn{greedy} in that they match as much
-as they can, while if you append a @samp{?} after them, it makes them
-non-greedy: they will match as little as possible.
+@samp{*}, @samp{+}, @samp{?} are @dfn{greedy} in that they match as
+much as they can, as long as the overall regexp can still match. With
+a following @samp{?}, they are non-greedy: they will match as little
+as possible.
+
+Thus, both @samp{ab*} and @samp{ab*?} can match the string @samp{a}
+and the string @samp{abbbb}; but if you try to match them both against
+the text @samp{abbb}, @samp{ab*} will match it all (the longest valid
+match), while @samp{ab*?} will match just @samp{a} (the shortest
+valid match).
+
+@item \@{@var{n}\@}
+is a postfix operator that specifies repetition @var{n} times---that
+is, the preceding regular expression must match exactly @var{n} times
+in a row. For example, @samp{x\@{4\@}} matches the string @samp{xxxx}
+and nothing else.
@item \@{@var{n},@var{m}\@}
-is another postfix operator that specifies an interval of iteration:
-the preceding regular expression must match between @var{n} and
-@var{m} times. If @var{m} is omitted, then there is no upper bound
-and if @samp{,@var{m}} is omitted, then the regular expression must match
-exactly @var{n} times. @*
-@samp{\@{0,1\@}} is equivalent to @samp{?}. @*
-@samp{\@{0,\@}} is equivalent to @samp{*}. @*
-@samp{\@{1,\@}} is equivalent to @samp{+}. @*
-@samp{\@{@var{n}\@}} is equivalent to @samp{\@{@var{n},@var{n}\@}}.
+is a postfix operator that specifies repetition between @var{n} and
+@var{m} times---that is, the preceding regular expression must match
+at least @var{n} times, but no more than @var{m} times. If @var{m} is
+omitted, then there is no upper limit, but the preceding regular
+expression must match at least @var{n} times.@* @samp{\@{0,1\@}} is
+equivalent to @samp{?}. @* @samp{\@{0,\@}} is equivalent to
+@samp{*}. @* @samp{\@{1,\@}} is equivalent to @samp{+}.
@item [ @dots{} ]
is a @dfn{character set}, which begins with @samp{[} and is terminated
and @samp{-}.
To include @samp{^} in a set, put it anywhere but at the beginning of
-the set.
+the set. (At the beginning, it complements the set---see below.)
When you use a range in case-insensitive search, you should write both
ends of the range in upper case, or both in lower case, or both should
@item [^ @dots{} ]
@samp{[^} begins a @dfn{complemented character set}, which matches any
character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches
-all characters @emph{except} letters and digits.
+all characters @emph{except} ASCII letters and digits.
@samp{^} is not special in a character set unless it is the first
character. The character following the @samp{^} is treated as if it
This last application is not a consequence of the idea of a
parenthetical grouping; it is a separate feature that is assigned as a
second meaning to the same @samp{\( @dots{} \)} construct. In practice
-there is almost no conflict between the two meanings.
+there is usually no conflict between the two meanings; when there is
+a conflict, you can use a ``shy'' group.
@item \(?: @dots{} \)
-is another grouping construct (often called ``shy'') that serves the same
-first two purposes, but not the third:
-it cannot be referred to later on by number. This is only useful
-for mechanically constructed regular expressions where grouping
-constructs need to be introduced implicitly and hence risk changing the
-numbering of subsequent groups.
+@cindex shy group, in regexp
+specifies a ``shy'' group that does not record the matched substring;
+you can't refer back to it with @samp{\@var{d}}. This is useful
+in mechanically combining regular expressions, so that you
+can add groups for syntactic purposes without interfering with
+the numbering of the groups that were written by the user.
@item \@var{d}
matches the same text that matched the @var{d}th occurrence of a
@item \s@var{c}
matches any character whose syntax is @var{c}. Here @var{c} is a
-character that represents a syntax code: thus, @samp{w} for word
-constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
-etc. Represent a character of whitespace (which can be a newline) by
-either @samp{-} or a space character.
+character that designates a particular syntax class: thus, @samp{w}
+for word constituent, @samp{-} or @samp{ } for whitespace, @samp{.}
+for ordinary punctuation, etc. @xref{Syntax}.
@item \S@var{c}
matches any character whose syntax is not @var{c}.
+
+@cindex categories of characters
+@cindex characters which belong to a specific language
+@findex describe-categories
+@item \c@var{c}
+matches any character that belongs to the category @var{c}. For
+example, @samp{\cc} matches Chinese characters, @samp{\cg} matches
+Greek characters, etc. For the description of the known categories,
+type @kbd{M-x describe-categories @key{RET}}.
+
+@item \C@var{c}
+matches any character that does @emph{not} belong to category
+@var{c}.
@end table
The constructs that pertain to words and syntax are controlled by the
setting of the syntax table (@pxref{Syntax}).
- Here is a complicated regexp, used by Emacs to recognize the end of a
-sentence together with any whitespace that follows. It is given in Lisp
-syntax to enable you to distinguish the spaces from the tab characters. In
-Lisp syntax, the string constant begins and ends with a double-quote.
-@samp{\"} stands for a double-quote as part of the regexp, @samp{\\} for a
-backslash as part of the regexp, @samp{\t} for a tab and @samp{\n} for a
-newline.
+ Here is a complicated regexp, stored in @code{sentence-end} and used
+by Emacs to recognize the end of a sentence together with any
+whitespace that follows. We show it Lisp syntax to distinguish the
+spaces from the tab characters. In Lisp syntax, the string constant
+begins and ends with a double-quote. @samp{\"} stands for a
+double-quote as part of the regexp, @samp{\\} for a backslash as part
+of the regexp, @samp{\t} for a tab, and @samp{\n} for a newline.
@example
-"[.?!][]\"')]*\\($\\|\t\\| \\)[ \t\n]*"
+"[.?!][]\"')]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
@end example
@noindent
-This contains four parts in succession: a character set matching period,
-@samp{?}, or @samp{!}; a character set matching close-brackets, quotes,
-or parentheses, repeated any number of times; an alternative in
-backslash-parentheses that matches end-of-line, a tab, or two spaces;
-and a character set matching whitespace characters, repeated any number
-of times.
+This contains four parts in succession: a character set matching
+period, @samp{?}, or @samp{!}; a character set matching
+close-brackets, quotes, or parentheses, repeated zero or more times; a
+set of alternatives within backslash-parentheses that matches either
+end-of-line, a space at the end of a line, a tab, or two spaces; and a
+character set matching whitespace characters, repeated any number of
+times.
To enter the same regexp interactively, you would type @key{TAB} to
enter a tab, and @kbd{C-j} to enter a newline. You would also type
single backslashes as themselves, instead of doubling them for Lisp syntax.
+@ignore
+@c I commented this out because it is missing vital information
+@c and therefore useless. For instance, what do you do to *use* the
+@c regular expression when it is finished? What jobs is this good for?
+@c -- rms
+
+@findex re-builder
+@cindex authoring regular expressions
+ For convenient interactive development of regular expressions, you
+can use the @kbd{M-x re-builder} command. It provides a convenient
+interface for creating regular expressions, by giving immediate visual
+feedback. The buffer from which @code{re-builder} was invoked becomes
+the target for the regexp editor, which pops in a separate window. At
+all times, all the matches in the target buffer for the current
+regular expression are highlighted. Each parenthesized sub-expression
+of the regexp is shown in a distinct face, which makes it easier to
+verify even very complex regexps. (On displays that don't support
+colors, Emacs blinks the cursor around the matched text, as it does
+for matching parens.)
+@end ignore
+
@node Search Case, Replace, Regexps, Search
@section Searching and Case
-@vindex case-fold-search
Incremental searches in Emacs normally ignore the case of the text
they are searching through, if you specify the text in lower case.
Thus, if you specify searching for @samp{foo}, then @samp{Foo} and
well as to string search. The effect ceases if you delete the
upper-case letter from the search string.
+ Typing @kbd{M-c} within an incremental search toggles the case
+sensitivity of that search. The effect does not extend beyond the
+current incremental search to the next one, but it does override the
+effect of including an upper-case letter in the current search.
+
+@vindex case-fold-search
If you set the variable @code{case-fold-search} to @code{nil}, then
all letters must match exactly, including case. This is a per-buffer
variable; altering the variable affects only the current buffer, but
@samp{bar}, not all of them, then you cannot use an ordinary
@code{replace-string}. Instead, use @kbd{M-%} (@code{query-replace}).
This command finds occurrences of @samp{foo} one by one, displays each
-occurrence and asks you whether to replace it. A numeric argument to
-@code{query-replace} tells it to consider only occurrences that are
-bounded by word-delimiter characters. This preserves case, just like
-@code{replace-string}, provided @code{case-replace} is non-@code{nil},
-as it normally is.
+occurrence and asks you whether to replace it. Aside from querying,
+@code{query-replace} works just like @code{replace-string}. It
+preserves case, like @code{replace-string}, provided
+@code{case-replace} is non-@code{nil}, as it normally is. A numeric
+argument means consider only occurrences that are bounded by
+word-delimiter characters.
@kindex C-M-%
@findex query-replace-regexp
- Aside from querying, @code{query-replace} works just like
-@code{replace-string}, and @code{query-replace-regexp} works just like
-@code{replace-regexp}. This command is run by @kbd{C-M-%}.
+ @kbd{C-M-%} performs regexp search and replace (@code{query-replace-regexp}).
- The things you can type when you are shown an occurrence of @var{string}
-or a match for @var{regexp} are:
+ The characters you can type when you are shown a match for the string
+or regexp are:
@ignore @c Not worth it.
@kindex SPC @r{(query-replace)}
occurrence of @var{string}. When done, exit the recursive editing level
with @kbd{C-M-c} to proceed to the next occurrence.
+@item e
+to edit the replacement string in the minibuffer. When you exit the
+minibuffer by typing @key{RET}, the minibuffer contents replace the
+current occurrence of the pattern. They also become the new
+replacement string for any further occurrences.
+
@item C-l
to redisplay the screen. Then you must type another character to
specify what to do with this occurrence.
@section Other Search-and-Loop Commands
Here are some other commands that find matches for a regular
-expression. They all operate from point to the end of the buffer, and
-all ignore case in matching, if the pattern contains no upper-case
-letters and @code{case-fold-search} is non-@code{nil}.
+expression. They all ignore case in matching, if the pattern contains
+no upper-case letters and @code{case-fold-search} is non-@code{nil}.
+Aside from @code{occur}, all operate on the text from point to the end
+of the buffer, or on the active region in Transient Mark mode.
@findex list-matching-lines
@findex occur
-@findex count-matches
+@findex how-many
@findex delete-non-matching-lines
@findex delete-matching-lines
@findex flush-lines
@table @kbd
@item M-x occur @key{RET} @var{regexp} @key{RET}
-Display a list showing each line in the buffer that contains a match for
-@var{regexp}. A numeric argument specifies the number of context lines
-to print before and after each matching line; the default is none.
-To limit the search to part of the buffer, narrow to that part
-(@pxref{Narrowing}).
+Display a list showing each line in the buffer that contains a match
+for @var{regexp}. To limit the search to part of the buffer, narrow
+to that part (@pxref{Narrowing}). A numeric argument @var{n}
+specifies that @var{n} lines of context are to be displayed before and
+after each matching line.
@kindex RET @r{(Occur mode)}
The buffer @samp{*Occur*} containing the output serves as a menu for
@item M-x list-matching-lines
Synonym for @kbd{M-x occur}.
-@item M-x count-matches @key{RET} @var{regexp} @key{RET}
-Print the number of matches for @var{regexp} after point.
+@item M-x how-many @key{RET} @var{regexp} @key{RET}
+Print the number of matches for @var{regexp} that exist in the buffer
+after point. In Transient Mark mode, if the region is active, the
+command operates on the region instead.
@item M-x flush-lines @key{RET} @var{regexp} @key{RET}
-Delete each line that follows point and contains a match for
-@var{regexp}.
+Delete each line that contains a match for @var{regexp}, operating on
+the text after point. In Transient Mark mode, if the region is
+active, the command operates on the region instead.
@item M-x keep-lines @key{RET} @var{regexp} @key{RET}
-Delete each line that follows point and @emph{does not} contain a match
-for @var{regexp}.
+Delete each line that @emph{does not} contain a match for
+@var{regexp}, operating on the text after point. In Transient Mark
+mode, if the region is active, the command operates on the region
+instead.
@end table
- In addition, you can use @code{grep} from Emacs to search a collection
-of files for matches for a regular expression, then visit the matches
-either sequentially or in arbitrary order. @xref{Grep Searching}.
+ You can also search multiple files under control of a tags table
+(@pxref{Tags Search}) or through Dired @kbd{A} command
+(@pxref{Operating on Files}), or ask the @code{grep} program to do it
+(@pxref{Grep Searching}).