X-Git-Url: https://code.delx.au/gnu-emacs/blobdiff_plain/5aa5a37c77f5fffe064d41194761344642257c83..719d83adfe41463938cbd125323fee575f9d6c05:/man/search.texi diff --git a/man/search.texi b/man/search.texi index e53d501d00..5ce8617f2a 100644 --- a/man/search.texi +++ b/man/search.texi @@ -1,5 +1,5 @@ @c This is part of the Emacs manual. -@c Copyright (C) 1985, 86, 87, 93, 94, 95, 97, 2000 +@c Copyright (C) 1985, 86, 87, 93, 94, 95, 97, 2000, 2001 @c Free Software Foundation, Inc. @c See file emacs.texi for copying conditions. @node Search, Fixit, Display, Top @@ -70,19 +70,20 @@ you want to erase. If you do not want to wait for this to happen, use When you are satisfied with the place you have reached, you can type @key{RET}, which stops searching, leaving the cursor where the search brought it. Also, any command not specially meaningful in searches -stops the searching and is then executed. Thus, typing @kbd{C-a} would -exit the search and then move to the beginning of the line. @key{RET} -is necessary only if the next command you want to type is a printing -character, @key{DEL}, @key{RET}, or another control character that is +stops the searching and is then executed. Thus, typing @kbd{C-a} +would exit the search and then move to the beginning of the line. +@key{RET} is necessary only if the next command you want to type is a +printing character, @key{DEL}, @key{RET}, or another character that is special within searches (@kbd{C-q}, @kbd{C-w}, @kbd{C-r}, @kbd{C-s}, -@kbd{C-y}, @kbd{M-y}, @kbd{M-r}, or @kbd{M-s}). +@kbd{C-y}, @kbd{M-y}, @kbd{M-r}, @kbd{M-s}, and some other +meta-characters). Sometimes you search for @samp{FOO} and find it, but not the one you -expected to find. There was a second @samp{FOO} that you forgot about, -before the one you were aiming for. In this event, type another @kbd{C-s} -to move to the next occurrence of the search string. This can be done any -number of times. If you overshoot, you can cancel some @kbd{C-s} -characters with @key{DEL}. +expected to find. There was a second @samp{FOO} that you forgot +about, before the one you were aiming for. In this event, type +another @kbd{C-s} to move to the next occurrence of the search string. +You can repeat this any number of times. If you overshoot, you can +cancel some @kbd{C-s} characters with @key{DEL}. After you exit a search, you can search for the same string again by typing just @kbd{C-s C-s}: the first @kbd{C-s} is the key that invokes @@ -111,13 +112,45 @@ entirely, returning point to where it was when the search started. case-sensitive. If you delete the upper-case character from the search string, it ceases to have this effect. @xref{Search Case}. + To search for a newline, type @kbd{C-j}. To search for another +control character, such as control-S or carriage return, you must quote +it by typing @kbd{C-q} first. This function of @kbd{C-q} is analogous +to its use for insertion (@pxref{Inserting Text}): it causes the +following character to be treated the way any ``ordinary'' character is +treated in the same context. You can also specify a character by its +octal code: enter @kbd{C-q} followed by a sequence of octal digits. + +@cindex searching for non-ASCII characters +@cindex input method, during incremental search + To search for non-ASCII characters, you must use an input method +(@pxref{Input Methods}). If an input method is turned on in the +current buffer when you start the search, you can use it while you +type the search string also. Emacs indicates that by including the +input method mnemonic in its prompt, like this: + +@example +I-search [@var{im}]: +@end example + +@noindent +@findex isearch-toggle-input-method +@findex isearch-toggle-specified-input-method +where @var{im} is the mnemonic of the active input method. You can +toggle (enable or disable) the input method while you type the search +string with @kbd{C-\} (@code{isearch-toggle-input-method}). You can +turn on a certain (non-default) input method with @kbd{C-^} +(@code{isearch-toggle-specified-input-method}), which prompts for the +name of the input method. Note that the input method you turn on +during incremental search is turned on in the current buffer as well. + If a search is failing and you ask to repeat it by typing another -@kbd{C-s}, it starts again from the beginning of the buffer. Repeating -a failing reverse search with @kbd{C-r} starts again from the end. This -is called @dfn{wrapping around}. @samp{Wrapped} appears in the search -prompt once this has happened. If you keep on going past the original -starting point of the search, it changes to @samp{Overwrapped}, which -means that you are revisiting matches that you have already seen. +@kbd{C-s}, it starts again from the beginning of the buffer. +Repeating a failing reverse search with @kbd{C-r} starts again from +the end. This is called @dfn{wrapping around}, and @samp{Wrapped} +appears in the search prompt once this has happened. If you keep on +going past the original starting point of the search, it changes to +@samp{Overwrapped}, which means that you are revisiting matches that +you have already seen. @cindex quitting (in search) The @kbd{C-g} ``quit'' character does special things during searches; @@ -131,14 +164,6 @@ been found are discarded from the search string. With them gone, the search is now successful and waiting for more input, so a second @kbd{C-g} will cancel the entire search. - To search for a newline, type @kbd{C-j}. To search for another -control character, such as control-S or carriage return, you must quote -it by typing @kbd{C-q} first. This function of @kbd{C-q} is analogous -to its use for insertion (@pxref{Inserting Text}): it causes the -following character to be treated the way any ``ordinary'' character is -treated in the same context. You can also specify a character by its -octal code: enter @kbd{C-q} followed by a sequence of octal digits. - You can change to searching backwards with @kbd{C-r}. If a search fails because the place you started was too late in the file, you should do this. Repeated @kbd{C-r} keeps looking for more occurrences backwards. A @@ -166,7 +191,7 @@ search remains case-insensitive. The character @kbd{M-y} copies text from the kill ring into the search string. It uses the same text that @kbd{C-y} as a command would yank. -@kbd{mouse-2} in the echo area does the same. +@kbd{Mouse-2} in the echo area does the same. @xref{Yanking}. When you exit the incremental search, it sets the mark to where point @@ -175,21 +200,20 @@ there. In Transient Mark mode, incremental search sets the mark without activating it, and does so only if the mark is not already active. @cindex lazy search highlighting - By default, Isearch uses @dfn{lazy highlighting}. All matches for -the current search string in the buffer after the point where searching -starts are highlighted. The extra highlighting makes it easier to -anticipate where the cursor will end up each time you press @kbd{C-s} or -@kbd{C-r} to repeat a pending search. Highlighting of these additional -matches happens in a deferred fashion so as not to rob Isearch of its -usual snappy response. -@vindex isearch-lazy-highlight-cleanup -By default the highlighting of matches is cleared when you end the -search. Customize the variable @code{isearch-lazy-highlight-cleanup} to -avoid cleaning up automatically. The command @kbd{M-x -isearch-lazy-highlight-cleanup} can be used to clean up manually. @vindex isearch-lazy-highlight -Customize the variable @code{isearch-lazy-highlight} to turn off this -feature. + When you pause for a little while during incremental search, it +highlights all other possible matches for the search string. This +makes it easier to anticipate where you can get to by typing @kbd{C-s} +or @kbd{C-r} to repeat the search. The short delay before highlighting +other matches helps indicate which match is the current one. +If you don't like this feature, you can turn it off by setting +@code{isearch-lazy-highlight} to @code{nil}. + +@vindex isearch-lazy-highlight-face +@cindex faces for highlighting search matches + You can control how does the highlighting of matches look like by +customizing the faces @code{isearch} (used for the current match) and +@code{isearch-lazy-highlight-face} (used for the other matches). @vindex isearch-mode-map To customize the special characters that incremental search understands, @@ -210,15 +234,10 @@ on the screen. Then Emacs redisplays the window in which the search was done, to show its new position of point. -@ignore - The three dots at the end of the search string, normally used to indicate -that searching is going on, are not displayed in slow style display. -@end ignore - @vindex search-slow-speed The slow terminal style of display is used when the terminal baud rate is less than or equal to the value of the variable @code{search-slow-speed}, -initially 1200. +initially 1200. See @code{baud-rate} in @ref{Display Custom}. @vindex search-slow-window-lines The number of lines to use in slow terminal search display is controlled @@ -359,7 +378,7 @@ Search}. @node Regexps, Search Case, Regexp Search, Search @section Syntax of Regular Expressions -@cindex regexp syntax +@cindex syntax of regexps Regular expressions have a syntax in which a few characters are special constructs and the rest are @dfn{ordinary}. An ordinary @@ -367,7 +386,9 @@ character is a simple regular expression which matches that same character and nothing else. The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*}, @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}. Any other character appearing in a regular expression is -ordinary, unless a @samp{\} precedes it. +ordinary, unless a @samp{\} precedes it. (When you use regular +expressions in a Lisp program, each @samp{\} must be doubled, see the +example near the end of this section.) For example, @samp{f} is not a special character, so it is ordinary, and therefore @samp{f} is a regular expression that matches the string @@ -429,20 +450,31 @@ preceding expression either once or not at all. For example, @item *?, +?, ?? @cindex non-greedy regexp matching are non-greedy variants of the operators above. The normal operators -@samp{*}, @samp{+}, @samp{?} are @dfn{greedy} in that they match as much -as they can, while if you append a @samp{?} after them, it makes them -non-greedy: they will match as little as possible. +@samp{*}, @samp{+}, @samp{?} are @dfn{greedy} in that they match as +much as they can, as long as the overall regexp can still match. With +a following @samp{?}, they are non-greedy: they will match as little +as possible. + +Thus, both @samp{ab*} and @samp{ab*?} can match the string @samp{a} +and the string @samp{abbbb}; but if you try to match them both against +the text @samp{abbb}, @samp{ab*} will match it all (the longest valid +match), while @samp{ab*?} will match just @samp{a} (the shortest +valid match). + +@item \@{@var{n}\@} +is a postfix operator that specifies repetition @var{n} times---that +is, the preceding regular expression must match exactly @var{n} times +in a row. For example, @samp{x\@{4\@}} matches the string @samp{xxxx} +and nothing else. @item \@{@var{n},@var{m}\@} -is another postfix operator that specifies an interval of iteration: -the preceding regular expression must match between @var{n} and -@var{m} times. If @var{m} is omitted, then there is no upper bound -and if @samp{,@var{m}} is omitted, then the regular expression must match -exactly @var{n} times. @* -@samp{\@{0,1\@}} is equivalent to @samp{?}. @* -@samp{\@{0,\@}} is equivalent to @samp{*}. @* -@samp{\@{1,\@}} is equivalent to @samp{+}. @* -@samp{\@{@var{n}\@}} is equivalent to @samp{\@{@var{n},@var{n}\@}}. +is a postfix operator that specifies repetition between @var{n} and +@var{m} times---that is, the preceding regular expression must match +at least @var{n} times, but no more than @var{m} times. If @var{m} is +omitted, then there is no upper limit, but the preceding regular +expression must match at least @var{n} times.@* @samp{\@{0,1\@}} is +equivalent to @samp{?}. @* @samp{\@{0,\@}} is equivalent to +@samp{*}. @* @samp{\@{1,\@}} is equivalent to @samp{+}. @item [ @dots{} ] is a @dfn{character set}, which begins with @samp{[} and is terminated @@ -472,7 +504,7 @@ set, or put it after a range. Thus, @samp{[]-]} matches both @samp{]} and @samp{-}. To include @samp{^} in a set, put it anywhere but at the beginning of -the set. +the set. (At the beginning, it complements the set---see below.) When you use a range in case-insensitive search, you should write both ends of the range in upper case, or both in lower case, or both should @@ -482,7 +514,7 @@ is somewhat ill-defined, and it may change in future Emacs versions. @item [^ @dots{} ] @samp{[^} begins a @dfn{complemented character set}, which matches any character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches -all characters @emph{except} letters and digits. +all characters @emph{except} ASCII letters and digits. @samp{^} is not special in a character set unless it is the first character. The character following the @samp{^} is treated as if it @@ -561,15 +593,16 @@ To record a matched substring for future reference. This last application is not a consequence of the idea of a parenthetical grouping; it is a separate feature that is assigned as a second meaning to the same @samp{\( @dots{} \)} construct. In practice -there is almost no conflict between the two meanings. +there is usually no conflict between the two meanings; when there is +a conflict, you can use a ``shy'' group. @item \(?: @dots{} \) -is another grouping construct (often called ``shy'') that serves the same -first two purposes, but not the third: -it cannot be referred to later on by number. This is only useful -for mechanically constructed regular expressions where grouping -constructs need to be introduced implicitly and hence risk changing the -numbering of subsequent groups. +@cindex shy group, in regexp +specifies a ``shy'' group that does not record the matched substring; +you can't refer back to it with @samp{\@var{d}}. This is useful +in mechanically combining regular expressions, so that you +can add groups for syntactic purposes without interfering with +the numbering of the groups that were written by the user. @item \@var{d} matches the same text that matched the @var{d}th occurrence of a @@ -639,46 +672,79 @@ matches any character that is not a word-constituent. @item \s@var{c} matches any character whose syntax is @var{c}. Here @var{c} is a -character that represents a syntax code: thus, @samp{w} for word -constituent, @samp{-} for whitespace, @samp{(} for open parenthesis, -etc. Represent a character of whitespace (which can be a newline) by -either @samp{-} or a space character. +character that designates a particular syntax class: thus, @samp{w} +for word constituent, @samp{-} or @samp{ } for whitespace, @samp{.} +for ordinary punctuation, etc. @xref{Syntax}. @item \S@var{c} matches any character whose syntax is not @var{c}. + +@cindex categories of characters +@cindex characters which belong to a specific language +@findex describe-categories +@item \c@var{c} +matches any character that belongs to the category @var{c}. For +example, @samp{\cc} matches Chinese characters, @samp{\cg} matches +Greek characters, etc. For the description of the known categories, +type @kbd{M-x describe-categories @key{RET}}. + +@item \C@var{c} +matches any character that does @emph{not} belong to category +@var{c}. @end table The constructs that pertain to words and syntax are controlled by the setting of the syntax table (@pxref{Syntax}). - Here is a complicated regexp, used by Emacs to recognize the end of a -sentence together with any whitespace that follows. It is given in Lisp -syntax to enable you to distinguish the spaces from the tab characters. In -Lisp syntax, the string constant begins and ends with a double-quote. -@samp{\"} stands for a double-quote as part of the regexp, @samp{\\} for a -backslash as part of the regexp, @samp{\t} for a tab and @samp{\n} for a -newline. + Here is a complicated regexp, stored in @code{sentence-end} and used +by Emacs to recognize the end of a sentence together with any +whitespace that follows. We show it Lisp syntax to distinguish the +spaces from the tab characters. In Lisp syntax, the string constant +begins and ends with a double-quote. @samp{\"} stands for a +double-quote as part of the regexp, @samp{\\} for a backslash as part +of the regexp, @samp{\t} for a tab, and @samp{\n} for a newline. @example -"[.?!][]\"')]*\\($\\|\t\\| \\)[ \t\n]*" +"[.?!][]\"')]*\\($\\| $\\|\t\\| \\)[ \t\n]*" @end example @noindent -This contains four parts in succession: a character set matching period, -@samp{?}, or @samp{!}; a character set matching close-brackets, quotes, -or parentheses, repeated any number of times; an alternative in -backslash-parentheses that matches end-of-line, a tab, or two spaces; -and a character set matching whitespace characters, repeated any number -of times. +This contains four parts in succession: a character set matching +period, @samp{?}, or @samp{!}; a character set matching +close-brackets, quotes, or parentheses, repeated zero or more times; a +set of alternatives within backslash-parentheses that matches either +end-of-line, a space at the end of a line, a tab, or two spaces; and a +character set matching whitespace characters, repeated any number of +times. To enter the same regexp interactively, you would type @key{TAB} to enter a tab, and @kbd{C-j} to enter a newline. You would also type single backslashes as themselves, instead of doubling them for Lisp syntax. +@ignore +@c I commented this out because it is missing vital information +@c and therefore useless. For instance, what do you do to *use* the +@c regular expression when it is finished? What jobs is this good for? +@c -- rms + +@findex re-builder +@cindex authoring regular expressions + For convenient interactive development of regular expressions, you +can use the @kbd{M-x re-builder} command. It provides a convenient +interface for creating regular expressions, by giving immediate visual +feedback. The buffer from which @code{re-builder} was invoked becomes +the target for the regexp editor, which pops in a separate window. At +all times, all the matches in the target buffer for the current +regular expression are highlighted. Each parenthesized sub-expression +of the regexp is shown in a distinct face, which makes it easier to +verify even very complex regexps. (On displays that don't support +colors, Emacs blinks the cursor around the matched text, as it does +for matching parens.) +@end ignore + @node Search Case, Replace, Regexps, Search @section Searching and Case -@vindex case-fold-search Incremental searches in Emacs normally ignore the case of the text they are searching through, if you specify the text in lower case. Thus, if you specify searching for @samp{foo}, then @samp{Foo} and @@ -692,6 +758,12 @@ the search case-sensitive. Thus, searching for @samp{Foo} does not find well as to string search. The effect ceases if you delete the upper-case letter from the search string. + Typing @kbd{M-c} within an incremental search toggles the case +sensitivity of that search. The effect does not extend beyond the +current incremental search to the next one, but it does override the +effect of including an upper-case letter in the current search. + +@vindex case-fold-search If you set the variable @code{case-fold-search} to @code{nil}, then all letters must match exactly, including case. This is a per-buffer variable; altering the variable affects only the current buffer, but @@ -840,20 +912,19 @@ Replace some matches for @var{regexp} with @var{newstring}. @samp{bar}, not all of them, then you cannot use an ordinary @code{replace-string}. Instead, use @kbd{M-%} (@code{query-replace}). This command finds occurrences of @samp{foo} one by one, displays each -occurrence and asks you whether to replace it. A numeric argument to -@code{query-replace} tells it to consider only occurrences that are -bounded by word-delimiter characters. This preserves case, just like -@code{replace-string}, provided @code{case-replace} is non-@code{nil}, -as it normally is. +occurrence and asks you whether to replace it. Aside from querying, +@code{query-replace} works just like @code{replace-string}. It +preserves case, like @code{replace-string}, provided +@code{case-replace} is non-@code{nil}, as it normally is. A numeric +argument means consider only occurrences that are bounded by +word-delimiter characters. @kindex C-M-% @findex query-replace-regexp - Aside from querying, @code{query-replace} works just like -@code{replace-string}, and @code{query-replace-regexp} works just like -@code{replace-regexp}. This command is run by @kbd{C-M-%}. + @kbd{C-M-%} performs regexp search and replace (@code{query-replace-regexp}). - The things you can type when you are shown an occurrence of @var{string} -or a match for @var{regexp} are: + The characters you can type when you are shown a match for the string +or regexp are: @ignore @c Not worth it. @kindex SPC @r{(query-replace)} @@ -916,6 +987,12 @@ to delete the occurrence, and then enter a recursive editing level as in occurrence of @var{string}. When done, exit the recursive editing level with @kbd{C-M-c} to proceed to the next occurrence. +@item e +to edit the replacement string in the minibuffer. When you exit the +minibuffer by typing @key{RET}, the minibuffer contents replace the +current occurrence of the pattern. They also become the new +replacement string for any further occurrences. + @item C-l to redisplay the screen. Then you must type another character to specify what to do with this occurrence. @@ -946,13 +1023,14 @@ copy, or link files by replacing regexp matches in file names. @section Other Search-and-Loop Commands Here are some other commands that find matches for a regular -expression. They all operate from point to the end of the buffer, and -all ignore case in matching, if the pattern contains no upper-case -letters and @code{case-fold-search} is non-@code{nil}. +expression. They all ignore case in matching, if the pattern contains +no upper-case letters and @code{case-fold-search} is non-@code{nil}. +Aside from @code{occur}, all operate on the text from point to the end +of the buffer, or on the active region in Transient Mark mode. @findex list-matching-lines @findex occur -@findex count-matches +@findex how-many @findex delete-non-matching-lines @findex delete-matching-lines @findex flush-lines @@ -960,11 +1038,11 @@ letters and @code{case-fold-search} is non-@code{nil}. @table @kbd @item M-x occur @key{RET} @var{regexp} @key{RET} -Display a list showing each line in the buffer that contains a match for -@var{regexp}. A numeric argument specifies the number of context lines -to print before and after each matching line; the default is none. -To limit the search to part of the buffer, narrow to that part -(@pxref{Narrowing}). +Display a list showing each line in the buffer that contains a match +for @var{regexp}. To limit the search to part of the buffer, narrow +to that part (@pxref{Narrowing}). A numeric argument @var{n} +specifies that @var{n} lines of context are to be displayed before and +after each matching line. @kindex RET @r{(Occur mode)} The buffer @samp{*Occur*} containing the output serves as a menu for @@ -976,18 +1054,24 @@ moves point to the original of the chosen occurrence. @item M-x list-matching-lines Synonym for @kbd{M-x occur}. -@item M-x count-matches @key{RET} @var{regexp} @key{RET} -Print the number of matches for @var{regexp} after point. +@item M-x how-many @key{RET} @var{regexp} @key{RET} +Print the number of matches for @var{regexp} that exist in the buffer +after point. In Transient Mark mode, if the region is active, the +command operates on the region instead. @item M-x flush-lines @key{RET} @var{regexp} @key{RET} -Delete each line that follows point and contains a match for -@var{regexp}. +Delete each line that contains a match for @var{regexp}, operating on +the text after point. In Transient Mark mode, if the region is +active, the command operates on the region instead. @item M-x keep-lines @key{RET} @var{regexp} @key{RET} -Delete each line that follows point and @emph{does not} contain a match -for @var{regexp}. +Delete each line that @emph{does not} contain a match for +@var{regexp}, operating on the text after point. In Transient Mark +mode, if the region is active, the command operates on the region +instead. @end table - In addition, you can use @code{grep} from Emacs to search a collection -of files for matches for a regular expression, then visit the matches -either sequentially or in arbitrary order. @xref{Grep Searching}. + You can also search multiple files under control of a tags table +(@pxref{Tags Search}) or through Dired @kbd{A} command +(@pxref{Operating on Files}), or ask the @code{grep} program to do it +(@pxref{Grep Searching}).