]> code.delx.au - gnu-emacs/blob - man/search.texi
Proofreading fixes friom Danny Colascione <qtmstr@optonline.net>.
[gnu-emacs] / man / search.texi
1 @c This is part of the Emacs manual.
2 @c Copyright (C) 1985, 86, 87, 93, 94, 95, 97, 2000, 2001
3 @c Free Software Foundation, Inc.
4 @c See file emacs.texi for copying conditions.
5 @node Search, Fixit, Display, Top
6 @chapter Searching and Replacement
7 @cindex searching
8 @cindex finding strings within text
9
10 Like other editors, Emacs has commands for searching for occurrences of
11 a string. The principal search command is unusual in that it is
12 @dfn{incremental}; it begins to search before you have finished typing the
13 search string. There are also nonincremental search commands more like
14 those of other editors.
15
16 Besides the usual @code{replace-string} command that finds all
17 occurrences of one string and replaces them with another, Emacs has a fancy
18 replacement command called @code{query-replace} which asks interactively
19 which occurrences to replace.
20
21 @menu
22 * Incremental Search:: Search happens as you type the string.
23 * Nonincremental Search:: Specify entire string and then search.
24 * Word Search:: Search for sequence of words.
25 * Regexp Search:: Search for match for a regexp.
26 * Regexps:: Syntax of regular expressions.
27 * Search Case:: To ignore case while searching, or not.
28 * Replace:: Search, and replace some or all matches.
29 * Other Repeating Search:: Operating on all matches for some regexp.
30 @end menu
31
32 @node Incremental Search, Nonincremental Search, Search, Search
33 @section Incremental Search
34
35 @cindex incremental search
36 An incremental search begins searching as soon as you type the first
37 character of the search string. As you type in the search string, Emacs
38 shows you where the string (as you have typed it so far) would be
39 found. When you have typed enough characters to identify the place you
40 want, you can stop. Depending on what you plan to do next, you may or
41 may not need to terminate the search explicitly with @key{RET}.
42
43 @c WideCommands
44 @table @kbd
45 @item C-s
46 Incremental search forward (@code{isearch-forward}).
47 @item C-r
48 Incremental search backward (@code{isearch-backward}).
49 @end table
50
51 @kindex C-s
52 @findex isearch-forward
53 @kbd{C-s} starts an incremental search. @kbd{C-s} reads characters from
54 the keyboard and positions the cursor at the first occurrence of the
55 characters that you have typed. If you type @kbd{C-s} and then @kbd{F},
56 the cursor moves right after the first @samp{F}. Type an @kbd{O}, and see
57 the cursor move to after the first @samp{FO}. After another @kbd{O}, the
58 cursor is after the first @samp{FOO} after the place where you started the
59 search. At each step, the buffer text that matches the search string is
60 highlighted, if the terminal can do that; at each step, the current search
61 string is updated in the echo area.
62
63 If you make a mistake in typing the search string, you can cancel
64 characters with @key{DEL}. Each @key{DEL} cancels the last character of
65 search string. This does not happen until Emacs is ready to read another
66 input character; first it must either find, or fail to find, the character
67 you want to erase. If you do not want to wait for this to happen, use
68 @kbd{C-g} as described below.
69
70 When you are satisfied with the place you have reached, you can type
71 @key{RET}, which stops searching, leaving the cursor where the search
72 brought it. Also, any command not specially meaningful in searches
73 stops the searching and is then executed. Thus, typing @kbd{C-a}
74 would exit the search and then move to the beginning of the line.
75 @key{RET} is necessary only if the next command you want to type is a
76 printing character, @key{DEL}, @key{RET}, or another character that is
77 special within searches (@kbd{C-q}, @kbd{C-w}, @kbd{C-r}, @kbd{C-s},
78 @kbd{C-y}, @kbd{M-y}, @kbd{M-r}, @kbd{M-s}, and some other
79 meta-characters).
80
81 Sometimes you search for @samp{FOO} and find it, but not the one you
82 expected to find. There was a second @samp{FOO} that you forgot
83 about, before the one you were aiming for. In this event, type
84 another @kbd{C-s} to move to the next occurrence of the search string.
85 You can repeat this any number of times. If you overshoot, you can
86 cancel some @kbd{C-s} characters with @key{DEL}.
87
88 After you exit a search, you can search for the same string again by
89 typing just @kbd{C-s C-s}: the first @kbd{C-s} is the key that invokes
90 incremental search, and the second @kbd{C-s} means ``search again.''
91
92 To reuse earlier search strings, use the @dfn{search ring}. The
93 commands @kbd{M-p} and @kbd{M-n} move through the ring to pick a search
94 string to reuse. These commands leave the selected search ring element
95 in the minibuffer, where you can edit it. Type @kbd{C-s} or @kbd{C-r}
96 to terminate editing the string and search for it.
97
98 If your string is not found at all, the echo area says @samp{Failing
99 I-Search}. The cursor is after the place where Emacs found as much of your
100 string as it could. Thus, if you search for @samp{FOOT}, and there is no
101 @samp{FOOT}, you might see the cursor after the @samp{FOO} in @samp{FOOL}.
102 At this point there are several things you can do. If your string was
103 mistyped, you can rub some of it out and correct it. If you like the place
104 you have found, you can type @key{RET} or some other Emacs command to
105 ``accept what the search offered.'' Or you can type @kbd{C-g}, which
106 removes from the search string the characters that could not be found (the
107 @samp{T} in @samp{FOOT}), leaving those that were found (the @samp{FOO} in
108 @samp{FOOT}). A second @kbd{C-g} at that point cancels the search
109 entirely, returning point to where it was when the search started.
110
111 An upper-case letter in the search string makes the search
112 case-sensitive. If you delete the upper-case character from the search
113 string, it ceases to have this effect. @xref{Search Case}.
114
115 To search for a newline, type @kbd{C-j}. To search for another
116 control character, such as control-S or carriage return, you must quote
117 it by typing @kbd{C-q} first. This function of @kbd{C-q} is analogous
118 to its use for insertion (@pxref{Inserting Text}): it causes the
119 following character to be treated the way any ``ordinary'' character is
120 treated in the same context. You can also specify a character by its
121 octal code: enter @kbd{C-q} followed by a sequence of octal digits.
122
123 @cindex searching for non-ASCII characters
124 @cindex input method, during incremental search
125 To search for non-ASCII characters, you must use an input method
126 (@pxref{Input Methods}). If an input method is turned on in the
127 current buffer when you start the search, you can use it while you
128 type the search string also. Emacs indicates that by including the
129 input method mnemonic in its prompt, like this:
130
131 @example
132 I-search [@var{im}]:
133 @end example
134
135 @noindent
136 @findex isearch-toggle-input-method
137 @findex isearch-toggle-specified-input-method
138 where @var{im} is the mnemonic of the active input method. You can
139 toggle (enable or disable) the input method while you type the search
140 string with @kbd{C-\} (@code{isearch-toggle-input-method}). You can
141 turn on a certain (non-default) input method with @kbd{C-^}
142 (@code{isearch-toggle-specified-input-method}), which prompts for the
143 name of the input method. Note that the input method you turn on
144 during incremental search is turned on in the current buffer as well.
145
146 If a search is failing and you ask to repeat it by typing another
147 @kbd{C-s}, it starts again from the beginning of the buffer.
148 Repeating a failing reverse search with @kbd{C-r} starts again from
149 the end. This is called @dfn{wrapping around}, and @samp{Wrapped}
150 appears in the search prompt once this has happened. If you keep on
151 going past the original starting point of the search, it changes to
152 @samp{Overwrapped}, which means that you are revisiting matches that
153 you have already seen.
154
155 @cindex quitting (in search)
156 The @kbd{C-g} ``quit'' character does special things during searches;
157 just what it does depends on the status of the search. If the search has
158 found what you specified and is waiting for input, @kbd{C-g} cancels the
159 entire search. The cursor moves back to where you started the search. If
160 @kbd{C-g} is typed when there are characters in the search string that have
161 not been found---because Emacs is still searching for them, or because it
162 has failed to find them---then the search string characters which have not
163 been found are discarded from the search string. With them gone, the
164 search is now successful and waiting for more input, so a second @kbd{C-g}
165 will cancel the entire search.
166
167 You can change to searching backwards with @kbd{C-r}. If a search fails
168 because the place you started was too late in the file, you should do this.
169 Repeated @kbd{C-r} keeps looking for more occurrences backwards. A
170 @kbd{C-s} starts going forwards again. @kbd{C-r} in a search can be canceled
171 with @key{DEL}.
172
173 @kindex C-r
174 @findex isearch-backward
175 If you know initially that you want to search backwards, you can use
176 @kbd{C-r} instead of @kbd{C-s} to start the search, because @kbd{C-r} as
177 a key runs a command (@code{isearch-backward}) to search backward. A
178 backward search finds matches that are entirely before the starting
179 point, just as a forward search finds matches that begin after it.
180
181 The characters @kbd{C-y} and @kbd{C-w} can be used in incremental
182 search to grab text from the buffer into the search string. This makes
183 it convenient to search for another occurrence of text at point.
184 @kbd{C-w} copies the word after point as part of the search string,
185 advancing point over that word. Another @kbd{C-s} to repeat the search
186 will then search for a string including that word. @kbd{C-y} is similar
187 to @kbd{C-w} but copies all the rest of the current line into the search
188 string. Both @kbd{C-y} and @kbd{C-w} convert the text they copy to
189 lower case if the search is currently not case-sensitive; this is so the
190 search remains case-insensitive.
191
192 The character @kbd{M-y} copies text from the kill ring into the search
193 string. It uses the same text that @kbd{C-y} as a command would yank.
194 @kbd{Mouse-2} in the echo area does the same.
195 @xref{Yanking}.
196
197 When you exit the incremental search, it sets the mark to where point
198 @emph{was}, before the search. That is convenient for moving back
199 there. In Transient Mark mode, incremental search sets the mark without
200 activating it, and does so only if the mark is not already active.
201
202 @cindex lazy search highlighting
203 @vindex isearch-lazy-highlight
204 When you pause for a little while during incremental search, it
205 highlights all other possible matches for the search string. This
206 makes it easier to anticipate where you can get to by typing @kbd{C-s}
207 or @kbd{C-r} to repeat the search. The short delay before highlighting
208 other matches helps indicate which match is the current one.
209 If you don't like this feature, you can turn it off by setting
210 @code{isearch-lazy-highlight} to @code{nil}.
211
212 @vindex isearch-lazy-highlight-face
213 @cindex faces for highlighting search matches
214 You can control how this highlighting looks by customizing the faces
215 @code{isearch} (used for the current match) and
216 @code{isearch-lazy-highlight-face} (for all the other matches).
217
218 @vindex isearch-mode-map
219 To customize the special characters that incremental search understands,
220 alter their bindings in the keymap @code{isearch-mode-map}. For a list
221 of bindings, look at the documentation of @code{isearch-mode} with
222 @kbd{C-h f isearch-mode @key{RET}}.
223
224 @subsection Slow Terminal Incremental Search
225
226 Incremental search on a slow terminal uses a modified style of display
227 that is designed to take less time. Instead of redisplaying the buffer at
228 each place the search gets to, it creates a new single-line window and uses
229 that to display the line that the search has found. The single-line window
230 comes into play as soon as point moves outside of the text that is already
231 on the screen.
232
233 When you terminate the search, the single-line window is removed.
234 Emacs then redisplays the window in which the search was done, to show
235 its new position of point.
236
237 @vindex search-slow-speed
238 The slow terminal style of display is used when the terminal baud rate is
239 less than or equal to the value of the variable @code{search-slow-speed},
240 initially 1200. See @code{baud-rate} in @ref{Display Custom}.
241
242 @vindex search-slow-window-lines
243 The number of lines to use in slow terminal search display is controlled
244 by the variable @code{search-slow-window-lines}. Its normal value is 1.
245
246 @node Nonincremental Search, Word Search, Incremental Search, Search
247 @section Nonincremental Search
248 @cindex nonincremental search
249
250 Emacs also has conventional nonincremental search commands, which require
251 you to type the entire search string before searching begins.
252
253 @table @kbd
254 @item C-s @key{RET} @var{string} @key{RET}
255 Search for @var{string}.
256 @item C-r @key{RET} @var{string} @key{RET}
257 Search backward for @var{string}.
258 @end table
259
260 To do a nonincremental search, first type @kbd{C-s @key{RET}}. This
261 enters the minibuffer to read the search string; terminate the string
262 with @key{RET}, and then the search takes place. If the string is not
263 found, the search command gets an error.
264
265 The way @kbd{C-s @key{RET}} works is that the @kbd{C-s} invokes
266 incremental search, which is specially programmed to invoke nonincremental
267 search if the argument you give it is empty. (Such an empty argument would
268 otherwise be useless.) @kbd{C-r @key{RET}} also works this way.
269
270 However, nonincremental searches performed using @kbd{C-s @key{RET}} do
271 not call @code{search-forward} right away. The first thing done is to see
272 if the next character is @kbd{C-w}, which requests a word search.
273 @ifinfo
274 @xref{Word Search}.
275 @end ifinfo
276
277 @findex search-forward
278 @findex search-backward
279 Forward and backward nonincremental searches are implemented by the
280 commands @code{search-forward} and @code{search-backward}. These
281 commands may be bound to keys in the usual manner. The feature that you
282 can get to them via the incremental search commands exists for
283 historical reasons, and to avoid the need to find suitable key sequences
284 for them.
285
286 @node Word Search, Regexp Search, Nonincremental Search, Search
287 @section Word Search
288 @cindex word search
289
290 Word search searches for a sequence of words without regard to how the
291 words are separated. More precisely, you type a string of many words,
292 using single spaces to separate them, and the string can be found even
293 if there are multiple spaces, newlines, or other punctuation characters
294 between these words.
295
296 Word search is useful for editing a printed document made with a text
297 formatter. If you edit while looking at the printed, formatted version,
298 you can't tell where the line breaks are in the source file. With word
299 search, you can search without having to know them.
300
301 @table @kbd
302 @item C-s @key{RET} C-w @var{words} @key{RET}
303 Search for @var{words}, ignoring details of punctuation.
304 @item C-r @key{RET} C-w @var{words} @key{RET}
305 Search backward for @var{words}, ignoring details of punctuation.
306 @end table
307
308 Word search is a special case of nonincremental search and is invoked
309 with @kbd{C-s @key{RET} C-w}. This is followed by the search string,
310 which must always be terminated with @key{RET}. Being nonincremental,
311 this search does not start until the argument is terminated. It works
312 by constructing a regular expression and searching for that; see
313 @ref{Regexp Search}.
314
315 Use @kbd{C-r @key{RET} C-w} to do backward word search.
316
317 @findex word-search-forward
318 @findex word-search-backward
319 Forward and backward word searches are implemented by the commands
320 @code{word-search-forward} and @code{word-search-backward}. These
321 commands may be bound to keys in the usual manner. The feature that you
322 can get to them via the incremental search commands exists for historical
323 reasons, and to avoid the need to find suitable key sequences for them.
324
325 @node Regexp Search, Regexps, Word Search, Search
326 @section Regular Expression Search
327 @cindex regular expression
328 @cindex regexp
329
330 A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
331 denotes a class of alternative strings to match, possibly infinitely
332 many. In GNU Emacs, you can search for the next match for a regexp
333 either incrementally or not.
334
335 @kindex C-M-s
336 @findex isearch-forward-regexp
337 @kindex C-M-r
338 @findex isearch-backward-regexp
339 Incremental search for a regexp is done by typing @kbd{C-M-s}
340 (@code{isearch-forward-regexp}). This command reads a search string
341 incrementally just like @kbd{C-s}, but it treats the search string as a
342 regexp rather than looking for an exact match against the text in the
343 buffer. Each time you add text to the search string, you make the
344 regexp longer, and the new regexp is searched for. Invoking @kbd{C-s}
345 with a prefix argument (its value does not matter) is another way to do
346 a forward incremental regexp search. To search backward for a regexp,
347 use @kbd{C-M-r} (@code{isearch-backward-regexp}), or @kbd{C-r} with a
348 prefix argument.
349
350 All of the control characters that do special things within an
351 ordinary incremental search have the same function in incremental regexp
352 search. Typing @kbd{C-s} or @kbd{C-r} immediately after starting the
353 search retrieves the last incremental search regexp used; that is to
354 say, incremental regexp and non-regexp searches have independent
355 defaults. They also have separate search rings that you can access with
356 @kbd{M-p} and @kbd{M-n}.
357
358 If you type @key{SPC} in incremental regexp search, it matches any
359 sequence of whitespace characters, including newlines. If you want
360 to match just a space, type @kbd{C-q @key{SPC}}.
361
362 Note that adding characters to the regexp in an incremental regexp
363 search can make the cursor move back and start again. For example, if
364 you have searched for @samp{foo} and you add @samp{\|bar}, the cursor
365 backs up in case the first @samp{bar} precedes the first @samp{foo}.
366
367 @findex re-search-forward
368 @findex re-search-backward
369 Nonincremental search for a regexp is done by the functions
370 @code{re-search-forward} and @code{re-search-backward}. You can invoke
371 these with @kbd{M-x}, or bind them to keys, or invoke them by way of
372 incremental regexp search with @kbd{C-M-s @key{RET}} and @kbd{C-M-r
373 @key{RET}}.
374
375 If you use the incremental regexp search commands with a prefix
376 argument, they perform ordinary string search, like
377 @code{isearch-forward} and @code{isearch-backward}. @xref{Incremental
378 Search}.
379
380 @node Regexps, Search Case, Regexp Search, Search
381 @section Syntax of Regular Expressions
382 @cindex syntax of regexps
383
384 Regular expressions have a syntax in which a few characters are
385 special constructs and the rest are @dfn{ordinary}. An ordinary
386 character is a simple regular expression which matches that same
387 character and nothing else. The special characters are @samp{$},
388 @samp{^}, @samp{.}, @samp{*}, @samp{+}, @samp{?}, @samp{[}, @samp{]} and
389 @samp{\}. Any other character appearing in a regular expression is
390 ordinary, unless a @samp{\} precedes it. (When you use regular
391 expressions in a Lisp program, each @samp{\} must be doubled, see the
392 example near the end of this section.)
393
394 For example, @samp{f} is not a special character, so it is ordinary, and
395 therefore @samp{f} is a regular expression that matches the string
396 @samp{f} and no other string. (It does @emph{not} match the string
397 @samp{ff}.) Likewise, @samp{o} is a regular expression that matches
398 only @samp{o}. (When case distinctions are being ignored, these regexps
399 also match @samp{F} and @samp{O}, but we consider this a generalization
400 of ``the same string,'' rather than an exception.)
401
402 Any two regular expressions @var{a} and @var{b} can be concatenated. The
403 result is a regular expression which matches a string if @var{a} matches
404 some amount of the beginning of that string and @var{b} matches the rest of
405 the string.@refill
406
407 As a simple example, we can concatenate the regular expressions @samp{f}
408 and @samp{o} to get the regular expression @samp{fo}, which matches only
409 the string @samp{fo}. Still trivial. To do something nontrivial, you
410 need to use one of the special characters. Here is a list of them.
411
412 @table @kbd
413 @item .@: @r{(Period)}
414 is a special character that matches any single character except a newline.
415 Using concatenation, we can make regular expressions like @samp{a.b}, which
416 matches any three-character string that begins with @samp{a} and ends with
417 @samp{b}.@refill
418
419 @item *
420 is not a construct by itself; it is a postfix operator that means to
421 match the preceding regular expression repetitively as many times as
422 possible. Thus, @samp{o*} matches any number of @samp{o}s (including no
423 @samp{o}s).
424
425 @samp{*} always applies to the @emph{smallest} possible preceding
426 expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating
427 @samp{fo}. It matches @samp{f}, @samp{fo}, @samp{foo}, and so on.
428
429 The matcher processes a @samp{*} construct by matching, immediately,
430 as many repetitions as can be found. Then it continues with the rest
431 of the pattern. If that fails, backtracking occurs, discarding some
432 of the matches of the @samp{*}-modified construct in case that makes
433 it possible to match the rest of the pattern. For example, in matching
434 @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first
435 tries to match all three @samp{a}s; but the rest of the pattern is
436 @samp{ar} and there is only @samp{r} left to match, so this try fails.
437 The next alternative is for @samp{a*} to match only two @samp{a}s.
438 With this choice, the rest of the regexp matches successfully.@refill
439
440 @item +
441 is a postfix operator, similar to @samp{*} except that it must match
442 the preceding expression at least once. So, for example, @samp{ca+r}
443 matches the strings @samp{car} and @samp{caaaar} but not the string
444 @samp{cr}, whereas @samp{ca*r} matches all three strings.
445
446 @item ?
447 is a postfix operator, similar to @samp{*} except that it can match the
448 preceding expression either once or not at all. For example,
449 @samp{ca?r} matches @samp{car} or @samp{cr}; nothing else.
450
451 @item *?, +?, ??
452 @cindex non-greedy regexp matching
453 are non-greedy variants of the operators above. The normal operators
454 @samp{*}, @samp{+}, @samp{?} are @dfn{greedy} in that they match as
455 much as they can, as long as the overall regexp can still match. With
456 a following @samp{?}, they are non-greedy: they will match as little
457 as possible.
458
459 Thus, both @samp{ab*} and @samp{ab*?} can match the string @samp{a}
460 and the string @samp{abbbb}; but if you try to match them both against
461 the text @samp{abbb}, @samp{ab*} will match it all (the longest valid
462 match), while @samp{ab*?} will match just @samp{a} (the shortest
463 valid match).
464
465 @item \@{@var{n}\@}
466 is a postfix operator that specifies repetition @var{n} times---that
467 is, the preceding regular expression must match exactly @var{n} times
468 in a row. For example, @samp{x\@{4\@}} matches the string @samp{xxxx}
469 and nothing else.
470
471 @item \@{@var{n},@var{m}\@}
472 is a postfix operator that specifies repetition between @var{n} and
473 @var{m} times---that is, the preceding regular expression must match
474 at least @var{n} times, but no more than @var{m} times. If @var{m} is
475 omitted, then there is no upper limit, but the preceding regular
476 expression must match at least @var{n} times.@* @samp{\@{0,1\@}} is
477 equivalent to @samp{?}. @* @samp{\@{0,\@}} is equivalent to
478 @samp{*}. @* @samp{\@{1,\@}} is equivalent to @samp{+}.
479
480 @item [ @dots{} ]
481 is a @dfn{character set}, which begins with @samp{[} and is terminated
482 by @samp{]}. In the simplest case, the characters between the two
483 brackets are what this set can match.
484
485 Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
486 @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
487 (including the empty string), from which it follows that @samp{c[ad]*r}
488 matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
489
490 You can also include character ranges in a character set, by writing the
491 starting and ending characters with a @samp{-} between them. Thus,
492 @samp{[a-z]} matches any lower-case ASCII letter. Ranges may be
493 intermixed freely with individual characters, as in @samp{[a-z$%.]},
494 which matches any lower-case ASCII letter or @samp{$}, @samp{%} or
495 period.
496
497 Note that the usual regexp special characters are not special inside a
498 character set. A completely different set of special characters exists
499 inside character sets: @samp{]}, @samp{-} and @samp{^}.
500
501 To include a @samp{]} in a character set, you must make it the first
502 character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To
503 include a @samp{-}, write @samp{-} as the first or last character of the
504 set, or put it after a range. Thus, @samp{[]-]} matches both @samp{]}
505 and @samp{-}.
506
507 To include @samp{^} in a set, put it anywhere but at the beginning of
508 the set. (At the beginning, it complements the set---see below.)
509
510 When you use a range in case-insensitive search, you should write both
511 ends of the range in upper case, or both in lower case, or both should
512 be non-letters. The behavior of a mixed-case range such as @samp{A-z}
513 is somewhat ill-defined, and it may change in future Emacs versions.
514
515 @item [^ @dots{} ]
516 @samp{[^} begins a @dfn{complemented character set}, which matches any
517 character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches
518 all characters @emph{except} ASCII letters and digits.
519
520 @samp{^} is not special in a character set unless it is the first
521 character. The character following the @samp{^} is treated as if it
522 were first (in other words, @samp{-} and @samp{]} are not special there).
523
524 A complemented character set can match a newline, unless newline is
525 mentioned as one of the characters not to match. This is in contrast to
526 the handling of regexps in programs such as @code{grep}.
527
528 @item ^
529 is a special character that matches the empty string, but only at the
530 beginning of a line in the text being matched. Otherwise it fails to
531 match anything. Thus, @samp{^foo} matches a @samp{foo} that occurs at
532 the beginning of a line.
533
534 @item $
535 is similar to @samp{^} but matches only at the end of a line. Thus,
536 @samp{x+$} matches a string of one @samp{x} or more at the end of a line.
537
538 @item \
539 has two functions: it quotes the special characters (including
540 @samp{\}), and it introduces additional special constructs.
541
542 Because @samp{\} quotes special characters, @samp{\$} is a regular
543 expression that matches only @samp{$}, and @samp{\[} is a regular
544 expression that matches only @samp{[}, and so on.
545 @end table
546
547 Note: for historical compatibility, special characters are treated as
548 ordinary ones if they are in contexts where their special meanings make no
549 sense. For example, @samp{*foo} treats @samp{*} as ordinary since there is
550 no preceding expression on which the @samp{*} can act. It is poor practice
551 to depend on this behavior; it is better to quote the special character anyway,
552 regardless of where it appears.@refill
553
554 For the most part, @samp{\} followed by any character matches only that
555 character. However, there are several exceptions: two-character
556 sequences starting with @samp{\} that have special meanings. The second
557 character in the sequence is always an ordinary character when used on
558 its own. Here is a table of @samp{\} constructs.
559
560 @table @kbd
561 @item \|
562 specifies an alternative. Two regular expressions @var{a} and @var{b}
563 with @samp{\|} in between form an expression that matches some text if
564 either @var{a} matches it or @var{b} matches it. It works by trying to
565 match @var{a}, and if that fails, by trying to match @var{b}.
566
567 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
568 but no other string.@refill
569
570 @samp{\|} applies to the largest possible surrounding expressions. Only a
571 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
572 @samp{\|}.@refill
573
574 Full backtracking capability exists to handle multiple uses of @samp{\|}.
575
576 @item \( @dots{} \)
577 is a grouping construct that serves three purposes:
578
579 @enumerate
580 @item
581 To enclose a set of @samp{\|} alternatives for other operations.
582 Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
583
584 @item
585 To enclose a complicated expression for the postfix operators @samp{*},
586 @samp{+} and @samp{?} to operate on. Thus, @samp{ba\(na\)*} matches
587 @samp{bananana}, etc., with any (zero or more) number of @samp{na}
588 strings.@refill
589
590 @item
591 To record a matched substring for future reference.
592 @end enumerate
593
594 This last application is not a consequence of the idea of a
595 parenthetical grouping; it is a separate feature that is assigned as a
596 second meaning to the same @samp{\( @dots{} \)} construct. In practice
597 there is usually no conflict between the two meanings; when there is
598 a conflict, you can use a ``shy'' group.
599
600 @item \(?: @dots{} \)
601 @cindex shy group, in regexp
602 specifies a ``shy'' group that does not record the matched substring;
603 you can't refer back to it with @samp{\@var{d}}. This is useful
604 in mechanically combining regular expressions, so that you
605 can add groups for syntactic purposes without interfering with
606 the numbering of the groups that were written by the user.
607
608 @item \@var{d}
609 matches the same text that matched the @var{d}th occurrence of a
610 @samp{\( @dots{} \)} construct.
611
612 After the end of a @samp{\( @dots{} \)} construct, the matcher remembers
613 the beginning and end of the text matched by that construct. Then,
614 later on in the regular expression, you can use @samp{\} followed by the
615 digit @var{d} to mean ``match the same text matched the @var{d}th time
616 by the @samp{\( @dots{} \)} construct.''
617
618 The strings matching the first nine @samp{\( @dots{} \)} constructs
619 appearing in a regular expression are assigned numbers 1 through 9 in
620 the order that the open-parentheses appear in the regular expression.
621 So you can use @samp{\1} through @samp{\9} to refer to the text matched
622 by the corresponding @samp{\( @dots{} \)} constructs.
623
624 For example, @samp{\(.*\)\1} matches any newline-free string that is
625 composed of two identical halves. The @samp{\(.*\)} matches the first
626 half, which may be anything, but the @samp{\1} that follows must match
627 the same exact text.
628
629 If a particular @samp{\( @dots{} \)} construct matches more than once
630 (which can easily happen if it is followed by @samp{*}), only the last
631 match is recorded.
632
633 @item \`
634 matches the empty string, but only at the beginning
635 of the buffer or string being matched against.
636
637 @item \'
638 matches the empty string, but only at the end of
639 the buffer or string being matched against.
640
641 @item \=
642 matches the empty string, but only at point.
643
644 @item \b
645 matches the empty string, but only at the beginning or
646 end of a word. Thus, @samp{\bfoo\b} matches any occurrence of
647 @samp{foo} as a separate word. @samp{\bballs?\b} matches
648 @samp{ball} or @samp{balls} as a separate word.@refill
649
650 @samp{\b} matches at the beginning or end of the buffer
651 regardless of what text appears next to it.
652
653 @item \B
654 matches the empty string, but @emph{not} at the beginning or
655 end of a word.
656
657 @item \<
658 matches the empty string, but only at the beginning of a word.
659 @samp{\<} matches at the beginning of the buffer only if a
660 word-constituent character follows.
661
662 @item \>
663 matches the empty string, but only at the end of a word. @samp{\>}
664 matches at the end of the buffer only if the contents end with a
665 word-constituent character.
666
667 @item \w
668 matches any word-constituent character. The syntax table
669 determines which characters these are. @xref{Syntax}.
670
671 @item \W
672 matches any character that is not a word-constituent.
673
674 @item \s@var{c}
675 matches any character whose syntax is @var{c}. Here @var{c} is a
676 character that designates a particular syntax class: thus, @samp{w}
677 for word constituent, @samp{-} or @samp{ } for whitespace, @samp{.}
678 for ordinary punctuation, etc. @xref{Syntax}.
679
680 @item \S@var{c}
681 matches any character whose syntax is not @var{c}.
682
683 @cindex categories of characters
684 @cindex characters which belong to a specific language
685 @findex describe-categories
686 @item \c@var{c}
687 matches any character that belongs to the category @var{c}. For
688 example, @samp{\cc} matches Chinese characters, @samp{\cg} matches
689 Greek characters, etc. For the description of the known categories,
690 type @kbd{M-x describe-categories @key{RET}}.
691
692 @item \C@var{c}
693 matches any character that does @emph{not} belong to category
694 @var{c}.
695 @end table
696
697 The constructs that pertain to words and syntax are controlled by the
698 setting of the syntax table (@pxref{Syntax}).
699
700 Here is a complicated regexp, stored in @code{sentence-end} and used
701 by Emacs to recognize the end of a sentence together with any
702 whitespace that follows. We show its Lisp syntax to distinguish the
703 spaces from the tab characters. In Lisp syntax, the string constant
704 begins and ends with a double-quote. @samp{\"} stands for a
705 double-quote as part of the regexp, @samp{\\} for a backslash as part
706 of the regexp, @samp{\t} for a tab, and @samp{\n} for a newline.
707
708 @example
709 "[.?!][]\"')]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
710 @end example
711
712 @noindent
713 This contains four parts in succession: a character set matching
714 period, @samp{?}, or @samp{!}; a character set matching
715 close-brackets, quotes, or parentheses, repeated zero or more times; a
716 set of alternatives within backslash-parentheses that matches either
717 end-of-line, a space at the end of a line, a tab, or two spaces; and a
718 character set matching whitespace characters, repeated any number of
719 times.
720
721 To enter the same regexp interactively, you would type @key{TAB} to
722 enter a tab, and @kbd{C-j} to enter a newline. You would also type
723 single backslashes as themselves, instead of doubling them for Lisp syntax.
724
725 @ignore
726 @c I commented this out because it is missing vital information
727 @c and therefore useless. For instance, what do you do to *use* the
728 @c regular expression when it is finished? What jobs is this good for?
729 @c -- rms
730
731 @findex re-builder
732 @cindex authoring regular expressions
733 For convenient interactive development of regular expressions, you
734 can use the @kbd{M-x re-builder} command. It provides a convenient
735 interface for creating regular expressions, by giving immediate visual
736 feedback. The buffer from which @code{re-builder} was invoked becomes
737 the target for the regexp editor, which pops in a separate window. At
738 all times, all the matches in the target buffer for the current
739 regular expression are highlighted. Each parenthesized sub-expression
740 of the regexp is shown in a distinct face, which makes it easier to
741 verify even very complex regexps. (On displays that don't support
742 colors, Emacs blinks the cursor around the matched text, as it does
743 for matching parens.)
744 @end ignore
745
746 @node Search Case, Replace, Regexps, Search
747 @section Searching and Case
748
749 Incremental searches in Emacs normally ignore the case of the text
750 they are searching through, if you specify the text in lower case.
751 Thus, if you specify searching for @samp{foo}, then @samp{Foo} and
752 @samp{foo} are also considered a match. Regexps, and in particular
753 character sets, are included: @samp{[ab]} would match @samp{a} or
754 @samp{A} or @samp{b} or @samp{B}.@refill
755
756 An upper-case letter anywhere in the incremental search string makes
757 the search case-sensitive. Thus, searching for @samp{Foo} does not find
758 @samp{foo} or @samp{FOO}. This applies to regular expression search as
759 well as to string search. The effect ceases if you delete the
760 upper-case letter from the search string.
761
762 Typing @kbd{M-c} within an incremental search toggles the case
763 sensitivity of that search. The effect does not extend beyond the
764 current incremental search to the next one, but it does override the
765 effect of including an upper-case letter in the current search.
766
767 @vindex case-fold-search
768 If you set the variable @code{case-fold-search} to @code{nil}, then
769 all letters must match exactly, including case. This is a per-buffer
770 variable; altering the variable affects only the current buffer, but
771 there is a default value which you can change as well. @xref{Locals}.
772 This variable applies to nonincremental searches also, including those
773 performed by the replace commands (@pxref{Replace}) and the minibuffer
774 history matching commands (@pxref{Minibuffer History}).
775
776 @node Replace, Other Repeating Search, Search Case, Search
777 @section Replacement Commands
778 @cindex replacement
779 @cindex search-and-replace commands
780 @cindex string substitution
781 @cindex global substitution
782
783 Global search-and-replace operations are not needed as often in Emacs
784 as they are in other editors@footnote{In some editors,
785 search-and-replace operations are the only convenient way to make a
786 single change in the text.}, but they are available. In addition to the
787 simple @kbd{M-x replace-string} command which is like that found in most
788 editors, there is a @kbd{M-x query-replace} command which asks you, for
789 each occurrence of the pattern, whether to replace it.
790
791 The replace commands normally operate on the text from point to the
792 end of the buffer; however, in Transient Mark mode, when the mark is
793 active, they operate on the region. The replace commands all replace
794 one string (or regexp) with one replacement string. It is possible to
795 perform several replacements in parallel using the command
796 @code{expand-region-abbrevs} (@pxref{Expanding Abbrevs}).
797
798 @menu
799 * Unconditional Replace:: Replacing all matches for a string.
800 * Regexp Replace:: Replacing all matches for a regexp.
801 * Replacement and Case:: How replacements preserve case of letters.
802 * Query Replace:: How to use querying.
803 @end menu
804
805 @node Unconditional Replace, Regexp Replace, Replace, Replace
806 @subsection Unconditional Replacement
807 @findex replace-string
808 @findex replace-regexp
809
810 @table @kbd
811 @item M-x replace-string @key{RET} @var{string} @key{RET} @var{newstring} @key{RET}
812 Replace every occurrence of @var{string} with @var{newstring}.
813 @item M-x replace-regexp @key{RET} @var{regexp} @key{RET} @var{newstring} @key{RET}
814 Replace every match for @var{regexp} with @var{newstring}.
815 @end table
816
817 To replace every instance of @samp{foo} after point with @samp{bar},
818 use the command @kbd{M-x replace-string} with the two arguments
819 @samp{foo} and @samp{bar}. Replacement happens only in the text after
820 point, so if you want to cover the whole buffer you must go to the
821 beginning first. All occurrences up to the end of the buffer are
822 replaced; to limit replacement to part of the buffer, narrow to that
823 part of the buffer before doing the replacement (@pxref{Narrowing}).
824 In Transient Mark mode, when the region is active, replacement is
825 limited to the region (@pxref{Transient Mark}).
826
827 When @code{replace-string} exits, it leaves point at the last
828 occurrence replaced. It sets the mark to the prior position of point
829 (where the @code{replace-string} command was issued); use @kbd{C-u
830 C-@key{SPC}} to move back there.
831
832 A numeric argument restricts replacement to matches that are surrounded
833 by word boundaries. The argument's value doesn't matter.
834
835 @node Regexp Replace, Replacement and Case, Unconditional Replace, Replace
836 @subsection Regexp Replacement
837
838 The @kbd{M-x replace-string} command replaces exact matches for a
839 single string. The similar command @kbd{M-x replace-regexp} replaces
840 any match for a specified pattern.
841
842 In @code{replace-regexp}, the @var{newstring} need not be constant: it
843 can refer to all or part of what is matched by the @var{regexp}.
844 @samp{\&} in @var{newstring} stands for the entire match being replaced.
845 @samp{\@var{d}} in @var{newstring}, where @var{d} is a digit, stands for
846 whatever matched the @var{d}th parenthesized grouping in @var{regexp}.
847 To include a @samp{\} in the text to replace with, you must enter
848 @samp{\\}. For example,
849
850 @example
851 M-x replace-regexp @key{RET} c[ad]+r @key{RET} \&-safe @key{RET}
852 @end example
853
854 @noindent
855 replaces (for example) @samp{cadr} with @samp{cadr-safe} and @samp{cddr}
856 with @samp{cddr-safe}.
857
858 @example
859 M-x replace-regexp @key{RET} \(c[ad]+r\)-safe @key{RET} \1 @key{RET}
860 @end example
861
862 @noindent
863 performs the inverse transformation.
864
865 @node Replacement and Case, Query Replace, Regexp Replace, Replace
866 @subsection Replace Commands and Case
867
868 If the first argument of a replace command is all lower case, the
869 command ignores case while searching for occurrences to
870 replace---provided @code{case-fold-search} is non-@code{nil}. If
871 @code{case-fold-search} is set to @code{nil}, case is always significant
872 in all searches.
873
874 @vindex case-replace
875 In addition, when the @var{newstring} argument is all or partly lower
876 case, replacement commands try to preserve the case pattern of each
877 occurrence. Thus, the command
878
879 @example
880 M-x replace-string @key{RET} foo @key{RET} bar @key{RET}
881 @end example
882
883 @noindent
884 replaces a lower case @samp{foo} with a lower case @samp{bar}, an
885 all-caps @samp{FOO} with @samp{BAR}, and a capitalized @samp{Foo} with
886 @samp{Bar}. (These three alternatives---lower case, all caps, and
887 capitalized, are the only ones that @code{replace-string} can
888 distinguish.)
889
890 If upper-case letters are used in the replacement string, they remain
891 upper case every time that text is inserted. If upper-case letters are
892 used in the first argument, the second argument is always substituted
893 exactly as given, with no case conversion. Likewise, if either
894 @code{case-replace} or @code{case-fold-search} is set to @code{nil},
895 replacement is done without case conversion.
896
897 @node Query Replace,, Replacement and Case, Replace
898 @subsection Query Replace
899 @cindex query replace
900
901 @table @kbd
902 @item M-% @var{string} @key{RET} @var{newstring} @key{RET}
903 @itemx M-x query-replace @key{RET} @var{string} @key{RET} @var{newstring} @key{RET}
904 Replace some occurrences of @var{string} with @var{newstring}.
905 @item C-M-% @var{regexp} @key{RET} @var{newstring} @key{RET}
906 @itemx M-x query-replace-regexp @key{RET} @var{regexp} @key{RET} @var{newstring} @key{RET}
907 Replace some matches for @var{regexp} with @var{newstring}.
908 @end table
909
910 @kindex M-%
911 @findex query-replace
912 If you want to change only some of the occurrences of @samp{foo} to
913 @samp{bar}, not all of them, then you cannot use an ordinary
914 @code{replace-string}. Instead, use @kbd{M-%} (@code{query-replace}).
915 This command finds occurrences of @samp{foo} one by one, displays each
916 occurrence and asks you whether to replace it. Aside from querying,
917 @code{query-replace} works just like @code{replace-string}. It
918 preserves case, like @code{replace-string}, provided
919 @code{case-replace} is non-@code{nil}, as it normally is. A numeric
920 argument means consider only occurrences that are bounded by
921 word-delimiter characters.
922
923 @kindex C-M-%
924 @findex query-replace-regexp
925 @kbd{C-M-%} performs regexp search and replace (@code{query-replace-regexp}).
926
927 The characters you can type when you are shown a match for the string
928 or regexp are:
929
930 @ignore @c Not worth it.
931 @kindex SPC @r{(query-replace)}
932 @kindex DEL @r{(query-replace)}
933 @kindex , @r{(query-replace)}
934 @kindex RET @r{(query-replace)}
935 @kindex . @r{(query-replace)}
936 @kindex ! @r{(query-replace)}
937 @kindex ^ @r{(query-replace)}
938 @kindex C-r @r{(query-replace)}
939 @kindex C-w @r{(query-replace)}
940 @kindex C-l @r{(query-replace)}
941 @end ignore
942
943 @c WideCommands
944 @table @kbd
945 @item @key{SPC}
946 to replace the occurrence with @var{newstring}.
947
948 @item @key{DEL}
949 to skip to the next occurrence without replacing this one.
950
951 @item , @r{(Comma)}
952 to replace this occurrence and display the result. You are then asked
953 for another input character to say what to do next. Since the
954 replacement has already been made, @key{DEL} and @key{SPC} are
955 equivalent in this situation; both move to the next occurrence.
956
957 You can type @kbd{C-r} at this point (see below) to alter the replaced
958 text. You can also type @kbd{C-x u} to undo the replacement; this exits
959 the @code{query-replace}, so if you want to do further replacement you
960 must use @kbd{C-x @key{ESC} @key{ESC} @key{RET}} to restart
961 (@pxref{Repetition}).
962
963 @item @key{RET}
964 to exit without doing any more replacements.
965
966 @item .@: @r{(Period)}
967 to replace this occurrence and then exit without searching for more
968 occurrences.
969
970 @item !
971 to replace all remaining occurrences without asking again.
972
973 @item ^
974 to go back to the position of the previous occurrence (or what used to
975 be an occurrence), in case you changed it by mistake. This works by
976 popping the mark ring. Only one @kbd{^} in a row is meaningful, because
977 only one previous replacement position is kept during @code{query-replace}.
978
979 @item C-r
980 to enter a recursive editing level, in case the occurrence needs to be
981 edited rather than just replaced with @var{newstring}. When you are
982 done, exit the recursive editing level with @kbd{C-M-c} to proceed to
983 the next occurrence. @xref{Recursive Edit}.
984
985 @item C-w
986 to delete the occurrence, and then enter a recursive editing level as in
987 @kbd{C-r}. Use the recursive edit to insert text to replace the deleted
988 occurrence of @var{string}. When done, exit the recursive editing level
989 with @kbd{C-M-c} to proceed to the next occurrence.
990
991 @item e
992 to edit the replacement string in the minibuffer. When you exit the
993 minibuffer by typing @key{RET}, the minibuffer contents replace the
994 current occurrence of the pattern. They also become the new
995 replacement string for any further occurrences.
996
997 @item C-l
998 to redisplay the screen. Then you must type another character to
999 specify what to do with this occurrence.
1000
1001 @item C-h
1002 to display a message summarizing these options. Then you must type
1003 another character to specify what to do with this occurrence.
1004 @end table
1005
1006 Some other characters are aliases for the ones listed above: @kbd{y},
1007 @kbd{n} and @kbd{q} are equivalent to @key{SPC}, @key{DEL} and
1008 @key{RET}.
1009
1010 Aside from this, any other character exits the @code{query-replace},
1011 and is then reread as part of a key sequence. Thus, if you type
1012 @kbd{C-k}, it exits the @code{query-replace} and then kills to end of
1013 line.
1014
1015 To restart a @code{query-replace} once it is exited, use @kbd{C-x
1016 @key{ESC} @key{ESC}}, which repeats the @code{query-replace} because it
1017 used the minibuffer to read its arguments. @xref{Repetition, C-x ESC
1018 ESC}.
1019
1020 See also @ref{Transforming File Names}, for Dired commands to rename,
1021 copy, or link files by replacing regexp matches in file names.
1022
1023 @node Other Repeating Search,, Replace, Search
1024 @section Other Search-and-Loop Commands
1025
1026 Here are some other commands that find matches for a regular
1027 expression. They all ignore case in matching, if the pattern contains
1028 no upper-case letters and @code{case-fold-search} is non-@code{nil}.
1029 Aside from @code{occur}, all operate on the text from point to the end
1030 of the buffer, or on the active region in Transient Mark mode.
1031
1032 @findex list-matching-lines
1033 @findex occur
1034 @findex how-many
1035 @findex delete-non-matching-lines
1036 @findex delete-matching-lines
1037 @findex flush-lines
1038 @findex keep-lines
1039
1040 @table @kbd
1041 @item M-x occur @key{RET} @var{regexp} @key{RET}
1042 Display a list showing each line in the buffer that contains a match
1043 for @var{regexp}. To limit the search to part of the buffer, narrow
1044 to that part (@pxref{Narrowing}). A numeric argument @var{n}
1045 specifies that @var{n} lines of context are to be displayed before and
1046 after each matching line.
1047
1048 @kindex RET @r{(Occur mode)}
1049 The buffer @samp{*Occur*} containing the output serves as a menu for
1050 finding the occurrences in their original context. Click @kbd{Mouse-2}
1051 on an occurrence listed in @samp{*Occur*}, or position point there and
1052 type @key{RET}; this switches to the buffer that was searched and
1053 moves point to the original of the chosen occurrence.
1054
1055 @item M-x list-matching-lines
1056 Synonym for @kbd{M-x occur}.
1057
1058 @item M-x how-many @key{RET} @var{regexp} @key{RET}
1059 Print the number of matches for @var{regexp} that exist in the buffer
1060 after point. In Transient Mark mode, if the region is active, the
1061 command operates on the region instead.
1062
1063 @item M-x flush-lines @key{RET} @var{regexp} @key{RET}
1064 Delete each line that contains a match for @var{regexp}, operating on
1065 the text after point. In Transient Mark mode, if the region is
1066 active, the command operates on the region instead.
1067
1068 @item M-x keep-lines @key{RET} @var{regexp} @key{RET}
1069 Delete each line that @emph{does not} contain a match for
1070 @var{regexp}, operating on the text after point. In Transient Mark
1071 mode, if the region is active, the command operates on the region
1072 instead.
1073 @end table
1074
1075 You can also search multiple files under control of a tags table
1076 (@pxref{Tags Search}) or through Dired @kbd{A} command
1077 (@pxref{Operating on Files}), or ask the @code{grep} program to do it
1078 (@pxref{Grep Searching}).