@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2004, 2005
+@c Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../info/syntax
@node Syntax Tables, Abbrevs, Searching and Matching, Top
@end deffn
@deffn {Syntax class} @w{word constituent}
-@dfn{Word constituents} (designated by @samp{w}) are parts of normal
-English words and are typically used in variable and command names in
-programs. All upper- and lower-case letters, and the digits, are typically
-word constituents.
+@dfn{Word constituents} (designated by @samp{w}) are parts of words in
+human languages, and are typically used in variable and command names
+in programs. All upper- and lower-case letters, and the digits, are
+typically word constituents.
@end deffn
@deffn {Syntax class} @w{symbol constituent}
@dfn{Punctuation characters} (designated by @samp{.}) are those
characters that are used as punctuation in English, or are used in some
way in a programming language to separate symbols from one another.
-Most programming language modes, including Emacs Lisp mode, have no
+Some programming language modes, such as Emacs Lisp mode, have no
characters in this class since the few characters that are not symbol or
-word constituents all have other uses.
+word constituents all have other uses. Other programming language modes,
+such as C mode, use punctuation syntax for operators.
@end deffn
@deffn {Syntax class} @w{open parenthesis character}
@end deffn
@deffn {Syntax class} @w{generic comment delimiter}
-A @dfn{generic comment delimiter} character starts or ends a special
-kind of comment. @emph{Any} generic comment delimiter matches
-@emph{any} generic comment delimiter, but they cannot match a comment
-starter or comment ender; generic comment delimiters can only match each
-other.
+A @dfn{generic comment delimiter} (designated by @samp{!}) starts
+or ends a special kind of comment. @emph{Any} generic comment delimiter
+matches @emph{any} generic comment delimiter, but they cannot match
+a comment starter or comment ender; generic comment delimiters can only
+match each other.
This syntax class is primarily meant for use with the
@code{syntax-table} text property (@pxref{Syntax Properties}). You can
@end deffn
@deffn {Syntax class} @w{generic string delimiter}
-A @dfn{generic string delimiter} character starts or ends a string.
-This class differs from the string quote class in that @emph{any}
-generic string delimiter can match any other generic string delimiter;
-but they do not match ordinary string quote characters.
+A @dfn{generic string delimiter} (designated by @samp{|}) starts or ends
+a string. This class differs from the string quote class in that @emph{any}
+generic string delimiter can match any other generic string delimiter; but
+they do not match ordinary string quote characters.
This syntax class is primarily meant for use with the
@code{syntax-table} text property (@pxref{Syntax Properties}). You can
In this section we describe functions for creating, accessing and
altering syntax tables.
-@defun make-syntax-table
-This function creates a new syntax table. It inherits the syntax for
-letters and control characters from the standard syntax table. For
-other characters, the syntax is copied from the standard syntax table.
+@defun make-syntax-table &optional table
+This function creates a new syntax table, with all values initialized
+to @code{nil}. If @var{table} is non-@code{nil}, it becomes the
+parent of the new syntax table, otherwise the standard syntax table is
+the parent. Like all char-tables, a syntax table inherits from its
+parent. Thus the original syntax of all characters in the returned
+syntax table is determined by the parent. @xref{Char-Tables}.
Most major mode syntax tables are created in this way.
@end defun
@defun copy-syntax-table &optional table
This function constructs a copy of @var{table} and returns it. If
@var{table} is not supplied (or is @code{nil}), it returns a copy of the
-current syntax table. Otherwise, an error is signaled if @var{table} is
+standard syntax table. Otherwise, an error is signaled if @var{table} is
not a syntax table.
@end defun
the table for this character is discarded.
An error is signaled if the first character of the syntax descriptor is not
-one of the twelve syntax class designator characters. An error is also
+one of the seventeen syntax class designator characters. An error is also
signaled if @var{char} is not a character.
@example
@exdent @r{Examples:}
;; @r{Put the space character in class whitespace.}
-(modify-syntax-entry ?\ " ")
+(modify-syntax-entry ?\s " ")
@result{} nil
@end group
@example
@group
-(string (char-syntax ?\ ))
+(string (char-syntax ?\s))
@result{} " "
@end group
@section Syntax Properties
@kindex syntax-table @r{(text property)}
-When the syntax table is not flexible enough to specify the syntax of a
-language, you can use @code{syntax-table} text properties to override
-the syntax table for specific character occurrences in the buffer.
-@xref{Text Properties}.
+When the syntax table is not flexible enough to specify the syntax of
+a language, you can use @code{syntax-table} text properties to
+override the syntax table for specific character occurrences in the
+buffer. @xref{Text Properties}. You can use Font Lock mode to set
+@code{syntax-table} text properties. @xref{Setting Syntax
+Properties}.
The valid values of @code{syntax-table} text property are:
@item @code{(@var{syntax-code} . @var{matching-char})}
A cons cell of this format specifies the syntax for this
-occurrence of the character.
+occurrence of the character. (@pxref{Syntax Table Internals})
@item @code{nil}
If the property is @code{nil}, the character's syntax is determined from
have certain syntax classes.
@defun skip-syntax-forward syntaxes &optional limit
-This function moves point forward across characters having syntax classes
-mentioned in @var{syntaxes}. It stops when it encounters the end of
-the buffer, or position @var{limit} (if specified), or a character it is
-not supposed to skip.
+This function moves point forward across characters having syntax
+classes mentioned in @var{syntaxes} (a string of syntax code
+characters). It stops when it encounters the end of the buffer, or
+position @var{limit} (if specified), or a character it is not supposed
+to skip.
If @var{syntaxes} starts with @samp{^}, then the function skips
characters whose syntax is @emph{not} in @var{syntaxes}.
@section Parsing Balanced Expressions
Here are several functions for parsing and scanning balanced
-expressions, also known as @dfn{sexps}, in which parentheses match in
-pairs. The syntax table controls the interpretation of characters, so
-these functions can be used for Lisp expressions when in Lisp mode and
-for C expressions when in C mode. @xref{List Motion}, for convenient
+expressions, also known as @dfn{sexps}. Basically, a sexp is either a
+balanced parenthetical grouping, or a symbol name (a sequence of
+characters whose syntax is either word constituent or symbol
+constituent). However, characters whose syntax is expression prefix
+are treated as part of the sexp if they appear next to it.
+
+ The syntax table controls the interpretation of characters, so these
+functions can be used for Lisp expressions when in Lisp mode and for C
+expressions when in C mode. @xref{List Motion}, for convenient
higher-level functions for moving over balanced expressions.
+ A syntax table only describes how each character changes the state
+of the parser, rather than describing the state itself. For example,
+a string delimiter character toggles the parser state between
+``in-string'' and ``in-code'' but the characters inside the string do
+not have any particular syntax to identify them as such. For example
+(note that 15 is the syntax code for generic string delimiters),
+
+@example
+(put-text-property 1 9 'syntax-table '(15 . nil))
+@end example
+
+@noindent
+does not tell Emacs that the first eight chars of the current buffer
+are a string, but rather that they are all string delimiters. As a
+result, Emacs treats them as four consecutive empty string constants.
+
+ Every time you use the parser, you specify it a starting state as
+well as a starting position. If you omit the starting state, the
+default is ``top level in parenthesis structure,'' as it would be at
+the beginning of a function definition. (This is the case for
+@code{forward-sexp}, which blindly assumes that the starting point is
+in such a state.)
+
@defun parse-partial-sexp start limit &optional target-depth stop-before state stop-comment
This function parses a sexp in the current buffer starting at
@var{start}, not scanning past @var{limit}. It stops at position
string, or the end of a comment or a string, whichever comes first.
@cindex parse state
-The fifth argument @var{state} is a nine-element list of the same form
+The fifth argument @var{state} is a ten-element list of the same form
as the value of this function, described below. (It is OK to omit the
-last element of the nine.) The return value of one call may be used to
-initialize the state of the parse on another call to
+last two elements of this list.) The return value of one call may be
+used to initialize the state of the parse on another call to
@code{parse-partial-sexp}.
-The result is a list of nine elements describing the final state of
+The result is a list of ten elements describing the final state of
the parse:
@enumerate 0
-@item
+@item
The depth in parentheses, counting from 0.
-@item
+@item
@cindex innermost containing parentheses
The character position of the start of the innermost parenthetical
grouping containing the stopping point; @code{nil} if none.
-@item
+@item
@cindex previous complete subexpression
The character position of the start of the last complete subexpression
terminated; @code{nil} if none.
-@item
+@item
@cindex inside string
Non-@code{nil} if inside a string. More precisely, this is the
character that will terminate the string, or @code{t} if a generic
string delimiter character should terminate it.
-@item
+@item
@cindex inside comment
@code{t} if inside a comment (of either style),
or the comment nesting level if inside a kind of comment
that can be nested.
-@item
+@item
@cindex quote character
@code{t} if point is just after a quote character.
-@item
+@item
The minimum parenthesis depth encountered during this scan.
@item
-What kind of comment is active: @code{nil} for a comment of style ``a'',
-@code{t} for a comment of style ``b'', and @code{syntax-table} for
-a comment that should be ended by a generic comment delimiter character.
+What kind of comment is active: @code{nil} for a comment of style
+``a'' or when not inside a comment, @code{t} for a comment of style
+``b'', and @code{syntax-table} for a comment that should be ended by a
+generic comment delimiter character.
@item
The string or comment start position. While inside a comment, this is
the position where the comment began; while inside a string, this is the
position where the string began. When outside of strings and comments,
this element is @code{nil}.
+
+@item
+Internal data for continuing the parsing. The meaning of this
+data is subject to change; it is used if you pass this list
+as the @var{state} argument to another call.
+
@end enumerate
-Elements 0, 3, 4, 5 and 7 are significant in the argument @var{state}.
+Elements 0, 3, 4, 5, 7 and 9 are significant in the argument
+@var{state}.
@cindex indenting with parentheses
This function is most often used to compute indentation for languages
before count is used up, @code{nil} is returned.
@end defun
-@defvar parse-sexp-ignore-comments
+@defvar multibyte-syntax-as-symbol
+@tindex multibyte-syntax-as-symbol
+If this variable is non-@code{nil}, @code{scan-sexps} treats all
+non-@acronym{ASCII} characters as symbol constituents regardless
+of what the syntax table says about them. (However, text properties
+can still override the syntax.)
+@end defvar
+
+@defopt parse-sexp-ignore-comments
@cindex skipping comments
If the value is non-@code{nil}, then comments are treated as
whitespace by the functions in this section and by @code{forward-sexp}.
+@end defopt
-In older Emacs versions, this feature worked only when the comment
-terminator is something like @samp{*/}, and appears only to end a
-comment. In languages where newlines terminate comments, it was
-necessary make this variable @code{nil}, since not every newline is the
-end of a comment. This limitation no longer exists.
-@end defvar
+@vindex parse-sexp-lookup-properties
+The behaviour of @code{parse-partial-sexp} is also affected by
+@code{parse-sexp-lookup-properties} (@pxref{Syntax Properties}).
You can use @code{forward-comment} to move forward or backward over
one comment or several comments.
@defun forward-comment count
-This function moves point forward across @var{count} comments (backward,
-if @var{count} is negative). If it finds anything other than a comment
-or whitespace, it stops, leaving point at the place where it stopped.
-It also stops after satisfying @var{count}.
+This function moves point forward across @var{count} complete comments
+(that is, including the starting delimiter and the terminating
+delimiter if any), plus any whitespace encountered on the way. It
+moves backward if @var{count} is negative. If it encounters anything
+other than a comment or whitespace, it stops, leaving point at the
+place where it stopped. This includes (for instance) finding the end
+of a comment when moving forward and expecting the beginning of one.
+The function also stops immediately after moving over the specified
+number of complete comments. If @var{count} comments are found as
+expected, with nothing except whitespace between them, it returns
+@code{t}; otherwise it returns @code{nil}.
+
+This function cannot tell whether the ``comments'' it traverses are
+embedded within a string. If they look like comments, it treats them
+as comments.
@end defun
To move forward over all comments and whitespace following point, use
Lisp programs don't usually work with the elements directly; the
Lisp-level syntax table functions usually work with syntax descriptors
(@pxref{Syntax Descriptors}). Nonetheless, here we document the
-internal format.
+internal format. This format is used mostly when manipulating
+syntax properties.
Each element of a syntax table is a cons cell of the form
@code{(@var{syntax-code} . @var{matching-char})}. The @sc{car},
@tab
9 @ @ escape
@tab
-14 @ @ comment-fence
+14 @ @ generic comment
@item
@tab
-15 @ string-fence
+15 @ generic string
@end multitable
For example, the usual syntax value for @samp{(} is @code{(4 . 41)}.
@samp{3} @ @ @code{(lsh 1 18)}
@end multitable
+@defun string-to-syntax @var{desc}
+This function returns the internal form @code{(@var{syntax-code} .
+@var{matching-char})} corresponding to the syntax descriptor @var{desc}.
+@end defun
+
+@defun syntax-after pos
+This function returns the syntax code of the character in the buffer
+after position @var{pos}, taking account of syntax properties as well
+as the syntax table. If @var{pos} is outside the buffer's accessible
+portion (@pxref{Narrowing, accessible portion}), this function returns
+@code{nil}.
+@end defun
+
+@defun syntax-class syntax
+This function returns the syntax class of the syntax code
+@var{syntax}. (It masks off the high 16 bits that hold the flags
+encoded in the syntax descriptor.) If @var{syntax} is @code{nil}, it
+returns @code{nil}; this is so evaluating the expression
+
+@example
+(syntax-class (syntax-after pos))
+@end example
+
+@noindent
+where @code{pos} is outside the buffer's accessible portion, will
+yield @code{nil} without throwing errors or producing wrong syntax
+class codes.
+@end defun
+
@node Categories
@section Categories
@cindex categories of characters
initialized by copying from the standard categories table, so that the
standard categories are available in all modes.
- Each category has a name, which is an @sc{ascii} printing character in
+ Each category has a name, which is an @acronym{ASCII} printing character in
the range @w{@samp{ }} to @samp{~}. You specify the name of a category
when you define it with @code{define-category}.
@code{t}, that means category @var{cat} is a member of the set, and that
character @var{c} belongs to category @var{cat}.
+For the next three functions, the optional argument @var{table}
+defaults to the current buffer's category table.
+
@defun define-category char docstring &optional table
This function defines a new category, with name @var{char} and
-documentation @var{docstring}.
-
-The new category is defined for category table @var{table}, which
-defaults to the current buffer's category table.
+documentation @var{docstring}, for the category table @var{table}.
@end defun
@defun category-docstring category &optional table
@end example
@end defun
-@defun get-unused-category table
+@defun get-unused-category &optional table
This function returns a category name (a character) which is not
currently defined in @var{table}. If all possible categories are in use
in @var{table}, it returns @code{nil}.
@defun copy-category-table &optional table
This function constructs a copy of @var{table} and returns it. If
@var{table} is not supplied (or is @code{nil}), it returns a copy of the
-current category table. Otherwise, an error is signaled if @var{table}
+standard category table. Otherwise, an error is signaled if @var{table}
is not a category table.
@end defun
@end defun
@defun char-category-set char
-This function returns the category set for character @var{char}. This
-is the bool-vector which records which categories the character
-@var{char} belongs to. The function @code{char-category-set} does not
-allocate storage, because it returns the same bool-vector that exists in
-the category table.
+This function returns the category set for character @var{char} in the
+current buffer's category table. This is the bool-vector which
+records which categories the character @var{char} belongs to. The
+function @code{char-category-set} does not allocate storage, because
+it returns the same bool-vector that exists in the category table.
@example
(char-category-set ?a)
But if @var{reset} is non-@code{nil}, then it deletes @var{category}
instead.
@end defun
+
+@deffn Command describe-categories &optional buffer-or-name
+This function describes the category specifications in the current
+category table. It inserts the descriptions in a buffer, and then
+displays that buffer. If @var{buffer-or-name} is non-@code{nil}, it
+describes the category table of that buffer instead.
+@end deffn
+
+@ignore
+ arch-tag: 4d914e96-0283-445c-9233-75d33662908c
+@end ignore