@c -*-texinfo-*-
@c This is part of the GNU Emacs Lisp Reference Manual.
-@c Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1998, 1999, 2000, 2001,
-@c 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
-@c Free Software Foundation, Inc.
+@c Copyright (C) 1990-1995, 1998-2011 Free Software Foundation, Inc.
@c See the file elisp.texi for copying conditions.
@setfilename ../../info/text
@node Text, Non-ASCII Characters, Markers, Top
position stored in a register.
* Base 64:: Conversion to or from base 64 encoding.
* MD5 Checksum:: Compute the MD5 "message digest"/"checksum".
+* Parsing HTML:: Parsing HTML and XML.
* Atomic Changes:: Installing several buffer changes "atomically".
* Change Hooks:: Supplying functions to be run when text is changed.
@end menu
@defvar interprogram-cut-function
This variable provides a way of communicating killed text to other
programs, when you are using a window system. Its value should be
-@code{nil} or a function of one required and one optional argument.
+@code{nil} or a function of one required argument.
If the value is a function, @code{kill-new} and @code{kill-append} call
-it with the new first element of the kill ring as the first argument.
-The second, optional, argument has the same meaning as the @var{push}
-argument to @code{x-set-cut-buffer} (@pxref{Definition of
-x-set-cut-buffer}) and only affects the second and later cut buffers.
+it with the new first element of the kill ring as the argument.
The normal use of this function is to set the window system's primary
-selection (and first cut buffer) from the newly killed text.
+selection from the newly killed text.
@xref{Window System Selections}.
@end defvar
@cindex hooks for changing a character
@kindex modification-hooks @r{(text property)}
If a character has the property @code{modification-hooks}, then its
-value should be a list of functions; modifying that character calls all
-of those functions. Each function receives two arguments: the beginning
-and end of the part of the buffer being modified. Note that if a
-particular modification hook function appears on several characters
-being modified by a single primitive, you can't predict how many times
-the function will be called.
+value should be a list of functions; modifying that character calls
+all of those functions before the actual modification. Each function
+receives two arguments: the beginning and end of the part of the
+buffer being modified. Note that if a particular modification hook
+function appears on several characters being modified by a single
+primitive, you can't predict how many times the function will
+be called.
+Furthermore, insertion will not modify any existing character, so this
+hook will only be run when removing some characters, replacing them
+with others, or changing their text-properties.
If these functions modify the buffer, they should bind
@code{inhibit-modification-hooks} to @code{t} around doing so, to
@code{point-left} functions are called first, followed by all the
@code{point-entered} functions.
-It is possible with @code{char-after} to examine characters at various
+It is possible to use @code{char-after} to examine characters at various
buffer positions without moving point to those positions. Only an
actual change in the value of point runs these hook functions.
+The variable @code{inhibit-point-motion-hooks} can inhibit running the
+@code{point-left} and @code{point-entered} hooks, see @ref{Inhibit
+point motion hooks}.
+
+@item composition
+@kindex composition @r{(text property)}
+This text property is used to display a sequence of characters as a
+single glyph composed from components. But the value of the property
+itself is completely internal to Emacs and should not be manipulated
+directly by, for instance, @code{put-text-property}.
+
+@end table
+
@defvar inhibit-point-motion-hooks
-When this variable is non-@code{nil}, @code{point-left} and
-@code{point-entered} hooks are not run, and the @code{intangible}
-property has no effect. Do not set this variable globally; bind it with
-@code{let}.
+@anchor{Inhibit point motion hooks} When this variable is
+non-@code{nil}, @code{point-left} and @code{point-entered} hooks are
+not run, and the @code{intangible} property has no effect. Do not set
+this variable globally; bind it with @code{let}.
@end defvar
@defvar show-help-function
Manual}) provides an example.
@end defvar
-@item composition
-@kindex composition @r{(text property)}
-This text property is used to display a sequence of characters as a
-single glyph composed from components. But the value of the property
-itself is completely internal to Emacs and should not be manipulated
-directly by, for instance, @code{put-text-property}.
-
-@end table
-
@node Format Properties
@subsection Formatted Text Properties
coding instead.
@end defun
+@node Parsing HTML
+@section Parsing HTML
+@cindex parsing html
+
+@defun libxml-parse-html-region start end &optional base-url
+This function provides HTML parsing via the @code{libxml2} library.
+It parses ``real world'' HTML and tries to return a sensible parse tree
+regardless.
+
+In addition to @var{start} and @var{end} (specifying the start and end
+of the region to act on), it takes an optional parameter,
+@var{base-url}, which is used to expand relative URLs in the document,
+if any.
+
+Here's an example demonstrating the structure of the parsed data you
+get out. Given this HTML document:
+
+@example
+<html><hEad></head><body width=101><div class=thing>Foo<div>Yes
+@end example
+
+You get this parse tree:
+
+@example
+(html
+ (head)
+ (body
+ (:width . "101")
+ (div
+ (:class . "thing")
+ (text . "Foo")
+ (div
+ (text . "Yes\n")))))
+@end example
+
+It's a simple tree structure, where the @code{car} for each node is
+the name of the node, and the @code{cdr} is the value, or the list of
+values.
+
+Attributes are coded the same way as child nodes, but with @samp{:} as
+the first character.
+@end defun
+
+@cindex parsing xml
+@defun libxml-parse-xml-region start end &optional base-url
+
+This is much the same as @code{libxml-parse-html-region} above, but
+operates on XML instead of HTML, and is correspondingly stricter about
+syntax.
+@end defun
+
@node Atomic Changes
@section Atomic Change Groups
@cindex atomic changes
- In data base terminology, an @dfn{atomic} change is an indivisible
+ In database terminology, an @dfn{atomic} change is an indivisible
change---it can succeed entirely or it can fail entirely, but it
cannot partly succeed. A Lisp program can make a series of changes to
one or several buffers as an @dfn{atomic change group}, meaning that