pod
xml
XML Parser and Document Modeling
classes
XAttr |
XAttr models an XML attribute in an element. |
---|---|
XDoc |
XML document encapsulates the root element and document type. |
XDocType |
XML document type declaration (but not the whole DTD). |
XElem |
Models an XML element: its name, attributes, and children nodes. |
XNode | |
XNs |
Models a XML Namespace uri. |
XParser |
XParser is a simple, lightweight XML parser. |
XPi |
XML processing instruction node. |
XText |
XText represents the character data inside an element. |
enums
errs
XErr |
XML exception. |
---|---|
XIncompleteErr |
Incomplete document exception indicates that the end of stream was reached before the end of the document was parsed. |
docs
Overview
The xml API provides the core APIs for working with XML:
XElem
: Provides a standard representation of an XML element tree to be used in memory. It is similar to the W3's DOM, but Fantom centric.XParser
: is a non-validating XML parser. It may be used in two modes: to read an entire XML document into memory or as a pull-parser.
The features supported by XParser:
- All element, attribute, processing instructions, and character data productions are supported
- CDATA sections are supported
- Namespaces are supported at both the element and attribute level
- Doctype declarations provide access to the public and system identifiers
- DTDs are ignored (as such you can't use internal or external entity declarations)
- No access to comments is provided by the XML parser
- Character data consisting only of whitespace is always ignored
DOM
XML documents are modeled in memory using the following classes:
XDoc
: models the entire document providing access to the root element, doctype, and processing instructions declared before the root elementXElem
: models an element name, attributes, and children nodesXText
: models character data in an elementXPi
: models a processing instructionXAttr
: models an attribute name/value pairXNs
: models the prefix and URI of a XML namespace
XML documents are structured as a tree of nodes using the classes listed above. Typically the tree is built by parsing a XML document from an input stream. But you can construct a document in memory using the APIs and it-blocks:
doc := XDoc { XElem("root") { XElem("a") { addAttr("flags", "0"); XText("blah"), }, XElem("b") { XElem("c"), }, XElem("m") { XText("I "), XElem("i") { XText("mean"), }, XText(" it!"), }, }, } doc.write(Env.cur.out) // would print the following to the console <?xml version='1.0' encoding='UTF-8'?> <root> <a flags='0'>blah</a> <b> <c/> </b> <m>I <i>mean</i> it!</m> </root>
Namespaces
The XNs
class is used model XML namespaces. The default namespace is indicated with the prefix of "". Both XElem and XAttr define a set of methods for working with qualified names:
ns := XNs("x", `urn:foo`) e := XElem("name", ns) e.ns => ns e.prefix => "x" e.uri => `urn:foo` e.name => "name" e.qname => "x:name"
If you construct a document by hand, then you are responsible for creating each element with the correct XNs and ensuring that the appropriate namespace attributes are declared:
nsDef := XNs("", `http://foo/default`) nsBar := XNs("b", `http://foo/bar`) root := XElem("root", nsBar) { XAttr.makeNs(nsDef), XAttr.makeNs(nsBar), XElem("elem", nsDef), XElem("elem", nsBar), } root.write(Env.cur.out) // would print the following to the console <b:root xmlns='http://foo/default' xmlns:b='http://foo/bar'> <elem/> <b:elem/> </b:root>
Writing
All the XML node classes support a write
method which takes an OutStream
. During debugging it is often convenient to write to standard out:
node.write(Env.cur.out) Env.cur.out.flush
To write an XML document from memory to a file:
out := `test.xml`.toFile.out doc.write(out) out.close
If you are generating an XML document on the fly you might want to use OutStream
directly. It supports escaping XML control characters via the writeXml
method:
// escape markup in text out.writeChars("<text>").writeXml(text).writeChars("</text>") // escape markup and quotes out.writeChars("attr='").writeXml(attrVal, OutStream.xmlEscQuotes).writeChars("'")
You can use Str.toXml
to generate a string with XML markup escaped. However, you should prefer OutStream when streaming which is more efficient.
Parsing
The XParser
class is used to parse XML input streams into XElems. The easiest way to do this is to parse the entire document into memory using the parseDoc
method:
// parse and close input stream doc := XParser(in).parseDoc // parse a file doc := XParser(`test.xml`.toFile.in).parseDoc // parse a string doc := XParser("<foo/>".in).parseDoc
The code above parses the document entirely into memory. This tends to be easiest way to work with XML documents. However it can create efficiency problems when parsing large documents, especially when mapping the XElems into other data structures. To support more efficient parsing of XML streams, XParser may also be used to read elements off the input stream one at a time. This is similar to the SAX API, except you pull events instead of having them pushed to you.
To perform pull parsing, use the next
method to iterate through the document. This tokenizes the stream into XElem, XText, and XPi chunks. Each call to next
advances to the next token and returns its node type. You may also check the type of the current token using nodeType
. You may access the current token using elem
, text
, or pi
.
XParser maintains a stack of XElems for you from the root element down to the current element. You may check the depth of the stack using the depth
method. Get the current element at any position in the stack via elemAt
.
It is very important to understand the XElem at given depth is only valid until the parser returns elemEnd
for that depth. After that the element will be reused. A XText instance is only valid until the next call to next
. You can make a safe copy of nodes using copy
. You can also use parseElem
to read the current element and its is descendants into memory during pull-parsing.
Example of pull parsing:
parser := XParser( "<root> <a>text</a> </root>".in) while (parser.next != null) echo("$parser.nodeType $parser.depth $parser.elem") // prints the following to the console elemStart 0 <root> elemStart 1 <a> text 1 <a> elemEnd 1 <a> elemEnd 0 <root>