1 files changed, 542 insertions, 0 deletions
diff --git a/test/rexml/data/documentation.xml b/test/rexml/data/documentation.xml
new file mode 100644
index 0000000000..a1ad6e878b
--- /dev/null
+++ b/test/rexml/data/documentation.xml
@@ -0,0 +1,542 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<?xml-stylesheet type="text/css" href="http://www.germane-software.com/repositories/public/documentation/documentation.css"?>
+<?xml-stylesheet alternative="yes" type="text/css" href="file:/home/ser/Work/documentation/documentation.css"?>
+<?xml-stylesheet alternative="yes" type="text/xsl" href="http://www.germane-software.com/repositories/public/documentation/paged.xsl"?>
+<!DOCTYPE documentation SYSTEM "http://www.germane-software.com/repositories/public/documentation/documentation.dtd">
+<documentation>
+  <head>
+    <title>REXML</title>
+
+    <banner href="img/rexml.png" />
+
+    <version>@ANT_VERSION@</version>
+
+    <date>@ANT_DATE@</date>
+
+    <home>http://www.germane-software.com/software/rexml</home>
+
+    <base>rexml</base>
+
+    <language>ruby</language>
+
+    <author email="ser@germane-software.com"
+    href="http://www.ser1.net/" jabber="seanerussell@gmail.com">Sean
+    Russell</author>
+  </head>
+
+  <overview>
+    <purpose lang="en">
+      <p>REXML is a conformant XML processor for the Ruby programming
+      language. REXML passes 100% of the Oasis non-validating tests and
+      includes full XPath support. It is reasonably fast, and is implemented
+      in pure Ruby. Best of all, it has a clean, intuitive API. REXML is
+      included in the standard library of Ruby</p>
+
+      <p>This software is distribute under the <link href="LICENSE.txt">Ruby
+      license</link>.</p>
+    </purpose>
+
+    <general>
+      <p>REXML arose out of a desire for a straightforward XML API, and is an
+      attempt at an API that doesn't require constant referencing of
+      documentation to do common tasks. "Keep the common case simple, and the
+      uncommon, possible."</p>
+
+      <p>REXML avoids The DOM API, which violates the maxim of simplicity. It
+      does provide <em>a</em> DOM model, but one that is Ruby-ized. It is an
+      XML API oriented for Ruby programmers, not for XML programmers coming
+      from Java.</p>
+
+      <p>Some of the common differences are that the Ruby API relies on block
+      enumerations, rather than iterators. For example, the Java code:</p>
+
+      <example>for (Enumeration e=parent.getChildren(); e.hasMoreElements(); ) { 
+  Element child = (Element)e.nextElement(); // Do something with child 
+}</example>
+
+      <p>in Ruby becomes:</p>
+
+      <example>parent.each_child{ |child| # Do something with child }</example>
+
+      <p>Can't you feel the peace and contentment in this block of code? Ruby
+      is the language Buddha would have programmed in.</p>
+
+      <p>One last thing. If you use and like this software, and you're in a
+      position of power in a company in Western Europe and are looking for a
+      software architect or developer, drop me a line. I took a lot of French
+      classes in college (all of which I've forgotten), and I lived in Munich
+      long enough that I was pretty fluent by the time I left, and I'd love to
+      get back over there.</p>
+    </general>
+
+    <features lang="en">
+      <item>Four intuitive parsing APIs.</item>
+
+      <item>Intuitive, powerful, and reasonably fast tree parsing API (a-la
+      DOM</item>
+
+      <item>Fast stream parsing API (a-la SAX)<footnote>This is not a SAX
+      API.</footnote></item>
+
+      <item>SAX2-based API<footnote>In addition to the native REXML streaming
+      API. This is slower than the native REXML API, but does a lot more work
+      for you.</footnote></item>
+
+      <item>Pull parsing API.</item>
+
+      <item>Small</item>
+
+      <item>Reasonably fast (for interpreted code)</item>
+
+      <item>Native Ruby</item>
+
+      <item>Full XPath support<footnote>Currently only available for the tree
+      API</footnote></item>
+
+      <item>XML 1.0 conformant<footnote>REXML passes all of the non-validating
+      OASIS tests. There are probably places where REXML isn't conformant, but
+      I try to fix them as they're reported.</footnote></item>
+
+      <item>ISO-8859-1, UNILE, UTF-16 and UTF-8 input and output; also,
+      support for any encoding the iconv supports.</item>
+
+      <item>Documentation</item>
+    </features>
+  </overview>
+
+  <operation lang="en">
+    <subsection title="Installation">
+      <p>You don't <em>have</em> to install anything; if you're running a
+      version of Ruby greater than 1.8, REXML is included. However, if you
+      choose to upgrade from the REXML distribution, run the command:
+      <code>ruby bin/install.rb</code>. By the way, you really should look at
+      these sorts of files before you run them as root. They could contain
+      anything, and since (in Ruby, at least) they tend to be mercifully
+      short, it doesn't hurt to glance over them. If you want to uninstall
+      REXML, run <code>ruby bin/install.rb -u</code>.</p>
+    </subsection>
+
+    <subsection title="Unit tests">
+      <p>If you have Test::Unit installed, you can run the unit test cases.
+      Run the command: <code>ruby bin/suite.rb</code>; it runs against the
+      distribution, not against the installed version.</p>
+    </subsection>
+
+    <subsection title="Benchmarks">
+      <p>There is a benchmark suite in <code>benchmarks/</code>. To run the
+      benchmarks, change into that directory and run <code>ruby
+      comparison.rb</code>. If you have nothing else installed, only the
+      benchmarks for REXML will be run. However, if you have any of the
+      following installed, benchmarks for those tools will also be run:</p>
+
+      <list>
+        <item>NQXML</item>
+
+        <item>XMLParser</item>
+
+        <item>Electric XML (you must copy <code>EXML.jar</code> into the
+        <code>benchmarks</code> directory and compile
+        <code>flatbench.java</code> before running the test)</item>
+      </list>
+
+      <p>The results will be written to <code>index.html</code>.</p>
+    </subsection>
+
+    <subsection title="General Usage">
+      <p>Please see <link href="docs/tutorial.html">the Tutorial</link>.</p>
+
+      <p>The API documentation is available <link
+      href="http://www.germane-software.com/software/XML/rexml/doc">on-line</link>,
+      or it can be downloaded as an archive <link
+      href="http://www.germane-software.com/software/archives/rexml_api_@ANT_VERSION@.tgz">in
+      tgz format (~70Kb)</link> or (if you're a masochist) <link
+      href="http://www.germane-software.com/software/archives/rexml_api_@ANT_VERSION@.zip">in
+      zip format (~280Kb)</link>. The best solution is to download and install
+      Dave Thomas' most excellent <link
+      href="http://rdoc.sourceforge.net">rdoc</link> and generate the API docs
+      yourself; then you'll be sure to have the latest API docs and won't have
+      to keep downloading the doc archive.</p>
+
+      <p>The unit tests in <code>test/</code> and the benchmarking code in
+      <code>benchmark/</code> provide additional examples of using REXML. The
+      Tutorial provides examples with commentary. The documentation unpacks
+      into <link href="doc/index.html"><code>rexml/doc</code></link>.</p>
+
+      <p>Kouhei Sutou maintains a <link
+      href="http://www.germane-software.com/software/rexml_doc_ja/current/index.html">Japanese
+      version</link> of the REXML API docs. <link
+      href="http://www.germane-software.com/software/rexml_doc_ja/current/japanese_documentation.html">Kou's
+      documentation page</link> contains links to binary archives for various
+      versions of the documentation.</p>
+    </subsection>
+  </operation>
+
+  <status>
+    <subsection title="Speed and Completeness">
+      <p>Unfortunately, NQXML is the only package REXML can be compared
+      against; XMLParser uses expat, which is a native library, and really is
+      a different beast altogether. So in comparing NQXML and REXML you can
+      look at four things: speed, size, completeness, and API.</p>
+
+      <p><link href="benchmarks/index.html">Benchmarks</link></p>
+
+      <p>REXML is faster than NQXML in some things, and slower than NQXML in a
+      couple of things. You can see this for yourself by running the supplied
+      benchmarks. Most of the places where REXML are slower are because of the
+      convenience methods<footnote>For example,
+      <code>element.elements[index]</code> isn't really an array operation;
+      index can be an Integer or an XPath, and this feature is relatively time
+      expensive.</footnote>. On the positive side, most of the convenience
+      methods can be bypassed if you know what you are doing. Check the <link
+      href="benchmarks/index.html"> benchmark comparison page</link> for a
+      <em>general</em> comparison. You can look at the benchmark code yourself
+      to decide how much salt to take with them.</p>
+
+      <p>The sizes of the XML parsers are close<footnote>As measured with
+      <code>ruby -nle 'print unless /^\s*(#.*|)$/' *.rb | wc -l</code>
+      </footnote>. NQXML 1.1.3 has 1580 non-blank, non-comment lines of code;
+      REXML 2.0 has 2340<footnote>REXML started out with about 1200, but that
+      number has been steadily increasing as features are added. XPath
+      accounts for 541 lines of that code, so the core REXML has about 1800
+      LOC.</footnote>.</p>
+
+      <p>REXML is a conformant XML 1.0 parser. It supports multiple language
+      encodings, and internal processing uses the required UTF-8 and UTF-16
+      encodings. It passes 100% of the Oasis non-validating tests.
+      Furthermore, it provides a full implementation of XPath, a SAX2 and a
+      PullParser API.</p>
+    </subsection>
+
+    <subsection title="XPath">
+      <p>As of release 2.0, XPath 1.0 is fully implemented.</p>
+
+      <p>I fully expect bugs to crop up from time to time, so if you see any
+      bogus XPath results, please let me know. That said, since I'm now
+      following the XPath grammar and spec fairly closely, I suspect that you
+      won't be surprised by REXML's XPath very often, and it should become
+      rock solid fairly quickly.</p>
+
+      <p>Check the "bugs" section for known problems; there are little bits of
+      XPath here and there that are not yet implemented, but I'll get to them
+      soon.</p>
+
+      <p>Namespace support is rather odd, but it isn't my fault. I can only do
+      so much and still conform to the specs. In particular, XPath attempts to
+      help as much as possible. Therefore, in the trivial cases, you can pass
+      namespace prefixes to Element.elements[...] and so on -- in these cases,
+      XPath will use the namespace environment of the base element you're
+      starting your XPath search from. However, if you want to do something
+      more complex, like pass in your own namespace environment, you have to
+      use the XPath first(), each(), and match() methods. Also, default
+      namespaces <em>force</em> you to use the XPath methods, rather than the
+      convenience methods, because there is no way for XPath to know what the
+      mappings for the default namespaces should be. This is exactly why I
+      loath namespaces -- a pox on the person(s) who thought them up!</p>
+    </subsection>
+
+    <subsection title="Namespaces">
+      <p>Namespace support is now fairly stable. One thing to be aware of is
+      that REXML is not (yet) a validating parser. This means that some
+      invalid namespace declarations are not caught.</p>
+    </subsection>
+
+    <subsection title="Mailing list">
+      <p>There is a low-volume mailing list dedicated to REXML. To subscribe,
+      send an empty email to <link
+      href="mailto:ser-rexml-subscribe@germane-software.com">ser-rexml-subscribe@germane-software.com</link>.
+      This list is more or less spam proof. To unsubscribe, similarly send a
+      message to <link
+      href="mailto:ser-rexml-unsubscribe@germane-software.com">ser-rexml-unsubscribe@germane-software.com</link>.</p>
+    </subsection>
+
+    <subsection title="RSS">
+      <p>An <link
+          href="http://www.germane-software.com/projects/rexml/timeline?ticket=on&amp;max=50&amp;daysback=90&amp;format=rss">RSS
+      file</link> for REXML is now being generated from the change log. This
+    allows you to be alerted of bug fixes and feature additions via "pull".
+    <link href="http://www.germane-software.com/software/rexml/rss.xml">Another
+      RSS</link> is available which contains a single item: the release notice
+    for the most recent release.  This is an abuse of the RSS
+      mechanism, which was intended to be a distribution system for headlines
+      linked back to full articles, but it works. The headline for REXML is
+      the version number, and the description is the change log. The links all
+      link back to the REXML home page. The URL for the RSS itself is
+      http://www.germane-software.com/software/rexml/rss.xml.</p>
+
+      <p>The <link href="release.html">changelog itself is here</link>.</p>
+
+      <p>For those who are interested, there's a <link
+      href="docs/sloccount.txt">SLOCCount</link> (by David A. Wheeler) file
+      with stats on the REXML sourcecode. Note that the SLOCCount output
+      includes the files in the test/, benchmarks/, and bin/ directories, as
+      well as the main sourcecode for REXML itself.</p>
+    </subsection>
+
+    <subsection title="Applications that use REXML">
+      <list>
+        <item><link
+        href="http://www.pablotron.org/software/raggle/">Raggle</link> is a
+        console-based RSS aggregator.</item>
+
+        <item><link
+        href="http://www.zweknu.org/technical/index.rhtml?s=p|10/">getrss</link>
+        is an RSS aggregator</item>
+
+        <item>Ned Konz's <link
+        href="http://www.bikenomad.microship.com/ruby/">ruby-htmltools</link>
+        uses REXML</item>
+
+        <item>Hiroshi NAKAMURA's <link
+        href="http://www.ruby-lang.org/en/raa-list.rhtml?name=SOAP4R">SOAP4R</link>
+        package can use REXML as the XML processor.</item>
+
+        <item>Chris Morris' <link href="http://clabs.org/clxmlserial.htm">XML
+        Serializer</link>. XML Serializer provides a serialization mechanism
+        for Ruby that provides a bidirectional mapping between Ruby classes
+        and XML documents.</item>
+
+        <item>Much of the <link href="http://www.rubyxml.com">RubyXML</link>
+        site is generated with scripts that use REXML. RubyXML is a great
+        place to find information about th intersection between Ruby and
+        XML.</item>
+      </list>
+    </subsection>
+
+    <bugs lang="en">
+      <p>You can submit bug reports and feature requests, and view the list of
+      known bugs, at the <link
+      href="http://www.germane-software.com/projects/rexml">REXML bug report
+      page.</link> Please do submit bug reports. If you really want your bug
+      fixed fast, include an runit or Test::Unit method (or methods) that
+      illustrates the problem. At the very least, send me some XML that REXML
+      doesn't process properly.</p>
+
+      <p>You don't have to send an entire test suite -- just the unit test
+      methods. If you don't send me a unit test, I'll have to write one
+      myself, which will mean that your bug will take longer to fix.</p>
+
+      <p>When submitting bug reports, please include the version of Ruby and
+      of REXML that you're using, and the operating system you're running on.
+      Just run: <code>ruby -vrrexml/rexml -e 'p
+      REXML::VERSION,PLATFORM'</code> and paste the results in your bug
+      report. Include your email if you want a response about the bug.</p>
+
+      <item>Attributes are not handled internally as nodes, so you can't
+      perform node functions on them. This will have to change. It'll also
+      probably mean that, rather than returning attribute values, XPath will
+      return the Attribute nodes.</item>
+
+      <item>Some of the XPath <em>functions</em> are untested<footnote>Mike
+      Stok has been testing, debugging, and implementing some of these
+      Functions (and he's been doing a good job) so there's steady improvement
+      in this area.</footnote>. Any XPath functions that don't work are also
+      bugs... please report them. If you send a unit test that illustrates the
+      problem, I'll try to fix the problem within a couple of days (if I can)
+      and send you a patch, personally.</item>
+
+      <item>Accessing prefixes for which there is no defined namespace in an
+      XPath should throw an exception. It currently doesn't -- it just fails
+      to match.</item>
+    </bugs>
+
+    <todo lang="en">
+      <item>Reparsing a tree with a pull/SAX parser</item>
+
+      <item>Better namespace support in SAX</item>
+
+      <item>Lazy tree parsing</item>
+
+      <item>Segregate parsers, for optimized minimal distributions</item>
+
+      <item>XML &lt;-&gt; Ruby</item>
+
+      <item>Validation support</item>
+
+      <item>True XML character support</item>
+
+      <item>Add XPath support for streaming APIs</item>
+
+      <item status="request">XQuery support</item>
+
+      <item status="request">XUpdate support</item>
+
+      <item>Make sure namespaces are supported in pull parser</item>
+
+      <item status="request">Add document start and entity replacement events
+      in pull parser</item>
+
+      <item>Better stream parsing exception handling</item>
+
+      <item>I'd like to hack XMLRPC4R to use REXML, for my own
+      purposes.</item>
+    </todo>
+  </status>
+
+  <faq>
+    <q>REXML is hanging while parsing one of my XML files.</q>
+
+    <a>Your XML is probably malformed. Some malformed XML, especially XML that
+    contains literal '&lt;' embedded in the document, causes REXML to hang.
+    REXML should be throwing an exception, but it doesn't; this is a bug. I'm
+    aware that it is an extremely annoying bug, and it is one I'm trying to
+    solve in a way that doesn't significantly reduce REXML's parsing
+    speed.</a>
+
+    <q>I'm using the XPath '//foo' on an XML branch node X, and keep getting
+    all of the 'foo' elements in the entire document. Why? Shouldn't it return
+    only the 'foo' element descendants of X?</q>
+
+    <a>No. XPath specifies that '/' returns the document root, regardless of
+    the context node. '//' also starts at the document root. If you want to
+    limit your search to a branch, you need to use the self:: axe. EG,
+    'self::node()//foo', or the shorthand './/foo'.</a>
+
+    <q>I want to parse a document both as a tree, and as a stream. Can I do
+    this?</q>
+
+    <a>Yes, and no. There is no mechanism that directly supports this in
+    REXML. However, aside from writing your own traversal layer, there is a
+    way of doing this. To turn a tree into a stream, just turn the branch you
+    want to process as a stream back into a string, and re-parse it with your
+    preferred API. EG: pp = PullParser.new( some_element.to_s ). The other
+    direction is more difficult; you basically have to build a tree from the
+    events. REXML will have one of these builders, eventually, but it doesn't
+    currently exist.</a>
+
+    <q>Why is Element.elements indexed off of '1' instead of '0'?</q>
+
+    <a>Because of XPath. The XPath specification states that the index of the
+    first child node is '1'. Although it may be counter-intuitive to base
+    elements on 1, it is more undesireable to have element.elements[0] ==
+    element.elements[ 'node()[1]' ]. Since I can't change the XPath
+    specification, the result is that Element.elements[1] is the first child
+    element.</a>
+
+    <q>Why isn't REXML a validating parser?</q>
+
+    <a>Because validating parsers must include code that parses and interprets
+    DTDs. I hate DTDs. REXML supports the barest minimum of DTD parsing, and
+    even that isn't complete. There is DTD parsing code in the works, but I
+    only work on it when I'm really, really bored. Rumor has it that a
+    contributor is working on a DTD parser for REXML; rest assured that any
+    such contribution will be included with REXML as soon as it is
+    available.</a>
+
+    <q>I'm trying to create an ISO-8859-1 document, but when I add text to the
+    document it isn't being properly encoded.</q>
+
+    <a>Regardless of what the encoding of your document is, when you add text
+    programmatically to a REXML document you <em>must</em> ensure that you are
+    only adding UTF-8 to the tree. In particular, you can't add ISO-8859-1
+    encoded text that contains characters above 0x80 to REXML trees -- you
+    must convert it to UTF-8 before doing so. Luckily, this is easy:
+    <code>text.unpack('C*').pack('U*')</code> will do the trick. 7-bit ASCII
+    is identical to UTF-8, so you probably won't need to worry about this.</a>
+
+    <q>How do I get the tag name of an Element?</q>
+
+    <a>You take a look at the APIs, and notice that <code>Element</code>
+    includes <code>Namespace</code>. Then you click on the
+    <code>Namespace</code> link and look at the methods that
+    <code>Element</code> includes from <code>Namespace</code>. One of these is
+    <code>name()</code>. Another is <code>expanded_name()</code>. Yet another
+    is <code>prefix()</code>. Then, you email the author of rdoc and ask him
+    to extend rdoc so that it lists methods in the API that are included from
+    other files, so that you don't have to do all of that looking around for
+    your method.</a>
+  </faq>
+
+  <credits>
+    <p>I've had help from a number of resources; if I haven't listed you here,
+    it means that I just haven't gotten around to adding you, or that I'm a
+    dork and have forgotten. In either case, feel free to write me and
+    complain.</p>
+
+    <list>
+      <item>Mike Stok has been very active, sending not only fixes for bugs
+      (especially in Functions), but also by providing unit tests and making
+      sure REXML runs under Ruby 1.7. He also sent the most awesome hand
+      knitted tea cozy, with "REXML" and the Ruby knitted into it.</item>
+
+      <item>Kouhei Sutou translated the REXML API documentation to Japanese!
+      Links are in the API docs section of the main documentation. He has also
+      contributed a large number of bug reports and patches to fix bugs in
+      REXML.</item>
+
+      <item>Erik Terpstra heard my pleas and submitted several logos for
+      REXML. After sagely procrastinating for several weeks, I finally forced
+      my poor slave of a wife to pick one (this is what we call "delegation").
+      She did, with caveats; Erik quickly made the changes, and the result is
+      what you now see at the top of this page. He also supplied a <link
+      href="img/rexml_50p.png">smaller version</link> that you can include
+      with your projects that use REXML, if you'd like.</item>
+
+      <item>Ernest Ellingson contributed the sourcecode for turning UTF16 and
+      UNILE encodings into UTF8, which allowed REXML to get the 100% OASIS
+      valid tests rating.</item>
+
+      <item>Ian Macdonald provided me with a comprehensive, well written RPM
+      spec file.</item>
+
+      <item>Oliver M . Bolzer is maintaining a Debian package distribution of
+      REXML. He also has provided good feedback and bug reports about
+      namespace support.</item>
+
+      <item>Michael Granger supplied a patch for REXML that make the unit
+      tests pass under Ruby 1.7.</item>
+
+      <item>James Britt contributed code that makes using
+      Document.parse_stream easier to use by allowing it to be passed either a
+      Source, File, or String.</item>
+
+      <item>Tobias Reif: Numerous bug reports, and suggestions for
+      improvement.</item>
+
+      <item>Stefan Scholl, who provided a lot of feedback and bug reports
+      while I was trying to get ISO-8859-1 support working.</item>
+
+      <item>Steven E Lumos for volunteering information about XPath
+      particulars.</item>
+
+      <item>Fumitoshi UKAI provided some bug fixes for CData metacharacter
+      quoting.</item>
+
+      <item>TAKAHASHI Masayoshi, for information on UTF</item>
+
+      <item>Robert Feldt: Bug reports and suggestions/recommendations about
+      improving REXML. Testing is one of the most important aspects of
+      software development.</item>
+
+      <item><link
+      href="http://www.themindelectric.com/exml/index.html">Electric
+      XML</link>: This was, after all, the inspiration for REXML. Originally,
+      I was just going to do a straight port, and although REXML doesn't in
+      any way, shape or form resemble Electric XML, still the basic framework
+      and philosophy was inspired by E-XML. And I still use E-XML in my Java
+      projects.</item>
+
+      <item><link
+      href="http://www.io.com/~jimm/downloads/nqxml/index.html">NQXML</link>:
+      While I may complain about the NQXML API, I wrote a few applications
+      using it that wouldn't have been written otherwise, and it was very
+      useful to me. It also encouraged me to write REXML. Never complain about
+      free software *slap*.</item>
+
+      <item>See my <link
+      href="http://www.germane-software.com/~ser/technology.html">technologies
+      page</link> for a more comprehensive list of computer technologies that
+      I depend on for my day-to-day work.</item>
+
+      <item>rdoc, an excellent JavaDoc analog<footnote>When I was first
+      working on REXML, rdoc wasn't, IMO, very good, so I wrote API2XML.
+      API2XML was good enough for a while, and then there was a flurry of work
+      on rdoc, and it quickly surpassed API2XML in features. Since I was never
+      really interested in maintaining a JavaDoc analog, I stopped support of
+      API2XML, and am now recommending that people use
+      rdoc.</footnote>.</item>
+
+      <item>Many, many other people who've submitted bug reports, suggestions,
+      and positive feedback. You're all co-developers!</item>
+    </list>
+  </credits>
+</documentation>