XML importing script

What does it do?

This XML importing script imports an external XML data file, such as a small amount of spurious data, or even an XML database, allows you to process the data, and you can then display it in any way you want. This is a DOM script that will work in 5^th generation browsers like Internet Explorer 5+, Mozilla/Netscape 6+, Opera 7+, Chrome/Safari 1.2+, Konqueror 3.3+, OmniWeb 5.1+, iCab 3+, NetFront 3.4+ and ICEbrowser. Espial claim that it also works in Escape/Evo, but it is not actually capable of running scripts like this. Tkhtml Hv3 supports XMLHttpRequest, but only returns a text response; it does not create a DOM for the returned document, so it cannot work with this script.

To download the script(s), see the script license, and check details like browser compatibility, use the links on the navigation panel at the top of this page.

I have also written a page describing how to import non-XML-based data into JavaScript, where the technique actually supports a few more browsers than this one. However, it is not designed for importing XML databases, so it serves a different purpose to this script.

This script builds on the monumental script created by PPK. His script is well commented showing how it works, so I will not reproduce it here. The major difference between our two scripts is the level of browser support. I have managed to extend the script to support Opera, Safari/Chrome, Konqueror, OmniWeb, Internet Explorer 5 on Mac (with a minor proviso), iCab, NetFront and ICEbrowser, while avoiding errors in browsers that partially support PPK's script (such as Safari).

Demonstration

This demonstration and associated XML file are shamelessly taken from PPK's page, in tribute to his inspiring script, and also to show that the two scripts are essentially compatible, as they simply provide an interface to the document object of the XML data file.

Test it here: load and process PPK's XML file (copied to my server due to the inherent restrictions of importing XML files). As with PPK's example, you can also view the XML document separately (the only difference is that I also include a blank stylesheet [and for my own personal satisfaction; an XML declaration] - see below for explanation). As with PPK's script, the XML document is processed, and the data converted into HTML and displayed as a table. This is a trivial use of a very powerful script, but it serves to demonstrate that the XML data has actually been loaded by JavaScript.

I have also written a script that imports RSS feeds and interprets them, and displays them using HTML. To import the feeds, it uses this script, since RSS feeds are just XML.

How is this possible?

There are several ways to import XML. None of them are truly cross browser, so a combination must be used. The two main techniques have a normal constructor for standards based browsers, and an ActiveX version for Internet Explorer. The syntax used to load the document is the same in both versions, and the functionality is comparable.

Browser	XML DOM	XMLHttpRequest	Iframes
IE Win	Yes	HTTP or HTTPS	No
Gecko	Yes	Yes	Yes
Safari/Chrome	No	1.2+	1.2+
Opera	No	7.6+	Yes
IE Mac	Error	Error	Yes
iCab	No	3.0.3+	Yes
NetFront	No	3.4+	No
ICE	Error	Error	Yes
Other	?	?	?

XML DOM Creates an XML document object, then makes it load an XML file. It uses either the createDocument method of the document.implementation object (followed by a call to load()), or the Microsoft.XMLDOM ActiveX object.
XMLHttpRequest creates an HTTP request using GET or POST and uses it to load an XML file. It uses the XMLHttpRequest contructor, or the Microsoft.XMLHTTP ActiveX object.
The iframe technique loads an XML page in an iframe then parses it as a document.

These techniques are equivalent for most purposes, except that the second can pass POST information in the request, allowing for more than 4KB of encoded data to be sent to the server (either can pass GET information in the requested URL string). Since the iframe technique cannot use POST without creating history entries, I restrict myself to GET in all situations. The second technique does offer superior error handling (in case the XML file does not load), and can handle synchronous/asynchronous loading better.

As well as that, the second technique is being included in the upcoming W3C specification, and is the basis of most AJAX (the annoying buzzword, not the bathroom cleaner) applications. It offers an extra advantage that it can deal with plain text files, and raw source code instead of just providing a DOM interface.

PPK's example uses only the first technique.

Newer versions of ICEbrowser also support their own version of XMLHttpRequest created using window.createRequest() or new XMLHttpRequest() but they do not provide a responseXML DOM, and are useless here. It would be possible to cludge this script to force ICEbrowser to use the iframe like it used to, but the DOM of the iframe page also seems to be broken. So it is better to just wait until it supports XMLHttpRequest properly.

Note: If you want to submit form data to construct the XML data URL, I have demonstrated how in a previous email.

I could just use XMLHttpRequest and one of the ActiveX techniques, which would extend support to a few more browsers, as well as making the script a lot easier, since repeated 'if' tests are not required, and no browsers produce errors. However, I am not satisfied with just that. What I do is I start the same way as PPK, but using XMLHttpRequest instead of document.implementation. Failing that, I try new ActiveXObject('Microsoft.XMLHTTP') with a check to prevent problems with ICEbrowser then fall back to new ActiveXObject('Microsoft.XMLDOM') if XMLHTTP is not available (either ActiveX method could be used, but the latter is perhaps a little easier and will also work with file:// URLs, but has broken implementations on many installs). I also use try-catch to prevent IE 5 Mac from failing when it tries to use its crippled ActiveXObject. I did try the same for ICEbrowser, but apparently it fails to use try-catch properly for this sort of situation.

At this stage, I have done almost the same as PPK, apart from my error avoiding. However (and here is the critical part), what I do next is I check if both of those techniques failed and if they did, I create a hidden iframe, and load the XML document into it. Normally, I would use onload, but I found that only Opera 7.5+ uses it here, despite the fact that IE 5 Mac and ICEbrowser also use this portion of the script. I still need to wait for the XML to load, so I run an interval timer to check if the XML document is loaded (the timer actually continues to run to allow you to load multiple XML files if you want to). In the usual spirit of my scripts, this can load multiple documents simultaneously without any interference between them. It does not reuse the existing requests, and I do not intend to make it operate like that. (Yes, that can make it a little memory hungry, but it prevents conflicts when there are multiple requests - if you want more control, please write it yourself.)

There are two restrictions. Firstly, the XML file must be in the same domain as the page that calls it. Secondly, if you want to support (Internet) Explorer Mac, you need to make sure your XML has a stylesheet. If you do not, then instead of loading the XML as XML, it converts it into an HTML document, filled with excessive CSS and JavaScript to allow it to expand/collapse, and only the the first part including the CSS seems to be available to scripts (sigh). You would be hard pressed to make any sense out of it at all. By adding a stylesheet, even a completely empty one, you override this behaviour, and force it to use plain XML, as it thinks you are trying to make an XML web page. So for best results, I suggest using a blank stylesheet for all imported XML files. It will have no effect on any of the other browsers:

<?xml-stylesheet type="text/css" href="blank.css"?>

One of my readers was having problems with Internet Explorer Mac freezing when running this. Apparently it helps to have a comment in the CSS file to make it not empty, this still has no effect on the other browsers:

/* nothing to see here folks! ;) */

Konqueror 3.2- and OmniWeb 4.5-5.0 actually have a similar behaviour to IE 5 Mac, but I could not find and way to avoid it. Even using a full stylesheet, which caused the XML to render as a real web page, had no effect at all. They still make complete garbage available to script (an HTML page with the XML data in the body with all the XML tags completely removed - why?!). This is a bug in the KHTML/WebKit engines that they use, and is fixed in more recent versions of the engine. Any script you write can check for this faulty behaviour by checking if the root element is HTML (assuming your root XML element is not <html>) and aborting the script at that point if that is the case.

if( XMLdocument.documentElement.tagName.toUpperCase() == 'HTML' ) { return; }

Special case: Internet Explorer users on Windows who have ActiveX disabled using Internet Explorer's preferences or a third party product such as Zone Alarm will find that this script produces errors. This is because Internet Explorer fails to use the try...catch statement properly when ActiveX is diabled in this way (it works correctly if ActiveX is disabled using Internet Explorer's preferences). It is not possible to detect this Internet Explorer bug, so users will simply have to accept it. One of my readers has worked out how to access the DOM from the iframe, but it is better that they just use IE 7, which has a native XMLHttpRequest constructor.

Important note about Safari. Safari 1.2 treats XML as if it were HTML. Although it uses the XML parser, if it sees any tags in the XML that it recognises as HTML tags, it will think they are HTML tags, and will treat them as such. For example, if you use <link> tags in your XML, Safari will ignore any contents, as HTML LINK elements are not allowed to have contents. Konqueror (the browser Safari was made from) uses its own XML parser and does not suffer from this problem. Newer versions of Safari have fixed this.

Note that IE on Pocket PC can import and process XML using a similar ActiveX control as IE on Windows (if it is installed - not by default on many devices), but it has extremely limited DOM support, so it cannot do anything useful with it once it has retrieved it. As a result, I recommend you simply ignore this browser (Opera is available for PocketPC users if they want to use this feature). Version 2.0.5+ of this script supports the ActiveX control used by Pocket IE.

Why not use iframes for everything?

Well, iframes have limitations. By adding an iframe, I add extra unnecessary HTML to the document. The browser also spends unnecessary time attempting to render the XML, even though it is in a hidden iframe. These two make this the slowest technique, although the speed difference is negligible. However, there is a more important problem. It is not possible for me to tell if the XML document has loaded, only that it is available, because XML documents do not have the intrinsic onload event. This could mean that it is beginning to load, but has not yet completed loading, so it is possible that you could begin working on a document before you have a complete set of data. I was worried about the effect that this might have, but when I tried it with a 100 KB XML file, it worked perfectly, so I must assume that the technique is safe enough to use.

Note: My RSS feed parsing script (which uses this XML importing script to load the RSS feed) does show problems created by this iframe loading delay, with large and slow RSS feeds. To combat this, I introduced the option to delay using the XML data for a few seconds in browsers that use the iframe (personally, I found between 2 and 5 seconds to be about right for large files). Even loading much larger XML files is now possible without any problems. This delay option is part of the XML importing script, so you can also use it if you are loading large or slow XML files.

Besides which, natively importing is just cleaner and less hackish, and I am not going to make browsers use something that feels like a hack when they have a much better inbuilt method.

Loading 100 KB in all cases took about 1 second, and processing took less than half a minute. By the time the file got to be 500 KB, the rendering time took too long for it to be usable (nearly five minutes). However, I was rendering a table with the entire contents of the XML file, which takes a lot of adjustments as its contents increase, so this may not be a typical example.

But why use it?

Firstly, why use XML at all? XML is a storage medium, like a table (or tables) in a database. You can even use multiple XML files and process the various 'tables' to produce a useful output. Unlike many database mediums, XML is not limited to any single platform or processing language. But why client side XML? Well, most database work is best handled on the server, so advanced scripting is not required. However, not everyone has access to server side databases. And sometimes, it would be nice to be able to carry and process a database on a PDA. Using XML data files allows you to reduce server load by getting the client to do some of the work. This is by no means an ideal solution to your databasing needs, as it is far less accessible than server side processing, but it does provide you with an option if you need to load more page data after the page has loaded, or you don't have access to server side processing.

If you do have access to server side processing, it is by far the better technique, as it causes no browser compatibility or accessibility problems. Loading XML and transforming it using XSLT is as easy as this in PHP 5:

<?php

$xml = new DomDocument();
$xml->load('source.xml');

$xsl = new DomDocument();
$xsl->load('style.xsl');

$xslt = new XsltProcessor();
$xslt->importStylesheet($xsl);

$transformation = $xslt->transformToXml($xml);
echo $transformation;

?>

Method taken from the PHP XSL introduction page, thanks to Michael Kalua for pointing out the technique to me.

Old browsers with no DOM support

Obviously, this script will not work in older browsers that do not support the DOM. However, a slightly worse feature is that it will produce errors in older browsers that do not support the try-catch control structure. OK, so these browsers are used by less than 2% of people now, but I like to be nice to them just in case. Simply put the following just before the script tag that loads my script:

<script type="text/javascript">
window.onerror = function () { return true; }
</script>

This will not do anything useful for them, but at least they will not see any error messages either.

XML importing script

Navigation

Site search