DOM nodes and tree
The DOM tree
The following text is a snippet of HTML taken from a regular HTML document.
<p title="The test paragraph">This is a sample of some <b>HTML you might<br>have</b> in your document</p>
In your browser, this renders as this (hold your mouse over the paragraph to see the title - most browsers display it as a tooltip, some display it in the status bar):
This is a sample of some HTML you might
have in your document
The DOM tree views this (simplified) as follows:
P _______________|______________ | | childNodes attributes ______________|___________ | | | | title = 'The test paragraph' 'This is a sample of some ' B ' in your document' | childNodes __________|_______ | | | 'HTML you might' BR 'have'
Of course, the tree also extends above the 'P' from window.document, through the HTML element, down through the body element, then through any other container elements to the paragraph.
The parts of the DOM tree are known as nodes. The 'P', 'B' and 'BR' nodes are element nodes, childNodes and attributes are collections,
the title='The test paragraph'
pair is an attribute node, and the text strings are text nodes.
Referencing the element nodes
- Konqueror incorrectly requires the getElementsByTagName parameter to be in lower case when using XHTML strict doctypes, but served as text/html.
- iCab 3 fails to keep the object returned by getElementsByTagName updated.
- Tkhtml Hv3 adds all text nodes that appear before the opening
<body>
tag, as separate text nodes into the body's childNodes collection. - Escape/Evo 5 fails to use the childNodes collections, and will abort the script. Ignore this browser.
- Early IE5 Mac did not provide the childNodes.length property. This was fixed automatically.
If you are worried, use something like this:
for( var x = 0; node.childNodes[x]; x++ )
Using the DOM, there are several ways that we could reference the paragraph. We can use getElementsByTagName to reference all paragraphs, then choose the one we want. If the paragraph were to be given an id, we could also use getElementById:
document.getElementById('id of paragraph')
document.getElementsByTagName('p')[indexOfParagraph]
If we assigned the paragraph an id so that we could use getElementById, the id='elementID'
pair
would appear in the attributes collection, along side
title='The test paragraph'
in the tree diagram above. Note that if the document is served with an XML based
content-type header, getElementsByTagName becomes case sensitive.
NOTE: getElementsByTagName does not return a true collection, it returns an object with element index and 'length' properties. This object keeps itself up to date, so if an element it references is deleted or added, it will automatically change its item references to reflect that change.
We could even walk the entire DOM tree from the document object, for example:
window.document.childNodes[0].childNodes[1].childNodes[4]
In this case, window.document.childNodes[0] should be the HTML element, as it is the first tag in the document (assuming there is no doctype tag), and window.document.childNodes[0].childNodes[1] should be the body tag, as the head element will be the first child of the HTML element. Alternatively, there is a shortcut to the HTML element: document.documentElement so we could use this:
window.document.documentElement.childNodes[1].childNodes[4]
There is also a shortcut to the BODY element: document.body so we could use this:
window.document.body.childNodes[4]
Those last three examples are based on a simple page structure, where the paragraph is a direct child of the body element. Neither of these would be correct in the current document as the document structure is far more complex, also using DIV elements as parents of the paragraph.
The techniques used in those examples can be unreliable. Most browsers will correctly view the blank space between tags as a text node containing only white space characters (such as space, line-break or tab), even if the blank space is not rendered, such as a gap in between a <tr> tag and a <td> tag or a blank gap in between <p> tags. However, some browsers (mainly Internet Explorer 8-, and 9+ in quirks mode) will not view this empty space as a text node at all.
This means that the childNodes collection will be different lengths in these different browsers. If you are trying to walk the DOM tree to the next element node, for example, it may be worth checking each entry in the childNodes collection to see if its nodeType is 1, or to use node.getElementsByTagName.
Because of this, and the fact that the structure of the DOM tree is designed to change as elements are moved, added or removed, the only reliable way to reference an element is using its ID:
var theParagraph = document.getElementById('id of element')
The first entry of the childNodes collection can be accessed using the shortcut firstChild, and the last can be accessed using lastChild. node.nextSibling references the next entry in the parent node's childNodes collection and node.previousSibling references the previous one. To reference the parent node, we use node.parentNode. Note also that all element nodes have the getElementsByTagName method to help reference elements within them. This means that from any node, it is possible to reference any of the other notes around it.
Referencing the attribute node
- Tkhtml Hv3 does not support the attributes collection.
To reference the title='The test paragraph'
attribute pair, we use the attributes collection.
Depending on the browser, this
collection may be filled up in a variety of different ways, and many empty attribute pairs may exist in the collection.
To find the correct attribute, we search through the attributes collection for an attribute whose nodeName matches what
we want. The nodeName may be in any case in HTML documents (typically upper case) and should be case sensitive
in XHTML and XML if served using an XML based MIME type.
for( var x = 0; x < theParagraph.attributes.length; x++ ) {
if( theParagraph.attributes[x].nodeName.toLowerCase() == 'title' ) {
window.alert( 'The value of the \'title\' attribute is: ' +
theParagraph.attributes[x].nodeValue );
}
}
Test it here: get the attribute value.
An easy way to check the attribute node
- NetFront gets the case wrong when retrieving attribute values (align is returned as 'Center' instead of 'center').
- Opera 7-8 will retrieve resolved values instead of specified values for attributes like 'href' and 'src'.
- Many browsers (particularly Internet Explorer 7-) will have trouble retrieving values for style and class, as well as event handlers.
If all you want to do is to check the value of an attribute, not manually edit its entry, it is easier to just use getAttribute.
window.alert( 'The value of the \'title\' attribute is: ' +
theParagraph.getAttribute('title') );
Attribute names are case sensitive. For example, bgcolor must be written as bgColor.
Test it here: get the attribute value.
Note that according to the specification, getAttribute should always return a string. However, this makes it impossible to differentiate between empty attributes and unspecified attributes. For this reason, browsers will return null for unspecified attributes, even though this is wrong. Opera 7-8 returns an empty string - this was changed to null in Opera 9. As a result, code that checks for attributes and incorrectly tests against null will fail in Opera 7 and 8, because a string will never equate to null. There is no need to test against null, just check if get attribute failed to retrieve a value:
if(!element.getAttribute('attribute_name'))
Changing the attribute
- Internet Explorer 7- (and some minor browsers) cannot set values for style, class or event handlers.
- Internet Explorer 8, and 9+ in quirks mode cannot set values for event handlers.
- Opera 7.0-7.1, could not set the align attribute.
The attributes of an element can be set or changed using setAttribute:
element.setAttribute('attributeName','attributeValue')
theParagraph.setAttribute('align','center')
Attribute names are case sensitive. For example, bgcolor must be written as bgColor.
You can also remove attributes, with a few exceptions, using removeAttribute:
theParagraph.removeAttribute('attributeName')
Move the paragraph here so you can see what you are doing then:
Change the title attribute and Change it back (hold your mouse over the paragraph to see if the title has changed).
- NetFront 3.2- cannot move the existing paragraph to a new location (version 3.3 may also fail, if the device does not have much memory), but it gets the rest of the example right.
Reading and writing problematic attributes
Internet Explorer 7- (and some minor browsers) cannot set values for style, class or event handlers, using setAttribute. Internet Explorer 8 has fixed most of these, but still cannot set event handlers. Internet Explorer 9 can now set these attributes in standards mode. A few more browsers also have trouble reading these attributes using getAttribute. Internet Explorer generally returns the wrong type of result, such as an object instead of a string, when using getAttribute for these attributes. The DOM does provide a few alternatives to allow the functionality to be approximated in these browsers.
The class is available as a read/write string called className - this is discussed in the DOM CSS chapter of this tutorial.
The event handler attributes are available as referenced functions (this is not the case for handlers added using DOM events), with their names matching the attribute name; element.onclick. These are read/write but must be written as a reference to a function, not as a direct string. They can also be written as a string using the Function constructor:
element.onclick = new Function(codeAsAString);
They may also be read as a string using the toString method of the function, but note that it will normally contain the anonymous function wrapper, and may not be available at all in browsers running on devices with limited capabilities, such as mobile phones. Note also that it will not be available at all if the attribute is not present:
var functioncode = element.onclick.toString();
The string value of the style attribute is available as a read/write string called cssText, which
is a property of the style object, which itself is a property of the element. Note, however, that it is not
supported very well; Safari does not support it up to version 1.1 (reading it produced the value null
),
Mozilla versions prior to 1.0 could not write to it, and iCab 3-, NetFront and Escape/Evo do not support it at all.
To avoid problems with its use, a combination of cssText and getAttribute/setAttribute
can be used. To read it:
var cssString;
cssString = element.style.cssText;
if( typeof(cssString) != 'string' ) {
cssString = element.getAttribute('style');
}
To write it, simply set both versions, and the browser will use whichever one works:
var cssString = 'color:lime;font-weight:bold;';
element.style.cssText = cssString;
element.setAttribute('style',cssString);
Note that this will then prevent it from being read correctly if other styles are changed individually. If this will cause a problem, check if cssText is supported first:
var cssString = 'color:lime;font-weight:bold;';
if( typeof(element.style.cssText) == 'string' ) {
element.style.cssText = cssString;
}
element.setAttribute('style',cssString);
Referencing the text nodes
- Mozilla/Firefox/Netscape 6+ and Opera 9.2x- will split very long text nodes into multiple smaller text nodes.
To give a full example, I will try to reference the text node 'HTML you might'. To do this, I will go through the second entry of the childNodes array of the 'P'. This will be a reference to the 'B'. I will then look at the firstChild (equivalent to the first entry in the childNodes collection) of the 'B' to reference the text node.
window.alert( 'The value of the text node is:\n' +
theParagraph.childNodes[1].firstChild.nodeValue );
Test it here: get the value of the text node.
Also important to note is that although the specifications say that no matter how much text exists between tags, it should all be in one text node, in practice this is not always the case. In Opera 7-9.2x and Mozilla/Netscape 6+, if the text is larger than a specific maximum size, it is split into multiple text nodes. These text nodes will be next to each other in the childNodes collection of the parent element.
In Opera 7-9.2x, this maximum text node size is 32 KB. In Mozilla/Firefox/Netscape 6+, it is 4 KB. Although the normalize() method of the parent node(s) should be able to replace the multiple text nodes with a single text node containing all the text, this only works in Mozilla/Firefox/Netscape 6+. In Opera 7-9.2x it puts all of the text into a single node and then truncates that node to 32 KB, so the contents of all except the first node are lost. Running the normalize method can crash Internet Explorer 6 and does not exist in Internet Explorer 5 on Windows.
For this reason, I do not recommend trying to normalize. It is better to manipulate the contents of text nodes separately. In fact, you can create your own text nodes and add them to the childNodes collection. Although to the DOM, they will still appear as separate nodes, they will appear as a single piece of text in the document. Basically, you need to be aware that your text may be split into several nodes, if it is 4 KB or over, or if you have added extra text nodes in yourself. In order to get that text in a single variable, you may need to look through every child node, and if they are consecutive text nodes append them together to get the total string.
Changing the text of text nodes
- KHTML/WebKit Konqueror 3.4-, Safari 1.2- and OmniWeb 4.5-5.0 do not always reflow the page when changing the text of text nodes.
- Tkhtml Hv3 versions before September 2007 cannot change the value of existing text nodes.
Once you have a reference to the text node, you can read or write its contents using its nodeValue.
theParagraph.childNodes[1].lastChild.nodeValue = 'want to change';
Move the paragraph here so you can see what you are doing then:
Change the text node | Change it back.
- NetFront 3.2- cannot move the existing paragraph to a new location (version 3.3 may also fail, if the device does not have much memory), but it gets the rest of the example right.
Creating new nodes and removing existing ones
- NetFront 3.2- cannot create or insert new nodes (version 3.3 often crashes with scripts that use this, if it is running on a lower memory device).
This is what the DOM was truly created for. In order to create new nodes, we use a couple of methods of the document object to create the node. We then insert the node into the main DOM tree, at which point the browser will display it. We can also move existing nodes (like the test paragraph) simply by inserting them into the DOM tree somewhere else.
Note that when creating element nodes, the element name must be in lower case. Although in theory it should not be case sensitive with HTML, I have noticed some problems in Konqueror when using upper case with strict doctypes - see the top of this page. It will be case sensitive with XHTML (in all compliant browsers, not just Konqueror), and must be in lower case.
var theNewParagraph = document.createElement('p');
var theTextOfTheParagraph = document.createTextNode('Some content.');
theNewParagraph.appendChild(theTextOfTheParagraph);
document.getElementById('someElementId').appendChild(theNewParagraph);
We could also use insertBefore instead of appendChild, or even manually add the new element to the end of the end of the
childNodes collection. Using replaceChild, we could also overwrite existing nodes. It is also possible to copy a node using
cloneNode(true)
. This returns a copy of the node but does not automatically add it into the childNodes collection. Using
element.removeChild(referenceToChildNode)
, we can remove existing nodes.
Test it here: create a new paragraph.
How about something even more complicated. What about adding HTML elements within the new element, instead of just plain text. Here, I will recreate the test sentence from above, one piece at a time.
//three elements are required: p, b, br
var theNewParagraph = document.createElement('p');
var theBoldBit = document.createElement('b');
var theBR = document.createElement('br');
//set up theNewParagraph
theNewParagraph.setAttribute('title','The test paragraph');
//prepare the text nodes
var theText1 = document.createTextNode('This is a sample of some ');
var theText2 = document.createTextNode('HTML you might');
var theText3 = document.createTextNode('have');
var theText4 = document.createTextNode(' in your document');
//put together the bold bit
theBoldBit.appendChild(theText2);
theBoldBit.appendChild(theBR);
theBoldBit.appendChild(theText3);
//put together the whole paragraph
theNewParagraph.appendChild(theText1);
theNewParagraph.appendChild(theBoldBit);
theNewParagraph.appendChild(theText4);
//insert it into the document somewhere
document.getElementById('someElementId').appendChild(theNewParagraph);
Test it here: recreate the test paragraph.
In case you were wondering how I managed to make those new paragraphs end up just above the links you clicked on, this is how.
The link you clicked on is in a paragraph. The paragraph is in a div (although this technique would work anywhere). The script is run in the event handler for the link. Therefore, in the handler function, 'this' refers to the link. The parentNode of the link is the paragraph - this.parentNode - and the parentNode of the paragraph is the div - this.parentNode.parentNode. I want to get the div to import the new paragraph node I have created above the paragraph the link is in, so I want to say this:
theDIV.insertBefore(theNewParagraph,theCurrentParagraph);
In JavaScript, this would be:
this.parentNode.parentNode.insertBefore(theNewParagraph,this.parentNode);
As for making them disappear when you click on them, when creating these paragraphs, I also assign an onclick event handler function that uses this.parentNode to reference the div, and then uses removeChild to delete the paragraph:
theNewParagraph.onclick = function () { this.parentNode.removeChild(this); };
Note that nodes belong to the document they were created in. So for example, if your page uses frames, and you create a paragraph node in one frame then attempt to add it to the document in another frame, it will not work. In theory you can use the document.importNode method to create a copy of it in the new document, but that method does not exist in Internet Explorer. If a script in one frame needs to create a node and append it to a document in another frame, it must use the document object for the destination frame when creating the node:
var newP = parent.frames['leftfr'].document.createElement('p');
parent.frames['leftfr'].document.body.appendChild(newP);
Using document fragments
- Internet Explorer 5.x on Windows, NetFront 3.3- and Tkhtml Hv3 do not support document fragments.
- Internet Explorer 5 on Mac cannot add text nodes to document fragments, and cannot append the fragment's contents to a document.
It is also possible to deal with multiple nodes at the same time. Say for example that you want to create 10 paragraphs, and add them all to the document at the same time as each other, instead of one at a time. This can be done using a document fragment. The benefit of using this is that it creates fewer document reflows, and as a result it can improve performance for big changes.
A document fragment is like a normal element (such as a div), except that it cannot become a part of the document itself. If you try to append a document fragment to any part of a document, instead of appending the fragment, the browser will add the child nodes of the fragment. For example, you create 10 paragraphs, append them to a document fragment, then append the document fragment to the body of a document. Instead of appending the fragment to the body, it will add the 10 paragraphs as children of the body.
var frag = document.createDocumentFragment();
for( var i = 0, p; i < 10; i++ ) {
p = document.createElement('p');
p.appendChild(document.createTextNode('Paragraph '+(i+1)));
frag.appendChild(p);
}
document.body.appendChild(frag);
Test it here: create and append a new document fragment.
Last modified: 19 March 2011