Using TreeWalker to query non-element nodes

25 Feb 2021 treewalker

I was recently working on a project where we needed to frequently extract and then perform computation on all the text nodes in a DOM tree.

If it had been normal HTML elements we'd needed, we'd have had no problem - querySelectorAll() is great for that. But that can find only elements - not other types of node, e.g. comment nodes or text nodes.

What we needed was a TreeWalker.

TreeWalker? 🔗

As MDN puts it, a TreeWalker, which sounds like something out of A Game of Thrones:

...represents a subset of nodes and a current position with them.

Here's a simple example, to get all div elements anywhere within body and add a class, "foo":

let walker = document.createTreeWalker(document.body, NodeFilter.SHOW_ELEMENT),
    currNode;
while(currNode = walker.nextNode()) currNode.classList.add('foo');

There, we create a new TreeWalker, telling it the container node to look within, and what sort of nodes we're interested in (this latter takes the form of a static constant on the built-in NodeFilter object.) To get HTML elements, we need SHOW_ELEMENT, but there are other options.

We can then iterate over els to do whatever we want with the captured elements, each time moving on to the next node via nextNode().

We don't have to go to the next node; there are a bunch of traversal methods available to TreeWalkers.

Selecting non-element nodes 🔗

So far, we've used TreeWalker to do a job that querySelectorAll('*') could have done, and could have done more succinctly - namely, to get all elements.

How would we adapt it to select, say, comments nodes, or text nodes? Simple - we just use a different constant for parameter 2:

NodeFilter.SHOW_COMMENT - for comment nodes
NodeFilter.SHOW_TEXT - for text nodes

...and there's a bunch of other possibilities, too, though some are deprecated.

An (inelegant) alternative for getting, say, all text nodes, would be a multi-loop situation like this:

let els = document.body.querySelectorAll('*'),
    tNodes = [];
els.forEach(el =>
    tNodes.push([...el.childNodes].filter(node => node.nodeType == 3))
);
tNodes = textNodes.flat();

Applying filters 🔗

createTreeWalker() takes an optional, third param, where we can implement a node filter to filter the nodes. This is an object that must implement an acceptNode() method, where we do our filtering. So to accept only divs with the class "foo":

let walker = document.createTreeWalker(
    document.body,
    NodeFilter.SHOW_ELEMENT,
    {acceptNode: el => el.matches('div.foo')}
);

(Actually, to be completely correct, our acceptNode method is supposed to return another static constant, either NodeFilter.FILTER_ACCEPT or NodeFilter.FILTER_REJECT, but returning a boolean seems to work as well.)

Performance 🔗

A quick note on performance. It has been suggested variously in some Stack Overflow answers that TreeWalkers can, in some cases, be much faster than other node-retrieval/iteration approaches.

However, like with all questions of optimisation, situations vary massively, on a plethora of factors. I've done some basic benchmarking and found cases where TreeWalkers were slower than alternatives such as querySelectorAll(), and others where there was little difference.

It's something to bear in mind, though; if you've got expensive DOM traversal operations going on, a TreeWalker may offer some optimsation.

---

That's it; I hope you found this mini-guide useful!

Did I help you? Feel free to be amazing and buy me a coffee on Ko-fi!