Thursday, February 7, 2008

Programming functionally: A proto-XPath implementation in Javascript

This has been sitting on my hard drive for a while. I thought I'd clean it up and post it! Well ... at least I've posted it.


If you don't know what "functional programming" is, wikipedia has a fairly technical overview. Loosely, "object" is to "object-oriented programming" as "function" is to "functional programming".

I wrote this to give a taste of the kind of power functional programming can bring to bear on a problem, as well as to document the basics of my psuedo-xpath implementation for myself. Some love to hate Javascript, but I think this comes more from the hostile browser environment than the language, which is quite nice. Note that, while I'm heavily using Javascript's functional features, I'm not making any effort to be 'purely functional'. (If I were using a purist style, I would not be using variable assignments and avoiding for/while loops, instead using recursion and if statements to the same effect.)

JS makes it quite easy to pass functions to as arguments, one of the core distinctions of programming in a functional style:
function doSomethingWithPi(arg) {
arg(3.1415);
}
function eat(x) {
alert(x + " sure was yummy!");
}
doSomethingWithPi(eat); // ------> "3.1415 sure was yummy!"

Simple enough. Now consider:
function siblingAxis(f,node) {
var v = node.nextSibling;
while (v != null) {
f(v);
v = v.nextSibling;
}
}

This function, when given a function & a DOM node, calls that function for every sibling of that node (until out of siblings). If you're familiar with the XPath spec, you already know why the function is called "siblingAxis". Here's another, very similar function -- it calls the provided function once for each child of the given node:
function childAxis(f,node) {
var v = node.firstChild;
if (v != null) {
f(v); // call the function for the first child node
siblingAxis(f,v); // do the same for all its siblings...
}
}

And here's one that takes two functions; the first acts as a test for the given DOM node and the second is only called if the test succeeds with the given node.
function test(test_f, f, node) {
if (test_f(node)) {
f(node);
}
}

Now, if we could wire our test function up to our childAxis function, we would have the beginnings of something fairly powerful. The trick is to curry the function test. It's easier to show what this means rather than try to explain it.
function test(test_f, f) {
return function(node) {
if (test_f(node)) {
f(node);
}
}
}

Now, test(test_f,action_f) does not perform its test or action right away. Instead it returns a new function, which expects a node and in turn runs the test against the node, and the other function if it's successful. Javascript implicitly keeps test_f and action_f arguments intact and available in the returned function. This is called a closure in FP jargon. By currying test, we can wire it up with e.g., childAxis:
function isLink(node) {
return (node.nodeType == 1 && node.localName.toLowerCase() == "a");
}

function doSomethingWithAllLinkChildren(f, node) {
childAxis(test(isLink,f), node); // easy peasy
}

It only makes sense to do the exact same thing with the axis functions themselves. This means that each axis function takes a "do something with nodes on this axis" function, and returns a function that, from a starting node, actually traverses the given axis and does the something.
function siblingAxis(f) {
return function(node) {
var v = node.nextSibling;
while (v != null) {
f(v);
v = v.nextSibling;
}
}
}

function childAxis(f) {
return function(node) {
var v = node.firstChild;
if (v != null) {
f(v); // call the function for the first child node
siblingAxis(f)(v); // siblingAxis is curried: siblingAxis(f,v) becomes siblingAxis(f)(v)
}
}
}

What have we gained from this? Well, here's a function findem that alerts with the children of each span found within a div child of the given node. In xpath terms, we're after "/div/span/*" (with the starting node treated as the root):
function nameTest(nm) {
return function(node) {
return (node.nodeType == 1 && node.localName.toLowerCase() == nm.toLowerCase());
}
}

function nodealert(node) {
alert("found " + node.localName);
}

function findem(node) {
var path = childAxis(test(nameTest("div"),(childAxis(test(nameTest("span"),childAxis(nodealert)))));
path(node);
}

Let's make another small change to test, currying it again:
function test(test_f) {
return function(f) {
return function(node) {
if (test_f(node)) {
f(node);
}
}
}
}

Now, we can use a handy bit of non-javascript notation and say that f ⋅ g is shorthand for function(x) { f(g(x)) }, and rewrite findem in this style:
function findem(node) {
var path = childAxis⋅test(nameTest("div"))⋅childAxis⋅test(nameTest("span"))⋅childAxis⋅nodealert;
path(node);
}

Observe the wonderful correspondence between our functions and the various components of the path they represent ("/div/span/*"):
/     ---> childAxis
div ---> test(nameTest("div"))
/ ---> childAxis
span ---> test(nameTest("span"))
/ ---> childAxis
* ---> (n/a)


At this point, having XPath (or something like it) up and running in javascript is a very straightforward job.

  • Each axis (child, sibling, descendant, etc) can be represented by : function someAxis(f) { return function(node) { /* traverse the axis from the node, calling 'f' along the way */ } }

  • Each test (both "node tests" and other predicates) can represented as a function that takes a node and returns a boolean, then gets wrapped by test. We'd need to write helper functions such as nameTest above to construct the actual test functions.

  • The input path is parsed and translated into a list of axis and test functions. Note that all of them have the form function(f) { return function(node) { /*...*/ } }

  • Combine the list of functions into the form a(b(c(d(ff))) where ff is the 'action' function that you want to be called for any nodes returned by the path. This has the effect of calling the outer function(f) { /*...*/} for each node step with f as the next node step, and results in a function awaiting a node...

  • Call it with the root node; your ff will be called for every succeeding node!


Click here to see the complete example of the proto-XPath implementation that I've been using along the way. It adds parent,self, and descendant axes; it also insert an implicit test at each step of evaluation to ensure that each node is visited only once. (This prevents repeat visits during paths like "//div//div"). The parsing code is hokey but works.


Notes:

  • I'm using the term "currying" a bit loosely to refer both to the transformation of a function f(a,b,c) { /* ... */ } into f(a) { return f(b) { return f(c) { /*...*/ } } }, and to calling a function written in this style with partial arguments to return a new function, as in f(a_argument)(b_argument).


  • The code I've written is not purely functional, nor even close to it: However, it makes good use of some functional programming features, and is a nice functional/imperative compromise for javascript.


  • There is still quite a bit of work to get a full implementation of XPath up and running, as XPath is a large language. However, I believe the remaing work -- if you were interested in a full implementation of XPath -- consists mostly of straightforward extension of the sketch laid out here.

No comments: