Last time I needed to manipulate a large XML document I
remembered Paul Graham's comment in
What Made Lisp Different
that programs communicating with s-expressions is an idea recently reinvented as XML.
I began to wonder if I could just use s-expressions instead of having to
deal with XML.
Step 0: Define an s-expression representation for XML.
(tagname (@ attr "value" attr2 "value2")
(tagname2)
(tagname3 "data"))
If the attributes are optional, then that
requires an extra token (@) to distinguish
between attributes and the first nested tag.
If the attributes are not optional, then that
requires an extra token (nil) when there are
no attributes specified.
Most XML documents I've used have more tags
without attributes, so I opted for
using @.
Since @ can't be a tag name, if the first thing in the
list (after the tag name) is a list whose car is @ then
it is the XML attributes for that tag.
I dubbed this representation SML (S-expression Meta Language).
UPDATE: I came up with a
simpler representation.
Step 1: convert XML to s-expressions.
This seems like
a job for Perl. It's great at manipulating data formats.
So I wrote
xml2sexp.pl
which works great.
But it seems like a hack because there
might be some XML syntax that it doesn't handle. XSLT
was designed for transforming XML so it's
a good choice for this also. So of course, I did some
Googling and found this
xml2sexp.xsl, but it's not complete. It can't even convert itself.
So I decided to write my own. Yikes! Now I'm back to writing
XML, which I was trying to avoid! I can't think of a programming
language that is more unpleasant than XML. But it was a chance
to learn XSLT, so I wrote
xml2sexp.xsl too.
Step 2: Convert SML back to XML.
Now I'm in the Lisp world, so I can use my Lisp of choice,
which happens to be
Arc at the moment. So I wrote an Arc library,
sml.arc,
to convert SML back to XML. There's also a function to pretty-print
the SML, since the SML created by the conversion from XML is
pretty ugly SML.
Adios, XML! I'll never need to deal with you again. I can
just use SML whenever I need to work with XML files.