Step 0: Define an s-expression representation for XML.
(tagname (@ attr "value" attr2 "value2") (tagname2) (tagname3 "data"))If the attributes are optional, then that requires an extra token (@) to distinguish between attributes and the first nested tag.
If the attributes are not optional, then that requires an extra token (nil) when there are no attributes specified.
Most XML documents I've used have more tags without attributes, so I opted for using @.
Since @ can't be a tag name, if the first thing in the list (after the tag name) is a list whose car is @ then it is the XML attributes for that tag. I dubbed this representation SML (S-expression Meta Language).
UPDATE: I came up with a simpler representation.
Step 1: convert XML to s-expressions.
This seems like
a job for Perl. It's great at manipulating data formats.
So I wrote
xml2sexp.pl
which works great.
But it seems like a hack because there might be some XML syntax that it doesn't handle. XSLT was designed for transforming XML so it's a good choice for this also. So of course, I did some Googling and found this xml2sexp.xsl, but it's not complete. It can't even convert itself. So I decided to write my own. Yikes! Now I'm back to writing XML, which I was trying to avoid! I can't think of a programming language that is more unpleasant than XML. But it was a chance to learn XSLT, so I wrote xml2sexp.xsl too.
Step 2: Convert SML back to XML.
Now I'm in the Lisp world, so I can use my Lisp of choice,
which happens to be
Arc at the moment. So I wrote an Arc library,
sml.arc,
to convert SML back to XML. There's also a function to pretty-print
the SML, since the SML created by the conversion from XML is
pretty ugly SML.
Adios, XML! I'll never need to deal with you again. I can just use SML whenever I need to work with XML files.
12 comments: