Friday, February 26, 2010

The pain of XML in Web 2.0

I have continued to think about the end to end XML story across database, middle tier, and the browser client. I have talked to many organizations that work with standard industry XML documents (HL7, OAGIS, ACCORD, etc) where the XML view unifies the data of their entire enterprise. To these organizations, they work data out from the message queues to data storage to middle tier. However, what does it look like when they want to expose this data to the web tier? There are products that handle this well like Lotus Forms based on XML centric standards like XForms. But what about Web 2.0 libraries like DOJO or jQuery?

I took the download sample I described here, and tried to visualize the data to Web 2.0 webpages. The data format of the XML of interest is:



First, I looked at the DOJO bar chart code and did something like this in a server side XQuery program that generated HTML with the following under JavaScript:



This works as XQuery can return sequence of primitive types and in this case, I'm just returning a string and inserting it inside of the JavaScript code that expects value/text values. But what if I want to have a REST endpoint serve up XML directly and have the browser consume it?

DOJO DataGrid can read from a DataStore which can be hooked to an XmlStore. This means I can use a browser side control to read from my server side XML. All seems good until you get into the details. Here are snippets of the code to make this "work":



What are the some of the issues with this? First, the XmlStore has to map to a simpler format for the DataGrid to understand the XML data. That is why I had to manually tell the XmlStore to promote all the attribute values to similarly named element names. Nicely, the XmlStore supports allowing the ability to drill down to something other than the root item for the data, but it really just allows you to pick the name of an element (you'll see I specified "month"). The second problem is that for any complex industry specific data, likely that wouldn't be sufficient. What if I had multiple month elements at different parts of the XML tree? I'd end up getting a table that combined months that meant different things. What I'd really want is XPath as the root selector. Third, even though the Store abstraction is nice for handling multiple data formats, if I wanted data to be combined from different parts of the XML tree or multiple trees, what I really would like is XPath from the DataGrid formatter function itself.

Assuming this might be easier in the other very popular library for JavaScript query, I went off an investigated jQuery. I quickly found articles that talked about jQuery and XML. I patterned the next part of the article after this example. So, rewriting, I ended up with:



Now, with jQuery, I'm actually able to do a little more "native" xml query. You'll see that I can access attributes directly. You'll see that I can navigate only to the months or the monthByMonthDownloadStats. However, as someone that knows XQuery, this syntax seems very unnatural (I'm sure it's very clear to JavaScript and/or CSS writers). Unnaturalness aside, this seems more verbose. In XQuery I can write this like:



With this I get all of the same benefits that jQuery has (plus more - I'm almost sure jQuery wouldn't support the rich Functions and Operations of XPath 2.0 or any mixed XML content common in document centric XML approaches). XQuery mixes the construction of the content with the query of input much better in my opinion (I believe if we showed date comparison for example you'd see a worse comparison). Of course the benefit of jQuery over XQuery is XQuery doesn't run in the browser. I had to run the previous XQuery sample on the server. That is a pretty big benefit.

I think the summary of all of this, if you stayed with me this long, is that Web 2.0 technology in the browser isn't really ready to handle the complex XML documents that exist within most enterprises. This means if you want to marry Web 2.0 with the enterprise XML data, you'll need to write data conversions essentially extending the presentation tier across the browser and middle tier that simplify the data or use feature like the Web 2.0 Feature Pack to do this for you. Also, you'll need to learn two languages (arguably three if you consider jQuery a language) and programming styles when dealing the with XML data.

Given I look at WebSphere XML Strategy, I'm not sure I'm happy with this answer. I am currently looking towards other solutions to this issue. Given I'm rather new to Web 2.0, feel free to point out other things I didn't consider in the Web 2.0 space for XML processing (outside of XForms of course).

8 comments:

Anonymous said...

As far as I understand, XQuery in the Browser (http://www.xqib.org/) would solve the problem.

Andrew Spyker said...

@Anonymous

Yes. I intend to talk about other options outside of JavaScript libraries eventually. However, while XQIB might make this better its not available by default in every browser (as JavaScript is):

From http://www.zorba-xquery.com/index.php/xquery-in-the-browser-xqib/

"XQIB is a browser plugin which embeds Zorba"

Andrew Spyker said...

Interesting related discussion:

http://n3.nabble.com/Using-dojo-query-with-XML-files-td419926.html

Andrew Spyker said...

Another conversation on this blog on XMLToday.org:

http://www.xmltoday.org/content/pain-xml-web-20

Joel said...

It's very possible the enterprise space is a lot different from the consumer space, but I don't know anyone that's developing RIAs and sending XML to the browser - most are sending JSON. I think I'd rather write a streaming XML->JSON transcoder and front my documents that way, as dealing with XML in the client is a PITA even with the help of Dojo/jQuery.

And you make a good point that using one of those libraries really is like learning another language - it takes a definite time investment to understand those libraries and use them well enough to avoid pitfalls.

Andrew Spyker said...

@Joel

I think its less about enterprise vs. consumer space. Its more about if you're looking at the problem top down or bottom up. I think if you start by developing the presentation and drive that down, you'll be sending JSON and likely adapting that data to whatever enterprise data exists. If instead, you already have a well established data model and are adding a presentation interface, you would like to send parts of that data to the client and process it with a similar data model as the existing data. However, due to the pain, you can't and you end up creating not only an enterprise set of services for interconnecting servers, but also another set of services that "transcode" the full data model to the subset needed in the presentation.

Anonymous said...

The Zorba XQuery Processor provides a JSon serializer and functions which automatically transform XDM to JSon(ML). See http://www.zorba-xquery.com/doc/zorba-latest/zorba/html/converters.html for more information.

Andrew Spyker said...

@Anonymous re: Zorba

I don't think Zorba's JSON support really adds much to the situation I described -- trying to address XML data from the popular JavaScript Web 2.0 libraries. As I mentioned in the article, the Web 2.0 Feature Pack provides something similar on the server side, but I was trying to look at ways to do this under Web 2.0 libraries in the browser.