Sunday, May 30, 2010

The Cure for XML in Web 2.0?

Earlier, I blogged about the Pain of XML in Web 2.0. I alluded to not being happy with the answer I ended up with. I'm happy to say that I'll finally be talking about a possible solution. As you see here, I have submitted a paper for the Balisage 2010 Conference entitled Where XForms meets the glass: Bridging between data and interaction design along with Charles Wiecha and Rahul Akolkar.

Here is the abstract:

XForms offers a model-view framework for XML applications. Some developers take a data-centric approach, developing XForms applications by first specifying abstract operations on data and then gradually giving those operations a concrete user interface using XForms widgets. Other developers start from the user interface and develop the MVC model only as far as is needed to support the desired user experience. Tools and design methods suitable for one group may be unhelpful (at best) for the other. We explore a way to bridge this divide by working within the conventions of existing Ajax frameworks such as Dojo.


Interested? Let me know and we can get a review copy of the paper to you. I have talked to many clients that want to integrate their meta data driven XML dominant data to the Web 2.0 work with DOJO and run into the impedance mismatch wall. Hopefully that wall will be coming down soon.

BTW, if you'd like to attend this great conference to hear about this topic and many others on the jam packed agenda, here is a great link to use to convince your management to let you join us in Montreal.

Thursday, May 27, 2010

XQuery: Powerful, Simple, Cool .. "Demo"

At IBM Impact this year, I did talks about the XML Feature Pack as well as basic introduction to the XPath 2.0, XSLT 2.0 and XQuery 1.0. I think one of the most useful parts of my talk was when I demoed code in XQuery. I found that people really saw the light (how simple and fully functioned XQuery is) once people saw the code in a useful application. Also, people that were experienced with XPath 1.0 appreciated the new features and people who had experience with XSLT 1.0 appreciated the syntax (closer to imperative coding). The application I used in the demo was the download stats program I have blogged about before. Let me take a second to do the same "demo" here.

First, I have an XML input file of all the downloads over a certain time period. That XML file could come from a web services, a JMS message, or be loaded from a XML database. The data looks something like:


<?xml version="1.0" encoding="UTF-8"?>
<downloads>
<download>
<transaction>1</transaction>
<userid>user1</userid>
<uniqueCustomerId>uid-1</uniqueCustomerId>
<filename>xml_and_import_repositories.zip</filename>
<name>Mr. Andrew Spyker</name>
<email>user@email.com</email>
<companyname>IBM</companyname>
<datedownloaded>2009-11-20</datedownloaded>
</download>
<!-- more download records repeating -->
</downloads>


First I want to quickly get rid of all downloads that have "education" in the filename. Next I want to split the downloads that come from IBM'ers (email or company has some version of IBM in it) vs. the downloads that come from clients. Of those groups, I want to quickly group repeat downloaders (by uniqueCustomerId). I won't include it here, but I've show how to write some of this with Java and DOM in the past. It's sufficient to say that this code is very complex (imagine all the loops through the data you'd write for each of these steps). Let's look at these steps in XQuery:


(: Quickly get rid of education downloads :)
declare variable $allNonEducationDownloads := /downloads/download[not(contains(filename, '/education/'))];

(: Split the IBM downloads from non-IBM downloads :)
declare variable $allIBMDownloads :=
$allNonEducationDownloads[contains(upper-case(email), 'IBM')] |
$allNonEducationDownloads[contains(upper-case(companyname), 'IBM')] |
$allNonEducationDownloads[contains(upper-case(companyname), 'INTERNATIONAL BUSINESS MACHINES')];

(: Get the unique IBM downloader id's :)
declare variable $allIBMUniqueIds := distinct-values($allIBMDownloads/uniqueCustomerId);

(: Get the non-IBM downloads :)
declare variable $allNonIBMDownloads := $allNonEducationDownloads except $allIBMDownloads;

(: Get the unique non-IBM downloader id's :)
declare variable $allINonIBMUniqueIds := distinct-values($allNonIBMDownloads/uniqueCustomerId);


I think the most powerful line of the above code is the "except" statement. In that one quick statement, I can quickly express that we want to take all the downloads and remove the IBM downloads which leaves us with the non-IBM downloads. I think it's quite impressive that XQuery expresses the above statements in about the same amount of lines as the English language I used to describe the requirements.

Additionally, since you are telling the runtime what you want to do instead of how you want to do it, our runtime can aggressively optimize the data access in ways that we couldn't if we had to try to understand the Java byte codes were doing on top of the DOM programming model. Also, since XQuery is functional (the above variables are final) we could span this to multi-core more safely than imperative code as we can guarantee there are no side-effects. This is why, as a performance guy, I think declarative languages are a key to the future of performance.

Back to the code. For people used to XPath 1.0 and its lack of all the built-in schema types, dealing with things as simple as dates was problematic (they were just strings). Here are a few functions that show, with schema awareness, XPath 2.0 and XQuery 1.0 are much more powerful than before:


declare function my:downloadsInDateRange($downloads, $startDate as xs:date, $endDate as xs:date) {
$downloads[xs:date(datedownloaded) >= $startDate and xs:date(datedownloaded) <= $endDate]
};

declare function my:codeDownloadsInDateRange($downloads, $startDate as xs:date, $endDate as xs:date) {
let $onlyCodeDownloads := my:onlyCodeDownloads($downloads)
return my:downloadsInDateRange($onlyCodeDownloads, $startDate, $endDate)
};


These two functions give me a quick way to look for "code" downloads within a date range. In the first function, it's very easy to understand that this functions take the downloads and returns only the subset that has a datedownloaded that is after the start date and before the end date. In the second function, you can see it's easy to call the first function. At this point, I think most Java programmers might be saying "this isn't like what I expected based on my previous work with XSLT". While XSLT is a great language for transformation (XSLT 2.0 even better), I think XQuery gets a little closer to a general purpose language with the ability to declare functions and variables in a more terse syntax.

Finally, let's cover two more important powerful features - FLOWR and output construction. Once I have sliced and diced the data, I need to output the data into a XML report. XQuery gives you a very nice way to mix XML and declarative code as shown below:


declare function my:downloadsByUniqid($uniqid, $downloads) {
for $id in $uniqid
let
$allDownloadsByUniqueId := $downloads[uniqueCustomerId = $id],
$allCodeDownloadsByUniqueId := $downloads[uniqueCustomerId = $id and (contains(filename, 'repositories'))]
return
<downloadById id="{ $id }" codeDownloads="{ count($allCodeDownloadsByUniqueId) }" >
<name>{ data($allDownloadsByUniqueId[1]/name) }</name>
<companyName>{ data($allDownloadsByUniqueId[1]/companyname) }</companyName>
<codeDownloads>
{
for $download in $allCodeDownloadsByUniqueId order by $download/datedownloaded return
<download>
<filename>{ data($download/filename) }</filename>
<datedownloaded>{ data($download/datedownloaded) }</datedownloaded>
</download>
}
</codeDownloads>
</downloadById>
};



This shows how you can create new XML documents and quickly mix in XQuery code. Some people I've talked to think this looks like scripting languages in terms of simplicity. Also, you'll see a For ($id in $uniqid) Let ($allDownloadsByUniqueId, ohters) Return (downloadsById). These three parts make up part of what people call FLOWR (and pronounce flower) which stands for for, let, order by, where, return. The FLOWR statement is a very powerful construct -- able to do all the sorts of joins of data you're used to in SQL -- but in this example I've chosen to show how it can be used to simplify code in the general case where joining data wasn't the focus. For Java people, think of it as a much more powerful looping construct that integrates all the power of SQL for XML.

In the end, I have a 200 line program that takes all the download reports and organizes them by unique IBM vs. unique non-IBM ids and produces a month by month summary. I'd be surprised if you could come up with anything shorter and more maintainable that worked with Java and DOM. I hope this "demo" encourages you to consider using XQuery in your next project where you need to work with data.

Finally, if you find people trying to convince you that XQuery isn't capable enough to be a general language, take a look at a complete ray tracer written in XQuery in a mere 300 lines of code (a real statement of XQuery's power and brevity).

PS. You can download this XQuery program here and some sample input here. You can run them by getting the XML Feature Pack thin client here. The thin client is a general purpose Java based XQuery processor that you can use for evaluation and in production when used with the WebSphere Application Server. All you need to do is download the thin client, unzip and run the below command:


.\executeXQuery.bat -input downloads-fake.xml summary.xq

Wednesday, May 26, 2010

Why the -outputfile switch in XML Thin Client is useful

A simple tip...

I was recently working with a set of files that contained non-English Unicode characters and trying to process the data with XSLT 2.0 and XQuery 1.0. I was using the Thin Client for XML that is part of the XML Feature Pack which offers J2SE and command line invocation options for XSLT and XQuery when used in a WebSphere environment.

I did something like:


.\executeXSLT.bat -input input.xml stylesheet.xslt > temp.xml
.\executeXQuery.bat -input temp.xml query.xq > final.xml


And this resulted in something like:


... executeXSLT "works" fine ...
... executeXQuery "fails" with ...
An invalid XML character (Unicode: 0x[8D,3F,E6,8D]) was found in the element content of the document
.
An invalid XML character (Unicode: 0x[8D,3F,E6,8D]) was found in the element content of the document
.


I figured something was wrong with the encodings in the XSLT output method or the xml encoding of the files themselves or -- worse yet -- something wrong with our processor. After some quick thinking by my excellent team, they had me replace the output redirection (where my OS and console got a chance to see/mess with the data between the processor and temp.xml) with the -outputfile option (which allows the processor to directly write to the file) like:


.\executeXSLT.bat -input input.xml -outputfile temp.xml stylesheet.xslt
.\executeXQuery.bat -input temp.xml -outputfile final.xml query.xq


Problem solved. No corruption of the data.

Lesson learned: Keep all the data inside of the processor and don't introduce things (like the Windows Console) into the pipeline that won't honor (or know) the encoding.

Monday, May 3, 2010

New CEA demo videos..

Here are some more demos of the WebSphere Application Server Feature Pack for Communications Enabled Applications (CEA). We'll be using some of these in the IBM Impact 2010 sessions that I referenced here

The first one is doing some of the contact center widgets (like click to call then cobrowsing) on the iPhone:



Here is the coshopping between a user on an iPhone and a Desktop:


The next is a shorter and HD version of our JavaScript widget walk through:

Saturday, May 1, 2010

WWW2010/FutureWeb Conference Summary

I had the opportunity to attend the Future Web part of the WWW2010 Conference this past week in Raleigh, NC. This conference was quiet amazing both in the scope/influence and the fact that it was in my hometown.

I was able to hear some technical giants like Sir Tim Berners-Lee, Vint Cerf, Danah Boyd and Doc Searls. I was able to meet up with many people locally (including Paul Jones) as well as folks from across the world working to make the internet move into the future.

The content was as technical as it was social and political. While it's interesting to hear about the Semantic Web and HTML5 and all the cool new areas for search/data mining, it was equally valuable to hear about the impacts the Web is having on education, healthcare, and media to name a few. Also, I hear about the work of many of the conference attendees to change government processes for the better and how involved that can be with the web spanning countries in ways no other technology can/does.

Some reflections on the technical content:

1) Facebook was bashed (a lot). I actually learned that yet again, Facebook had opted me into sharing information without my understanding. The key take away from all of this bashing was that Facebook (and all web technologies) have become a critical part of our culture. The information we all are producing to create value for sites like Facebook/Twitter/etc needs to be treated with care. Marketing folks salivate at the opportunities that this community created content provides. However, just because we can share and use such data in ways that benefit our companies, we shouldn't assume we should.

2) Adobe/Apple was based (a lot). The value of open standards on the web is clear. Some of the stories shared by the panelists were quite interesting -- Talk about how the internet was just a radical idea that would never compete with the "serious networks" of the time prove how valuable standards can be and how they have and will continue to change the world.

3) There was a great presentation by Carl Malamud talking about "Rules for Radicals" that documented 10 rules to make large changes to government and technology, but the rules applied equally well - I can apply them to working within a large corporation. Note that while take aways #1 and #2 got a lot of press, the fact is there were many iPad's, Mac Books and Facebook borne meetups. Carl's presentation showed that we need to work to affect change within these communities. Here is a quick video summary of the rules.

4) I've had it on my TODO list for some time now to look at the building blocks of the Semantic web. I needed to understand how RDF/RDFa and SPARQL relate to XML and XQuery. I'm starting to form some opinions now based upon what I've heard at the conference and the work I've done this week to play with the technologies. I can say with certainty that this Web 3.0 (the web for machines vs. Web 2.0 which was the web for human) and its related technologies - RDF, SPARQL are not going away. I can also say that RDF/SPARQL doesn't compete with XML/XQuery. I can see that we'll need to bridge the gap between these worlds as we look to unleash not only the XML stored in many enterprises but also relational data. We'll also need to do this quickly as this world is moving fast and those people who don't embrace Web 3.0 will be as left behind as those that are still moving towards Web 2.0. An example of this speed that impressed me was the creation of a Facebook Open Graph Protocol vocabulary that was peer edited during a session on Thursday but then live by Friday. Amazing.

5) Twitter is a business tool. I've known this for some time and had success stories, but given the audience of this conference (passionate web technologists) I saw the value of Twitter magnified by at least an order of magnitude. Every academic attendee was communicating via Twitter. I used it to find the IBM attendees and collaborate with them in ways I'm sure I would have missed without Twitter. I used it to meet people I've never met before (even led to a lunch out with Doc Searls and Kathy Gill and another with a local company that is working with SIP technologies). If it wasn't for twitter, I'd say the value of the collaboration at this conference would have been decreased by that same order of magnitude. Another funny story that proves Raleigh is well connected was a fight between two bars on Twitter that broke out trying to earn our patronage for a dinner on the town. If you're a business that isn't paying attention to Twitter are you losing the cost of a few beers or worse?

I'm sure there were more take aways I'll remember, but for now that's a good starting point. If you were at Future Web and had other big take aways, post them in comments.

PS. I got to meet a bunch of great local XML/XQuery folks at the XQuery meet-up I organized. I look forward to collaborating with these folks locally in the future.