Wednesday, May 26, 2010

Why the -outputfile switch in XML Thin Client is useful

A simple tip...

I was recently working with a set of files that contained non-English Unicode characters and trying to process the data with XSLT 2.0 and XQuery 1.0. I was using the Thin Client for XML that is part of the XML Feature Pack which offers J2SE and command line invocation options for XSLT and XQuery when used in a WebSphere environment.

I did something like:


.\executeXSLT.bat -input input.xml stylesheet.xslt > temp.xml
.\executeXQuery.bat -input temp.xml query.xq > final.xml


And this resulted in something like:


... executeXSLT "works" fine ...
... executeXQuery "fails" with ...
An invalid XML character (Unicode: 0x[8D,3F,E6,8D]) was found in the element content of the document
.
An invalid XML character (Unicode: 0x[8D,3F,E6,8D]) was found in the element content of the document
.


I figured something was wrong with the encodings in the XSLT output method or the xml encoding of the files themselves or -- worse yet -- something wrong with our processor. After some quick thinking by my excellent team, they had me replace the output redirection (where my OS and console got a chance to see/mess with the data between the processor and temp.xml) with the -outputfile option (which allows the processor to directly write to the file) like:


.\executeXSLT.bat -input input.xml -outputfile temp.xml stylesheet.xslt
.\executeXQuery.bat -input temp.xml -outputfile final.xml query.xq


Problem solved. No corruption of the data.

Lesson learned: Keep all the data inside of the processor and don't introduce things (like the Windows Console) into the pipeline that won't honor (or know) the encoding.

0 comments: