I was recently working with a set of files that contained non-English Unicode characters and trying to process the data with XSLT 2.0 and XQuery 1.0. I was using the Thin Client for XML that is part of the XML Feature Pack which offers J2SE and command line invocation options for XSLT and XQuery when used in a WebSphere environment.
I did something like:
.\executeXSLT.bat -input input.xml stylesheet.xslt > temp.xml
.\executeXQuery.bat -input temp.xml query.xq > final.xml
And this resulted in something like:
... executeXSLT "works" fine ...
... executeXQuery "fails" with ...
An invalid XML character (Unicode: 0x[8D,3F,E6,8D]) was found in the element content of the document
.
An invalid XML character (Unicode: 0x[8D,3F,E6,8D]) was found in the element content of the document
.
I figured something was wrong with the encodings in the XSLT output method or the xml encoding of the files themselves or -- worse yet -- something wrong with our processor. After some quick thinking by my excellent team, they had me replace the output redirection (where my OS and console got a chance to see/mess with the data between the processor and temp.xml) with the -outputfile option (which allows the processor to directly write to the file) like:
.\executeXSLT.bat -input input.xml -outputfile temp.xml stylesheet.xslt
.\executeXQuery.bat -input temp.xml -outputfile final.xml query.xq
Problem solved. No corruption of the data.
Lesson learned: Keep all the data inside of the processor and don't introduce things (like the Windows Console) into the pipeline that won't honor (or know) the encoding.
0 comments:
Post a Comment