UTF-8 write xml successful


today I faced with very interesting problem. When I try to rewrite xml file.

I have 3 ways to do this. And I want to know the best way and reason of problem.

I.

File file = new File(REAL_XML_PATH);
         try {
         FileWriter fileWriter = new FileWriter(file);
         XMLOutputter xmlOutput = new XMLOutputter();

     xmlOutput.output(document, System.out);
     xmlOutput.output(document, fileWriter);

     fileWriter.close();
     } catch (IOException e) {
     // TODO Auto-generated catch block
     e.printStackTrace();
     }

In this case I have a big problem with my app. After writing in file in my own language I can't read anything. Encoding file was changed on ANSI javax.servlet.ServletException: javax.servlet.jsp.JspException: Invalid argument looking up property: "document.rootElement.children[0].children"

II.

File file = new File(REAL_XML_PATH);
         XMLOutputter output=new XMLOutputter();
         try {
            output.output(document, new FileOutputStream(file));
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

In this case I haven't problems. Encoding wasn't change. No problem with reading and writing.

And this article http://tripoverit.blogspot.com/2007/04/javas-utf-8-and-unicode-writing-is.html

And I want to know the best way and reason of problem.



Well, this looks like the problem:

FileWriter fileWriter = new FileWriter(file); 

That will always use the platform default encoding, which is rarely what you want. Suppose your default encoding is ISO-8859-1. If your document declares itself to be encoded in UTF-8, but you actually write everything in ISO-8859-1, then your file will be invalid if you have any non-ASCII characters - you'll end up writing them out with the ISO-8859-1 single byte representation, which isn't valid UTF-8.

I would actually provide a stream to XMLOutputter rather than a Writer. That way there's no room for conflict between the encoding declared by the file and the encoding used by the writer. So just change your code to:

FileOutputStream fileOutput = new FileOutputStream(file);
...
xmlOutput.output(document, fileOutput);

... as I now see you've done in your second bit of code. So yes, this is the preferred approach. Here, the stream makes no assumptions about the encoding to use, because it's just going to handle binary data. The XML writing code gets to decide what that binary data will be, and it can make sure that the character encoding it really uses matches the declaration at the start of the file.

You should also clean up your exception handling - don't just print a stack trace and continue on failure, and call close in a finally block instead of at the end of the try block. If you can't genuinely handle an exception, either let it propagate up the stack directly (potentially adding throws clauses to your method) or catch it, log it and then rethrow either the exception or a more appropriate one wrapping the cause.


If I remember correctly, you can force your xmlOutputter to use a "pretty" format with: new XMLOutputter(Format.getPrettyFormat()) so it should work with I too

pretty is:

Returns a new Format object that performs whitespace beautification with 2-space indents, uses the UTF-8 encoding, doesn't expand empty elements, includes the declaration and encoding, and uses the default entity escape strategy. Tweaks can be made to the returned Format instance without affecting other instances.