Problem: Issues on Double Byte support - Java platform.
Symptoms: The umlauts characters like ä,ö,ü in German / Spanish / Portuguese languages are garbled in the output stream. Java application receives data over a socket using an InputStreamReader. It reports "Cp1252" from its getEncoding() method:
/* java.net. */ Socket Sock = ...;
InputStreamReader is = new InputStreamReader(Sock.getInputStream());
System.out.println("Character encoding = " + is.getEncoding());
// Prints "Character encoding = Cp1252"
That doesn't necessarily match what the system reports as its code page. For example:
C:\>chcp
Active code page: 850
The application may receive byte 0x81, which in code page 850 represents the character |ü|. The program interprets that byte with code page 1252, which doesn't define any character at that value, so I get a question mark instead.
Solution:
You can work around this problem by using code page 850 i.e., by adding another command-line option:
java.exe -Dfile.encoding=Cp850 ...
ENC=...
java.exe -Dfile.encoding=%ENC% ...
To write at the command line,
> chcp 850
Active code page: 850
> type 1251.txt
abcde xyz
ÓßÔÒõ ²■
Some pointers relevant to this:
> http://illegalargumentexception.blogspot.com/2009/04/i18n-unicode-at-windows-command-prompt.html
> http://stackoverflow.com/questions/1336930/how-do-you-specify-a-java-file-encoding-value-consistent-with-the-underlying-wind
No comments:
Post a Comment