KnowIT: i18n support for German Characters in Java

Problem: Issues on Double Byte support - Java platform.

Symptoms: The umlauts characters like ä,ö,ü in German / Spanish / Portuguese languages are garbled in the output stream. Java application receives data over a socket using an InputStreamReader. It reports "Cp1252" from its getEncoding() method:

/* java.net. */ Socket Sock = ...;

InputStreamReader is = new InputStreamReader(Sock.getInputStream());

System.out.println("Character encoding = " + is.getEncoding());

// Prints "Character encoding = Cp1252"

That doesn't necessarily match what the system reports as its code page. For example:

C:\>chcp

Active code page: 850

The application may receive byte 0x81, which in code page 850 represents the character |ü|. The program interprets that byte with code page 1252, which doesn't define any character at that value, so I get a question mark instead.

Solution:

You can work around this problem by using code page 850 i.e., by adding another command-line option:

java.exe -Dfile.encoding=Cp850 ...

ENC=...

java.exe -Dfile.encoding=%ENC% ...

To write at the command line,

> chcp 850

Active code page: 850

> type 1251.txt

abcde xyz

ÓßÔÒõ ²■

Some pointers relevant to this:

> http://illegalargumentexception.blogspot.com/2009/04/i18n-unicode-at-windows-command-prompt.html

> http://stackoverflow.com/questions/1336930/how-do-you-specify-a-java-file-encoding-value-consistent-with-the-underlying-wind

KnowIT

Friday, October 23, 2009

i18n support for German Characters in Java

No comments:

Post a Comment

Search This Blog

Blog Archive

HitCounter

About Me