changing the encoding of strings?

getBytes always returns the bytes encoded in the platform’s default charset, which is probably UTF-8 for you.

Encodes this String into a sequence of bytes using the platform’s default charset, storing the result into a new byte array.

So you are essentially trying to decode a bunch of UTF-8 bytes with non-UTF-8 charsets. No wonder you don’t get expected results.

Though kind of pointless, you can get what you want by passing the desired charset to getBytes, so that the string is encoded correctly.

    System.out.println(new String("ó".getBytes(StandardCharsets.UTF_16), StandardCharsets.UTF_16));
    System.out.println(new String("ó".getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.ISO_8859_1));
    System.out.println(new String("ó".getBytes(StandardCharsets.US_ASCII), StandardCharsets.US_ASCII));
    System.out.println(new String("ó".getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8));
    System.out.println(new String("ó".getBytes(StandardCharsets.UTF_16BE), StandardCharsets.UTF_16BE));
    System.out.println(new String("ó".getBytes(StandardCharsets.UTF_16LE), StandardCharsets.UTF_16LE));

You also seem to have some misunderstanding about encodings. It’s not just about the number of bytes that a character takes. The byte-count-per-character for two encodings being the same doesn’t mean that they are compatible with each other. Also, it is not always one byte per character in UTF-8. UTF-8 is a variable-length encoding.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top