getBytes
always returns the bytes encoded in the platform’s default charset, which is probably UTF-8 for you.
Encodes this String into a sequence of bytes using the platform’s default charset, storing the result into a new byte array.
So you are essentially trying to decode a bunch of UTF-8 bytes with non-UTF-8 charsets. No wonder you don’t get expected results.
Though kind of pointless, you can get what you want by passing the desired charset to getBytes
, so that the string is encoded correctly.
System.out.println(new String("ó".getBytes(StandardCharsets.UTF_16), StandardCharsets.UTF_16));
System.out.println(new String("ó".getBytes(StandardCharsets.ISO_8859_1), StandardCharsets.ISO_8859_1));
System.out.println(new String("ó".getBytes(StandardCharsets.US_ASCII), StandardCharsets.US_ASCII));
System.out.println(new String("ó".getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8));
System.out.println(new String("ó".getBytes(StandardCharsets.UTF_16BE), StandardCharsets.UTF_16BE));
System.out.println(new String("ó".getBytes(StandardCharsets.UTF_16LE), StandardCharsets.UTF_16LE));
You also seem to have some misunderstanding about encodings. It’s not just about the number of bytes that a character takes. The byte-count-per-character for two encodings being the same doesn’t mean that they are compatible with each other. Also, it is not always one byte per character in UTF-8. UTF-8 is a variable-length encoding.
CLICK HERE to find out more related problems solutions.