identical strings from different data files won’t match in r

This was a great puzzler. To answer – to diagnose the problem, charToRaw() was the answer.

> charToRaw(q)
 [1] 31 39 33 31 c2 a0 38 30 74 68 c2 a0 41 6e 6e 69 76 65
[19] 72 73 61 72 79
> charToRaw(z)
 [1] 31 39 33 31 20 38 30 74 68 20 41 6e 6e 69 76 65 72 73
[19] 61 72 79

Oh! Different! It seems to lie in the encoding, which, given that these were both plain ole’ CSVs I loaded from, I never would have guessed, but

> Encoding(q)
[1] "UTF-8"
> Encoding(z)
[1] "unknown"

In the end, I used iconv() on q to make it work

> iconv(q, from = 'UTF-8', to = 'ASCII//TRANSLIT') == z
[1] TRUE

This has been a weird journey, and I hope this helps someone else who is as baffled as I was – and they learn a few new functions along the way.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top