Why using utf8 patterns within perl substitute(s) and match(m) operators within one-liners does not work?

perl -C63 -n -e "print if m{Текст на юникоде}" file

-C63 applies various flags to tell Perl that input and output files are in UTF8.

perl -C63 -n -e "print if m{Текст на юникоде}" file

-Mutf8 tells the Perl compiler that your source code is in UTF8.

-C63 effects how Perl sees the data in file. -Mutf8 effects how Perl sees the code in your -e option. In order for Perl to understand that the input file and the source code should both be interpreted as UTF8, you need both options.

$ perl -Mutf8 -C63 -n -e "print if m{Текст на юникоде}" file
Текст на юникоде

Update: Oh, and I should probably add that the simplest option works as well (but for all the wrong reasons!)

$ perl -n -e "print if m{Текст на юникоде}" file
Текст на юникоде

In this case, it works because Perl interprets both the input and the source code as being made up of single-byte Latin-1 characters. Please don’t do this 🙂

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top