input.regex in Hive

Your first pattern does not match the entire string, and field matching parts are [^ ]*, that is, any 0+ chars other than a space, so the last field cannot be matched (it contains spaces).

The second regex also contains \S+ patterns matching 1 or more chars other than whitespace, and the last one does not match the last field.

You may use

^(\S+)\t+(\S+)\t+(\S+)\t+(.+)
^([^\t]*)\t+([^\t]*)\t+([^\t]*)\t+(.*)

See the regex demo

The [^\t]* matches any field in a tab-delimited text since it matches zero or more chars other than a tab.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top