Instead of a complex regex for split()
to separate the tokens you want, it’s often easier and simpler to just capture the tokens you want:
String input = "[02 Nov 2020 17:31:00,117] [12' for queue: 'weblogic.kernel.Default (self-tuning)'] [ ] [0EyKACgWsiME2VoobHq2EuBc49xq_NsuBmnV0dhrgPWrKotJgQ8-!-438943902!1604338067478] [Server] DEBUG tm.BitronixTransactionManager - shutting down journal - resetting servers";
for (Matcher m = Pattern.compile("\\[[^\\]]*\\]|-.*|\\S+").matcher(input); m.find(); ) {
System.out.println(m.group());
}
Output
[02 Nov 2020 17:31:00,117]
[12' for queue: 'weblogic.kernel.Default (self-tuning)']
[ ]
[0EyKACgWsiME2VoobHq2EuBc49xq_NsuBmnV0dhrgPWrKotJgQ8-!-438943902!1604338067478]
[Server]
DEBUG
tm.BitronixTransactionManager
- shutting down journal - resetting servers
Explanation
\[[^\]]*\] Match a `[ ]` enclosed token
| OR
-.* Match a `-` and the rest of the input (excl. linebreaks)
| OR
\S+ Match a block of non-space characters (must be last)
The \S+
must be last, otherwise it will consume the [
or -
characters that are lead-in to the other two token types.
This should also perform much better, because there is no lookaheads or lookbehinds, and there is little to no backtracking.
CLICK HERE to find out more related problems solutions.