is it possible to avoid matching literals in a string and only match at the start?

Instead of a complex regex for split() to separate the tokens you want, it’s often easier and simpler to just capture the tokens you want:

String input = "[02 Nov 2020 17:31:00,117] [12' for queue: 'weblogic.kernel.Default (self-tuning)'] [            ] [0EyKACgWsiME2VoobHq2EuBc49xq_NsuBmnV0dhrgPWrKotJgQ8-!-438943902!1604338067478] [Server] DEBUG tm.BitronixTransactionManager - shutting down journal - resetting servers";

for (Matcher m = Pattern.compile("\\[[^\\]]*\\]|-.*|\\S+").matcher(input); m.find(); ) {
    System.out.println(m.group());
}

Output

[02 Nov 2020 17:31:00,117]
[12' for queue: 'weblogic.kernel.Default (self-tuning)']
[            ]
[0EyKACgWsiME2VoobHq2EuBc49xq_NsuBmnV0dhrgPWrKotJgQ8-!-438943902!1604338067478]
[Server]
DEBUG
tm.BitronixTransactionManager
- shutting down journal - resetting servers

Explanation

\[[^\]]*\]    Match a `[ ]` enclosed token
|             OR
-.*           Match a `-` and the rest of the input (excl. linebreaks)
|             OR
\S+           Match a block of non-space characters (must be last)

The \S+ must be last, otherwise it will consume the [ or - characters that are lead-in to the other two token types.

This should also perform much better, because there is no lookaheads or lookbehinds, and there is little to no backtracking.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top