Squeeze the most performance out of xpressive with these tips and tricks.
Compiling a regex (dynamic or static) is far more expensive
than executing a match or search. If you have the option, prefer to compile
a pattern into a
object once and reuse it rather than recreating it over and over.
objects are not mutated by any of the regex algorithms, they are completely
thread-safe once their initialization (and that of any grammars of which
they are members) completes. The easiest way to reuse your patterns is to
simply make your
objects "static const".
objects are not thread-safe, so don't go wild reusing them across threads.
This is a corollary to the previous tip. If you are doing multiple searches,
you should prefer the regex algorithms that accept a
object over the ones that don't, and you should reuse the same
object each time. If you don't provide a
object, a temporary one will be created for you and discarded when the algorithm
returns. Any memory cached in the object will be deallocated and will have
to be reallocated the next time.
xpressive provides overloads of the
algorithms that operate on C-style null-terminated strings. You should prefer
the overloads that take iterator ranges. When you pass a null-terminated
string to a regex algorithm, the end iterator is calculated immediately by
strlen. If you already
know the length of the string, you can avoid this overhead by calling the
regex algorithms with a
On average, static regexes execute about 10 to 15% faster than their dynamic counterparts. It's worth familiarizing yourself with the static regex dialect.
optimize flag tells the
regex compiler to spend some extra time analyzing the pattern. It can cause
some patterns to execute faster, but it increases the time to compile the
pattern, and often increases the amount of memory consumed by the pattern.
If you plan to reuse your pattern,
is usually a win. If you will only use the pattern once, don't use
Keep the following tips in mind to avoid stepping in potholes with xpressive.
With static regexes, you can create grammars by nesting regexes inside one another. When compiling the outer regex, both the outer and inner regex objects, and all the regex objects to which they refer either directly or indirectly, are modified. For this reason, it's dangerous for global regex objects to participate in grammars. It's best to build regex grammars from a single thread. Once built, the resulting regex grammar can be executed from multiple threads without problems.
This is a pitfall common to many regular expression engines. Some patterns
can cause exponentially bad performance. Often these patterns involve one
quantified term nested withing another quantifier, such as
"(a*)*", although in many cases,
the problem is harder to spot. Beware of patterns that have nested quantifiers.