Finding the intersection between two regexes

Ben Evans benjamin.john.evans at gmail.com
Tue Apr 22 17:20:08 BST 2014


This piece of anecdotal evidence is now a good ~8 years out of date,
but I found that there were some surprising performance regressions
for a complex, combined regex versus versus multiple runs with simple
ones.

As ever the moral of the story is, if performance matters, always
measure, and get a friend to sanity check your results.

Cheers,

Ben

On Mon, Apr 21, 2014 at 9:45 AM, Dirk Koopman <djk at tobit.co.uk> wrote:
> On 21/04/14 03:14, Mark Fowler wrote:
>>
>> On Sunday, April 20, 2014, David Cantrell <david at cantrell.org.uk> wrote:
>>
>>> Can anyone point me at some code on the CPAN that, given two regexes,
>>> can figure out whether there are any bits of text that will be matched
>>> by both?
>>
>>
>>
>> I'm not sure I understand the question here, or moreover why you want to
>> do
>> this..is it just an intellectual exercise?
>>
>> If it's just a matter of wanting a single Perl regular expression that can
>> match something iff both of these other regular expressions would match,
>> surely you can just do this by inserting the second regular expression at
>> the beginning of the first encapsulated in a zero-width positive look
>> ahead
>> assertion (with suitable variable length doodads to pad if they're not
>> anchoring at the same place in the string.)
>>
>> What the link is talking about seems to be converting a regular expression
>> down into a finate state machine and then combining that finate state
>> machine with another finate state machine (I.e. non deterministic, being
>> turned back into deterministic with maths). I can see how that's possible
>> for a strict regular expression, but as you say, not for a true Perl
>> non-regular regular expression.
>>
>> So...why do you want to do this?
>>
>
> This may be related to the question I asked recently about turning (up to) a
> few hundred REGEXes into one giant REGEX. The goal being to test all those
> disparate REGEXes in the most efficient way possible on a string.
>
> Dirk
>


More information about the london.pm mailing list