Finding the intersection between two regexes

David Cantrell david at cantrell.org.uk
Tue Apr 22 12:16:14 BST 2014


On Sun, Apr 20, 2014 at 10:14:48PM -0400, Mark Fowler wrote:
> On Sunday, April 20, 2014, David Cantrell <david at cantrell.org.uk> wrote:
> > Can anyone point me at some code on the CPAN that, given two regexes,
> > can figure out whether there are any bits of text that will be matched
> > by both?
> I'm not sure I understand the question here, or moreover why you want to do
> this..is it just an intellectual exercise?

I do actually have a use for it, which would help to explain the
question.

A large part of Number::Phone is based on data in google's
libphonenumber project. That has, for most countries, regular
expressions that match valid fixed lines and valid mobiles. For some
countries those two regexes can both match some of the same numbers.
Here's the data:
  http://goo.gl/hTBAhZ

If you look at the data for Barbados, they have for fixed lines:
  246[2-9]\d{6}

and for mobiles:
  246(?:(?:2[346]|45|82)\d|25[0-4])\d{4}

then some strings will match both expressions - 2462303333, for example.
But if you look at the data for Jamaica there are no strings that match
both regexes.

At the moment I detect these overlaps (and then throw the regexes away
as being unfit for my purpose) by just going through each country's
number space. This is practical for NANP countries as I can do it
all with only about a million comparisons in the worst possible case. It
would be impractical to apply this to the whole world though.

-- 
David Cantrell | Bourgeois reactionary pig


More information about the london.pm mailing list