These aren't the characters you're looking for...

Abigail abigail at abigail.be
Tue Aug 19 11:29:13 BST 2008


On Tue, Aug 19, 2008 at 11:17:01AM +0100, Robin Barker wrote:
> 
> From: Andy Wardley
> > I mistakenly wrote this the other day:
> >
> >     [\s^\n]
> >
> > What I wanted was to match a whitespace character that wasn't a newline.
> >
> > Of course, I could just write this:
> >
> >     [ \t]
> >
> > But that doesn't include the Unicode whitespace characters which \s would
> > normally match.  So I ended up writing this:
> >
> >     [ \t\x{85}\x{2028}\x{2029}]
> >
> > Second: am I missing something obvious?  Is there a better way to do it?
> 
> You could use 
> 	[[:blank:]] 
> (see perlre), but my experience is that [:...:] does not behave as I expect with unicode (maybe my expectations are wrong).

[:blank:] is equivalent to [ \t].

And I wouldn't use any of the POSIX classes, as their behaviour depends on
whether Unicode semantics are in effect when doing the matching. Better
is to use Unicode properties; they will also match the same set of
characters.

> 
> You could also do a negative look ahead 
> 	(?!\n)\s


Or define your own property:

    sub IsMySpace {<<'--'}
    +utf8::IsSpace
    -000A
    --

    /\p{IsMySpace}/


Abigail


More information about the london.pm mailing list