These aren't the characters you're looking for...

Andy Wardley abw at wardley.org
Tue Aug 19 12:24:17 BST 2008


Abigail wrote:
> You are also missing quite a number of characters that would match \s, but
> aren't included in [ \t\x{85}\x{2028}\x{2029}]. \s matches 25 characters,
> including \r, and \cL. NEXT LINE (\x{85} and NO-BREAK SPACE (\x{A0}) only
> match with Unicode semantics.

I thought that might be the case.  I was working from my old Camel book which
claimed only those three, but then found reference to a whole class of
whitespace-ish things in the 5.10 unicode/perlre docs.  Thanks for the
clarification.

> You might want to use:
> 
>     (?!\n)[\h\v]

In this case, I specifically want to exclude the vertical tab (not that I'm
ever likely to come across it).

What does \h match?

> Alternatively, you can use:
> 
>    [^\S\n]
> 
> but that suffers from the problem points \x{85} and \x{A0}.

I think that's the simplest solution that's Good Enough.  If I add in
\r and \x{85} (which I want to exclude) then NO-BREAK-SPACE is the only thing
it won't accept but should.  I think I can live with that.

     [^\S\n\r\x{85}]

Thanks everyone.
A




More information about the london.pm mailing list