These aren't the characters you're looking for...

Andy Wardley abw at wardley.org
Tue Aug 19 10:46:45 BST 2008


I mistakenly wrote this the other day:

     [\s^\n]

What I wanted was to match a whitespace character that wasn't a newline.

Of course, it doesn't work.  The '^' must be at the start for it to work as a
character class negatorificator.  And you can't mix "inny" classes with "outy"
classes.  That's just not allowed.

Of course, I could just write this:

     [ \t]

But that doesn't include the Unicode whitespace characters which \s would
normally match.  So I ended up writing this:

     [ \t\x{85}\x{2028}\x{2029}]

First question: is it safe to match a regex containing Unicode code points
against a non-unicode string?  I'm sure it is, and it seems to work OK, but my
subconscious woke me up at 3am this morning to remind me to check.  My Camel
is a little old (3rd ed - 5.6.0) and talks of problems in Unicode processing
that "will probably be fixed by the time you read this".  Can I tell my
subconscious to stop worrying and go back to snuggle-bunny land?

Second: am I missing something obvious?  Is there a better way to do it?

A



More information about the london.pm mailing list