[OT] Encode woes

Fri Sep 25 10:16:23 BST 2009

On 25 Sep 2009, at 10:09, Philip Newton wrote:

> On Fri, Sep 25, 2009 at 09:54, Dirk Koopman <djk at tobit.co.uk> wrote:
>> Dirk Koopman wrote:
>>>
>>> Now, is there a reasonably reliable way of determining what we  
>>> have, on a
>>> string by string basis, to at least tell whether we are dealing  
>>> with utf8 or
>>> iso-8859 (not caring which variant) so that I can drive Encode  
>>> appropriately
>>> to avoid crashes of the above type.  Or how do I completely switch  
>>> off utf8
>>> encoding/decoding - everywhere - in an 80,000 line perl app.
>>
>> As no-one seems interested in this, or may be no-one else has had  
>> these
>> problems themselves, can anyone suggest a better mailing list to  
>> poll?
>
> I was going to suggest Encode::is_utf8 and/or utf8::is_utf8, but I
> wasn't sure whether it would actually solve your problem so I thought
> I'd rather stay quiet and hope someone with real-world experience in
> utf8 woes would pipe up.
>
> Cheers,
> Philip
> -- 
> Philip Newton <philip.newton at gmail.com>
>

http://search.cpan.org/perldoc?Encode::Detect might be of some use to  
you. As a general rule, if you have is_utf8 in your code you have a  
bug. It does not do what you think it does.