[OT] Encode woes

Dirk Koopman djk at tobit.co.uk
Fri Sep 25 08:54:19 BST 2009


Dirk Koopman wrote:
> It appears that, with the increasing prevalence of 5.10, the usage of 
> utf8 or not is getting more picky.
> 
> I have a well established, networked, app that has upwards of 250 nodes 
> and about 4000 users at one time (on certain weekends double that) all 
> over the world. These users are running mainly windows based clients 
> (which may include quite a lot of windows telnet). The nominal character 
> set is ascii, as interpreted by the client's host operating system.
> 
> To date, I have managed to avoid the tribulations of Encode and utf8 et 
> al. But I am now get occasional errors, on 5.10 perl, of the ilk:-
> 
>  Wide character in null operation at /spider/perl/DXDupe.pm line 47.
>  at /spider/perl/DXDupe.pm line 47
>     DXDupe::find('X14163|UA0KEF|RZ6HV|�������  �������') called at 
> /spider/perl/Spot.pm line 420
> 
> And also something similar on print or syswrite.
> 
> Studying the data, what I am receiving is a mixture of utf8 and 
> iso-8859-*, the reason for this being that older perls happily take what 
> they are given and just pass it along. Some clients are emitting utf8 
> and other iso-8859 and yet others (running Win95/8) some kind of 
> codepage. In addition, there are older, usually windows based, packages 
> acting as nodes, together with yet more clients that are also adding 
> data to this network in who knows what character set.
> 
> Up until recently, this has not been a problem because the important 
> stuff is in 7 bit ascii and the remarks section (the usual source of 
> problems), if it is unreadable, doesn't matter 'cos you can't translate 
> it anyway.
> 
> Now, is there a reasonably reliable way of determining what we have, on 
> a string by string basis, to at least tell whether we are dealing with 
> utf8 or iso-8859 (not caring which variant) so that I can drive Encode 
> appropriately to avoid crashes of the above type.  Or how do I 
> completely switch off utf8 encoding/decoding - everywhere - in an 80,000 
> line perl app.
> 

As no-one seems interested in this, or may be no-one else has had these 
problems themselves, can anyone suggest a better mailing list to poll?

Dirk


More information about the london.pm mailing list