UTF-8 + HTML::Template + CGI::Fast

Mark Fowler mark at twoshortplanks.com
Fri Dec 4 15:19:59 GMT 2009


On Fri, Dec 4, 2009 at 11:49 AM, Philip Potter
<philip.g.potter at gmail.com> wrote:

> I don't see how you're supposed to guess what encoding the user agent
> used if it won't tell you. Does anyone else have any ideas?

Let's assume you're sane and you've told your webserver to serve utf-8
(and you've got a utf-8 header in the Content-Type) for the page the
form is created from.  Most browsers will return you utf-8 in this
situation.  Some will not (they are broken.)

Your choices are:

a) Treat this as latin-1 (very wrong, but probably what the user meant)

eval { $string = Encode::decode("utf8", $string, Encode::FB_CROAK); }

(actually, this'll effectively have it in your default encoding, which
is _probably_ latin-1)

b) Display an error message (most correct)

eval { $string = Encode::decode("utf8", $string, Encode::FB_CROAK); 1 }
  or handle_errors();

c) Put a \x{fffd} where the character you don't understand is
(slightly less correct, but might just work)

$string = Encode::decode("utf8", $string, Encode::FB_DEFAULT);

See "perldoc Encode" and in particular the section on "Handling Malformed Data"

Note in the above examples I've used "utf8" not "UTF-8" which is
probably what you want (it's more lax)

I'm not even going to get into a conversation about normalised forms here.

Mark.


More information about the london.pm mailing list