character set detection?

Dominic Mitchell dom at happygiraffe.net
Sun Jan 7 11:15:42 GMT 2007


Dirk Koopman wrote:
> Is there a way of, reasonably reliably, determining what the character
> set of a lump of text is?

Not really, no.  Like Jesse said, Encode::Guess might be a good start.

If you want to do what the browser does, the algorithm is described here:

   http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

There's a python implementation of it as well.

   http://chardet.feedparser.org/

-Dom


More information about the london.pm mailing list