Language recognition

Matt Lawrence matt.lawrence at virgin.net
Mon Oct 8 17:44:54 BST 2007


Peter Hickman wrote:
> Looking at the public twitter feeds I note that although they are in
> UTF8 they do not indicate the language that they are in. I realise
> that this would be somewhat difficult. But just how difficult?
>
> Given the utf8 entities (is that the correct term) is there an easy
> way to tell which language it might be from, or at least which script?
>
> I'm sure something could be hacked up but rather than some adhoc rules
> it would appear that this could be revered from the Unicode.
>
> Any pointers?
>
See "Scripts" under man perlunicode

Matt



More information about the london.pm mailing list