Parse-text-from-HTML CPAN module ?

Stephen Collyer scollyer at netspinner.co.uk
Fri Dec 9 16:18:29 GMT 2005


Jonathan Stowe wrote:
> On Fri, 2005-12-09 at 11:10, Stephen Collyer wrote:
> 
>>I have a search-related requirement to take some arbitrary HTML,
>>parse out the text and stem it/apply stop words and so on. Now,
>>I can cook something up myself with the usual set of modules, but
>>this sounds like such a common requirement that someone will
>>already have done it and packaged it up, in a nice reusable form.
>>
>>Does anyone know if there's a nice, Pure Perl implementation of
>>this that I can pick up and use with no further brain-power required ?
>>(I'm wondering if there's something in the WWW::Mechanize area that
>>is suitable, as that seems to have grown a lot since I last looked).
> 
> 
> Getting just the text is a piece of piss with HTML::Parser:

It is, but if someone's been kind enough to do the whole
job for me, and package it up nicely and put it on CPAN,
then I'd be a fool to write it all again, wouldn't I ?

-- 
Regards

Stephen Collyer
Netspinner Ltd


More information about the london.pm mailing list