Web scraping frameworks?

Hernan Lopes hernanlopes at gmail.com
Tue Mar 4 22:54:26 GMT 2014


In that case the following example might be worth a look (
HTML::TreeBuilder::LibXML vs HTML::TreeBuilder::XPath ):

https://gist.github.com/hernan604/8466937

But remember HTML::TreeBuilder::LibXML will accept more html5 tags than
HTML::TreeBuilder::XPath =)



On Tue, Mar 4, 2014 at 7:35 PM, James Laver <james.laver at gmail.com> wrote:

>
> On 4 Mar 2014, at 22:10, DAVID HODGKINSON <davehodg at gmail.com> wrote:
>
> > For what I'm thinking, a way of relating named divs (and lists of) on
> > a page to the hash elements needed for poking into DBIx::Class.
> >
> > As for Web::Scraper, it's Miyagawa-ware, so definitely worth looking
> > at.
>
> Sounds like what you actually want is a handful of app-specific lines of
> code around HTML::TreeBuilder. You can fetch with LWP (maybe LWP::Simple if
> your needs are small) or WWW::Mechanize for more complex stuff, or whatever
> else.
>
> FWIW, last time I got involved in web scraping, this approach worked quite
> well and while it's not immediately reusable, it's pretty straightforward.
>
> James
>


More information about the london.pm mailing list