Web scraping frameworks?

Joel Bernstein joel at fysh.org
Fri Mar 7 13:02:54 GMT 2014


On 7 March 2014 12:48, Dave Hodgkinson <davehodg at gmail.com> wrote:

> Installing HTML::TreeBuilder::LibXML seemed like a good idea but didn't
> make any difference.
>

https://metacpan.org/pod/HTML::TreeBuilder::LibXML#BENCHMARK suggests it
ought to increase speed considerably - what's your benchmark look like?

Can you paste your benchmark code? Are you using local HTML so as to
discount network I/O and LWP overhead? Does your scraping perform better if
you use CSS selectors rather than XPath expressions? Does it make much
difference if you scrape more/fewer selectors - that is, are your scrapes
slow due to their complexity or due to fixed overhead in the library?

/joel


More information about the london.pm mailing list