Web scraping frameworks?

Dave Hodgkinson davehodg at gmail.com
Fri Mar 7 12:58:14 GMT 2014


 Web::Scraper::LibXML is about 5x faster. I'll take that.



On Fri, Mar 7, 2014 at 12:48 PM, Dave Hodgkinson <davehodg at gmail.com> wrote:

> 85% of the time is in XML::XPathEngine
>
>
> On Fri, Mar 7, 2014 at 12:40 PM, Dave Hodgkinson <davehodg at gmail.com>wrote:
>
>> He's not touched the repo for a couple of years and even then just for
>> cosmetic things. I don't hold out much hope there.
>>
>> I get the feeling I'm missing an XS something somewhere. Suppose I could
>> profile it.
>>
>>
>>
>>
>> On Fri, Mar 7, 2014 at 12:29 PM, Hernan Lopes <hernanlopes at gmail.com>wrote:
>>
>>> ask miyagawa =)
>>>
>>>
>>> On Fri, Mar 7, 2014 at 8:48 AM, Dave Hodgkinson <davehodg at gmail.com>
>>> wrote:
>>>
>>> > OK, so I've worked out the DSL and am successfully scraping a page.
>>> >
>>> > It's taking a second to parse each page. Seems a bit much.
>>> >
>>> > Installing HTML::TreeBuilder::LibXML seemed like a good idea but didn't
>>> > make any difference.
>>> >
>>> > Any ideas on switches I can flip to make things faster?
>>> >
>>> >
>>> > On Tue, Mar 4, 2014 at 9:44 PM, Dave Cross <dave at dave.org.uk> wrote:
>>> >
>>> > > On 04/03/14 21:33, DAVID HODGKINSON wrote:
>>> > >
>>> > >>
>>> > >> Does something exist?
>>> > >>
>>> > >> If it doesn't does anyone want to help make it happen?
>>> > >>
>>> > >> I *really* don't want to have to write the code all over again ten
>>> > >> times...
>>> > >>
>>> > >
>>> > > Something like Web::Scraper, perhaps?
>>> > >
>>> > >   https://metacpan.org/pod/Web::Scraper
>>> > >
>>> > > Dave...
>>> > >
>>> > >
>>> >
>>>
>>
>>
>


More information about the london.pm mailing list