Web scraping frameworks?

Dave Hodgkinson davehodg at gmail.com
Fri Mar 7 12:48:01 GMT 2014


 85% of the time is in XML::XPathEngine


On Fri, Mar 7, 2014 at 12:40 PM, Dave Hodgkinson <davehodg at gmail.com> wrote:

> He's not touched the repo for a couple of years and even then just for
> cosmetic things. I don't hold out much hope there.
>
> I get the feeling I'm missing an XS something somewhere. Suppose I could
> profile it.
>
>
>
>
> On Fri, Mar 7, 2014 at 12:29 PM, Hernan Lopes <hernanlopes at gmail.com>wrote:
>
>> ask miyagawa =)
>>
>>
>> On Fri, Mar 7, 2014 at 8:48 AM, Dave Hodgkinson <davehodg at gmail.com>
>> wrote:
>>
>> > OK, so I've worked out the DSL and am successfully scraping a page.
>> >
>> > It's taking a second to parse each page. Seems a bit much.
>> >
>> > Installing HTML::TreeBuilder::LibXML seemed like a good idea but didn't
>> > make any difference.
>> >
>> > Any ideas on switches I can flip to make things faster?
>> >
>> >
>> > On Tue, Mar 4, 2014 at 9:44 PM, Dave Cross <dave at dave.org.uk> wrote:
>> >
>> > > On 04/03/14 21:33, DAVID HODGKINSON wrote:
>> > >
>> > >>
>> > >> Does something exist?
>> > >>
>> > >> If it doesn't does anyone want to help make it happen?
>> > >>
>> > >> I *really* don't want to have to write the code all over again ten
>> > >> times...
>> > >>
>> > >
>> > > Something like Web::Scraper, perhaps?
>> > >
>> > >   https://metacpan.org/pod/Web::Scraper
>> > >
>> > > Dave...
>> > >
>> > >
>> >
>>
>
>


More information about the london.pm mailing list