Web scraping frameworks?

Dave Hodgkinson davehodg at gmail.com
Fri Mar 7 19:19:51 GMT 2014


HTTP::Async dropped in nicely. Remote end appears to be throttling somehow,
but I doubt most will.


On Fri, Mar 7, 2014 at 1:28 PM, Dave Hodgkinson <davehodg at gmail.com> wrote:

> I'll give a talk!
>
> Apropos previous discussions, I'll also try HTTP::Async instead of my
> usual route 1. I think it fits better with the approach I'm taking at the
> moment.
>
>
>
>
> On Fri, Mar 7, 2014 at 1:11 PM, Leo Lapworth <leo at cuckoo.org> wrote:
>
>> Hi Dave,
>>
>> When you've finished please could you write a blog post?
>>
>> It would be a better way of sharing what you are doing (and you'd share
>> with more people), then we'd also get a summary rather than blow by blow
>> updates.
>>
>> Thanks
>>
>> Leo
>>
>>
>>
>> On 7 March 2014 12:58, Dave Hodgkinson <davehodg at gmail.com> wrote:
>>
>> >  Web::Scraper::LibXML is about 5x faster. I'll take that.
>> >
>> >
>> >
>> > On Fri, Mar 7, 2014 at 12:48 PM, Dave Hodgkinson <davehodg at gmail.com>
>> > wrote:
>> >
>> > > 85% of the time is in XML::XPathEngine
>> > >
>> > >
>> > > On Fri, Mar 7, 2014 at 12:40 PM, Dave Hodgkinson <davehodg at gmail.com
>> > >wrote:
>> > >
>> > >> He's not touched the repo for a couple of years and even then just
>> for
>> > >> cosmetic things. I don't hold out much hope there.
>> > >>
>> > >> I get the feeling I'm missing an XS something somewhere. Suppose I
>> could
>> > >> profile it.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Fri, Mar 7, 2014 at 12:29 PM, Hernan Lopes <hernanlopes at gmail.com
>> > >wrote:
>> > >>
>> > >>> ask miyagawa =)
>> > >>>
>> > >>>
>> > >>> On Fri, Mar 7, 2014 at 8:48 AM, Dave Hodgkinson <davehodg at gmail.com
>> >
>> > >>> wrote:
>> > >>>
>> > >>> > OK, so I've worked out the DSL and am successfully scraping a
>> page.
>> > >>> >
>> > >>> > It's taking a second to parse each page. Seems a bit much.
>> > >>> >
>> > >>> > Installing HTML::TreeBuilder::LibXML seemed like a good idea but
>> > didn't
>> > >>> > make any difference.
>> > >>> >
>> > >>> > Any ideas on switches I can flip to make things faster?
>> > >>> >
>> > >>> >
>> > >>> > On Tue, Mar 4, 2014 at 9:44 PM, Dave Cross <dave at dave.org.uk>
>> wrote:
>> > >>> >
>> > >>> > > On 04/03/14 21:33, DAVID HODGKINSON wrote:
>> > >>> > >
>> > >>> > >>
>> > >>> > >> Does something exist?
>> > >>> > >>
>> > >>> > >> If it doesn't does anyone want to help make it happen?
>> > >>> > >>
>> > >>> > >> I *really* don't want to have to write the code all over again
>> ten
>> > >>> > >> times...
>> > >>> > >>
>> > >>> > >
>> > >>> > > Something like Web::Scraper, perhaps?
>> > >>> > >
>> > >>> > >   https://metacpan.org/pod/Web::Scraper
>> > >>> > >
>> > >>> > > Dave...
>> > >>> > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>>
>
>


More information about the london.pm mailing list