Web scraping frameworks?

Joel Bernstein joel at fysh.org
Fri Mar 7 13:04:10 GMT 2014


Can you show your numbers?


On 7 March 2014 13:58, Dave Hodgkinson <davehodg at gmail.com> wrote:

>  Web::Scraper::LibXML is about 5x faster. I'll take that.
>
>
>
> On Fri, Mar 7, 2014 at 12:48 PM, Dave Hodgkinson <davehodg at gmail.com>
> wrote:
>
> > 85% of the time is in XML::XPathEngine
> >
> >
> > On Fri, Mar 7, 2014 at 12:40 PM, Dave Hodgkinson <davehodg at gmail.com
> >wrote:
> >
> >> He's not touched the repo for a couple of years and even then just for
> >> cosmetic things. I don't hold out much hope there.
> >>
> >> I get the feeling I'm missing an XS something somewhere. Suppose I could
> >> profile it.
> >>
> >>
> >>
> >>
> >> On Fri, Mar 7, 2014 at 12:29 PM, Hernan Lopes <hernanlopes at gmail.com
> >wrote:
> >>
> >>> ask miyagawa =)
> >>>
> >>>
> >>> On Fri, Mar 7, 2014 at 8:48 AM, Dave Hodgkinson <davehodg at gmail.com>
> >>> wrote:
> >>>
> >>> > OK, so I've worked out the DSL and am successfully scraping a page.
> >>> >
> >>> > It's taking a second to parse each page. Seems a bit much.
> >>> >
> >>> > Installing HTML::TreeBuilder::LibXML seemed like a good idea but
> didn't
> >>> > make any difference.
> >>> >
> >>> > Any ideas on switches I can flip to make things faster?
> >>> >
> >>> >
> >>> > On Tue, Mar 4, 2014 at 9:44 PM, Dave Cross <dave at dave.org.uk> wrote:
> >>> >
> >>> > > On 04/03/14 21:33, DAVID HODGKINSON wrote:
> >>> > >
> >>> > >>
> >>> > >> Does something exist?
> >>> > >>
> >>> > >> If it doesn't does anyone want to help make it happen?
> >>> > >>
> >>> > >> I *really* don't want to have to write the code all over again ten
> >>> > >> times...
> >>> > >>
> >>> > >
> >>> > > Something like Web::Scraper, perhaps?
> >>> > >
> >>> > >   https://metacpan.org/pod/Web::Scraper
> >>> > >
> >>> > > Dave...
> >>> > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>
>


More information about the london.pm mailing list