Web scraping frameworks?

Hernan Lopes hernanlopes at gmail.com
Tue Mar 4 23:42:56 GMT 2014


When someone goes really deep into web scrapping, they will encounter the
problems i cited.
Im not sure how to handle all those situations with Web::Scraper::LibXML.
Examples on that would be great.


On Tue, Mar 4, 2014 at 8:39 PM, Hernan Lopes <hernanlopes at gmail.com> wrote:

> Yeah, its by Mr Miyagawa.
>
> I find HTML::TreeBuilder::LibXML more complete than
> HTML::TreeBuilder::Xpath. Because HTML::TreeBuilder::Xpath seems to not be
> able to parse certain tags by default, and of course being from Miyagawa is
> a huge +++. Good he took some tame to dig into the scrapper area.
>
>
> On Tue, Mar 4, 2014 at 8:21 PM, Pierre M <piemas25 at gmail.com> wrote:
>
>> > But remember HTML::TreeBuilder::LibXML will accept more html5 tags
>> > than HTML::TreeBuilder::XPath =)
>> Would you say that HTML::TreeBuilder::LibXML always better than
>> HTML::TreeBuilder::XPath
>> ?
>> I notice that HTML::TreeBuilder::LibXM depends on HTML::TreeBuilder::XPath
>> as well as on XML::LibXML and on Web::Scraper, which is surprising.
>>
>> What is the advantage of LibXML over XPath?
>>
>>
>> Oh, I just found Web::Scraper::LibXML - also by Miyagawa.
>>
>
>


More information about the london.pm mailing list