XML::LibXML and HTML (in >=v1.67)

Robin Berjon robin at berjon.com
Wed Apr 1 19:07:46 BST 2009


On Apr 1, 2009, at 09:11 , mirod wrote:
> The only problem I found was with tags like '<table 1>' which gets  
> output by the as_XML method as '<table 1="1">', which is not quite  
> well-formed XML. This doesn't prevent you from using XPath on it  
> with HTML::TreeBuilder::XPath though.

It's more than "not quite well-formed" it's simply invalid XML :)

If you want to understand HTML documents in the way that browsers do,  
à la HTML5, then you will have documents that simply cannot in the  
general case be converted to XML because there are more HTML DOMs than  
there are XML DOMs. But that's fine because it doesn't prevent you  
from using any XML tools so long as you stick to the abstract level.

-- 
Robin Berjon - http://berjon.com/
     Feel like hiring me? Go to http://robineko.com/




More information about the london.pm mailing list