XML::LibXML and HTML (in >=v1.67)

peter@dragonstaff.com peter at dragonstaff.com
Wed Apr 1 11:17:38 BST 2009


Quoting Dave Cross <dave at dave.org.uk>:
> Toby Wintermute wrote:
> What you're trying to parse isn't XML. Therefore you shouldn't expect
> to be able to parse it with an XML parser.
>> Alternatively.. what do YOU use to parse real-world websites that are
>> often not totally valid?

A similar problem is when writing an XML editor you have to be able to  
parse incomplete/inconsistent XML to do code highlighting.
I use a combination of a C parser and perl regexp matching of XML tokens.
 From source at http://www.scintilla.org/ScintillaDownload.html:  
scintilla/src/LexHTML.cxx

Regards, Peter
http://perl.dragonstaff.co.uk





More information about the london.pm mailing list