Advice on HTML editting

Peter Corlett abuse at cabal.org.uk
Thu Apr 29 11:31:25 BST 2010


On 29 Apr 2010, at 11:14, Victoria Conlan wrote:
[...]
> I have <bunch of rather complicated web pages>.
> I want to
> 
> loop through them
> 	pick out all the image tags
> 	change the path of some (particular ones, not arbitrary!)
> 	save the new html (possibly with a backup, but not essential)
> (then some other stuff)

For this sort of thing, HTML::Parser should suffice. By default it just prints the HTML it's just parsed, so it's a case of writing a suitable handler to look for the appropriate IMG tags and then mutate them and print them.

By way of example of such a filter, below is one of my old throwaway scripts that I used to add a comment to closing </div> tags with the the attributes of the corresponding <div> tag to have some chance of understanding the excessive div-itis that a web designer has just handed me.

#!/usr/bin/env perl
use warnings;
use strict;

use HTML::Parser;

local $^I = '.bak';

my @divs;

sub start_h {
  my($text, $tagname, $attr) = @_;
  print $text;
  return unless $tagname eq 'div';
  push @divs, $attr;
}

sub end_h {
  my($text, $tagname) = @_;
  print $text;
  return unless $tagname eq 'div';
  my $attr = pop @divs;
  return unless defined $attr and scalar keys %$attr;
  print "[%# ";
  print join " ", map { sprintf '%s="%s"', $_, $attr->{$_} } sort keys %$attr;
  print " %]";
}

my $p = new HTML::Parser
  default_h => [sub { print shift }, 'text'], # print by default
  start_h => [\&start_h, 'text, tagname, attr'],
  end_h => [\&end_h, 'text, tagname'],
  comment_h => [ sub { printf '[%%# %s %%]', shift }, 'text' ],
  ;
while(my $line = <>) {
  $p->parse($line);
}
$p->eof;




More information about the london.pm mailing list