From daveh at hodgkinson.org Tue May 21 12:31:54 2013 From: daveh at hodgkinson.org (Dave Hodgkinson) Date: Tue, 21 May 2013 12:31:54 +0100 Subject: Quarantining crap HTML? Message-ID: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> In keeping with the spirit of the list, this isn't directly a perl question but it might be part of the solution. I'm picking up HTML from another site, and that HTML is pretty crappy. Is there any way of quarantining it so it doesn't bugger up the rest of the page? From jerome.eteve at gmail.com Tue May 21 12:42:32 2013 From: jerome.eteve at gmail.com (=?UTF-8?B?SsOpcsO0bWUgw4l0w6l2w6k=?=) Date: Tue, 21 May 2013 12:42:32 +0100 Subject: Quarantining crap HTML? In-Reply-To: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> References: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> Message-ID: What about parsing it with a lax XHTML parser and rendering it? On 21 May 2013 12:31, Dave Hodgkinson wrote: > In keeping with the spirit of the list, this isn't directly a perl question > but it might be part of the solution. > > I'm picking up HTML from another site, and that HTML is pretty crappy. > > Is there any way of quarantining it so it doesn't bugger up the rest of the > page? > > > -- Jerome Eteve +44(0)7738864546 http://www.eteve.net/ From joel at fysh.org Tue May 21 12:45:54 2013 From: joel at fysh.org (Joel Bernstein) Date: Tue, 21 May 2013 13:45:54 +0200 Subject: Quarantining crap HTML? In-Reply-To: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> References: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> Message-ID: OK, so assuming (you didn't mention but it's sort of implied if you squint a bit) you mean you're inserting it into another HTML page, I wonder: a) in what way is it crappy? b) how are you inserting it? c) what are you trying to avoid? dodgy formatting? broken formatting? malicious code execution? ...? This question as it stands is too vague to be answered. But if you give sufficient info we can give it a go. /joel On 21 May 2013 13:31, Dave Hodgkinson wrote: > In keeping with the spirit of the list, this isn't directly a perl question > but it might be part of the solution. > > I'm picking up HTML from another site, and that HTML is pretty crappy. > > Is there any way of quarantining it so it doesn't bugger up the rest of the > page? > > > > > From ben at vinnerd.com Tue May 21 12:57:52 2013 From: ben at vinnerd.com (Ben Vinnerd) Date: Tue, 21 May 2013 12:57:52 +0100 Subject: Quarantining crap HTML? In-Reply-To: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> References: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> Message-ID: You could try putting it in On 05/21/2013 01:57 PM, Ben Vinnerd wrote: > You could try putting it in > > On 05/21/2013 01:57 PM, Ben Vinnerd wrote: >> You could try putting it in > Upon sleeping on it, this was the direction I was headed in. > > The problem is the HTML is user-generated and we know where that > leads. If I were using that approach, I'd host the HTML on a different domain (to use the Same Origin Policy to protect my site against JS attacks from the HTML) and cover it with anti-evil HTTP headers (to stop people including frame buster scripts). http://tools.ietf.org/html/draft-ietf-websec-x-frame-options-00 (Not that that would be the first approach I'd consider, I'd tend towards parsing the HTML, running it through a whitelist to determine what attributes were acceptable or not and then spitting out something valid and non-evil though.) -- David Dorward http://dorward.co.uk/ From th.j.v.hoesel at gmail.com Wed May 22 18:57:14 2013 From: th.j.v.hoesel at gmail.com (Th. J. van Hoesel) Date: Wed, 22 May 2013 19:57:14 +0200 Subject: Quarantining crap HTML? In-Reply-To: <519D00A6.90701@tobit.co.uk> References: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> <519B650C.7050705@philip-skinner.co.uk> <323F3718-DF28-4D08-99AA-237F64359978@me.com> <519D00A6.90701@tobit.co.uk> Message-ID: Op 22 mei 2013, om 19:30 heeft Dirk Koopman het volgende geschreven: > On 22/05/13 16:29, DAVID HODGKINSON wrote: >> >> Upon sleeping on it, this was the direction I was headed in. >> >> The problem is the HTML is user-generated and we know where that >> leads. >> > > Carefully constructed, efficient and well tested code? > Yay! handcrafted coded is soooo much more compact and efficient compared to that the stuff that todays frameworks vomit From me at philip-skinner.co.uk Thu May 23 08:09:29 2013 From: me at philip-skinner.co.uk (Philip Skinner) Date: Thu, 23 May 2013 09:09:29 +0200 Subject: Quarantining crap HTML? In-Reply-To: References: <10EE4B6A-059A-49AC-8D3B-051592B1DAAB@hodgkinson.org> <519B650C.7050705@philip-skinner.co.uk> <323F3718-DF28-4D08-99AA-237F64359978@me.com> Message-ID: <519DC0A9.3030302@philip-skinner.co.uk> On 05/22/2013 07:53 PM, David Dorward wrote: > On 22 May 2013, at 16:29, DAVID HODGKINSON wrote: > >> On 21 May 2013, at 13:14, Philip Skinner >> wrote: >>> You can specify the content of an iframe using a javascript call in >>> the src: >>> > >> Upon sleeping on it, this was the direction I was headed in. >> >> The problem is the HTML is user-generated and we know where that >> leads. > > If I were using that approach, I'd host the HTML on a different domain > (to use the Same Origin Policy to protect my site against JS attacks > from the HTML) and cover it with anti-evil HTTP headers (to stop > people including frame buster scripts). > > http://tools.ietf.org/html/draft-ietf-websec-x-frame-options-00 > > (Not that that would be the first approach I'd consider, I'd tend > towards parsing the HTML, running it through a whitelist to determine > what attributes were acceptable or not and then spitting out something > valid and non-evil though.) > Plus remember to set a restrictive P3P policy on the domain/subdomain hosting that stuff. From paulm at paulm.com Fri May 24 01:31:52 2013 From: paulm at paulm.com (Paul Makepeace) Date: Thu, 23 May 2013 17:31:52 -0700 Subject: npm, PyPi overtake CPAN Message-ID: http://modulecounts.com/ ... with Rubygems screaming ahead since overtaking CPAN a couple of years ago. And the hugeness of Maven Central. I'm sure there's plenty of caveats etc but the gradients is probably what's most interesting here; CPAN is relatively static compared with, well, all the others. From aaron.trevena at gmail.com Fri May 24 05:43:28 2013 From: aaron.trevena at gmail.com (Aaron Trevena) Date: Fri, 24 May 2013 05:43:28 +0100 Subject: npm, PyPi overtake CPAN In-Reply-To: References: Message-ID: On 24 May 2013 01:31, Paul Makepeace wrote: > http://modulecounts.com/ > > ... with Rubygems screaming ahead since overtaking CPAN a couple of years > ago. And the hugeness of Maven Central. > > I'm sure there's plenty of caveats etc but the gradients is probably what's > most interesting here; CPAN is relatively static compared with, well, all > the others. I had a deeper look at thisin a bit more depth before I got snowed under at work http://blogs.perl.org/users/hashbangperl/2013/03/comparing-apples-and-oranges---rubygems-vs-cpan-part-2.html - I'll try and finish writing it up in some upcoming time I have sitting in airports next month. A -- Aaron J Trevena, BSc Hons http://www.aarontrevena.co.uk LAMP System Integration, Development and Consulting From aaron.trevena at gmail.com Fri May 24 06:18:26 2013 From: aaron.trevena at gmail.com (Aaron Trevena) Date: Fri, 24 May 2013 06:18:26 +0100 Subject: npm, PyPi overtake CPAN In-Reply-To: References: Message-ID: On 24 May 2013 05:43, Aaron Trevena wrote: > On 24 May 2013 01:31, Paul Makepeace wrote: >> http://modulecounts.com/ >> >> ... with Rubygems screaming ahead since overtaking CPAN a couple of years >> ago. And the hugeness of Maven Central. >> > I had a deeper look at thisin a bit more depth before I got snowed > under at work http://blogs.perl.org/users/hashbangperl/2013/03/comparing-apples-and-oranges---rubygems-vs-cpan-part-2.html A couple of things worth mentioning are firstly that several issues mentioned in that blog and elsewhere are being addressed http://www.dagolden.com/index.php/2098/the-annotated-lancaster-consensus/ and also if you look at rubygems uploads it's an astonishingly high proportion of undocumented version 0.001 abandonware. A. -- Aaron J Trevena, BSc Hons http://www.aarontrevena.co.uk LAMP System Integration, Development and Consulting From james.laver at gmail.com Fri May 24 07:30:25 2013 From: james.laver at gmail.com (James Laver) Date: Fri, 24 May 2013 07:30:25 +0100 Subject: npm, PyPi overtake CPAN In-Reply-To: References: Message-ID: <25427D54-047B-47C8-A689-E82E114C6623@gmail.com> On 24 May 2013, at 01:31, Paul Makepeace wrote: > I'm sure there's plenty of caveats etc but the gradients is probably what's > most interesting here; CPAN is relatively static compared with, well, all > the others. How about the caveat of utility? Whilst npm has a reasonable SNR and gems has so many modules that there are enough useful ones hidden there, pypi is mostly full of crap and not useful when you want to achieve something. That said, egg basket makes it remarkably easy to host your own mini-pypi server for darkpan you've generated From evdb at ecclestoad.co.uk Fri May 24 09:20:39 2013 From: evdb at ecclestoad.co.uk (Edmund von der Burg) Date: Fri, 24 May 2013 09:20:39 +0100 Subject: npm, PyPi overtake CPAN In-Reply-To: <25427D54-047B-47C8-A689-E82E114C6623@gmail.com> References: <25427D54-047B-47C8-A689-E82E114C6623@gmail.com> Message-ID: CPAN is excellent for its uniformity. The names of modules are predictable, the documentation is consistently presented, the search good, publishing fairly straightforward and installing trivial. pypi (I find) is appalling. Names are all over the place, searching bad, docs ugly, installing reasonable-ish, publishing confusing and then unsatisfying. NPM is overwhelmingly superb with regards to publishing. The steps to publish are extensions to the steps to creating the module in the first place so it is very little extra work. And once published the modules are instantly available (no waiting on mirror syncing) so you can publish and then install as a way to work - which really encourages publishing smaller bits of code. The docs are also good as the convention is becoming a README.md that explains most of it, and a separate site for the details if required. Both pypi and npm allow you to install from repo urls, which is most handy. I'm not sure if cpan can do this, although I guess there is no reason why not. NPM is great, I believe, because it is the youngest and has cherry picked the best bits from the others. Can't comment on Ruby. Cheers, Edmundp On 24 May 2013 07:30, James Laver wrote: > On 24 May 2013, at 01:31, Paul Makepeace wrote: > > > I'm sure there's plenty of caveats etc but the gradients is probably > what's > > most interesting here; CPAN is relatively static compared with, well, all > > the others. > > How about the caveat of utility? Whilst npm has a reasonable SNR and gems > has so many modules that there are enough useful ones hidden there, pypi is > mostly full of crap and not useful when you want to achieve something. > > That said, egg basket makes it remarkably easy to host your own mini-pypi > server for darkpan you've generated > From stommepoes at stommepoes.nl Fri May 24 10:59:03 2013 From: stommepoes at stommepoes.nl (Mallory van Achterberg) Date: Fri, 24 May 2013 11:59:03 +0200 Subject: npm, PyPi overtake CPAN In-Reply-To: References: <25427D54-047B-47C8-A689-E82E114C6623@gmail.com> Message-ID: <20130524095902.GA23738@jkva-vps.colo.transip.net> I was at the PyGrunn meeting in Groningen recently. Holger Krekel (https://twitter.com/hpk42) gave a talk about the problems of pypi and compared it to CPAN, and then showed some new work he was doing to help fix the multiple issues. He loves CPAN's searching, testing, multiple mirrors etc. He has slides but I haven't seen them published anywhere, but the praise for CPAN was great and mighty, and it's good for Python to see someone basically wants to make the Cheeseshop more like CPAN :) -Mallory From nick at ccl4.org Fri May 24 11:18:51 2013 From: nick at ccl4.org (Nicholas Clark) Date: Fri, 24 May 2013 11:18:51 +0100 Subject: npm, PyPi overtake CPAN In-Reply-To: <20130524095902.GA23738@jkva-vps.colo.transip.net> References: <25427D54-047B-47C8-A689-E82E114C6623@gmail.com> <20130524095902.GA23738@jkva-vps.colo.transip.net> Message-ID: <20130524101850.GU5011@plum.flirble.org> On Fri, May 24, 2013 at 11:59:03AM +0200, Mallory van Achterberg wrote: > I was at the PyGrunn meeting in Groningen recently. Holger Krekel > (https://twitter.com/hpk42) gave a talk about the problems of pypi > and compared it to CPAN, and then showed some new work he was doing > to help fix the multiple issues. He loves CPAN's searching, testing, > multiple mirrors etc. Mirrors matter. (Well, mirrors are important, but that doesn't have the alliteration) because 1) in an online environment, just a URL to a master site is a single point of failure. Start to chain enough dependencies, and however reliable they are at building and testing, it doesn't work if you can't download the code because one server is down 2) in a secured environment it is easi*er* to make a mirror and ship it in 3) Sometimes I am on a $DHH plane. Or in a tunnel. Or in data-roaming-multiple-limb-loss land. So having a mirror is useful. > He has slides but I haven't seen them published anywhere, but the > praise for CPAN was great and mighty, and it's good for Python to > see someone basically wants to make the Cheeseshop more like CPAN :) Yes, it nice to know that Perl is doing at least some things right. I was also surprised to notice that the new (<8 weeks old) official mirror client for PiPy, bandersnatch, is written in Python 2.7. "Official", because the description and link are from here https://pypi.python.org/mirrors to here https://pypi.python.org/pypi/bandersnatch Nicholas Clark From david at cantrell.org.uk Fri May 24 13:47:04 2013 From: david at cantrell.org.uk (David Cantrell) Date: Fri, 24 May 2013 13:47:04 +0100 Subject: npm, PyPi overtake CPAN In-Reply-To: <20130524101850.GU5011@plum.flirble.org> References: <25427D54-047B-47C8-A689-E82E114C6623@gmail.com> <20130524095902.GA23738@jkva-vps.colo.transip.net> <20130524101850.GU5011@plum.flirble.org> Message-ID: <20130524124704.GH27556@bytemark.barnyard.co.uk> On Fri, May 24, 2013 at 11:18:51AM +0100, Nicholas Clark wrote: > Mirrors matter. (Well, mirrors are important, but that doesn't have the > alliteration) because > > 1) in an online environment, just a URL to a master site is a single point of > failure. Start to chain enough dependencies, and however reliable they are > at building and testing, it doesn't work if you can't download the code > because one server is down > 2) in a secured environment it is easi*er* to make a mirror and ship it in > 3) Sometimes I am on a $DHH plane. Or in a tunnel. Or in > data-roaming-multiple-limb-loss land. So having a mirror is useful. 4) you can set up funky distorting mirrors, eg cpXXXan or Pinto and can easily use the public toolchain for private code. -- David Cantrell | Minister for Arbitrary Justice Human Rights left unattended may be removed, destroyed, or damaged by the security services. From aaron.trevena at gmail.com Fri May 24 13:57:09 2013 From: aaron.trevena at gmail.com (Aaron Trevena) Date: Fri, 24 May 2013 13:57:09 +0100 Subject: [off topic] Any londoners know a good 'puter repair place near the city/bishopsgate or docklands? Message-ID: Hi L.pmers, My sister in law (trained dancer and personal trainer with almost 0 IT expertise ) has a laptop that won't boot and no backups of her important info on the hard disk - any recommendations of somewhere that can give a good honest appraisal of what's wrong with the laptop and back up the data for her would be much appreciated Somewhere near Spitalfields would probably be especially helpful Cheers, A. -- Aaron J Trevena, BSc Hons http://www.aarontrevena.co.uk LAMP System Integration, Development and Consulting From david at cantrell.org.uk Fri May 24 15:03:10 2013 From: david at cantrell.org.uk (David Cantrell) Date: Fri, 24 May 2013 15:03:10 +0100 Subject: [off topic] Any londoners know a good 'puter repair place near the city/bishopsgate or docklands? In-Reply-To: References: Message-ID: <20130524140309.GI27556@bytemark.barnyard.co.uk> On Fri, May 24, 2013 at 01:57:09PM +0100, Aaron Trevena wrote: > My sister in law (trained dancer and personal trainer with almost 0 IT > expertise ) has a laptop that won't boot and no backups of her > important info on the hard disk - any recommendations of somewhere > that can give a good honest appraisal of what's wrong with the laptop > and back up the data for her would be much appreciated > > Somewhere near Spitalfields would probably be especially helpful If it's a Mac then go to the shop on Cheshire St. -- David Cantrell | http://www.cantrell.org.uk/david I apologize if I offended you personally, I intended to do it professionally. -- Steve Champeon, on the nanog list From aaron.trevena at gmail.com Fri May 24 15:12:27 2013 From: aaron.trevena at gmail.com (Aaron Trevena) Date: Fri, 24 May 2013 15:12:27 +0100 Subject: [off topic] Any londoners know a good 'puter repair place near the city/bishopsgate or docklands? In-Reply-To: <20130524140309.GI27556@bytemark.barnyard.co.uk> References: <20130524140309.GI27556@bytemark.barnyard.co.uk> Message-ID: On 24 May 2013 15:03, David Cantrell wrote: > On Fri, May 24, 2013 at 01:57:09PM +0100, Aaron Trevena wrote: >> My sister in law (trained dancer and personal trainer with almost 0 IT >> expertise ) has a laptop that won't boot and no backups of her >> important info on the hard disk - any recommendations of somewhere >> that can give a good honest appraisal of what's wrong with the laptop >> and back up the data for her would be much appreciated >> >> Somewhere near Spitalfields would probably be especially helpful > > If it's a Mac then go to the shop on Cheshire St. Afraid it's 'doze on a 5 year old samsung laptop :( A. -- Aaron J Trevena, BSc Hons http://www.aarontrevena.co.uk LAMP System Integration, Development and Consulting From francesco.nidito at gmail.com Fri May 24 15:25:17 2013 From: francesco.nidito at gmail.com (Francesco Nidito) Date: Fri, 24 May 2013 15:25:17 +0100 Subject: [off topic] Any londoners know a good 'puter repair place near the city/bishopsgate or docklands? In-Reply-To: References: <20130524140309.GI27556@bytemark.barnyard.co.uk> Message-ID: On 24 May 2013 15:12, Aaron Trevena wrote: > On 24 May 2013 15:03, David Cantrell wrote: > > On Fri, May 24, 2013 at 01:57:09PM +0100, Aaron Trevena wrote: > >> My sister in law (trained dancer and personal trainer with almost 0 IT > >> expertise ) has a laptop that won't boot and no backups of her > >> important info on the hard disk - any recommendations of somewhere > >> that can give a good honest appraisal of what's wrong with the laptop > >> and back up the data for her would be much appreciated > >> > >> Somewhere near Spitalfields would probably be especially helpful > > > > If it's a Mac then go to the shop on Cheshire St. > > Afraid it's 'doze on a 5 year old samsung laptop :( Did you already try to remove the hd from the laptop, put it in an external hd enclosure and try to recover data from there? From 2013 at denny.me Fri May 24 15:35:16 2013 From: 2013 at denny.me (Denny) Date: Fri, 24 May 2013 15:35:16 +0100 Subject: [off topic] Any londoners know a good 'puter repair place near the city/bishopsgate or docklands? In-Reply-To: References: <20130524140309.GI27556@bytemark.barnyard.co.uk> Message-ID: <1369406116.23291.36.camel@serenity> On Fri, 2013-05-24 at 15:25 +0100, Francesco Nidito wrote: > > > On Fri, May 24, 2013 at 01:57:09PM +0100, Aaron Trevena wrote: > > >> My sister in law (trained dancer and personal trainer with almost 0 IT > > >> expertise ) has a laptop that won't boot and no backups of her > > >> important info on the hard disk - any recommendations of somewhere > > >> that can give a good honest appraisal of what's wrong with the laptop > > >> and back up the data for her would be much appreciated > > Did you already try to remove the hd from the laptop, put it in an external > hd enclosure and try to recover data from there? People with "almost 0 IT expertise" don't generally do that sort of thing. I would assume that's why Aaron is asking for professional recommendations. Regards, Denny From dominic at thoreau-online.net Fri May 24 16:47:16 2013 From: dominic at thoreau-online.net (Dominic Thoreau) Date: Fri, 24 May 2013 16:47:16 +0100 Subject: [off topic] Any londoners know a good 'puter repair place near the city/bishopsgate or docklands? In-Reply-To: <1369406116.23291.36.camel@serenity> References: <20130524140309.GI27556@bytemark.barnyard.co.uk> <1369406116.23291.36.camel@serenity> Message-ID: On 24 May 2013 15:35, Denny <2013 at denny.me> wrote: > On Fri, 2013-05-24 at 15:25 +0100, Francesco Nidito wrote: > > > Did you already try to remove the hd from the laptop, put it in an > external > > hd enclosure and try to recover data from there? > > People with "almost 0 IT expertise" don't generally do that sort of > thing. I would assume that's why Aaron is asking for professional > recommendations. > > To summarise: The laptop is near Spitalfields. Aaron, on the other, hand, is in Cornwall. Hence his asking for local information. -- Unde venistis vos manebit, donec completa est. -- Tenax D. From francesco.nidito at gmail.com Fri May 24 17:26:50 2013 From: francesco.nidito at gmail.com (Francesco Nidito) Date: Fri, 24 May 2013 17:26:50 +0100 Subject: [off topic] Any londoners know a good 'puter repair place near the city/bishopsgate or docklands? In-Reply-To: References: <20130524140309.GI27556@bytemark.barnyard.co.uk> <1369406116.23291.36.camel@serenity> Message-ID: On 24 May 2013 16:47, Dominic Thoreau wrote: > On 24 May 2013 15:35, Denny <2013 at denny.me> wrote: > > > On Fri, 2013-05-24 at 15:25 +0100, Francesco Nidito wrote: > > > > > > > Did you already try to remove the hd from the laptop, put it in an > > external > > > hd enclosure and try to recover data from there? > > > > People with "almost 0 IT expertise" don't generally do that sort of > > thing. I would assume that's why Aaron is asking for professional > > recommendations. > > > > > To summarise: The laptop is near Spitalfields. Aaron, on the other, hand, > is in Cornwall. Hence his asking for local information. > Sorry, I actually misunderstood this... :?( From philippe.bruhat at free.fr Sat May 25 21:07:04 2013 From: philippe.bruhat at free.fr (Philippe Bruhat (BooK)) Date: Sat, 25 May 2013 22:07:04 +0200 Subject: [off topic] Any londoners know a good 'puter repair place near the city/bishopsgate or docklands? In-Reply-To: <1369406116.23291.36.camel@serenity> References: <20130524140309.GI27556@bytemark.barnyard.co.uk> <1369406116.23291.36.camel@serenity> Message-ID: <20130525200704.GQ5022@zlott> On Fri, May 24, 2013 at 03:35:16PM +0100, Denny wrote: > On Fri, 2013-05-24 at 15:25 +0100, Francesco Nidito wrote: > > > > Did you already try to remove the hd from the laptop, put it in an external > > hd enclosure and try to recover data from there? > > People with "almost 0 IT expertise" don't generally do that sort of > thing. I would assume that's why Aaron is asking for professional > recommendations. > But if it comes to this, and the disk is really in bad shape, I would heartily recommend GNU ddrescue. http://www.gnu.org/software/ddrescue/ddrescue.html -- Philippe Bruhat (BooK) A substitute is never as good as the genuine article. (Moral from Groo The Wanderer #67 (Epic))