Speed v Version

Thu Jun 2 09:29:41 BST 2011

On 1 June 2011 09:46, Dirk Koopman <djk at tobit.co.uk> wrote:

> I contemplating providing "encouragement" to a customer to upgrade from
> 5.8.7 to something more modern. One of the overriding issues is "speed". The
> customer is fixated with "speed".
>
> Unfortunately one of the major things the customer's clients do is
> "replicate" their ISAM data into databases, usually MS-SQL via DBI and
> DBD::ODBC. The ISAM data is always on Linux (and a few Unix) boxes. The
> replication is a batch process that must complete overnight. Currently it is
> a fairly close run thing.
>
> I don't suppose anyone has done any speed benchmarking on the various perls
> to date? Still less on newer DBIs etc? All the customer's modules are circa
> 2005.
>

I had to set up a similar siutation where a customer had an old 4GL package
using (slow) C-ISAM databases. When they're still using green screen systems
for parts of a system there's not much you can do, particularly if the
records are mastered there.
If you have access to the old system code, one way is to add a layer between
the app and the C-ISAM db and do a shadow copy of all writes to a
transaction log and then you can replay that against your SQL db without
taking the old system off--line.

For speeding up the copy, you're almost certain to be bottle-necked on the
disk I/O and not Perl speed
1. One option is to use mirrored disks then take one off-line and sync the
dbs and then return the disk (you could try ZFS or mirrored RAID)
2. Another is to copy all the C-ISAM db files at once off-server to say a
farm of AWS nodes and then run the copy to SQL in parallel
3. Yet another option (which I ended up using) is to find the keys of all
the C-ISAM records as well as holding a "changed since last sync" flag in
them,
then difference the record IDs against what you've previously synced to SQL
and copy across missing records.
For the existing records where the changed flag is set select those records
and copy over.
By far the fastest route for copying is to dump to a flat file in CSV format
then use a bulk loader for your target DB from that, as people have said.
I was using that for 10s of millions of timesheet record rows and it was
fast enough.

Regards, Peter