Arbyte Slides

Tue Dec 9 16:53:20 GMT 2008

On (01/12/08 14:57), Simon Wistow wrote:
> You're right about Gearman in that it's "Not Reliable" but potentially 
> that's a poor choice of words - "Not Guaranteed" is probably a better 
> way of putting it

I explained the sense of reliable in my talk. I note that it is the
same as used in TheSchwartz POD: "TheSchwartz - reliable job
queue". Non-reliable job systems are of course useful and have
advantages for some things but that is not what I currently need. The
slides were not designed for web use but as a compliment to what I was
saying, which is the way I think they should be. I posted them anyway
due to popular demand.

> As for it not working - ping me off list or add an RT ticket (if you 
> haven't already) and I'll take a look.

I'd need to do some more work to produce a useful bug report. If I
want something like Gearman in future though I might do this.

> As for the TheSchwartz you say it's not easily scalable - it uses 
> Data::ObjectDriver and hence has inbuilt support for sharding.

I didn't know that. You should certainly promote this in the POD.

> Also, you say it doesn't have batching after submission - unless I'm 
> misunderstanding you that's not actually true. If you look in the docs 
> for TheSchwartz::Job you'll find the coalesce param to new()
> 
> http://search.cpan.org/~bradfitz/TheSchwartz-1.07/lib/TheSchwartz/Job.pm#coalesce
> 
> Which allows batching.

Yes, I saw that feature but it is not quite what I meant.  As you say,
the coalesce key is a parameter to new. It is set at the time of job
submission. It must be set by some process outside of TheSchwartz that
does not have access to the job queue. By batching after submission I
mean that jobs can be grouped in the main queue. The batch a job is
assigned to may change depending on what else is in the queue,
including jobs that are added after it is.  An example of why you may
want to do this is that given say 10 machines and 100 jobs ~equal
runtime you would want them to be in groups of about 10. Given 1000
jobs you would want groups of 100. This is assuming there is some
overlap in the data used by each job and that this can be exploited to
improve cache hit rates by intelligent grouping.

Also, if you just have a simple key that can be set before submission
you still need something to do that. In this case the JobBuffers
encapsulate this job specific batching and continue to provide a
consistent interface.

> Anyway, I've been thinking of writing something very similar to
> Arbyte so I'm looking forward to it. I notice you mentioned
> something about Jo bRunner::Simple that fork()s - one of the things
> I wanted was something that ran The various parts of Gearman (the
> injector, the Geamand and a number of workers) or TheSchwartz (the
> ibjector, the DB and a number of workers) all within the same
> process for testing purposes - is that the same kind of
> functionality that you're looking at providing?

Arbyte can indeed be run all within one process which has proven
useful both during development of Arbyte itself and applications that
run on it. It's a lot easier to debug and profile an application that
exists within one process than to have to restart services.  This
running all in one process without forking at all though so perhaps
not what you mean. It runs one job at a time itself.

What else were you considering for your Arbyte-like system? The main
aim of Arbyte was to make it easy to interchange other systems while
allowing users to add things they needed.

-- 
Alistair MacLeod
PGP Key: http://www.biscuitsfruit.org.uk/~alistair/pubkey.asc