[OT] select and sysread problem on solaris

Paul Johnson paul at pjcj.net
Thu Sep 11 02:12:21 BST 2008


I'm looking for a little help in solving a problem which has me stumped
and couldn't think of anywhere better to come.  That's not the problem
by the way, but I'll take answers to that as well.

I have about 210 named pipes (FIFOs) and three processes which are
running a select over a third of the pipes each, and then calling
sysread on the pipe before writing out the data to log files.

This has been working well in production for almost two years handling
many GB of data daily.

Recently, another thirty or so pipes have been added to this group and
very occassionally I am noticing a problem whereby select will indicate
that a pipe is ready for reading and sysread will attempt to read from
the pipe, but there is actually nothing there to be read, and so the
sysread call hangs waiting for input.

Reproducing this problem is difficult, but I currently have the system
in such a state.  The pipe on which the sysread call is waiting is one
of the new pipes.

I can only think of four possible explanations here:

 1.  My code is broken.  I don't think this is the case but don't want
     to rule it out.

 2.  Some other process has read the data inbetween the select returning
     and the sysread being called.  lsof shows no unexpected processes
     accessing the pipe at the moment and no one should have been on the
     system to have run cat or anything.  last shows nothing suspicious.

 3. Perl's select is broken.

 4. The OS broken.

Is my assumption correct that if select tells you there is something to
be read then there should be something there to be read?  Can anyone
think of any other possibilities?

What is curious to me is that the process writing to the named pipe is
hung.  Is the pipe locked somehow until the sysread call has returned?

Unless I can think of anything better to do, tomorrow I will try to send
some data to the named pipe that is being read to see if that will allow
the sysread to return.  If it does, I should be able to tell whether any
data has been lost from the named pipe, which might indicate that
another process had read it.

I am running perl-5.8.8 on Solaris 8.  The program writing to named pipe
is a Java program which is writing to STDOUT.  That program has been
called using system by a Perl wrapper which has reopened STDOUT to the
named pipe.  The program reading from the named pipe is using PERLIO.

I'm open to any hints, suggestions or solutions.

Thanks for reading this far.  Unless you just skipped to the bottom.

-- 
Paul Johnson - paul at pjcj.net
http://www.pjcj.net


More information about the london.pm mailing list