Email List: Xaustin-group-lX
[All Lists]

select issue

To: "'yyyyyyyyyyyyyy@xxxxxxxxxxxxx'" <yyyyyyyyyyyyyy@xxxxxxxxxxxxx>
Subject: select issue
From: "Green, Paul" <yyyyyyyyyy@xxxxxxxxxxx>
Date: Fri, 21 May 2004 12:13:14 -0400
Hello, I am the architect of the POSIX implementation on the Stratus VOS
operating system. Over the past 8 years we have implemented many of the
features of the POSIX standards on VOS.  We have been able to deliver many
new capabilities to our customers by porting open-source software, such as
GNU gcc, Apache, Perl, OpenSSL, Samba, etc., and licensed software such as
MQ Series, to the VOS POSIX environment.  None of this would have been
possible without the hard work that each of you has put into making the
POSIX standard useful, comprehensive, and accurate.  As we have implemented
our POSIX capabilities on top of a proprietary operating system, the work
you have done to insulate features and capabilities from underlying
mechanisms is especially vital to us.  On behalf of our company and our
customers, thank you.


Recently, while porting OpenSSH 3.7.1p2 to our POSIX environment, I
discovered a compatibility problem between our implementation of the select
function and the version that is implemented on other systems.  It seems
that the source code of OpenSSH calculates the number of words needed to
allocate an exact-length byte string for fd_sets rather than taking
sizeof(fd_set) and allocating that. The OpenSSH source code allocated 4
bytes for the used portion of the fd_set; our fd_set variables are 128
bytes. The source code assumed that select only ever referenced the first
nfds bits; we both referenced and set all bits in the fd_set.  Poof, a
corrupted heap.

I fully understand that passing anything other than an fd_set to the select
function places OpenSSH in the category of a non-POSIX-compliant program.
My concern, and the reason for this note, is to ensure that our
implementation of select matches the standard, and that the standard is
clear and unambiguous.

I believe I have discovered an ambiguity.

The question is, what is the behavior of select with respect to clearing the
bits in the fd_set?  Should select clear *all* bits in the set, or just the
first nfds bits, or what?

I went to the POSIX standard looking for guidance and was disappointed to
discover that it does not seem to completely resolve the issue.  My
references for the statements I'm about to make is the select.html page from
the online manual at www.opengroup.org/onlinepubs

I have no quibble with the first part of the description:


"The nfds argument specifies the range of descriptors to be tested. The
first nfds descriptors shall be checked in each set; that is, the
descriptors from zero through nfds-1 in the descriptor sets shall be
examined."

"If the readfds argument is not a null pointer, it points to an object of
type fd_set that on input specifies the file descriptors to be checked for
being ready to read, and on output indicates which file descriptors are
ready to read."


These sentences make it reasonably clear that select() is to examine and set
only the first nfds bits. It also makes it clear that the reference is to an
object of type fd_set, not some cleverly-allocated byte string.

My concern is for the following sentence:


"If the timeout interval expires without the specified condition being true
for any of the specified file descriptors, the objects pointed to by the
readfds, writefds, and errorfds arguments shall have all bits set to 0."


What does the phrase "all bits" mean?  Does it mean the "first nfds bits" or
does it mean "all bits"?

When a POSIX-compliant program passes an fd_set to select, there is no data
corruption issue, but there is still the issue of which bits are zeroed, and
which bits are left unmodified.

Our implementation, written from the standard, cleared all bits.  A test
case we just wrote reveals that Linux and Solaris, at least, clear only the
first nfds bits.  (Actually, they probably work on a word-by-word basis, not
a bit or byte basis). This explains why when we built and ran OpenSSH, it
died with a corrupted heap.

We will change our implementation to more closely match the other systems.
I'd just like to see tighter language in the POSIX standard so that other
implementers don't make the same mistake I made.  I think it would a bonus
if the language in the standard was consistent with legacy implementations.

Thanks
PG
--
Paul Green, Senior Technical Consultant, Stratus Technologies.
Voice: +1 978-461-7557; FAX: +1 978-461-3610; AIM: PaulGreen

<Prev in Thread] Current Thread [Next in Thread>