Email List: Xaustin-group-lX
[All Lists]

link(2) and symlinks

To: yyyyyyyyyyyyyy@xxxxxxxxxxxxx
Subject: link(2) and symlinks
From: Eric Blake <yyyy@xxxxxxx>
Date: Wed, 16 Mar 2005 06:56:34 -0700
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Issues were recently raised on the GNU coreutils mailing list about the
behavior of ln(1) and link(1), based on the underlying link(2), in the
presence of symlinks (several messages into the thread starting here:
http://lists.gnu.org/archive/html/bug-coreutils/2005-03/msg00083.html).
It is probably worth an aardvark or two, but I would like some discussion
first.

To begin with, the specification for link(2) (XSH page 692, line 22935) is
ambiguous - it is not clear whether a "pathname naming an existing file"
implies dereferencing a symlink before the link is created.  For example,
Paul Eggert informed me that Solaris 9 allows the creation of a hardlink
to an existing symlink (even if the symlink is dangling or loops); while
OpenBSD 3.4 resolves the symlink and creates the hardlink to the
underlying file (or dies with ELOOP or ENOENT).  Should both
implementation choices be allowed, and the wording improved to make this
clearer?

- From here, consider the files:
$ touch A
$ ln -s A B
$ ln -s B C

The implementation of link(1) (XCU page 547, line 21112) relies on the
ambiguity of link(2).  So, on Solaris 9, "link C D" creates D as a
hardlink to C, where on OpenBSD 3.4, "link C D" creates D as a hardlink to
A, because that is the behavior of their respective link(2).  Thus Solaris
link(1) provides functionality not possible with a compliant ln(1), but
OpenBSD link(1) is redundant with ln(1).  Furthermore, if you then run "rm
A", Solaris has preserved the metadata (C is a symlink), but lost the data
(the contents of A are gone), while OpenBSD has preserved the data.  Are
both implementations acceptable, and should the wording of link(1) be
touched up to mention this difference?

On the other hand, the wording for ln(1) (XCU page 549 line 21198) seems
like it was trying to prevent hard links to symlinks, but remains
ambiguous.  The standard is clear that "ln B E" must dereference B before
calling link(2), so that creating E as a hardlink to B is non-compliant.
Yet this is the behavior of GNU ln on systems that support hardlinks to
symlinks, and also of Solaris /bin/ln, so the wording in step 3 is
incompatible with historic implementations.  Then there is the question of
whether "using the object that source_file references" implies a single
dereference, even if that still is a link, or if it means chasing the
symlink until the actual file (or a dangling link or loop) is found?
Running "ln C D" with Solaris /usr/bin/xpg4/ln makes D a hardlink to B
(only one level of dereferencing), while FreeBSD /bin/ln makes D a
hardlink to A (full symlink resolution was performed).  If POSIX is going
to require ln to chase links, should it require chasing the link all the
way rather than relying on the ambiguity of link(2)?

- --
Life is short - so eat dessert first!

Eric Blake             yyyy@xxxxxxx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCODsS84KuGfSFAYARAjepAJ91fQZ2pebGWg9L5A1ZbOEo9kljWQCfZ6G5
bCuJZ2gRtXJNdC5jMVGFah8=
=fYFS
-----END PGP SIGNATURE-----

<Prev in Thread] Current Thread [Next in Thread>