Possible bug with open between unshare(CLONE_NEWNS) calls

From: Boris Burkov <boris@bur.io>
To: linux-fsdevel@vger.kernel.org
Cc: daan.j.demeyer@gmail.com
Subject: Possible bug with open between unshare(CLONE_NEWNS) calls
Date: Wed, 15 Jan 2025 10:56:08 -0800	[thread overview]
Message-ID: <20250115185608.GA2223535@zen.localdomain> (raw)

Hello,

If we run the following C code:

unshare(CLONE_NEWNS);
int fd = open("/dev/loop0", O_RDONLY)
unshare(CLONE_NEWNS);

Then after the second unshare, the mount hierarchy created by the first
unshare is fully dereferenced and gets torn down, leaving the file
pointed to by fd with a broken dentry.

Specifically, subsequent calls to d_path on its path resolve to
"/loop0". I was able to confirm this with drgn, and it has caused an
unexpected failure in mkosi/systemd-repart attempting to mount a btrfs
filesystem through such an fd, since btrfs uses d_path to resolve the
source device file path fully.

I confirmed that this is definitely due to the first unshare mount
namespace going away by:
1. printks/bpftrace the copy_root path in the kernel
2. rewriting my test program to fork after the first unshare to keep
that namespace referenced. In this case, the fd is not broken after the
second unshare.

My question is:
Is this expected behavior with respect to mount reference counts and
namespace teardown?

If I mount a filesystem and have a running program with an open file
descriptor in that filesystem, I would expect unmounting that filesystem
to fail with EBUSY, so it stands to reason that the automatic unmount
that happens from tearing down the mount namespace of the first unshare
should respect similar semantics and either return EBUSY or at least
have the lazy umount behavior and not wreck the still referenced mount
objects.

If this behavior seems like a bug to people better versed in the
expected behavior of namespaces, I would be happy to work on a fix.

Thanks,
Boris

next             reply	other threads:[~2025-01-15 18:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-15 18:56 Boris Burkov [this message]
2025-01-16  4:14 ` Possible bug with open between unshare(CLONE_NEWNS) calls Al Viro
2025-01-16  4:52   ` Boris Burkov
2025-01-16  5:12     ` Al Viro
2025-01-16 10:46 ` Christian Brauner
2025-01-16 21:09   ` Qu Wenruo
2025-01-16 21:29     ` Al Viro
2025-01-16 21:42       ` Qu Wenruo
2025-01-20 15:37     ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250115185608.GA2223535@zen.localdomain \
    --to=boris@bur.io \
    --cc=daan.j.demeyer@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).