diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-06-15 03:59:45 +0530 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-06-15 03:59:45 +0530 |
| commit | 7e0e7bd60d4a812b694c477716597fcb038b00cb (patch) | |
| tree | 4ff61d47485803e7dacab1c8ddef0a4c11b512da /Documentation | |
| parent | ff8747aacaff8266dd751b8a8648fb728dcc3b21 (diff) | |
| parent | aa5c4fe3ba0cb2af90bbcfa7a8ef4fefcd5c2370 (diff) | |
| download | ath-7e0e7bd60d4a812b694c477716597fcb038b00cb.tar.gz | |
Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Reduce pipe->mutex contention by pre-allocating pages outside the
lock in anon_pipe_write().
anon_pipe_write() called alloc_page() once per page while holding
pipe->mutex. The allocation can sleep doing direct reclaim and runs
memcg charging, which extends the critical section and stalls any
concurrent reader on the same mutex. Now up to 8 pages are
pre-allocated before the mutex is taken, leftovers are recycled
into the per-pipe tmp_page[] cache before unlock, and any remainder
is released after unlock, keeping the allocator out of the critical
section on both sides. On a writers x readers sweep with 64KB
writes against a 1 MB pipe throughput improves 6-28% and average
write latency drops 5-22%; under memory pressure - when the cost of
holding the mutex across reclaim is highest - throughput improves
21-48% and latency drops 17-33%. The microbenchmark is added to
selftests.
- uaccess/sockptr: fix the ignored_trailing logic in
copy_struct_to_user() to behave as documented and the usize check
in copy_struct_from_sockptr() for user pointers, and add
copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr()
helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC).
- bpf: add a sleepable bpf_real_inode() kfunc that resolves the real
inode backing a dentry via d_real_inode(). On overlayfs the inode
attached to the dentry doesn't carry the underlying device
information; this is used by the filesystem restriction BPF program
that was merged into systemd.
- docs: add guidelines for submitting new filesystems, motivated by
the maintenance burden abandoned and untestable filesystems impose
on VFS developers, blocking infrastructure work like folio
conversions and iomap migration.
Fixes:
- libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
and drop the now-redundant assignments in callers. This began as a
one-line dma-buf fix for a path_noexec() warning; a pseudo
filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo()
callers were audited: the only visible effect is on dma-buf where
SB_I_NOEXEC silences the warning.
- Handle set_blocksize() failures in legacy filesystems (bfs, hpfs,
qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a
device with a sector size > PAGE_SIZE crashed roughly half of them;
the rest had the same missing error handling pattern. Plus a
follow-up releasing the superblock buffer_head when setting the
minix v3 block size fails.
- mount: honour SB_NOUSER in the new mount API.
- fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by
switching the process-group paths of send_sigio() and send_sigurg()
from read_lock(&tasklist_lock) to RCU, matching the single-PID
path.
- vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing
delegated NFS mounts (fsopen() in a container with the mount
performed by a privileged daemon) that broke when non-init
s_user_ns was tied to FS_USERNS_MOUNT.
- selftests/namespaces: fix a hang in nsid_test where an unreaped
grandchild kept the TAP pipe write-end open, a waitpid(-1) race in
listns_efault_test, and a false FAIL on kernels without listns()
where the tests should SKIP.
- filelock: fix the break_lease() stub signature for
CONFIG_FILE_LOCKING=n.
- init/initramfs_test: wait for the async initramfs unpacking before
running; the test and do_populate_rootfs() share the parser state.
- fs/coredump: reduce redundant log noise in
validate_coredump_safety().
- iomap: pass the correct length to fserror_report_io() in
__iomap_write_begin().
- backing-file: fix the backing_file_open() kerneldoc.
Cleanups:
- initramfs: refactor the cpio hex header parsing to use hex2bin()
instead of the hand-rolled simple_strntoul() which is reverted, and
extend the initramfs KUnit tests to cover header fields with 0x
prefixes.
- Replace __get_free_pages() and friends with kmalloc()/kzalloc()
across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2,
isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the
do_mounts init code - part of the larger work of replacing page
allocator calls with kmalloc().
- Use clear_and_wake_up_bit() in unlock_buffer() and
journal_end_buffer_io_sync() instead of open-coding the sequence.
- Drop unused VFS exports: unexport drop_super_exclusive(), remove
start_removing_user_path_at(), and fold __start_removing_path()
into start_removing_path().
- fs/read_write: narrow the __kernel_write() export with
EXPORT_SYMBOL_FOR_MODULES().
- vfs: uapi: retire octal and hex constants in favor of (1 << n) for
the O_ flags. Finding a free bit for a new flag across the
architectures was needlessly hard with the mixed bases.
- dcache: add extra sanity checks of dead dentries in dentry_free()
via a new DENTRY_WARN_ONCE() that also prints d_flags.
- iov_iter: use kmemdup_array() in dup_iter() to harden the
allocation against multiplication overflow.
- fs/pipe: write to ->poll_usage only once.
- vfs: remove an always-taken if-branch in find_next_fd().
- dcache: use kmalloc_flex() for struct external_name in __d_alloc().
- namei: use QSTR() instead of QSTR_INIT() in path_pts().
- sync_file_range: delete dead S_ISLNK code.
- Comment fixes: retire a stale comment in fget_task_next() and fix
assorted spelling mistakes"
* tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits)
backing-file: fix backing_file_open() kerneldoc parameter
iomap: pass the correct len to fserror_report_io in __iomap_write_begin
vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS
filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n
vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags
bpf: add bpf_real_inode() kfunc
fs/read_write: Do not export __kernel_write() to the entire world
libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
mount: honour SB_NOUSER in the new mount API
fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling
selftests/pipe: add pipe_bench microbenchmark
fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write
fs: retire stale comment in fget_task_next()
fs: fix spelling mistakes in comment
bfs: replace get_zeroed_page() with kzalloc()
binfmt_misc: replace __get_free_page() with kmalloc()
configfs: replace __get_free_pages() with kzalloc()
fs/namespace: use __getname() to allocate mntpath buffer
fs/select: replace __get_free_page() with kmalloc()
...
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/filesystems/adding-new-filesystems.rst | 195 | ||||
| -rw-r--r-- | Documentation/filesystems/index.rst | 1 | ||||
| -rw-r--r-- | Documentation/filesystems/porting.rst | 1 |
3 files changed, 196 insertions, 1 deletions
diff --git a/Documentation/filesystems/adding-new-filesystems.rst b/Documentation/filesystems/adding-new-filesystems.rst new file mode 100644 index 0000000000000..a3d0bf16f73a0 --- /dev/null +++ b/Documentation/filesystems/adding-new-filesystems.rst @@ -0,0 +1,195 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _adding_new_filesystems: + +Adding New Filesystems +====================== + +This document describes what is involved in adding a new filesystem to the +Linux kernel. + +Every filesystem merged into the kernel becomes the collective responsibility +of the VFS maintainers and the wider filesystem development community. +Experience has shown that filesystems which become unmaintained impose a +significant and ongoing burden: they are hard or impossible to test, they +block infrastructure changes because someone must update or preserve old APIs +for code that nobody is actively looking after, and they accumulate unfixed +bugs. The requirements and expectations described here are informed by this +experience and are intended to ensure that new filesystems enter the kernel +on a sustainable footing. + + +Do You Need a New In-Kernel Filesystem? +--------------------------------------- + +Before proposing a new in-kernel filesystem, consider whether one of the +alternatives might be more appropriate. + + - If an existing in-kernel filesystem covers the same use case, improving it + is generally preferred over adding a new implementation. The kernel + community favors incremental improvement over parallel implementations. + + - If the filesystem serves a niche audience or has a small user base, a FUSE + (Filesystem in Userspace) implementation may be a better fit. FUSE + filesystems avoid the long-term kernel maintenance commitment and can be + developed and released on their own schedule. + + - If kernel-level performance, reliability, or integration is genuinely + required, make the case explicitly. Explain who the users are, what the + use case is, and why a FUSE implementation would not be sufficient. + + +Technical Requirements +---------------------- + +New filesystems must use current kernel interfaces and practices. +Submitting a filesystem built on outdated APIs creates an unacceptable +maintenance debt and is likely to face pushback during review. + +Use modern VFS interfaces + Do not use interfaces listed in + :ref:`Documentation/process/deprecated.rst <deprecated>`. + + Use folios rather than raw page operations for page cache management and + iomap rather than buffer heads for block mapping and I/O. See + ``Documentation/filesystems/iomap/index.rst`` for iomap documentation. + + Block-based filesystems that need functionality not currently provided by + iomap should be prepared to explain why adding that functionality to iomap + is infeasible, rather than reimplementing their own block mapping layer. + + Network filesystems should consider using the netfs library + (``Documentation/filesystems/netfs_library.rst``), or be prepared to explain + why it is not a good fit. + +Provide userspace utilities + A ``mkfs`` tool is expected so that the filesystem can be created and used + by testers and users. A ``fsck`` tool is strongly recommended; while not + strictly required for every filesystem type, the ability to verify + consistency and repair corruption is an important part of a mature + filesystem. + +Be testable + The filesystem must be testable in a meaningful way. The + `fstests <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git>`_ + framework (also known as xfstests) is the standard testing infrastructure + for Linux filesystems and its use is highly recommended. At a minimum, + there must be a credible and documented way to test the filesystem and + detect regressions. When submitting, include a summary of test results + indicating which tests pass, fail, or are not applicable. + +Provide documentation + A documentation file under ``Documentation/filesystems/`` describing the + filesystem, its on-disk format, mount options, and any notable design + decisions is recommended. + + +Community and Maintainership Expectations +----------------------------------------- + +Merging a filesystem is a long-term commitment. The kernel community +needs confidence that the filesystem will be actively maintained after it +is merged. + +Identified maintainers + The submission must include a ``MAINTAINERS`` entry with at least one + maintainer (``M:``), a mailing list (``L:``), and a git tree (``T:``). + Having two or more maintainers is strongly preferred so that coverage + does not depend on a single person. The maintainers are expected to be + the primary points of contact for the filesystem going forward. + +Demonstrated commitment + A track record of maintaining kernel code -- for example, in other + subsystems -- significantly strengthens the case for a new filesystem. + Maintainers who are already known and trusted within the community face + less friction during review. + +Sustained backing + Major filesystems in Linux have organizational or corporate support behind + their development. Filesystems that depend entirely on volunteer effort + face higher scrutiny about their long-term viability. + +Responsiveness + The maintainer is expected to respond to bug reports, address review + feedback, and adapt the filesystem to VFS infrastructure changes such as + folio conversions, iomap migration, and mount API updates. Unresponsive + maintainership is one of the primary reasons filesystems end up on the + path to deprecation. + +User base + Clearly describe who the users of this filesystem are and the scale of the + user base. Filesystems with a very small or unclear user base face a + harder path to acceptance and a higher risk of future deprecation. + +Building your track record + A practical way to demonstrate many of the qualities above is to maintain + the filesystem out-of-tree for a period before requesting a merge. This + shows sustained commitment, builds a visible user base, and gives reviewers + confidence that the code and its maintainer will persist after merging. + That said, it is recognized that for some filesystems the user base grows + significantly only after upstreaming, so a compelling case for expected + adoption can substitute for a large existing user base. + + +Submission Process +------------------ + +This section covers what is specific to filesystem submissions, over and +above the normal submission advice in +:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` and +:ref:`Documentation/process/submit-checklist.rst <submitchecklist>`. + + - Send patches to the linux-fsdevel mailing list + (``linux-fsdevel@vger.kernel.org``). CC the relevant VFS maintainers as + listed in the ``MAINTAINERS`` file under + ``FILESYSTEMS (VFS and infrastructure)``. + + - Structure the submission logically. It is neither acceptable to send one + large patch containing the entire filesystem, nor is a replay of the full + development history helpful to reviewers. Instead, split the series by + topic -- for example: superblock and mount handling, inode operations, + directory operations, address space operations, and so on -- so that each + patch is reviewable in isolation. + + - Separate any filesystem-specific ioctls into their own patches with + dedicated justification. Interfaces beyond those already common across + other filesystems will receive additional scrutiny because they are hard + to maintain and may conflict with future generic interfaces. + + - Expect thorough review. Filesystem code interacts deeply with the VFS, + memory management, and block layers, so reviewers will examine the code + carefully. Address all review feedback and be prepared for multiple + revision cycles. + + - It may be appropriate to mark the filesystem as experimental in its Kconfig + help text for the first few releases to set expectations while the code + stabilizes in-tree. + + +Ongoing Obligations +------------------- + +Merging is not the finish line. Maintaining a filesystem in the kernel is an +ongoing commitment. + + - Adapt to VFS infrastructure changes. The VFS layer evolves continuously; + maintainers are expected to keep up with conversions such as folio + migration, iomap adoption, and mount API updates. + + - Maintain test coverage. As test suites evolve, the filesystem's test + results should be kept current. + + - Handle security issues and regression promptly. Both those reported + by ordinary users and those reported by test bots and fuzzing tools. + The filesystem must handle corrupted input gracefully without corrupting + memory, hanging, or crashing the kernel. + + - Engage with the wider filesystem community. Participate on linux-fsdevel, + share approaches to common problems, and look for opportunities to reuse + shared infrastructure. It is inappropriate to develop in isolation on a + private list and surface patches only at merge time. + + - Filesystems that become unmaintained -- where the maintainer stops + responding, infrastructure changes go unadapted, and testing becomes + impossible -- are candidates for deprecation and eventual removal from + the kernel. diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst index fc7254d01a2b2..1f71cf1595476 100644 --- a/Documentation/filesystems/index.rst +++ b/Documentation/filesystems/index.rst @@ -43,6 +43,7 @@ algorithms work. caching/index porting + adding-new-filesystems Filesystem support layers ========================= diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst index fdf074429cd3a..f546b1d3897fa 100644 --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -1297,7 +1297,6 @@ Several functions are renamed: - kern_path_locked -> start_removing_path - kern_path_create -> start_creating_path - user_path_create -> start_creating_user_path -- user_path_locked_at -> start_removing_user_path_at - done_path_create -> end_creating_path --- |
