aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
authorLinus Torvalds <torvalds@linux-foundation.org>2026-06-15 03:59:45 +0530
committerLinus Torvalds <torvalds@linux-foundation.org>2026-06-15 03:59:45 +0530
commit7e0e7bd60d4a812b694c477716597fcb038b00cb (patch)
tree4ff61d47485803e7dacab1c8ddef0a4c11b512da /Documentation
parentff8747aacaff8266dd751b8a8648fb728dcc3b21 (diff)
parentaa5c4fe3ba0cb2af90bbcfa7a8ef4fefcd5c2370 (diff)
downloadath-7e0e7bd60d4a812b694c477716597fcb038b00cb.tar.gz
Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner: "Features: - Reduce pipe->mutex contention by pre-allocating pages outside the lock in anon_pipe_write(). anon_pipe_write() called alloc_page() once per page while holding pipe->mutex. The allocation can sleep doing direct reclaim and runs memcg charging, which extends the critical section and stalls any concurrent reader on the same mutex. Now up to 8 pages are pre-allocated before the mutex is taken, leftovers are recycled into the per-pipe tmp_page[] cache before unlock, and any remainder is released after unlock, keeping the allocator out of the critical section on both sides. On a writers x readers sweep with 64KB writes against a 1 MB pipe throughput improves 6-28% and average write latency drops 5-22%; under memory pressure - when the cost of holding the mutex across reclaim is highest - throughput improves 21-48% and latency drops 17-33%. The microbenchmark is added to selftests. - uaccess/sockptr: fix the ignored_trailing logic in copy_struct_to_user() to behave as documented and the usize check in copy_struct_from_sockptr() for user pointers, and add copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr() helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC). - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real inode backing a dentry via d_real_inode(). On overlayfs the inode attached to the dentry doesn't carry the underlying device information; this is used by the filesystem restriction BPF program that was merged into systemd. - docs: add guidelines for submitting new filesystems, motivated by the maintenance burden abandoned and untestable filesystems impose on VFS developers, blocking infrastructure work like folio conversions and iomap migration. Fixes: - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() and drop the now-redundant assignments in callers. This began as a one-line dma-buf fix for a path_noexec() warning; a pseudo filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo() callers were audited: the only visible effect is on dma-buf where SB_I_NOEXEC silences the warning. - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs, qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a device with a sector size > PAGE_SIZE crashed roughly half of them; the rest had the same missing error handling pattern. Plus a follow-up releasing the superblock buffer_head when setting the minix v3 block size fails. - mount: honour SB_NOUSER in the new mount API. - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by switching the process-group paths of send_sigio() and send_sigurg() from read_lock(&tasklist_lock) to RCU, matching the single-PID path. - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing delegated NFS mounts (fsopen() in a container with the mount performed by a privileged daemon) that broke when non-init s_user_ns was tied to FS_USERNS_MOUNT. - selftests/namespaces: fix a hang in nsid_test where an unreaped grandchild kept the TAP pipe write-end open, a waitpid(-1) race in listns_efault_test, and a false FAIL on kernels without listns() where the tests should SKIP. - filelock: fix the break_lease() stub signature for CONFIG_FILE_LOCKING=n. - init/initramfs_test: wait for the async initramfs unpacking before running; the test and do_populate_rootfs() share the parser state. - fs/coredump: reduce redundant log noise in validate_coredump_safety(). - iomap: pass the correct length to fserror_report_io() in __iomap_write_begin(). - backing-file: fix the backing_file_open() kerneldoc. Cleanups: - initramfs: refactor the cpio hex header parsing to use hex2bin() instead of the hand-rolled simple_strntoul() which is reverted, and extend the initramfs KUnit tests to cover header fields with 0x prefixes. - Replace __get_free_pages() and friends with kmalloc()/kzalloc() across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2, isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the do_mounts init code - part of the larger work of replacing page allocator calls with kmalloc(). - Use clear_and_wake_up_bit() in unlock_buffer() and journal_end_buffer_io_sync() instead of open-coding the sequence. - Drop unused VFS exports: unexport drop_super_exclusive(), remove start_removing_user_path_at(), and fold __start_removing_path() into start_removing_path(). - fs/read_write: narrow the __kernel_write() export with EXPORT_SYMBOL_FOR_MODULES(). - vfs: uapi: retire octal and hex constants in favor of (1 << n) for the O_ flags. Finding a free bit for a new flag across the architectures was needlessly hard with the mixed bases. - dcache: add extra sanity checks of dead dentries in dentry_free() via a new DENTRY_WARN_ONCE() that also prints d_flags. - iov_iter: use kmemdup_array() in dup_iter() to harden the allocation against multiplication overflow. - fs/pipe: write to ->poll_usage only once. - vfs: remove an always-taken if-branch in find_next_fd(). - dcache: use kmalloc_flex() for struct external_name in __d_alloc(). - namei: use QSTR() instead of QSTR_INIT() in path_pts(). - sync_file_range: delete dead S_ISLNK code. - Comment fixes: retire a stale comment in fget_task_next() and fix assorted spelling mistakes" * tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits) backing-file: fix backing_file_open() kerneldoc parameter iomap: pass the correct len to fserror_report_io in __iomap_write_begin vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags bpf: add bpf_real_inode() kfunc fs/read_write: Do not export __kernel_write() to the entire world libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() mount: honour SB_NOUSER in the new mount API fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling selftests/pipe: add pipe_bench microbenchmark fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write fs: retire stale comment in fget_task_next() fs: fix spelling mistakes in comment bfs: replace get_zeroed_page() with kzalloc() binfmt_misc: replace __get_free_page() with kmalloc() configfs: replace __get_free_pages() with kzalloc() fs/namespace: use __getname() to allocate mntpath buffer fs/select: replace __get_free_page() with kmalloc() ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/filesystems/adding-new-filesystems.rst195
-rw-r--r--Documentation/filesystems/index.rst1
-rw-r--r--Documentation/filesystems/porting.rst1
3 files changed, 196 insertions, 1 deletions
diff --git a/Documentation/filesystems/adding-new-filesystems.rst b/Documentation/filesystems/adding-new-filesystems.rst
new file mode 100644
index 0000000000000..a3d0bf16f73a0
--- /dev/null
+++ b/Documentation/filesystems/adding-new-filesystems.rst
@@ -0,0 +1,195 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _adding_new_filesystems:
+
+Adding New Filesystems
+======================
+
+This document describes what is involved in adding a new filesystem to the
+Linux kernel.
+
+Every filesystem merged into the kernel becomes the collective responsibility
+of the VFS maintainers and the wider filesystem development community.
+Experience has shown that filesystems which become unmaintained impose a
+significant and ongoing burden: they are hard or impossible to test, they
+block infrastructure changes because someone must update or preserve old APIs
+for code that nobody is actively looking after, and they accumulate unfixed
+bugs. The requirements and expectations described here are informed by this
+experience and are intended to ensure that new filesystems enter the kernel
+on a sustainable footing.
+
+
+Do You Need a New In-Kernel Filesystem?
+---------------------------------------
+
+Before proposing a new in-kernel filesystem, consider whether one of the
+alternatives might be more appropriate.
+
+ - If an existing in-kernel filesystem covers the same use case, improving it
+ is generally preferred over adding a new implementation. The kernel
+ community favors incremental improvement over parallel implementations.
+
+ - If the filesystem serves a niche audience or has a small user base, a FUSE
+ (Filesystem in Userspace) implementation may be a better fit. FUSE
+ filesystems avoid the long-term kernel maintenance commitment and can be
+ developed and released on their own schedule.
+
+ - If kernel-level performance, reliability, or integration is genuinely
+ required, make the case explicitly. Explain who the users are, what the
+ use case is, and why a FUSE implementation would not be sufficient.
+
+
+Technical Requirements
+----------------------
+
+New filesystems must use current kernel interfaces and practices.
+Submitting a filesystem built on outdated APIs creates an unacceptable
+maintenance debt and is likely to face pushback during review.
+
+Use modern VFS interfaces
+ Do not use interfaces listed in
+ :ref:`Documentation/process/deprecated.rst <deprecated>`.
+
+ Use folios rather than raw page operations for page cache management and
+ iomap rather than buffer heads for block mapping and I/O. See
+ ``Documentation/filesystems/iomap/index.rst`` for iomap documentation.
+
+ Block-based filesystems that need functionality not currently provided by
+ iomap should be prepared to explain why adding that functionality to iomap
+ is infeasible, rather than reimplementing their own block mapping layer.
+
+ Network filesystems should consider using the netfs library
+ (``Documentation/filesystems/netfs_library.rst``), or be prepared to explain
+ why it is not a good fit.
+
+Provide userspace utilities
+ A ``mkfs`` tool is expected so that the filesystem can be created and used
+ by testers and users. A ``fsck`` tool is strongly recommended; while not
+ strictly required for every filesystem type, the ability to verify
+ consistency and repair corruption is an important part of a mature
+ filesystem.
+
+Be testable
+ The filesystem must be testable in a meaningful way. The
+ `fstests <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git>`_
+ framework (also known as xfstests) is the standard testing infrastructure
+ for Linux filesystems and its use is highly recommended. At a minimum,
+ there must be a credible and documented way to test the filesystem and
+ detect regressions. When submitting, include a summary of test results
+ indicating which tests pass, fail, or are not applicable.
+
+Provide documentation
+ A documentation file under ``Documentation/filesystems/`` describing the
+ filesystem, its on-disk format, mount options, and any notable design
+ decisions is recommended.
+
+
+Community and Maintainership Expectations
+-----------------------------------------
+
+Merging a filesystem is a long-term commitment. The kernel community
+needs confidence that the filesystem will be actively maintained after it
+is merged.
+
+Identified maintainers
+ The submission must include a ``MAINTAINERS`` entry with at least one
+ maintainer (``M:``), a mailing list (``L:``), and a git tree (``T:``).
+ Having two or more maintainers is strongly preferred so that coverage
+ does not depend on a single person. The maintainers are expected to be
+ the primary points of contact for the filesystem going forward.
+
+Demonstrated commitment
+ A track record of maintaining kernel code -- for example, in other
+ subsystems -- significantly strengthens the case for a new filesystem.
+ Maintainers who are already known and trusted within the community face
+ less friction during review.
+
+Sustained backing
+ Major filesystems in Linux have organizational or corporate support behind
+ their development. Filesystems that depend entirely on volunteer effort
+ face higher scrutiny about their long-term viability.
+
+Responsiveness
+ The maintainer is expected to respond to bug reports, address review
+ feedback, and adapt the filesystem to VFS infrastructure changes such as
+ folio conversions, iomap migration, and mount API updates. Unresponsive
+ maintainership is one of the primary reasons filesystems end up on the
+ path to deprecation.
+
+User base
+ Clearly describe who the users of this filesystem are and the scale of the
+ user base. Filesystems with a very small or unclear user base face a
+ harder path to acceptance and a higher risk of future deprecation.
+
+Building your track record
+ A practical way to demonstrate many of the qualities above is to maintain
+ the filesystem out-of-tree for a period before requesting a merge. This
+ shows sustained commitment, builds a visible user base, and gives reviewers
+ confidence that the code and its maintainer will persist after merging.
+ That said, it is recognized that for some filesystems the user base grows
+ significantly only after upstreaming, so a compelling case for expected
+ adoption can substitute for a large existing user base.
+
+
+Submission Process
+------------------
+
+This section covers what is specific to filesystem submissions, over and
+above the normal submission advice in
+:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` and
+:ref:`Documentation/process/submit-checklist.rst <submitchecklist>`.
+
+ - Send patches to the linux-fsdevel mailing list
+ (``linux-fsdevel@vger.kernel.org``). CC the relevant VFS maintainers as
+ listed in the ``MAINTAINERS`` file under
+ ``FILESYSTEMS (VFS and infrastructure)``.
+
+ - Structure the submission logically. It is neither acceptable to send one
+ large patch containing the entire filesystem, nor is a replay of the full
+ development history helpful to reviewers. Instead, split the series by
+ topic -- for example: superblock and mount handling, inode operations,
+ directory operations, address space operations, and so on -- so that each
+ patch is reviewable in isolation.
+
+ - Separate any filesystem-specific ioctls into their own patches with
+ dedicated justification. Interfaces beyond those already common across
+ other filesystems will receive additional scrutiny because they are hard
+ to maintain and may conflict with future generic interfaces.
+
+ - Expect thorough review. Filesystem code interacts deeply with the VFS,
+ memory management, and block layers, so reviewers will examine the code
+ carefully. Address all review feedback and be prepared for multiple
+ revision cycles.
+
+ - It may be appropriate to mark the filesystem as experimental in its Kconfig
+ help text for the first few releases to set expectations while the code
+ stabilizes in-tree.
+
+
+Ongoing Obligations
+-------------------
+
+Merging is not the finish line. Maintaining a filesystem in the kernel is an
+ongoing commitment.
+
+ - Adapt to VFS infrastructure changes. The VFS layer evolves continuously;
+ maintainers are expected to keep up with conversions such as folio
+ migration, iomap adoption, and mount API updates.
+
+ - Maintain test coverage. As test suites evolve, the filesystem's test
+ results should be kept current.
+
+ - Handle security issues and regression promptly. Both those reported
+ by ordinary users and those reported by test bots and fuzzing tools.
+ The filesystem must handle corrupted input gracefully without corrupting
+ memory, hanging, or crashing the kernel.
+
+ - Engage with the wider filesystem community. Participate on linux-fsdevel,
+ share approaches to common problems, and look for opportunities to reuse
+ shared infrastructure. It is inappropriate to develop in isolation on a
+ private list and surface patches only at merge time.
+
+ - Filesystems that become unmaintained -- where the maintainer stops
+ responding, infrastructure changes go unadapted, and testing becomes
+ impossible -- are candidates for deprecation and eventual removal from
+ the kernel.
diff --git a/Documentation/filesystems/index.rst b/Documentation/filesystems/index.rst
index fc7254d01a2b2..1f71cf1595476 100644
--- a/Documentation/filesystems/index.rst
+++ b/Documentation/filesystems/index.rst
@@ -43,6 +43,7 @@ algorithms work.
caching/index
porting
+ adding-new-filesystems
Filesystem support layers
=========================
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index fdf074429cd3a..f546b1d3897fa 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1297,7 +1297,6 @@ Several functions are renamed:
- kern_path_locked -> start_removing_path
- kern_path_create -> start_creating_path
- user_path_create -> start_creating_user_path
-- user_path_locked_at -> start_removing_user_path_at
- done_path_create -> end_creating_path
---