aboutsummaryrefslogtreecommitdiffstats
path: root/fs/jfs
AgeCommit message (Collapse)AuthorFilesLines
2026-06-15Merge tag 'vfs-7.2-rc1.misc' of ↵Linus Torvalds2-9/+10
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Reduce pipe->mutex contention by pre-allocating pages outside the lock in anon_pipe_write(). anon_pipe_write() called alloc_page() once per page while holding pipe->mutex. The allocation can sleep doing direct reclaim and runs memcg charging, which extends the critical section and stalls any concurrent reader on the same mutex. Now up to 8 pages are pre-allocated before the mutex is taken, leftovers are recycled into the per-pipe tmp_page[] cache before unlock, and any remainder is released after unlock, keeping the allocator out of the critical section on both sides. On a writers x readers sweep with 64KB writes against a 1 MB pipe throughput improves 6-28% and average write latency drops 5-22%; under memory pressure - when the cost of holding the mutex across reclaim is highest - throughput improves 21-48% and latency drops 17-33%. The microbenchmark is added to selftests. - uaccess/sockptr: fix the ignored_trailing logic in copy_struct_to_user() to behave as documented and the usize check in copy_struct_from_sockptr() for user pointers, and add copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr() helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC). - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real inode backing a dentry via d_real_inode(). On overlayfs the inode attached to the dentry doesn't carry the underlying device information; this is used by the filesystem restriction BPF program that was merged into systemd. - docs: add guidelines for submitting new filesystems, motivated by the maintenance burden abandoned and untestable filesystems impose on VFS developers, blocking infrastructure work like folio conversions and iomap migration. Fixes: - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() and drop the now-redundant assignments in callers. This began as a one-line dma-buf fix for a path_noexec() warning; a pseudo filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo() callers were audited: the only visible effect is on dma-buf where SB_I_NOEXEC silences the warning. - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs, qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a device with a sector size > PAGE_SIZE crashed roughly half of them; the rest had the same missing error handling pattern. Plus a follow-up releasing the superblock buffer_head when setting the minix v3 block size fails. - mount: honour SB_NOUSER in the new mount API. - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by switching the process-group paths of send_sigio() and send_sigurg() from read_lock(&tasklist_lock) to RCU, matching the single-PID path. - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing delegated NFS mounts (fsopen() in a container with the mount performed by a privileged daemon) that broke when non-init s_user_ns was tied to FS_USERNS_MOUNT. - selftests/namespaces: fix a hang in nsid_test where an unreaped grandchild kept the TAP pipe write-end open, a waitpid(-1) race in listns_efault_test, and a false FAIL on kernels without listns() where the tests should SKIP. - filelock: fix the break_lease() stub signature for CONFIG_FILE_LOCKING=n. - init/initramfs_test: wait for the async initramfs unpacking before running; the test and do_populate_rootfs() share the parser state. - fs/coredump: reduce redundant log noise in validate_coredump_safety(). - iomap: pass the correct length to fserror_report_io() in __iomap_write_begin(). - backing-file: fix the backing_file_open() kerneldoc. Cleanups: - initramfs: refactor the cpio hex header parsing to use hex2bin() instead of the hand-rolled simple_strntoul() which is reverted, and extend the initramfs KUnit tests to cover header fields with 0x prefixes. - Replace __get_free_pages() and friends with kmalloc()/kzalloc() across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2, isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the do_mounts init code - part of the larger work of replacing page allocator calls with kmalloc(). - Use clear_and_wake_up_bit() in unlock_buffer() and journal_end_buffer_io_sync() instead of open-coding the sequence. - Drop unused VFS exports: unexport drop_super_exclusive(), remove start_removing_user_path_at(), and fold __start_removing_path() into start_removing_path(). - fs/read_write: narrow the __kernel_write() export with EXPORT_SYMBOL_FOR_MODULES(). - vfs: uapi: retire octal and hex constants in favor of (1 << n) for the O_ flags. Finding a free bit for a new flag across the architectures was needlessly hard with the mixed bases. - dcache: add extra sanity checks of dead dentries in dentry_free() via a new DENTRY_WARN_ONCE() that also prints d_flags. - iov_iter: use kmemdup_array() in dup_iter() to harden the allocation against multiplication overflow. - fs/pipe: write to ->poll_usage only once. - vfs: remove an always-taken if-branch in find_next_fd(). - dcache: use kmalloc_flex() for struct external_name in __d_alloc(). - namei: use QSTR() instead of QSTR_INIT() in path_pts(). - sync_file_range: delete dead S_ISLNK code. - Comment fixes: retire a stale comment in fget_task_next() and fix assorted spelling mistakes" * tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits) backing-file: fix backing_file_open() kerneldoc parameter iomap: pass the correct len to fserror_report_io in __iomap_write_begin vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags bpf: add bpf_real_inode() kfunc fs/read_write: Do not export __kernel_write() to the entire world libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() mount: honour SB_NOUSER in the new mount API fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling selftests/pipe: add pipe_bench microbenchmark fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write fs: retire stale comment in fget_task_next() fs: fix spelling mistakes in comment bfs: replace get_zeroed_page() with kzalloc() binfmt_misc: replace __get_free_page() with kmalloc() configfs: replace __get_free_pages() with kzalloc() fs/namespace: use __getname() to allocate mntpath buffer fs/select: replace __get_free_page() with kmalloc() ...
2026-05-27jfs: replace __get_free_page() with kmalloc()Mike Rapoport (Microsoft)1-8/+8
jfs_readdir() allocates dirent_buf with __get_free_page(). kmalloc() is a better API for such use and it also provides better scalability and more debugging possibilities. Replace use of __get_free_page() with kmalloc(). Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260523-b4-fs-v1-9-275e36a83f0e@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-21jfs: handle set_blocksize failuresChristoph Hellwig1-1/+2
jfs uses buffer_heads, which don't handle block size > PAGE_SIZE well. Without this, mounting we will hit the BUG_ON(offset >= folio_size(folio)); in folio_set_bh on the first __bread_gfp call. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260511071701.2456211-5-hch@lst.de Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-11fs: Fix return in jfs_mkdir and orangefs_mkdirHongling Zeng1-1/+1
Return NULL instead of passing to ERR_PTR while err is zero Fixes these smatch warnings: - fs/jfs/namei.c:311 jfs_mkdir() warn: passing zero to 'ERR_PTR' - fs/orangefs/namei.c:369 orangefs_mkdir() warn: passing zero to 'ERR_PTR' Fixes: 88d5baf69082 ("Change inode_operations.mkdir to return struct dentry *") Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn> Link: https://patch.msgid.link/20260501071058.1243245-1-zenghongling@kylinos.cn Reviewed-by: Jori Koolstra <jkoolstra@xs4all.nl> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-15Merge tag 'jfs-7.1' of github.com:kleikamp/linux-shaggyLinus Torvalds10-30/+344
Pull jfs updates from Dave Kleikamp: "More robust data integrity checking and some fixes" * tag 'jfs-7.1' of github.com:kleikamp/linux-shaggy: jfs: avoid -Wtautological-constant-out-of-range-compare warning again JFS: always load filesystem UUID during mount jfs: hold LOG_LOCK on umount to avoid null-ptr-deref jfs: Set the lbmDone flag at the end of lbmIODone jfs: fix corrupted list in dbUpdatePMap jfs: add dmapctl integrity check to prevent invalid operations jfs: add dtpage integrity check to prevent index/pointer overflows jfs: add dtroot integrity check to prevent index out-of-bounds
2026-03-16jfs: avoid -Wtautological-constant-out-of-range-compare warning againArnd Bergmann1-5/+2
The comparison of an __s8 value against DTPAGEMAXSLOT is still trivially true, causing a harmless (default disabled) warning with clang: fs/jfs/jfs_dtree.c:4419:25: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 4419 | p->header.freelist >= DTPAGEMAXSLOT)) { | ~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~ I previously worked around two of these in commit 7833570dae83 ("jfs: avoid -Wtautological-constant-out-of-range-compare warning"), but now a new one has come up, so address the same way by dropping the redundant range check. Fixes: 119e448bb50a ("jfs: add dtpage integrity check to prevent index/pointer overflows") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11JFS: always load filesystem UUID during mountJoão Paredes1-1/+2
The filesystem UUID was only being loaded into super_block sb when an external journal device was in use. When mounting without an external journal, the UUID remained unset, which prevented the computation of a filesystem ID (fsid), which could be confirmed via `stat -f -c "%i"` and thus user space could not use fanotify correctly. A missing filesystem ID causes fanotify to return ENODEV when marking the filesystem for events like FAN_CREATE, FAN_DELETE, FAN_MOVED_TO, and FAN_MOVED_FROM. As a result, applications relying on fanotify could not monitor these events on JFS filesystems without an external journal. Moved the UUID initialization so it is always performed during mount, ensuring the superblock UUID is consistently available. Signed-off-by: João Paredes <joaommp@yahoo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: hold LOG_LOCK on umount to avoid null-ptr-derefHelen Koike3-9/+24
write_special_inodes() function iterate through the log->sb_list and access the sbi fields, which can be set to NULL concurrently by umount. Fix concurrency issue by holding LOG_LOCK and checking for NULL. Reported-by: syzbot+e14b1036481911ae4d77@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=e14b1036481911ae4d77 Signed-off-by: Helen Koike <koike@igalia.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: Set the lbmDone flag at the end of lbmIODoneEdward Adam Davis1-11/+7
In lbmRead(), the I/O event waited for by wait_event() finishes before it goes to sleep, and the lbmIODone() prematurely sets the flag to lbmDONE, thus ending the wait. This causes wait_event() to return before lbmREAD is cleared (because lbmDONE was set first), the premature return of wait_event() leads to the release of lbuf before lbmIODone() returns, thus triggering the use-after-free vulnerability reported in [1]. Moving the operation of setting the lbmDONE flag to after clearing lbmREAD in lbmIODone() avoids the use-after-free vulnerability reported in [1]. [1] BUG: KASAN: slab-use-after-free in rt_spin_lock+0x88/0x3e0 kernel/locking/spinlock_rt.c:56 Call Trace: blk_update_request+0x57e/0xe60 block/blk-mq.c:1007 blk_mq_end_request+0x3e/0x70 block/blk-mq.c:1169 blk_complete_reqs block/blk-mq.c:1244 [inline] blk_done_softirq+0x10a/0x160 block/blk-mq.c:1249 Allocated by task 6101: lbmLogInit fs/jfs/jfs_logmgr.c:1821 [inline] lmLogInit+0x3d0/0x19e0 fs/jfs/jfs_logmgr.c:1269 open_inline_log fs/jfs/jfs_logmgr.c:1175 [inline] lmLogOpen+0x4e1/0xfa0 fs/jfs/jfs_logmgr.c:1069 jfs_mount_rw+0xe9/0x670 fs/jfs/jfs_mount.c:257 jfs_fill_super+0x754/0xd80 fs/jfs/super.c:532 Freed by task 6101: kfree+0x1bd/0x900 mm/slub.c:6876 lbmLogShutdown fs/jfs/jfs_logmgr.c:1864 [inline] lmLogInit+0x1137/0x19e0 fs/jfs/jfs_logmgr.c:1415 open_inline_log fs/jfs/jfs_logmgr.c:1175 [inline] lmLogOpen+0x4e1/0xfa0 fs/jfs/jfs_logmgr.c:1069 jfs_mount_rw+0xe9/0x670 fs/jfs/jfs_mount.c:257 jfs_fill_super+0x754/0xd80 fs/jfs/super.c:532 Reported-by: syzbot+1d38eedcb25a3b5686a7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1d38eedcb25a3b5686a7 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: fix corrupted list in dbUpdatePMapYun Zhou2-2/+4
This patch resolves the "list_add corruption. next is NULL" Oops reported by syzkaller in dbUpdatePMap(). The root cause is uninitialized synclist nodes in struct metapage and struct TxBlock, plus improper list node removal using list_del() (which leaves nodes in an invalid state). This fixes the following Oops reported by syzkaller. list_add corruption. next is NULL. ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:28! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 UID: 0 PID: 122 Comm: jfsCommit Not tainted syzkaller #0 PREEMPT_{RT,(full)} Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/02/2025 RIP: 0010:__list_add_valid_or_report+0xc3/0x130 lib/list_debug.c:27 Code: 4c 89 f2 48 89 d9 e8 0c 88 a4 fc 90 0f 0b 48 c7 c7 20 de 3d 8b e8 fd 87 a4 fc 90 0f 0b 48 c7 c7 c0 de 3d 8b e8 ee 87 a4 fc 90 <0f> 0b 48 89 df e8 13 c3 7d fd 42 80 7c 2d 00 00 74 08 4c 89 e7 e8 RSP: 0018:ffffc9000395fa20 EFLAGS: 00010246 RAX: 0000000000000022 RBX: 0000000000000000 RCX: 270c5dfadb559700 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 00000000000f0000 R08: 0000000000000000 R09: 0000000000000000 R10: dffffc0000000000 R11: fffff5200072bee9 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000004 R15: 1ffff92000632266 FS: 0000000000000000(0000) GS:ffff888126ef9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056341fdb86c0 CR3: 0000000040a18000 CR4: 00000000003526f0 Call Trace: <TASK> __list_add_valid include/linux/list.h:96 [inline] __list_add include/linux/list.h:158 [inline] list_add include/linux/list.h:177 [inline] dbUpdatePMap+0x7e4/0xeb0 fs/jfs/jfs_dmap.c:577 txAllocPMap+0x57d/0x6b0 fs/jfs/jfs_txnmgr.c:2426 txUpdateMap+0x81e/0x9c0 fs/jfs/jfs_txnmgr.c:2364 txLazyCommit fs/jfs/jfs_txnmgr.c:2665 [inline] jfs_lazycommit+0x3f1/0xa10 fs/jfs/jfs_txnmgr.c:2734 kthread+0x711/0x8a0 kernel/kthread.c:463 ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- Reported-by: syzbot+4d0a0feb49c5138cac46@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4d0a0feb49c5138cac46 Tested-by: syzbot+4d0a0feb49c5138cac46@syzkaller.appspotmail.com Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: add dmapctl integrity check to prevent invalid operationsYun Zhou1-3/+111
Add check_dmapctl() to validate dmapctl structure integrity, focusing on preventing invalid operations caused by on-disk corruption. Key checks: - nleafs bounded by [0, LPERCTL] (maximum leaf nodes per dmapctl). - l2nleafs bounded by [0, L2LPERCTL] and consistent with nleafs (nleafs must be 2^l2nleafs). - leafidx must be exactly CTLLEAFIND (expected leaf index position). - height bounded by [0, L2LPERCTL >> 1] (valid tree height range). - budmin validity: NOFREE only if nleafs=0; otherwise >= BUDMIN. - Leaf nodes fit within stree array (leafidx + nleafs <= CTLTREESIZE). - Leaf node values are either non-negative or NOFREE. Invoked in dbAllocAG(), dbFindCtl(), dbAdjCtl() and dbExtendFS() when accessing dmapctl pages, catching corruption early before dmap operations trigger invalid memory access or logic errors. This fixes the following UBSAN warning. [58245.668090][T14017] ------------[ cut here ]------------ [58245.668103][T14017] UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:2641:11 [58245.668119][T14017] shift exponent 110 is too large for 32-bit type 'int' [58245.668137][T14017] CPU: 0 UID: 0 PID: 14017 Comm: 4c1966e88c28fa9 Tainted: G E 6.18.0-rc4-00253-g21ce5d4ba045-dirty #124 PREEMPT_{RT,(full)} [58245.668174][T14017] Tainted: [E]=UNSIGNED_MODULE [58245.668176][T14017] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [58245.668184][T14017] Call Trace: [58245.668200][T14017] <TASK> [58245.668208][T14017] dump_stack_lvl+0x189/0x250 [58245.668288][T14017] ? __pfx_dump_stack_lvl+0x10/0x10 [58245.668301][T14017] ? __pfx__printk+0x10/0x10 [58245.668315][T14017] ? lock_metapage+0x303/0x400 [jfs] [58245.668406][T14017] ubsan_epilogue+0xa/0x40 [58245.668422][T14017] __ubsan_handle_shift_out_of_bounds+0x386/0x410 [58245.668462][T14017] dbSplit+0x1f8/0x200 [jfs] [58245.668543][T14017] dbAdjCtl+0x34c/0xa20 [jfs] [58245.668628][T14017] dbAllocNear+0x2ee/0x3d0 [jfs] [58245.668710][T14017] dbAlloc+0x933/0xba0 [jfs] [58245.668797][T14017] ea_write+0x374/0xdd0 [jfs] [58245.668888][T14017] ? __pfx_ea_write+0x10/0x10 [jfs] [58245.668966][T14017] ? __jfs_setxattr+0x76e/0x1120 [jfs] [58245.669046][T14017] __jfs_setxattr+0xa01/0x1120 [jfs] [58245.669135][T14017] ? __pfx___jfs_setxattr+0x10/0x10 [jfs] [58245.669216][T14017] ? mutex_lock_nested+0x154/0x1d0 [58245.669252][T14017] ? __jfs_xattr_set+0xb9/0x170 [jfs] [58245.669333][T14017] __jfs_xattr_set+0xda/0x170 [jfs] [58245.669430][T14017] ? __pfx___jfs_xattr_set+0x10/0x10 [jfs] [58245.669509][T14017] ? xattr_full_name+0x6f/0x90 [58245.669546][T14017] ? jfs_xattr_set+0x33/0x60 [jfs] [58245.669636][T14017] ? __pfx_jfs_xattr_set+0x10/0x10 [jfs] [58245.669726][T14017] __vfs_setxattr+0x43c/0x480 [58245.669743][T14017] __vfs_setxattr_noperm+0x12d/0x660 [58245.669756][T14017] vfs_setxattr+0x16b/0x2f0 [58245.669768][T14017] ? __pfx_vfs_setxattr+0x10/0x10 [58245.669782][T14017] filename_setxattr+0x274/0x600 [58245.669795][T14017] ? __pfx_filename_setxattr+0x10/0x10 [58245.669806][T14017] ? getname_flags+0x1e5/0x540 [58245.669829][T14017] path_setxattrat+0x364/0x3a0 [58245.669840][T14017] ? __pfx_path_setxattrat+0x10/0x10 [58245.669859][T14017] ? __se_sys_chdir+0x1b9/0x280 [58245.669876][T14017] __x64_sys_lsetxattr+0xbf/0xe0 [58245.669888][T14017] do_syscall_64+0xfa/0xfa0 [58245.669901][T14017] ? lockdep_hardirqs_on+0x9c/0x150 [58245.669913][T14017] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [58245.669927][T14017] ? exc_page_fault+0xab/0x100 [58245.669937][T14017] entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported-by: syzbot+4c1966e88c28fa96e053@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4c1966e88c28fa96e053 Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: add dtpage integrity check to prevent index/pointer overflowsYun Zhou2-4/+107
Add check_dtpage() to validate dtpage_t integrity, focusing on preventing index/pointer overflows from on-disk corruption. Key checks: - maxslot must be exactly DTPAGEMAXSLOT (128) as defined for dtpage slot array. - freecnt bounded by [0, DTPAGEMAXSLOT-1] (slot[0] reserved for header). - freelist validity: -1 when freecnt=0; 1~DTPAGEMAXSLOT-1 when non-zero, with linked list checks (no duplicates, proper termination via next=-1). - stblindex bounds: must be within range that avoids overlapping with stbl itself (stblindex < DTPAGEMAXSLOT - stblsize). - nextindex bounded by stbl size (stblsize << L2DTSLOTSIZE). stbl entries validity: within 1~DTPAGEMAXSLOT-1, no duplicates(excluding invalid entries marked as -1). Invoked when loading dtpage (in BT_GETPAGE macro context) to catch corruption early before directory operations trigger out-of-bounds access. Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-11jfs: add dtroot integrity check to prevent index out-of-boundsYun Zhou3-0/+92
Add check_dtroot() to validate dtroot_t integrity, focusing on preventing index/pointer overflows from on-disk corruption. Key checks: - freecnt bounded by [0, DTROOTMAXSLOT-1] (slot[0] reserved for header). - freelist validity: -1 when freecnt=0; 1~DTROOTMAXSLOT-1 when non-zero, with linked list checks (no duplicates, proper termination via next=-1). - stbl bounds: nextindex within stbl array size; entries within 0~8, no duplicates (excluding idx=0). Invoked in copy_from_dinode() when loading directory inodes, catching corruption early before directory operations trigger out-of-bounds access. This fixes the following UBSAN warning. [ 101.832754][ T5960] ------------[ cut here ]------------ [ 101.832762][ T5960] UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:3713:8 [ 101.832792][ T5960] index -1 is out of range for type 'struct dtslot[128]' [ 101.832807][ T5960] CPU: 2 UID: 0 PID: 5960 Comm: 5f7f0caf9979e9d Tainted: G E 6.18.0-rc4-00250-g2603eb907f03 #119 PREEMPT_{RT,(full [ 101.832817][ T5960] Tainted: [E]=UNSIGNED_MODULE [ 101.832819][ T5960] Hardware name: QEMU Ubuntu 25.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 101.832823][ T5960] Call Trace: [ 101.832833][ T5960] <TASK> [ 101.832838][ T5960] dump_stack_lvl+0x189/0x250 [ 101.832909][ T5960] ? __pfx_dump_stack_lvl+0x10/0x10 [ 101.832925][ T5960] ? __pfx__printk+0x10/0x10 [ 101.832934][ T5960] ? rt_mutex_slowunlock+0x493/0x8a0 [ 101.832959][ T5960] ubsan_epilogue+0xa/0x40 [ 101.832966][ T5960] __ubsan_handle_out_of_bounds+0xe9/0xf0 [ 101.833007][ T5960] dtInsertEntry+0x936/0x1430 [jfs] [ 101.833094][ T5960] dtSplitPage+0x2c8b/0x3ed0 [jfs] [ 101.833177][ T5960] ? __pfx_rt_mutex_slowunlock+0x10/0x10 [ 101.833193][ T5960] dtInsert+0x109b/0x6000 [jfs] [ 101.833283][ T5960] ? rt_mutex_slowunlock+0x493/0x8a0 [ 101.833296][ T5960] ? __pfx_rt_mutex_slowunlock+0x10/0x10 [ 101.833307][ T5960] ? rt_spin_unlock+0x161/0x200 [ 101.833315][ T5960] ? __pfx_dtInsert+0x10/0x10 [jfs] [ 101.833391][ T5960] ? txLock+0xaf9/0x1cb0 [jfs] [ 101.833477][ T5960] ? dtInitRoot+0x22a/0x670 [jfs] [ 101.833556][ T5960] jfs_mkdir+0x6ec/0xa70 [jfs] [ 101.833636][ T5960] ? __pfx_jfs_mkdir+0x10/0x10 [jfs] [ 101.833721][ T5960] ? generic_permission+0x2e5/0x690 [ 101.833760][ T5960] ? bpf_lsm_inode_mkdir+0x9/0x20 [ 101.833776][ T5960] vfs_mkdir+0x306/0x510 [ 101.833786][ T5960] do_mkdirat+0x247/0x590 [ 101.833795][ T5960] ? __pfx_do_mkdirat+0x10/0x10 [ 101.833804][ T5960] ? getname_flags+0x1e5/0x540 [ 101.833815][ T5960] __x64_sys_mkdir+0x6c/0x80 [ 101.833823][ T5960] do_syscall_64+0xfa/0xfa0 [ 101.833832][ T5960] ? lockdep_hardirqs_on+0x9c/0x150 [ 101.833840][ T5960] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 101.833847][ T5960] ? exc_page_fault+0xab/0x100 [ 101.833856][ T5960] entry_SYSCALL_64_after_hwframe+0x77/0x7f Signed-off-by: Yun Zhou <yun.zhou@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-03-06treewide: change inode->i_ino from unsigned long to u64Jeff Layton3-3/+3
On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds4-8/+8
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook5-10/+10
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'jfs-7.0' of github.com:kleikamp/linux-shaggyLinus Torvalds3-4/+7
Pull jfs updates from Dave Kleikamp: "Just a handful of minor jfs fixes" * tag 'jfs-7.0' of github.com:kleikamp/linux-shaggy: jfs: avoid -Wtautological-constant-out-of-range-compare warning jfs: Add missing set_freezable() for freezable kthread jfs: nlink overflow in jfs_rename
2026-02-09Merge tag 'vfs-7.0-rc1.misc' of ↵Linus Torvalds1-7/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains a mix of VFS cleanups, performance improvements, API fixes, documentation, and a deprecation notice. Scalability and performance: - Rework pid allocation to only take pidmap_lock once instead of twice during alloc_pid(), improving thread creation/teardown throughput by 10-16% depending on false-sharing luck. Pad the namespace refcount to reduce false-sharing - Track file lock presence via a flag in ->i_opflags instead of reading ->i_flctx, avoiding false-sharing with ->i_readcount on open/close hot paths. Measured 4-16% improvement on 24-core open-in-a-loop benchmarks - Use a consume fence in locks_inode_context() to match the store-release/load-consume idiom, eliminating a hardware fence on some architectures - Annotate cdev_lock with __cacheline_aligned_in_smp to prevent false-sharing - Remove a redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu() that never fires since the caller already verifies it, eliminating a 100% mispredicted branch - Fix a 100% mispredicted likely() in devcgroup_inode_permission() that became wrong after a prior code reorder Bug fixes and correctness: - Make insert_inode_locked() wait for inode destruction instead of skipping, fixing a corner case where two matching inodes could exist in the hash - Move f_mode initialization before file_ref_init() in alloc_file() to respect the SLAB_TYPESAFE_BY_RCU ordering contract - Add a WARN_ON_ONCE guard in try_to_free_buffers() for folios with no buffers attached, preventing a null pointer dereference when AS_RELEASE_ALWAYS is set but no release_folio op exists - Fix select restart_block to store end_time as timespec64, avoiding truncation of tv_sec on 32-bit architectures - Make dump_inode() use get_kernel_nofault() to safely access inode and superblock fields, matching the dump_mapping() pattern API modernization: - Make posix_acl_to_xattr() allocate the buffer internally since every single caller was doing it anyway. Reduces boilerplate and unnecessary error checking across ~15 filesystems - Replace deprecated simple_strtoul() with kstrtoul() for the ihash_entries, dhash_entries, mhash_entries, and mphash_entries boot parameters, adding proper error handling - Convert chardev code to use guard(mutex) and __free(kfree) cleanup patterns - Replace min_t() with min() or umin() in VFS code to avoid silently truncating unsigned long to unsigned int - Gate LOOKUP_RCU assertions behind CONFIG_DEBUG_VFS since callers already check the flag Deprecation: - Begin deprecating legacy BSD process accounting (acct(2)). The interface has numerous footguns and better alternatives exist (eBPF) Documentation: - Fix and complete kernel-doc for struct export_operations, removing duplicated documentation between ReST and source - Fix kernel-doc warnings for __start_dirop() and ilookup5_nowait() Testing: - Add a kunit test for initramfs cpio handling of entries with filesize > PATH_MAX Misc: - Add missing <linux/init_task.h> include in fs_struct.c" * tag 'vfs-7.0-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) posix_acl: make posix_acl_to_xattr() alloc the buffer fs: make insert_inode_locked() wait for inode destruction initramfs_test: kunit test for cpio.filesize > PATH_MAX fs: improve dump_inode() to safely access inode fields fs: add <linux/init_task.h> for 'init_fs' docs: exportfs: Use source code struct documentation fs: move initializing f_mode before file_ref_init() exportfs: Complete kernel-doc for struct export_operations exportfs: Mark struct export_operations functions at kernel-doc exportfs: Fix kernel-doc output for get_name() acct(2): begin the deprecation of legacy BSD process accounting device_cgroup: remove branch hint after code refactor VFS: fix __start_dirop() kernel-doc warnings fs: Describe @isnew parameter in ilookup5_nowait() fs/namei: Remove redundant DCACHE_MANAGED_DENTRY check in __follow_mount_rcu fs: only assert on LOOKUP_RCU when built with CONFIG_DEBUG_VFS select: store end_time as timespec64 in restart block chardev: Switch to guard(mutex) and __free(kfree) namespace: Replace simple_strtoul with kstrtoul to parse boot params dcache: Replace simple_strtoul with kstrtoul in set_dhash_entries ...
2026-02-02jfs: avoid -Wtautological-constant-out-of-range-compare warningArnd Bergmann1-2/+2
A recent change for the range check started triggering a clang warning: fs/jfs/jfs_dtree.c:2906:31: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 2906 | if (stbl[i] < 0 || stbl[i] >= DTPAGEMAXSLOT) { | ~~~~~~~ ^ ~~~~~~~~~~~~~ fs/jfs/jfs_dtree.c:3111:30: error: result of comparison of constant 128 with expression of type 's8' (aka 'signed char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 3111 | if (stbl[0] < 0 || stbl[0] >= DTPAGEMAXSLOT) { | ~~~~~~~ ^ ~~~~~~~~~~~~~ Both the old and the new check were useless, but the previous version apparently did not lead to the warning. Remove the extraneous range check for simplicity. Fixes: cafc6679824a ("jfs: replace hardcoded magic number with DTPAGEMAXSLOT constant") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2026-01-16posix_acl: make posix_acl_to_xattr() alloc the bufferMiklos Szeredi1-7/+2
Without exception all caller do that. So move the allocation into the helper. This reduces boilerplate and removes unnecessary error checking. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260115122341.556026-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12jfs: add setlease file operationJeff Layton2-0/+4
Add the setlease file_operation to jfs_file_operations and jfs_dir_operations, pointing to generic_setlease. A future patch will change the default behavior to reject lease attempts with -EINVAL when there is no setlease file operation defined. Add generic_setlease to retain the ability to set leases on this filesystem. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-12-ea4dec9b67fa@kernel.org Acked-by: Richard Weinberger <richard@nod.at> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-02jfs: Add missing set_freezable() for freezable kthreadHaotian Zhang1-0/+1
The jfsIOWait() thread calls try_to_freeze() but lacks set_freezable(), causing it to remain non-freezable by default. This prevents proper freezing during system suspend. Add set_freezable() to make the thread freezable as intended. Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-12-02jfs: nlink overflow in jfs_renameJori Koolstra1-2/+4
If nlink is maximal for a directory (-1) and inside that directory you perform a rename for some child directory (not moving from the parent), then the nlink of the first directory is first incremented and later decremented. Normally this is fine, but when nlink = -1 this causes a wrap around to 0, and then drop_nlink issues a warning. After applying the patch syzbot no longer issues any warnings. I also ran some basic fs tests to look for any regressions. Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> Reported-by: syzbot+9131ddfd7870623b719f@syzkaller.appspotmail.com Closes: https://syzbot.org/bug?extid=9131ddfd7870623b719f Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-12-01Merge tag 'vfs-6.19-rc1.inode' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...
2025-10-29jfs: Rename _inline to avoid conflict with clang's '-fms-extensions'Nathan Chancellor1-3/+3
Building fs/jfs with clang and '-fms-extensions' errors with: In file included from fs/jfs/jfs_unicode.c:8: fs/jfs/jfs_incore.h:86:13: error: type name does not allow function specifier to be specified 86 | unchar _inline[128]; | ^ fs/jfs/jfs_incore.h:86:20: error: expected member name or ';' after declaration specifiers 86 | unchar _inline[128]; | ~~~~~~~~~~~~~~^ '-fms-extensions' in clang enables several other Microsoft specific keywords such as _inline [1], presumably for compatibility with MSVC, as Microsoft's documentation [2] mentions: For compatibility with previous versions, _inline and _forceinline are synonyms for __inline and __forceinline, respectively Rename the _inline array in 'struct jfs_inode_info' to _inline_sym to avoid this conflict, which is not a large workaround as this member is only ever referred to via the i_inline macro. Link: https://github.com/llvm/llvm-project/blob/249883d0c5883996bed038cd82a8999f342994c9/clang/include/clang/Basic/TokenKinds.def#L744-L79 [1] Link: https://learn.microsoft.com/en-us/cpp/c-language/inline-functions [2] Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Link: https://patch.msgid.link/20251023-jfs-fix-conflict-with-clang-ms-ext-v1-1-e219d59a1e68@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-10-20Coccinelle-based conversion to use ->i_state accessorsMateusz Guzik3-4/+4
All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 | flag2) @@ expression inode, flags; @@ - inode->i_state |= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-03Merge tag 'jfs-6.18' of github.com:kleikamp/linux-shaggyLinus Torvalds5-13/+19
Pull jfs updates from Dave Kleikamp: "A few fixes and cleanups for JFS" * tag 'jfs-6.18' of github.com:kleikamp/linux-shaggy: jfs: replace hardcoded magic number with DTPAGEMAXSLOT constant JFS: Remove redundant 0 value initialization JFS: Remove unnecessary parentheses jfs: fix uninitialized waitqueue in transaction manager jfs: Verify inode mode when loading from disk
2025-09-18jfs: replace hardcoded magic number with DTPAGEMAXSLOT constantZheng Yu1-2/+2
Replace hardcoded value 127 with DTPAGEMAXSLOT constant in boundary checks within jfs_readdir() and dtReadFirst(). This improves code maintainability and ensures consistency with the defined maximum slot value. Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-18JFS: Remove redundant 0 value initializationLiao Yuanhong1-1/+0
The jfs_log struct is already zeroed by kzalloc(). It's redundant to initialize dummy_log->base to 0. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-18JFS: Remove unnecessary parenthesesLiao Yuanhong1-5/+5
When using &, it's unnecessary to have parentheses afterward. Remove redundant parentheses to enhance readability. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-18jfs: fix uninitialized waitqueue in transaction managerShaurya Rane1-4/+5
The transaction manager initialization in txInit() was not properly initializing TxBlock[0].waitor waitqueue, causing a crash when txEnd(0) is called on read-only filesystems. When a filesystem is mounted read-only, txBegin() returns tid=0 to indicate no transaction. However, txEnd(0) still gets called and tries to access TxBlock[0].waitor via tid_to_tblock(0), but this waitqueue was never initialized because the initialization loop started at index 1 instead of 0. This causes a 'non-static key' lockdep warning and system crash: INFO: trying to register non-static key in txEnd Fix by ensuring all transaction blocks including TxBlock[0] have their waitqueues properly initialized during txInit(). Reported-by: syzbot+c4f3462d8b2ad7977bea@syzkaller.appspotmail.com Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-18jfs: Verify inode mode when loading from diskTetsuo Handa1-1/+7
The inode mode loaded from corrupted disk can be invalid. Do like what commit 0a9e74051313 ("isofs: Verify inode mode when loading from disk") does. Reported-by: syzbot <syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-09-13treewide: remove MIGRATEPAGE_SUCCESSDavid Hildenbrand1-4/+4
At this point MIGRATEPAGE_SUCCESS is misnamed for all folio users, and now that we remove MIGRATEPAGE_UNMAP, it's really the only "success" return value that the code uses and expects. Let's just get rid of MIGRATEPAGE_SUCCESS completely and just use "0" for success. Link: https://lkml.kernel.org/r/20250811143949.1117439-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> [mm] Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> [jfs] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Byungchul Park <byungchul@sk.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Eugenio Pé rez <eperezma@redhat.com> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mathew Brost <matthew.brost@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Lance Yang <lance.yang@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-31Merge tag 'jfs-6.17' of github.com:kleikamp/linux-shaggyLinus Torvalds5-69/+96
Pull jfs updates from Dave Kleikamp: "Fixes and cleanups for JFS filesystem" * tag 'jfs-6.17' of github.com:kleikamp/linux-shaggy: jfs: fix metapage reference count leak in dbAllocCtl jfs: stop using write_cache_pages jfs: truncate good inode pages when hard link is 0 jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage() jfs: Regular file corruption check jfs: upper bound check of tree index in dbAllocAG
2025-07-29jfs: fix metapage reference count leak in dbAllocCtlZheng Yu1-1/+3
In dbAllocCtl(), read_metapage() increases the reference count of the metapage. However, when dp->tree.budmin < 0, the function returns -EIO without calling release_metapage() to decrease the reference count, leading to a memory leak. Add release_metapage(mp) before the error return to properly manage the metapage reference count and prevent the leak. Fixes: a5f5e4698f8abbb25fe4959814093fb5bfa1aa9d ("jfs: fix shift-out-of-bounds in dbSplit") Signed-off-by: Zheng Yu <zheng.yu@northwestern.edu> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-28Merge tag 'vfs-6.17-rc1.fileattr' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fileattr updates from Christian Brauner: "This introduces the new file_getattr() and file_setattr() system calls after lengthy discussions. Both system calls serve as successors and extensible companions to the FS_IOC_FSGETXATTR and FS_IOC_FSSETXATTR system calls which have started to show their age in addition to being named in a way that makes it easy to conflate them with extended attribute related operations. These syscalls allow userspace to set filesystem inode attributes on special files. One of the usage examples is the XFS quota projects. XFS has project quotas which could be attached to a directory. All new inodes in these directories inherit project ID set on parent directory. The project is created from userspace by opening and calling FS_IOC_FSSETXATTR on each inode. This is not possible for special files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left with empty project ID. Those inodes then are not shown in the quota accounting but still exist in the directory. This is not critical but in the case when special files are created in the directory with already existing project quota, these new inodes inherit extended attributes. This creates a mix of special files with and without attributes. Moreover, special files with attributes don't have a possibility to become clear or change the attributes. This, in turn, prevents userspace from re-creating quota project on these existing files. In addition, these new system calls allow the implementation of additional attributes that we couldn't or didn't want to fit into the legacy ioctls anymore" * tag 'vfs-6.17-rc1.fileattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: tighten a sanity check in file_attr_to_fileattr() tree-wide: s/struct fileattr/struct file_kattr/g fs: introduce file_getattr and file_setattr syscalls fs: prepare for extending file_get/setattr() fs: make vfs_fileattr_[get|set] return -EOPNOTSUPP selinux: implement inode_file_[g|s]etattr hooks lsm: introduce new hooks for setting/getting inode fsxattr fs: split fileattr related helpers into separate file
2025-07-28Merge tag 'vfs-6.17-rc1.mmap_prepare' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_*mmap() users to .mmap_prepare() fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28Merge tag 'vfs-6.17-rc1.misc' of ↵Linus Torvalds1-7/+9
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc VFS updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add ext4 IOCB_DONTCACHE support This refactors the address_space_operations write_begin() and write_end() callbacks to take const struct kiocb * as their first argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the filesystem's buffered I/O path. Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and advertises support via the FOP_DONTCACHE file operation flag. Additionally, the i915 driver's shmem write paths are updated to bypass the legacy write_begin/write_end interface in favor of directly calling write_iter() with a constructed synchronous kiocb. Another i915 change replaces a manual write loop with kernel_write() during GEM shmem object creation. Cleanups: - don't duplicate vfs_open() in kernel_file_open() - proc_fd_getattr(): don't bother with S_ISDIR() check - fs/ecryptfs: replace snprintf with sysfs_emit in show function - vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() - filelock: add new locks_wake_up_waiter() helper - fs: Remove three arguments from block_write_end() - VFS: change old_dir and new_dir in struct renamedata to dentrys - netfs: Remove unused declaration netfs_queue_write_request() Fixes: - eventpoll: Fix semi-unbounded recursion - eventpoll: fix sphinx documentation build warning - fs/read_write: Fix spelling typo - fs: annotate data race between poll_schedule_timeout() and pollwake() - fs/pipe: set FMODE_NOWAIT in create_pipe_files() - docs/vfs: update references to i_mutex to i_rwsem - fs/buffer: remove comment about hard sectorsize - fs/buffer: remove the min and max limit checks in __getblk_slow() - fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable - fs_context: fix parameter name in infofc() macro - fs: Prevent file descriptor table allocations exceeding INT_MAX" * tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits) netfs: Remove unused declaration netfs_queue_write_request() eventpoll: fix sphinx documentation build warning ext4: support uncached buffered I/O mm/pagemap: add write_begin_get_folio() helper function fs: change write_begin/write_end interface to take struct kiocb * drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter drm/i915: Use kernel_write() in shmem object create eventpoll: Fix semi-unbounded recursion vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable fs/buffer: remove the min and max limit checks in __getblk_slow() fs: Prevent file descriptor table allocations exceeding INT_MAX fs: Remove three arguments from block_write_end() fs/ecryptfs: replace snprintf with sysfs_emit in show function fs: annotate suspected data race between poll_schedule_timeout() and pollwake() docs/vfs: update references to i_mutex to i_rwsem fs/buffer: remove comment about hard sectorsize fs_context: fix parameter name in infofc() macro VFS: change old_dir and new_dir in struct renamedata to dentrys proc_fd_getattr(): don't bother with S_ISDIR() check ...
2025-07-16fs: change write_begin/write_end interface to take struct kiocb *Taotao Chen1-7/+9
Change the address_space_operations callbacks write_begin() and write_end() to take struct kiocb * as the first argument instead of struct file *. Update all affected function prototypes, implementations, call sites, and related documentation across VFS, filesystems, and block layer. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-14jfs: stop using write_cache_pagesChristoph Hellwig1-3/+5
Stop using the obsolete write_cache_pages and use writeback_iter directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: truncate good inode pages when hard link is 0Lizhi Xu1-1/+1
The fileset value of the inode copy from the disk by the reproducer is AGGR_RESERVED_I. When executing evict, its hard link number is 0, so its inode pages are not truncated. This causes the bugon to be triggered when executing clear_inode() because nrpages is greater than 0. Reported-by: syzbot+6e516bb515d93230bc7b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6e516bb515d93230bc7b Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: jfs_xtree: replace XT_GETPAGE macro with xt_getpage()Suchit Karunakaran1-64/+78
Replace legacy XT_GETPAGE macro with an inline function that returns a xtpage_t pointer and update all instances of XT_GETPAGE in jfs_xtree.c Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com> Simplified xt_getpage by removing size and rc arguments. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: Regular file corruption checkEdward Adam Davis1-0/+3
The reproducer builds a corrupted file on disk with a negative i_size value. Add a check when opening this file to avoid subsequent operation failures. Reported-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=630f6d40b3ccabc8e96e Tested-by: syzbot+630f6d40b3ccabc8e96e@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-14jfs: upper bound check of tree index in dbAllocAGArnaud Lecomte1-0/+6
When computing the tree index in dbAllocAG, we never check if we are out of bounds realative to the size of the stree. This could happen in a scenario where the filesystem metadata are corrupted. Reported-by: syzbot+cffd18309153948f3c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cffd18309153948f3c3e Tested-by: syzbot+cffd18309153948f3c3e@syzkaller.appspotmail.com Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-07-04tree-wide: s/struct fileattr/struct file_kattr/gChristian Brauner2-4/+4
Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-17fs: convert simple use of generic_file_*_mmap() to .mmap_prepare()Lorenzo Stoakes1-1/+1
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"), the f_op->mmap() hook has been deprecated in favour of f_op->mmap_prepare(). We have provided generic .mmap_prepare() equivalents, so update all file systems that specify these directly in their file_operations structures. This updates 9p, adfs, affs, bfs, fat, hfs, hfsplus, hostfs, hpfs, jffs2, jfs, minix, omfs, ramfs and ufs file systems directly. It updates generic_ro_fops which impacts qnx4, cramfs, befs, squashfs, frebxfs, qnx6, efs, romfs, erofs and isofs file systems. There are remaining file systems which use generic hooks in a less direct way which we address in a subsequent commit. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/c7dc90e44a9e75e750939ea369290d6e441a18e6.1750099179.git.lorenzo.stoakes@oracle.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-10new helper: set_default_d_op()Al Viro1-1/+1
... to be used instead of manually assigning to ->s_d_op. All in-tree filesystem converted (and field itself is renamed, so any out-of-tree ones in need of conversion will be caught by compiler). Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-05-31Merge tag 'mm-stable-2025-05-31-14-50' of ↵Linus Torvalds1-0/+106
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
2025-05-12jfs: implement migrate_folio for jfs_metapage_aopsShivank Garg1-0/+106
Add the missing migrate_folio operation to jfs_metapage_aops to fix warnings during memory compaction. These warnings were introduced by commit 7ee3647243e5 ("migrate: Remove call to ->writepage") which added explicit warnings when filesystems don't implement migrate_folio. System reports following warnings: jfs_metapage_aops does not implement migrate_folio WARNING: CPU: 0 PID: 6870 at mm/migrate.c:955 fallback_migrate_folio mm/migrate.c:953 [inline] WARNING: CPU: 0 PID: 6870 at mm/migrate.c:955 move_to_new_folio+0x70e/0x840 mm/migrate.c:1007 Implement metapage_migrate_folio() which handles both single and multiple metapages per page configurations. [shivankg@amd.com: change comment style] Link: https://lkml.kernel.org/r/1967593d-8084-4a4a-b384-35d5adc54eb4@amd.com [akpm@linux-foundation.org: fix build] [shivankg@amd.com: remove redundant NULL check in __metapage_migrate_folio()] Link: https://lkml.kernel.org/r/a67db238-0ca6-4725-abb2-dc092de87e1b@amd.com Link: https://lkml.kernel.org/r/20250430100150.279751-3-shivankg@amd.com Fixes: 35474d52c605 ("jfs: Convert metapage_writepage to metapage_write_folio") Signed-off-by: Shivank Garg <shivankg@amd.com> Reported-by: syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67faff52.050a0220.379d84.001b.GAE@google.com Tested-by: syzbot+8bb6fd945af4e0ad9299@syzkaller.appspotmail.com Cc: Alistair Popple <apopple@nvidia.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-03jfs: fix array-index-out-of-bounds read in add_missing_indicesAditya Dutt1-3/+15
stbl is s8 but it must contain offsets into slot which can go from 0 to 127. Added a bound check for that error and return -EIO if the check fails. Also make jfs_readdir return with error if add_missing_indices returns with an error. Reported-by: syzbot+b974bd41515f770c608b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com./bug?extid=b974bd41515f770c608b Signed-off-by: Aditya Dutt <duttaditya18@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-04-03jfs: Fix null-ptr-deref in jfs_ioc_trimDylan Wolff1-1/+2
[ Syzkaller Report ] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000087: 0000 [#1 KASAN: null-ptr-deref in range [0x0000000000000438-0x000000000000043f] CPU: 2 UID: 0 PID: 10614 Comm: syz-executor.0 Not tainted 6.13.0-rc6-gfbfd64d25c7a-dirty #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Sched_ext: serialise (enabled+all), task: runnable_at=-30ms RIP: 0010:jfs_ioc_trim+0x34b/0x8f0 Code: e7 e8 59 a4 87 fe 4d 8b 24 24 4d 8d bc 24 38 04 00 00 48 8d 93 90 82 fe ff 4c 89 ff 31 f6 RSP: 0018:ffffc900055f7cd0 EFLAGS: 00010206 RAX: 0000000000000087 RBX: 00005866a9e67ff8 RCX: 000000000000000a RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000001 RBP: dffffc0000000000 R08: ffff88807c180003 R09: 1ffff1100f830000 R10: dffffc0000000000 R11: ffffed100f830001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000438 FS: 00007fe520225640(0000) GS:ffff8880b7e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005593c91b2c88 CR3: 000000014927c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body+0x61/0xb0 ? die_addr+0xb1/0xe0 ? exc_general_protection+0x333/0x510 ? asm_exc_general_protection+0x26/0x30 ? jfs_ioc_trim+0x34b/0x8f0 jfs_ioctl+0x3c8/0x4f0 ? __pfx_jfs_ioctl+0x10/0x10 ? __pfx_jfs_ioctl+0x10/0x10 __se_sys_ioctl+0x269/0x350 ? __pfx___se_sys_ioctl+0x10/0x10 ? do_syscall_64+0xfb/0x210 do_syscall_64+0xee/0x210 ? syscall_exit_to_user_mode+0x1e0/0x330 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fe51f4903ad Code: c3 e8 a7 2b 00 00 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d RSP: 002b:00007fe5202250c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fe51f5cbf80 RCX: 00007fe51f4903ad RDX: 0000000020000680 RSI: 00000000c0185879 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe520225640 R13: 000000000000000e R14: 00007fe51f44fca0 R15: 00007fe52021d000 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:jfs_ioc_trim+0x34b/0x8f0 Code: e7 e8 59 a4 87 fe 4d 8b 24 24 4d 8d bc 24 38 04 00 00 48 8d 93 90 82 fe ff 4c 89 ff 31 f6 RSP: 0018:ffffc900055f7cd0 EFLAGS: 00010206 RAX: 0000000000000087 RBX: 00005866a9e67ff8 RCX: 000000000000000a RDX: 0000000000000001 RSI: 0000000000000004 RDI: 0000000000000001 RBP: dffffc0000000000 R08: ffff88807c180003 R09: 1ffff1100f830000 R10: dffffc0000000000 R11: ffffed100f830001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000438 FS: 00007fe520225640(0000) GS:ffff8880b7e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005593c91b2c88 CR3: 000000014927c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Kernel panic - not syncing: Fatal exception [ Analysis ] We believe that we have found a concurrency bug in the `fs/jfs` module that results in a null pointer dereference. There is a closely related issue which has been fixed: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d6c1b3599b2feb5c7291f5ac3a36e5fa7cedb234 ... but, unfortunately, the accepted patch appears to still be susceptible to a null pointer dereference under some interleavings. To trigger the bug, we think that `JFS_SBI(ipbmap->i_sb)->bmap` is set to NULL in `dbFreeBits` and then dereferenced in `jfs_ioc_trim`. This bug manifests quite rarely under normal circumstances, but is triggereable from a syz-program. Reported-and-tested-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg> Reported-and-tested-by: Jiacheng Xu <stitch@zju.edu.cn> Signed-off-by: Dylan J. Wolff<wolffd@comp.nus.edu.sg> Signed-off-by: Jiacheng Xu <stitch@zju.edu.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-04-03jfs: validate AG parameters in dbMount() to prevent crashesVasiliy Kovalev1-1/+5
Validate db_agheight, db_agwidth, and db_agstart in dbMount to catch corrupted metadata early and avoid undefined behavior in dbAllocAG. Limits are derived from L2LPERCTL, LPERCTL/MAXAG, and CTLTREESIZE: - agheight: 0 to L2LPERCTL/2 (0 to 5) ensures shift (L2LPERCTL - 2*agheight) >= 0. - agwidth: 1 to min(LPERCTL/MAXAG, 2^(L2LPERCTL - 2*agheight)) ensures agperlev >= 1. - Ranges: 1-8 (agheight 0-3), 1-4 (agheight 4), 1 (agheight 5). - LPERCTL/MAXAG = 1024/128 = 8 limits leaves per AG; 2^(10 - 2*agheight) prevents division to 0. - agstart: 0 to CTLTREESIZE-1 - agwidth*(MAXAG-1) keeps ti within stree (size 1365). - Ranges: 0-1237 (agwidth 1), 0-348 (agwidth 8). UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:1400:9 shift exponent -335544310 is negative CPU: 0 UID: 0 PID: 5822 Comm: syz-executor130 Not tainted 6.14.0-rc5-syzkaller #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_shift_out_of_bounds+0x3c8/0x420 lib/ubsan.c:468 dbAllocAG+0x1087/0x10b0 fs/jfs/jfs_dmap.c:1400 dbDiscardAG+0x352/0xa20 fs/jfs/jfs_dmap.c:1613 jfs_ioc_trim+0x45a/0x6b0 fs/jfs/jfs_discard.c:105 jfs_ioctl+0x2cd/0x3e0 fs/jfs/ioctl.c:131 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Cc: stable@vger.kernel.org Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+fe8264911355151c487f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fe8264911355151c487f Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-03-27Merge tag 'jfs-6.14' of github.com:kleikamp/linux-shaggyLinus Torvalds7-40/+50
Pull jfs updates from David Kleikamp: "Various bug fixes and cleanups for JFS" * tag 'jfs-6.14' of github.com:kleikamp/linux-shaggy: jfs: add index corruption check to DT_GETPAGE() fs/jfs: consolidate sanity checking in dbMount jfs: add sanity check for agwidth in dbMount jfs: Prevent copying of nlink with value 0 from disk inode fs/jfs: Prevent integer overflow in AG size calculation fs/jfs: cast inactags to s64 to prevent potential overflow jfs: Fix uninit-value access of imap allocated in the diMount() function jfs: fix slab-out-of-bounds read in ea_get() jfs: add check read-only before truncation in jfs_truncate_nolock() jfs: add check read-only before txBeginAnon() call jfs: reject on-disk inodes of an unsupported type jfs: Remove reference to bh->b_page jfs: Delete a couple tabs in jfs_reconfigure()
2025-03-11jfs: add index corruption check to DT_GETPAGE()Roman Smirnov1-1/+2
If the file system is corrupted, the header.stblindex variable may become greater than 127. Because of this, an array access out of bounds may occur: ------------[ cut here ]------------ UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:3096:10 index 237 is out of range for type 'struct dtslot[128]' CPU: 0 UID: 0 PID: 5822 Comm: syz-executor740 Not tainted 6.13.0-rc4-syzkaller-00110-g4099a71718b0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:231 [inline] __ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429 dtReadFirst+0x622/0xc50 fs/jfs/jfs_dtree.c:3096 dtReadNext fs/jfs/jfs_dtree.c:3147 [inline] jfs_readdir+0x9aa/0x3c50 fs/jfs/jfs_dtree.c:2862 wrap_directory_iterator+0x91/0xd0 fs/readdir.c:65 iterate_dir+0x571/0x800 fs/readdir.c:108 __do_sys_getdents64 fs/readdir.c:403 [inline] __se_sys_getdents64+0x1e2/0x4b0 fs/readdir.c:389 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> ---[ end trace ]--- Add a stblindex check for corruption. Reported-by: syzbot <syzbot+9120834fc227768625ba@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=9120834fc227768625ba Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-27Change inode_operations.mkdir to return struct dentry *NeilBrown1-4/+4
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-20fs/jfs: consolidate sanity checking in dbMountDave Kleikamp1-28/+9
Sanity checks have been added to dbMount as individual if clauses with identical error handling. Move these all into one clause. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-20jfs: add sanity check for agwidth in dbMountEdward Adam Davis1-0/+4
The width in dmapctl of the AG is zero, it trigger a divide error when calculating the control page level in dbAllocAG. To avoid this issue, add a check for agwidth in dbAllocAG. Reported-and-tested-by: syzbot+7c808908291a569281a9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7c808908291a569281a9 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-20jfs: Prevent copying of nlink with value 0 from disk inodeEdward Adam Davis1-1/+1
syzbot report a deadlock in diFree. [1] When calling "ioctl$LOOP_SET_STATUS64", the offset value passed in is 4, which does not match the mounted loop device, causing the mapping of the mounted loop device to be invalidated. When creating the directory and creating the inode of iag in diReadSpecial(), read the page of fixed disk inode (AIT) in raw mode in read_metapage(), the metapage data it returns is corrupted, which causes the nlink value of 0 to be assigned to the iag inode when executing copy_from_dinode(), which ultimately causes a deadlock when entering diFree(). To avoid this, first check the nlink value of dinode before setting iag inode. [1] WARNING: possible recursive locking detected 6.12.0-rc7-syzkaller-00212-g4a5df3796467 #0 Not tainted -------------------------------------------- syz-executor301/5309 is trying to acquire lock: ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diFree+0x37c/0x2fb0 fs/jfs/jfs_imap.c:889 but task is already holding lock: ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diAlloc+0x1b6/0x1630 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(imap->im_aglock[index])); lock(&(imap->im_aglock[index])); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by syz-executor301/5309: #0: ffff8880422a4420 (sb_writers#9){.+.+}-{0:0}, at: mnt_want_write+0x3f/0x90 fs/namespace.c:515 #1: ffff88804755b390 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: inode_lock_nested include/linux/fs.h:850 [inline] #1: ffff88804755b390 (&type->i_mutex_dir_key#6/1){+.+.}-{3:3}, at: filename_create+0x260/0x540 fs/namei.c:4026 #2: ffff888044548920 (&(imap->im_aglock[index])){+.+.}-{3:3}, at: diAlloc+0x1b6/0x1630 #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diNewIAG fs/jfs/jfs_imap.c:2460 [inline] #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diAllocExt fs/jfs/jfs_imap.c:1905 [inline] #3: ffff888044548890 (&imap->im_freelock){+.+.}-{3:3}, at: diAllocAG+0x4b7/0x1e50 fs/jfs/jfs_imap.c:1669 #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diNewIAG fs/jfs/jfs_imap.c:2477 [inline] #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diAllocExt fs/jfs/jfs_imap.c:1905 [inline] #4: ffff88804755a618 (&jfs_ip->rdwrlock/1){++++}-{3:3}, at: diAllocAG+0x869/0x1e50 fs/jfs/jfs_imap.c:1669 stack backtrace: CPU: 0 UID: 0 PID: 5309 Comm: syz-executor301 Not tainted 6.12.0-rc7-syzkaller-00212-g4a5df3796467 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_deadlock_bug+0x483/0x620 kernel/locking/lockdep.c:3037 check_deadlock kernel/locking/lockdep.c:3089 [inline] validate_chain+0x15e2/0x5920 kernel/locking/lockdep.c:3891 __lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752 diFree+0x37c/0x2fb0 fs/jfs/jfs_imap.c:889 jfs_evict_inode+0x32d/0x440 fs/jfs/inode.c:156 evict+0x4e8/0x9b0 fs/inode.c:725 diFreeSpecial fs/jfs/jfs_imap.c:552 [inline] duplicateIXtree+0x3c6/0x550 fs/jfs/jfs_imap.c:3022 diNewIAG fs/jfs/jfs_imap.c:2597 [inline] diAllocExt fs/jfs/jfs_imap.c:1905 [inline] diAllocAG+0x17dc/0x1e50 fs/jfs/jfs_imap.c:1669 diAlloc+0x1d2/0x1630 fs/jfs/jfs_imap.c:1590 ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56 jfs_mkdir+0x1c5/0xba0 fs/jfs/namei.c:225 vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257 do_mkdirat+0x264/0x3a0 fs/namei.c:4280 __do_sys_mkdirat fs/namei.c:4295 [inline] __se_sys_mkdirat fs/namei.c:4293 [inline] __x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported-by: syzbot+355da3b3a74881008e8f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=355da3b3a74881008e8f Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-20fs/jfs: Prevent integer overflow in AG size calculationRand Deeb1-1/+1
The JFS filesystem calculates allocation group (AG) size using 1 << l2agsize in dbExtendFS(). When l2agsize exceeds 31 (possible with >2TB aggregates on 32-bit systems), this 32-bit shift operation causes undefined behavior and improper AG sizing. On 32-bit architectures: - Left-shifting 1 by 32+ bits results in 0 due to integer overflow - This creates invalid AG sizes (0 or garbage values) in sbi->bmap->db_agsize - Subsequent block allocations would reference invalid AG structures - Could lead to: - Filesystem corruption during extend operations - Kernel crashes due to invalid memory accesses - Security vulnerabilities via malformed on-disk structures Fix by casting to s64 before shifting: bmp->db_agsize = (s64)1 << l2agsize; This ensures 64-bit arithmetic even on 32-bit architectures. The cast matches the data type of db_agsize (s64) and follows similar patterns in JFS block calculation code. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rand Deeb <rand.sec96@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-20fs/jfs: cast inactags to s64 to prevent potential overflowRand Deeb1-2/+2
The expression "inactags << bmp->db_agl2size" in the function dbFinalizeBmap() is computed using int operands. Although the values (inactags and db_agl2size) are derived from filesystem parameters and are usually small, there is a theoretical risk that the shift could overflow a 32-bit int if extreme values occur. According to the C standard, shifting a signed 32-bit int can lead to undefined behavior if the result exceeds its range. In our case, an overflow could miscalculate free blocks, potentially leading to erroneous filesystem accounting. To ensure the arithmetic is performed in 64-bit space, we cast "inactags" to s64 before shifting. This defensive fix prevents any risk of overflow and complies with kernel coding best practices. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rand Deeb <rand.sec96@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-19jfs: Fix uninit-value access of imap allocated in the diMount() functionZhongqiu Han1-1/+1
syzbot reports that hex_dump_to_buffer is using uninit-value: ===================================================== BUG: KMSAN: uninit-value in hex_dump_to_buffer+0x888/0x1100 lib/hexdump.c:171 hex_dump_to_buffer+0x888/0x1100 lib/hexdump.c:171 print_hex_dump+0x13d/0x3e0 lib/hexdump.c:276 diFree+0x5ba/0x4350 fs/jfs/jfs_imap.c:876 jfs_evict_inode+0x510/0x550 fs/jfs/inode.c:156 evict+0x723/0xd10 fs/inode.c:796 iput_final fs/inode.c:1946 [inline] iput+0x97b/0xdb0 fs/inode.c:1972 txUpdateMap+0xf3e/0x1150 fs/jfs/jfs_txnmgr.c:2367 txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline] jfs_lazycommit+0x627/0x11d0 fs/jfs/jfs_txnmgr.c:2733 kthread+0x6b9/0xef0 kernel/kthread.c:464 ret_from_fork+0x6d/0x90 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Uninit was created at: slab_post_alloc_hook mm/slub.c:4121 [inline] slab_alloc_node mm/slub.c:4164 [inline] __kmalloc_cache_noprof+0x8e3/0xdf0 mm/slub.c:4320 kmalloc_noprof include/linux/slab.h:901 [inline] diMount+0x61/0x7f0 fs/jfs/jfs_imap.c:105 jfs_mount+0xa8e/0x11d0 fs/jfs/jfs_mount.c:176 jfs_fill_super+0xa47/0x17c0 fs/jfs/super.c:523 get_tree_bdev_flags+0x6ec/0x910 fs/super.c:1636 get_tree_bdev+0x37/0x50 fs/super.c:1659 jfs_get_tree+0x34/0x40 fs/jfs/super.c:635 vfs_get_tree+0xb1/0x5a0 fs/super.c:1814 do_new_mount+0x71f/0x15e0 fs/namespace.c:3560 path_mount+0x742/0x1f10 fs/namespace.c:3887 do_mount fs/namespace.c:3900 [inline] __do_sys_mount fs/namespace.c:4111 [inline] __se_sys_mount+0x71f/0x800 fs/namespace.c:4088 __x64_sys_mount+0xe4/0x150 fs/namespace.c:4088 x64_sys_call+0x39bf/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:166 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ===================================================== The reason is that imap is not properly initialized after memory allocation. It will cause the snprintf() function to write uninitialized data into linebuf within hex_dump_to_buffer(). Fix this by using kzalloc instead of kmalloc to clear its content at the beginning in diMount(). Signed-off-by: Zhongqiu Han <quic_zhonhan@quicinc.com> Reported-by: syzbot+df6cdcb35904203d2b6d@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/67b5d07e.050a0220.14d86d.00e6.GAE@google.com/ Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-19jfs: fix slab-out-of-bounds read in ea_get()Qasim Ijaz1-4/+9
During the "size_check" label in ea_get(), the code checks if the extended attribute list (xattr) size matches ea_size. If not, it logs "ea_get: invalid extended attribute" and calls print_hex_dump(). Here, EALIST_SIZE(ea_buf->xattr) returns 4110417968, which exceeds INT_MAX (2,147,483,647). Then ea_size is clamped: int size = clamp_t(int, ea_size, 0, EALIST_SIZE(ea_buf->xattr)); Although clamp_t aims to bound ea_size between 0 and 4110417968, the upper limit is treated as an int, causing an overflow above 2^31 - 1. This leads "size" to wrap around and become negative (-184549328). The "size" is then passed to print_hex_dump() (called "len" in print_hex_dump()), it is passed as type size_t (an unsigned type), this is then stored inside a variable called "int remaining", which is then assigned to "int linelen" which is then passed to hex_dump_to_buffer(). In print_hex_dump() the for loop, iterates through 0 to len-1, where len is 18446744073525002176, calling hex_dump_to_buffer() on each iteration: for (i = 0; i < len; i += rowsize) { linelen = min(remaining, rowsize); remaining -= rowsize; hex_dump_to_buffer(ptr + i, linelen, rowsize, groupsize, linebuf, sizeof(linebuf), ascii); ... } The expected stopping condition (i < len) is effectively broken since len is corrupted and very large. This eventually leads to the "ptr+i" being passed to hex_dump_to_buffer() to get closer to the end of the actual bounds of "ptr", eventually an out of bounds access is done in hex_dump_to_buffer() in the following for loop: for (j = 0; j < len; j++) { if (linebuflen < lx + 2) goto overflow2; ch = ptr[j]; ... } To fix this we should validate "EALIST_SIZE(ea_buf->xattr)" before it is utilised. Reported-by: syzbot <syzbot+4e6e7e4279d046613bc5@syzkaller.appspotmail.com> Tested-by: syzbot <syzbot+4e6e7e4279d046613bc5@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=4e6e7e4279d046613bc5 Fixes: d9f9d96136cb ("jfs: xattr: check invalid xattr size more strictly") Cc: stable@vger.kernel.org Signed-off-by: Qasim Ijaz <qasdev00@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-19jfs: add check read-only before truncation in jfs_truncate_nolock()Vasiliy Kovalev1-1/+1
Added a check for "read-only" mode in the `jfs_truncate_nolock` function to avoid errors related to writing to a read-only filesystem. Call stack: block_write_begin() { jfs_write_failed() { jfs_truncate() { jfs_truncate_nolock() { txEnd() { ... log = JFS_SBI(tblk->sb)->log; // (log == NULL) If the `isReadOnly(ip)` condition is triggered in `jfs_truncate_nolock`, the function execution will stop, and no further data modification will occur. Instead, the `xtTruncate` function will be called with the "COMMIT_WMAP" flag, preventing modifications in "read-only" mode. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+4e89b5368baba8324e07@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=4e89b5368baba8324e07 Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-19jfs: add check read-only before txBeginAnon() callVasiliy Kovalev1-0/+10
Added a read-only check before calling `txBeginAnon` in `extAlloc` and `extRecord`. This prevents modification attempts on a read-only mounted filesystem, avoiding potential errors or crashes. Call trace: txBeginAnon+0xac/0x154 extAlloc+0xe8/0xdec fs/jfs/jfs_extent.c:78 jfs_get_block+0x340/0xb98 fs/jfs/inode.c:248 __block_write_begin_int+0x580/0x166c fs/buffer.c:2128 __block_write_begin fs/buffer.c:2177 [inline] block_write_begin+0x98/0x11c fs/buffer.c:2236 jfs_write_begin+0x44/0x88 fs/jfs/inode.c:299 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+4e89b5368baba8324e07@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=4e89b5368baba8324e07 Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-19jfs: reject on-disk inodes of an unsupported typeDmitry Antipov1-2/+11
Syzbot has reported the following BUG: kernel BUG at fs/inode.c:668! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 3 UID: 0 PID: 139 Comm: jfsCommit Not tainted 6.12.0-rc4-syzkaller-00085-g4e46774408d9 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 RIP: 0010:clear_inode+0x168/0x190 Code: 4c 89 f7 e8 ba fe e5 ff e9 61 ff ff ff 44 89 f1 80 e1 07 80 c1 03 38 c1 7c c1 4c 89 f7 e8 90 ff e5 ff eb b7 0b e8 01 5d 7f ff 90 0f 0b e8 f9 5c 7f ff 90 0f 0b e8 f1 5c 7f RSP: 0018:ffffc900027dfae8 EFLAGS: 00010093 RAX: ffffffff82157a87 RBX: 0000000000000001 RCX: ffff888104d4b980 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffffc900027dfc90 R08: ffffffff82157977 R09: fffff520004fbf38 R10: dffffc0000000000 R11: fffff520004fbf38 R12: dffffc0000000000 R13: ffff88811315bc00 R14: ffff88811315bda8 R15: ffff88811315bb80 FS: 0000000000000000(0000) GS:ffff888135f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005565222e0578 CR3: 0000000026ef0000 CR4: 00000000000006f0 Call Trace: <TASK> ? __die_body+0x5f/0xb0 ? die+0x9e/0xc0 ? do_trap+0x15a/0x3a0 ? clear_inode+0x168/0x190 ? do_error_trap+0x1dc/0x2c0 ? clear_inode+0x168/0x190 ? __pfx_do_error_trap+0x10/0x10 ? report_bug+0x3cd/0x500 ? handle_invalid_op+0x34/0x40 ? clear_inode+0x168/0x190 ? exc_invalid_op+0x38/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? clear_inode+0x57/0x190 ? clear_inode+0x167/0x190 ? clear_inode+0x168/0x190 ? clear_inode+0x167/0x190 jfs_evict_inode+0xb5/0x440 ? __pfx_jfs_evict_inode+0x10/0x10 evict+0x4ea/0x9b0 ? __pfx_evict+0x10/0x10 ? iput+0x713/0xa50 txUpdateMap+0x931/0xb10 ? __pfx_txUpdateMap+0x10/0x10 jfs_lazycommit+0x49a/0xb80 ? _raw_spin_unlock_irqrestore+0x8f/0x140 ? lockdep_hardirqs_on+0x99/0x150 ? __pfx_jfs_lazycommit+0x10/0x10 ? __pfx_default_wake_function+0x10/0x10 ? __kthread_parkme+0x169/0x1d0 ? __pfx_jfs_lazycommit+0x10/0x10 kthread+0x2f2/0x390 ? __pfx_jfs_lazycommit+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x4d/0x80 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> This happens when 'clear_inode()' makes an attempt to finalize an underlying JFS inode of unknown type. According to JFS layout description from https://jfs.sourceforge.net/project/pub/jfslayout.pdf, inode types from 5 to 15 are reserved for future extensions and should not be encountered on a valid filesystem. So add an extra check for valid inode type in 'copy_from_dinode()'. Reported-by: syzbot+ac2116e48989e84a2893@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ac2116e48989e84a2893 Fixes: 79ac5a46c5c1 ("jfs_lookup(): don't bother with . or ..") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-13jfs: Remove reference to bh->b_pageMatthew Wilcox (Oracle)1-1/+1
Buffer heads are attached to folios, not to pages. Also flush_dcache_page() is now deprecated in favour of flush_dcache_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-02-13jfs: Delete a couple tabs in jfs_reconfigure()Dan Carpenter1-2/+2
This is just a small white space cleanup. The conversion to the new mount api accidentally added an extra indent on these lines. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2025-01-27Pass parent directory inode and expected name to ->d_revalidate()Al Viro1-1/+2
->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-21Merge tag 'jfs-6.13' of github.com:kleikamp/linux-shaggyLinus Torvalds3-1/+22
Pull jfs updates from Dave Kleikamp: "A few more patches to add sanity checks in jfs" * tag 'jfs-6.13' of github.com:kleikamp/linux-shaggy: jfs: add a check to prevent array-index-out-of-bounds in dbAdjTree jfs: xattr: check invalid xattr size more strictly jfs: fix array-index-out-of-bounds in jfs_readdir jfs: fix shift-out-of-bounds in dbSplit jfs: array-index-out-of-bounds fix in dtReadFirst
2024-11-18Merge tag 'vfs-6.13.mount.api' of ↵Linus Torvalds2-218/+242
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount api conversions from Christian Brauner: "Convert adfs, affs, befs, hfs, hfsplus, jfs, and hpfs to the new mount api" * tag 'vfs-6.13.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: efs: fix the efs new mount api implementation ubifs: Convert ubifs to use the new mount API hpfs: convert hpfs to use the new mount api jfs: convert jfs to use the new mount api hfsplus: convert hfsplus to use the new mount api hfs: convert hfs to use the new mount api befs: convert befs to use the new mount api affs: convert affs to use the new mount api adfs: convert adfs to use the new mount api
2024-10-29jfs: add a check to prevent array-index-out-of-bounds in dbAdjTreeNihar Chaithanya1-0/+3
When the value of lp is 0 at the beginning of the for loop, it will become negative in the next assignment and we should bail out. Reported-by: syzbot+412dea214d8baa3f7483@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=412dea214d8baa3f7483 Tested-by: syzbot+412dea214d8baa3f7483@syzkaller.appspotmail.com Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-10-29jfs: xattr: check invalid xattr size more strictlyArtem Sadovnikov1-1/+1
Commit 7c55b78818cf ("jfs: xattr: fix buffer overflow for invalid xattr") also addresses this issue but it only fixes it for positive values, while ea_size is an integer type and can take negative values, e.g. in case of a corrupted filesystem. This still breaks validation and would overflow because of implicit conversion from int to size_t in print_hex_dump(). Fix this issue by clamping the ea_size value instead. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Cc: stable@vger.kernel.org Signed-off-by: Artem Sadovnikov <ancowi69@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-10-29jfs: fix array-index-out-of-bounds in jfs_readdirGhanshyam Agrawal1-0/+8
The stbl might contain some invalid values. Added a check to return error code in that case. Reported-by: syzbot+0315f8fe99120601ba88@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0315f8fe99120601ba88 Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-10-29jfs: fix shift-out-of-bounds in dbSplitGhanshyam Agrawal1-0/+3
When dmt_budmin is less than zero, it causes errors in the later stages. Added a check to return an error beforehand in dbAllocCtl itself. Reported-by: syzbot+b5ca8a249162c4b9a7d0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b5ca8a249162c4b9a7d0 Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-10-29jfs: array-index-out-of-bounds fix in dtReadFirstGhanshyam Agrawal1-0/+7
The value of stbl can be sometimes out of bounds due to a bad filesystem. Added a check with appopriate return of error code in that case. Reported-by: syzbot+65fa06e29859e41a83f3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=65fa06e29859e41a83f3 Signed-off-by: Ghanshyam Agrawal <ghanshyam1898@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-10-22jfs: Fix sanity check in dbMountDave Kleikamp1-1/+1
MAXAG is a legitimate value for bmp->db_numag Fixes: e63866a47556 ("jfs: fix out-of-bounds in dbNextAG() and diAlloc()") Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-10-08jfs: convert jfs to use the new mount apiEric Sandeen2-218/+242
Convert the jfs filesystem to use the new mount API. Tested by comparing random mount & remount options before and after the change. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/20240926171947.682881-1-sandeen@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-19Merge tag 'jfs-6.12' of github.com:kleikamp/linux-shaggyLinus Torvalds4-7/+19
Pull jfs updates from David Kleikamp: "A few fixes for jfs" * tag 'jfs-6.12' of github.com:kleikamp/linux-shaggy: jfs: Fix uninit-value access of new_ea in ea_buffer jfs: check if leafidx greater than num leaves per dmap tree jfs: Fix uaf in dbFreeBits jfs: fix out-of-bounds in dbNextAG() and diAlloc() jfs: UBSAN: shift-out-of-bounds in dbFindBits
2024-09-04jfs: Fix uninit-value access of new_ea in ea_bufferZhao Mengmeng1-0/+2
syzbot reports that lzo1x_1_do_compress is using uninit-value: ===================================================== BUG: KMSAN: uninit-value in lzo1x_1_do_compress+0x19f9/0x2510 lib/lzo/lzo1x_compress.c:178 ... Uninit was stored to memory at: ea_put fs/jfs/xattr.c:639 [inline] ... Local variable ea_buf created at: __jfs_setxattr+0x5d/0x1ae0 fs/jfs/xattr.c:662 __jfs_xattr_set+0xe6/0x1f0 fs/jfs/xattr.c:934 ===================================================== The reason is ea_buf->new_ea is not initialized properly. Fix this by using memset to empty its content at the beginning in ea_get(). Reported-by: syzbot+02341e0daa42a15ce130@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=02341e0daa42a15ce130 Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-08-27jfs: check if leafidx greater than num leaves per dmap treeEdward Adam Davis1-1/+4
syzbot report a out of bounds in dbSplit, it because dmt_leafidx greater than num leaves per dmap tree, add a checking for dmt_leafidx in dbFindLeaf. Shaggy: Modified sanity check to apply to control pages as well as leaf pages. Reported-and-tested-by: syzbot+dca05492eff41f604890@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=dca05492eff41f604890 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-08-27jfs: Fix uaf in dbFreeBitsEdward Adam Davis1-2/+9
[syzbot reported] ================================================================== BUG: KASAN: slab-use-after-free in __mutex_lock_common kernel/locking/mutex.c:587 [inline] BUG: KASAN: slab-use-after-free in __mutex_lock+0xfe/0xd70 kernel/locking/mutex.c:752 Read of size 8 at addr ffff8880229254b0 by task syz-executor357/5216 CPU: 0 UID: 0 PID: 5216 Comm: syz-executor357 Not tainted 6.11.0-rc3-syzkaller-00156-gd7a5aa4b3c00 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 __mutex_lock_common kernel/locking/mutex.c:587 [inline] __mutex_lock+0xfe/0xd70 kernel/locking/mutex.c:752 dbFreeBits+0x7ea/0xd90 fs/jfs/jfs_dmap.c:2390 dbFreeDmap fs/jfs/jfs_dmap.c:2089 [inline] dbFree+0x35b/0x680 fs/jfs/jfs_dmap.c:409 dbDiscardAG+0x8a9/0xa20 fs/jfs/jfs_dmap.c:1650 jfs_ioc_trim+0x433/0x670 fs/jfs/jfs_discard.c:100 jfs_ioctl+0x2d0/0x3e0 fs/jfs/ioctl.c:131 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 Freed by task 5218: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2252 [inline] slab_free mm/slub.c:4473 [inline] kfree+0x149/0x360 mm/slub.c:4594 dbUnmount+0x11d/0x190 fs/jfs/jfs_dmap.c:278 jfs_mount_rw+0x4ac/0x6a0 fs/jfs/jfs_mount.c:247 jfs_remount+0x3d1/0x6b0 fs/jfs/super.c:454 reconfigure_super+0x445/0x880 fs/super.c:1083 vfs_cmd_reconfigure fs/fsopen.c:263 [inline] vfs_fsconfig_locked fs/fsopen.c:292 [inline] __do_sys_fsconfig fs/fsopen.c:473 [inline] __se_sys_fsconfig+0xb6e/0xf80 fs/fsopen.c:345 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [Analysis] There are two paths (dbUnmount and jfs_ioc_trim) that generate race condition when accessing bmap, which leads to the occurrence of uaf. Use the lock s_umount to synchronize them, in order to avoid uaf caused by race condition. Reported-and-tested-by: syzbot+3c010e21296f33a5dc16@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-08-23jfs: fix out-of-bounds in dbNextAG() and diAlloc()Jeongjun Park2-3/+3
In dbNextAG() , there is no check for the case where bmp->db_numag is greater or same than MAXAG due to a polluted image, which causes an out-of-bounds. Therefore, a bounds check should be added in dbMount(). And in dbNextAG(), a check for the case where agpref is greater than bmp->db_numag should be added, so an out-of-bounds exception should be prevented. Additionally, a check for the case where agno is greater or same than MAXAG should be added in diAlloc() to prevent out-of-bounds. Reported-by: Jeongjun Park <aha310510@gmail.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-08-23jfs: UBSAN: shift-out-of-bounds in dbFindBitsRemington Brasga1-1/+1
Fix issue with UBSAN throwing shift-out-of-bounds warning. Reported-by: syzbot+e38d703eeb410b17b473@syzkaller.appspotmail.com Signed-off-by: Remington Brasga <rbrasga@uci.edu> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-08-07fs: Convert aops->write_begin to take a folioMatthew Wilcox (Oracle)1-2/+2
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07fs: Convert aops->write_end to take a folioMatthew Wilcox (Oracle)1-2/+2
Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-23Merge tag 'jfs-6.11' of github.com:kleikamp/linux-shaggyLinus Torvalds7-173/+193
Pull jfs updates from David Kleikamp: "Folio conversion from Matthew Wilcox and a few various fixes" * tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy: jfs: don't walk off the end of ealist jfs: Fix shift-out-of-bounds in dbDiscardAG jfs: Fix array-index-out-of-bounds in diFree jfs: fix null ptr deref in dtInsertEntry jfs: Remove use of folio error flag fs: Remove i_blocks_per_page jfs: Change metapage->page to metapage->folio jfs: Convert force_metapage to use a folio jfs: Convert inc_io to take a folio jfs: Convert page_to_mp to folio_to_mp jfs; Convert __invalidate_metapages to use a folio jfs: Convert dec_io to take a folio jfs: Convert drop_metapage and remove_metapage to take a folio jfs; Convert release_metapage to use a folio jfs: Convert insert_metapage() to take a folio jfs: Convert __get_metapage to use a folio jfs: Convert metapage_writepage to metapage_write_folio jfs: Convert metapage_read_folio to use folio APIs
2024-06-27jfs: don't walk off the end of ealistlei lu1-4/+19
Add a check before visiting the members of ea to make sure each ea stays within the ealist. Signed-off-by: lei lu <llfamsec@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-06-26jfs: Fix shift-out-of-bounds in dbDiscardAGPei Li1-0/+2
When searching for the next smaller log2 block, BLKSTOL2() returned 0, causing shift exponent -1 to be negative. This patch fixes the issue by exiting the loop directly when negative shift is found. Reported-by: syzbot+61be3359d2ee3467e7e4@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=61be3359d2ee3467e7e4 Signed-off-by: Pei Li <peili.dev@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-06-26jfs: Fix array-index-out-of-bounds in diFreeJeongjun Park1-1/+4
Reported-by: syzbot+241c815bda521982cb49@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-06-26jfs: fix null ptr deref in dtInsertEntryEdward Adam Davis1-0/+2
[syzbot reported] general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 5061 Comm: syz-executor404 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 RIP: 0010:dtInsertEntry+0xd0c/0x1780 fs/jfs/jfs_dtree.c:3713 ... [Analyze] In dtInsertEntry(), when the pointer h has the same value as p, after writing name in UniStrncpy_to_le(), p->header.flag will be cleared. This will cause the previously true judgment "p->header.flag & BT-LEAF" to change to no after writing the name operation, this leads to entering an incorrect branch and accessing the uninitialized object ih when judging this condition for the second time. [Fix] After got the page, check freelist first, if freelist == 0 then exit dtInsert() and return -EINVAL. Reported-by: syzbot+bba84aef3a26fb93deb9@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-06-04jfs: xattr: fix buffer overflow for invalid xattrGreg Kroah-Hartman1-1/+3
When an xattr size is not what is expected, it is printed out to the kernel log in hex format as a form of debugging. But when that xattr size is bigger than the expected size, printing it out can cause an access off the end of the buffer. Fix this all up by properly restricting the size of the debug hex dump in the kernel log. Reported-by: syzbot+9dfe490c8176301c1d06@syzkaller.appspotmail.com Cc: Dave Kleikamp <shaggy@kernel.org> Link: https://lore.kernel.org/r/2024051433-slider-cloning-98f9@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-27jfs: Remove use of folio error flagMatthew Wilcox (Oracle)1-22/+25
Store the blk_status per folio (if we can have multiple metapages per folio) instead of setting the folio error flag. This will allow us to reclaim a precious folio flag shortly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-27jfs: Change metapage->page to metapage->folioMatthew Wilcox (Oracle)3-22/+22
Convert all the users to operate on a folio. Saves sixteen calls to compound_head(). We still use sizeof(struct page) in print_hex_dump, otherwise it will go into the second and third pages of the folio which won't exist for jfs folios (since they are not large). This needs a better solution, but finding it can be postponed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-27jfs: Convert force_metapage to use a folioMatthew Wilcox (Oracle)1-9/+8
Convert the mp->page to a folio and operate on it. That lets us convert metapage_write_one() to take a folio. Replaces five calls to compound_head() with one. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-27jfs: Convert inc_io to take a folioMatthew Wilcox (Oracle)1-7/+8
All their callers now have a folio, so pass it in. Remove mp_anchor() as inc_io() was the last user. No savings here, just cleaning up some remnants. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-27jfs: Convert page_to_mp to folio_to_mpMatthew Wilcox (Oracle)1-10/+12
Access folio->private directly instead of testing the page private flag. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-27jfs; Convert __invalidate_metapages to use a folioMatthew Wilcox (Oracle)1-6/+6
Retrieve a folio from the page cache instead of a page. Saves a couple of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-27jfs: Convert dec_io to take a folioMatthew Wilcox (Oracle)1-15/+17
This means also converting the two handlers to take a folio. Saves four calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-24jfs: Convert drop_metapage and remove_metapage to take a folioMatthew Wilcox (Oracle)1-12/+12
All callers now have a folio, so pass it in instead of the page. Removes a couple of calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-24jfs; Convert release_metapage to use a folioMatthew Wilcox (Oracle)1-12/+10
Convert mp->page to a folio and remove 7 hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-24jfs: Convert insert_metapage() to take a folioMatthew Wilcox (Oracle)1-18/+13
Both of its callers now have a folio, so convert this function. Use folio_attach_private() instead of manually setting folio->private. This also gets the expected refcount of the folio correct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-24jfs: Convert __get_metapage to use a folioMatthew Wilcox (Oracle)1-14/+14
Remove four hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-24jfs: Convert metapage_writepage to metapage_write_folioMatthew Wilcox (Oracle)1-34/+41
Implement writepages rather than writepage by using write_cache_pages() to call metapage_write_folio(). Use bio_add_folio_nofail() as we know we just allocated the bio. Replace the call to SetPageError (which is never checked) with a call to mapping_set_error (which ... might be checked somewhere?) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-05-24jfs: Convert metapage_read_folio to use folio APIsMatthew Wilcox (Oracle)1-22/+13
Use bio_add_folio_nofail() as we just allocated the bio and know it cannot fail. Other than that, this is a 1:1 conversion from page APIs to folio APIs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-03-27fs,block: yield devices earlyChristian Brauner1-2/+2
Currently a device is only really released once the umount returns to userspace due to how file closing works. That ultimately could cause an old umount assumption to be violated that concurrent umount and mount don't fail. So an exclusively held device with a temporary holder should be yielded before the filesystem is gone. Add a helper that allows callers to do that. This also allows us to remove the two holder ops that Linus wasn't excited about. Link: https://lore.kernel.org/r/20240326-vfs-bdev-end_holder-v1-1-20af85202918@kernel.org Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-13Merge tag 'fs_for_v6.9-rc1' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, isofs, udf, and quota updates from Jan Kara: "A lot of material this time: - removal of a lot of GFP_NOFS usage from ext2, udf, quota (either it was legacy or replaced with scoped memalloc_nofs_*() API) - removal of BUG_ONs in quota code - conversion of UDF to the new mount API - tightening quota on disk format verification - fix some potentially unsafe use of RCU pointers in quota code and annotate everything properly to make sparse happy - a few other small quota, ext2, udf, and isofs fixes" * tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (26 commits) udf: remove SLAB_MEM_SPREAD flag usage quota: remove SLAB_MEM_SPREAD flag usage isofs: remove SLAB_MEM_SPREAD flag usage ext2: remove SLAB_MEM_SPREAD flag usage ext2: mark as deprecated udf: convert to new mount API udf: convert novrs to an option flag MAINTAINERS: add missing git address for ext2 entry quota: Detect loops in quota tree quota: Properly annotate i_dquot arrays with __rcu quota: Fix rcu annotations of inode dquot pointers isofs: handle CDs with bad root inode but good Joliet root directory udf: Avoid invalid LVID used on mount quota: Fix potential NULL pointer dereference quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_sem quota: Set nofs allocation context when acquiring dqio_sem ext2: Remove GFP_NOFS use in ext2_xattr_cache_insert() ext2: Drop GFP_NOFS use in ext2_get_blocks() ext2: Drop GFP_NOFS allocation from ext2_init_block_alloc_info() udf: Remove GFP_NOFS allocation in udf_expand_file_adinicb() ...
2024-03-11Merge tag 'vfs-6.9.super' of ↵Linus Torvalds3-15/+15
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull block handle updates from Christian Brauner: "Last cycle we changed opening of block devices, and opening a block device would return a bdev_handle. This allowed us to implement support for restricting and forbidding writes to mounted block devices. It was accompanied by converting and adding helpers to operate on bdev_handles instead of plain block devices. That was already a good step forward but ultimately it isn't necessary to have special purpose helpers for opening block devices internally that return a bdev_handle. Fundamentally, opening a block device internally should just be equivalent to opening files. So now all internal opens of block devices return files just as a userspace open would. Instead of introducing a separate indirection into bdev_open_by_*() via struct bdev_handle bdev_file_open_by_*() is made to just return a struct file. Opening and closing a block device just becomes equivalent to opening and closing a file. This all works well because internally we already have a pseudo fs for block devices and so opening block devices is simple. There's a few places where we needed to be careful such as during boot when the kernel is supposed to mount the rootfs directly without init doing it. Here we need to take care to ensure that we flush out any asynchronous file close. That's what we already do for opening, unpacking, and closing the initramfs. So nothing new here. The equivalence of opening and closing block devices to regular files is a win in and of itself. But it also has various other advantages. We can remove struct bdev_handle completely. Various low-level helpers are now private to the block layer. Other helpers were simply removable completely. A follow-up series that is already reviewed build on this and makes it possible to remove bdev->bd_inode and allows various clean ups of the buffer head code as well. All places where we stashed a bdev_handle now just stash a file and use simple accessors to get to the actual block device which was already the case for bdev_handle" * tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) block: remove bdev_handle completely block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access bdev: remove bdev pointer from struct bdev_handle bdev: make struct bdev_handle private to the block layer bdev: make bdev_{release, open_by_dev}() private to block layer bdev: remove bdev_open_by_path() reiserfs: port block device access to file ocfs2: port block device access to file nfs: port block device access to files jfs: port block device access to file f2fs: port block device access to files ext4: port block device access to file erofs: port device access to file btrfs: port device access to file bcachefs: port block device access to file target: port block device access to file s390: port block device access to file nvme: port block device access to file block2mtd: port device access to files bcache: port block device access to files ...
2024-03-11Merge tag 'vfs-6.9.misc' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ...
2024-02-27jfs: remove SLAB_MEM_SPREAD flag usageChengming Zhou1-1/+1
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1 (see [1]), so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete explicitly to avoid confusion for users. Here we can just remove all its users, which has no any functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz [1] Link: https://lore.kernel.org/r/20240224134925.829677-1-chengming.zhou@linux.dev Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25jfs: port block device access to fileChristian Brauner3-15/+15
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-23-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25bdev: open block device as filesChristian Brauner1-1/+1
Add two new helpers to allow opening block devices as files. This is not the final infrastructure. This still opens the block device before opening a struct a file. Until we have removed all references to struct bdev_handle we can't switch the order: * Introduce blk_to_file_flags() to translate from block specific to flags usable to pen a new file. * Introduce bdev_file_open_by_{dev,path}(). * Introduce temporary sb_bdev_handle() helper to retrieve a struct bdev_handle from a block device file and update places that directly reference struct bdev_handle to rely on it. * Don't count block device openes against the number of open files. A bdev_file_open_by_{dev,path}() file is never installed into any file descriptor table. One idea that came to mind was to use kernel_tmpfile_open() which would require us to pass a path and it would then call do_dentry_open() going through the regular fops->open::blkdev_open() path. But then we're back to the problem of routing block specific flags such as BLK_OPEN_RESTRICT_WRITES through the open path and would have to waste FMODE_* flags every time we add a new one. With this we can avoid using a flag bit and we have more leeway in how we open block devices from bdev_open_by_{dev,path}(). Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-1-adbd023e19cc@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-08quota: Properly annotate i_dquot arrays with __rcuJan Kara2-2/+2
Dquots pointed to from i_dquot arrays in inodes are protected by dquot_srcu. Annotate them as such and change .get_dquots callback to return properly annotated pointer to make sparse happy. Fixes: b9ba6f94b238 ("quota: remove dqptr_sem") Signed-off-by: Jan Kara <jack@suse.cz>
2024-01-29Revert "jfs: fix shift-out-of-bounds in dbJoin"Dave Kleikamp1-7/+1
This reverts commit cca974daeb6c43ea971f8ceff5a7080d7d49ee30. The added sanity check is incorrect. BUDMIN is not the wrong value and is too small. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-01-02jfs: Add missing set_freezable() for freezable kthreadKevin Hao1-0/+2
The kernel thread function jfs_lazycommit() and jfs_sync() invoke the try_to_freeze() in its loop. But all the kernel threads are no-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2024-01-02jfs: fix array-index-out-of-bounds in diNewExtEdward Adam Davis1-0/+3
[Syz report] UBSAN: array-index-out-of-bounds in fs/jfs/jfs_imap.c:2360:2 index -878706688 is out of range for type 'struct iagctl[128]' CPU: 1 PID: 5065 Comm: syz-executor282 Not tainted 6.7.0-rc4-syzkaller-00009-gbee0e7762ad2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 diNewExt+0x3cf3/0x4000 fs/jfs/jfs_imap.c:2360 diAllocExt fs/jfs/jfs_imap.c:1949 [inline] diAllocAG+0xbe8/0x1e50 fs/jfs/jfs_imap.c:1666 diAlloc+0x1d3/0x1760 fs/jfs/jfs_imap.c:1587 ialloc+0x8f/0x900 fs/jfs/jfs_inode.c:56 jfs_mkdir+0x1c5/0xb90 fs/jfs/namei.c:225 vfs_mkdir+0x2f1/0x4b0 fs/namei.c:4106 do_mkdirat+0x264/0x3a0 fs/namei.c:4129 __do_sys_mkdir fs/namei.c:4149 [inline] __se_sys_mkdir fs/namei.c:4147 [inline] __x64_sys_mkdir+0x6e/0x80 fs/namei.c:4147 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7fcb7e6a0b57 Code: ff ff 77 07 31 c0 c3 0f 1f 40 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd83023038 EFLAGS: 00000286 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007fcb7e6a0b57 RDX: 00000000000a1020 RSI: 00000000000001ff RDI: 0000000020000140 RBP: 0000000020000140 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000286 R12: 00007ffd830230d0 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [Analysis] When the agstart is too large, it can cause agno overflow. [Fix] After obtaining agno, if the value is invalid, exit the subsequent process. Reported-and-tested-by: syzbot+553d90297e6d2f50dbc7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Modified the test from agno > MAXAG to agno >= MAXAG based on linux-next report by kernel test robot (Dan Carpenter). Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21jfs: fix shift-out-of-bounds in dbJoinManas Ghandat1-1/+7
Currently while joining the leaf in a buddy system there is shift out of bound error in calculation of BUDSIZE. Added the required check to the BUDSIZE and fixed the documentation as well. Reported-by: syzbot+411debe54d318eaed386@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=411debe54d318eaed386 Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21jfs: fix uaf in jfs_evict_inodeEdward Adam Davis1-3/+3
When the execution of diMount(ipimap) fails, the object ipimap that has been released may be accessed in diFreeSpecial(). Asynchronous ipimap release occurs when rcu_core() calls jfs_free_node(). Therefore, when diMount(ipimap) fails, sbi->ipimap should not be initialized as ipimap. Reported-and-tested-by: syzbot+01cf2dbcbe2022454388@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21jfs: fix array-index-out-of-bounds in dbAdjTreeManas Ghandat1-29/+31
Currently there is a bound check missing in the dbAdjTree while accessing the dmt_stree. To add the required check added the bool is_ctl which is required to determine the size as suggest in the following commit. https://lore.kernel.org/linux-kernel-mentees/f9475918-2186-49b8-b801-6f0f9e75f4fa@oracle.com/ Reported-by: syzbot+39ba34a099ac2e9bd3cb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21jfs: fix slab-out-of-bounds Read in dtSearchManas Ghandat1-0/+5
Currently while searching for current page in the sorted entry table of the page there is a out of bound access. Added a bound check to fix the error. Dave: Set return code to -EIO Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202310241724.Ed02yUz9-lkp@intel.com/ Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21UBSAN: array-index-out-of-bounds in dtSplitRootOsama Muhammad1-1/+1
Syzkaller reported the following issue: oop0: detected capacity change from 0 to 32768 UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dtree.c:1971:9 index -2 is out of range for type 'struct dtslot [128]' CPU: 0 PID: 3613 Comm: syz-executor270 Not tainted 6.0.0-syzkaller-09423-g493ffd6605b2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:151 [inline] __ubsan_handle_out_of_bounds+0xdb/0x130 lib/ubsan.c:283 dtSplitRoot+0x8d8/0x1900 fs/jfs/jfs_dtree.c:1971 dtSplitUp fs/jfs/jfs_dtree.c:985 [inline] dtInsert+0x1189/0x6b80 fs/jfs/jfs_dtree.c:863 jfs_mkdir+0x757/0xb00 fs/jfs/namei.c:270 vfs_mkdir+0x3b3/0x590 fs/namei.c:4013 do_mkdirat+0x279/0x550 fs/namei.c:4038 __do_sys_mkdirat fs/namei.c:4053 [inline] __se_sys_mkdirat fs/namei.c:4051 [inline] __x64_sys_mkdirat+0x85/0x90 fs/namei.c:4051 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fcdc0113fd9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffeb8bc67d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000102 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fcdc0113fd9 RDX: 0000000000000000 RSI: 0000000020000340 RDI: 0000000000000003 RBP: 00007fcdc00d37a0 R08: 0000000000000000 R09: 00007fcdc00d37a0 R10: 00005555559a72c0 R11: 0000000000000246 R12: 00000000f8008000 R13: 0000000000000000 R14: 00083878000000f8 R15: 0000000000000000 </TASK> The issue is caused when the value of fsi becomes less than -1. The check to break the loop when fsi value becomes -1 is present but syzbot was able to produce value less than -1 which cause the error. This patch simply add the change for the values less than 0. The patch is tested via syzbot. Reported-and-tested-by: syzbot+d4b1df2e9d4ded6488ec@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=d4b1df2e9d4ded6488ec Signed-off-by: Osama Muhammad <osmtendev@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-21FS:JFS:UBSAN:array-index-out-of-bounds in dbAdjTreeOsama Muhammad1-0/+3
Syzkaller reported the following issue: UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:2867:6 index 196694 is out of range for type 's8[1365]' (aka 'signed char[1365]') CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867 dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834 dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331 dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline] dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402 txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534 txUpdateMap+0x342/0x9e0 txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline] jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732 kthread+0x2d3/0x370 kernel/kthread.c:388 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> ================================================================================ Kernel panic - not syncing: UBSAN: panic_on_warn set ... CPU: 1 PID: 109 Comm: jfsCommit Not tainted 6.6.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 panic+0x30f/0x770 kernel/panic.c:340 check_panic_on_warn+0x82/0xa0 kernel/panic.c:236 ubsan_epilogue lib/ubsan.c:223 [inline] __ubsan_handle_out_of_bounds+0x13c/0x150 lib/ubsan.c:348 dbAdjTree+0x474/0x4f0 fs/jfs/jfs_dmap.c:2867 dbJoin+0x210/0x2d0 fs/jfs/jfs_dmap.c:2834 dbFreeBits+0x4eb/0xda0 fs/jfs/jfs_dmap.c:2331 dbFreeDmap fs/jfs/jfs_dmap.c:2080 [inline] dbFree+0x343/0x650 fs/jfs/jfs_dmap.c:402 txFreeMap+0x798/0xd50 fs/jfs/jfs_txnmgr.c:2534 txUpdateMap+0x342/0x9e0 txLazyCommit fs/jfs/jfs_txnmgr.c:2664 [inline] jfs_lazycommit+0x47a/0xb70 fs/jfs/jfs_txnmgr.c:2732 kthread+0x2d3/0x370 kernel/kthread.c:388 ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 </TASK> Kernel Offset: disabled Rebooting in 86400 seconds.. The issue is caused when the value of lp becomes greater than CTLTREESIZE which is the max size of stree. Adding a simple check solves this issue. Dave: As the function returns a void, good error handling would require a more intrusive code reorganization, so I modified Osama's patch at use WARN_ON_ONCE for lack of a cleaner option. The patch is tested via syzbot. Reported-by: syzbot+39ba34a099ac2e9bd3cb@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=39ba34a099ac2e9bd3cb Signed-off-by: Osama Muhammad <osmtendev@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-11-07Merge tag 'vfs-6.7.fsid' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fanotify fsid updates from Christian Brauner: "This work is part of the plan to enable fanotify to serve as a drop-in replacement for inotify. While inotify is availabe on all filesystems, fanotify currently isn't. In order to support fanotify on all filesystems two things are needed: (1) all filesystems need to support AT_HANDLE_FID (2) all filesystems need to report a non-zero f_fsid This contains (1) and allows filesystems to encode non-decodable file handlers for fanotify without implementing any exportfs operations by encoding a file id of type FILEID_INO64_GEN from i_ino and i_generation. Filesystems that want to opt out of encoding non-decodable file ids for fanotify that don't support NFS export can do so by providing an empty export_operations struct. This also partially addresses (2) by generating f_fsid for simple filesystems as well as freevxfs. Remaining filesystems will be dealt with by separate patches. Finally, this contains the patch from the current exportfs maintainers which moves exportfs under vfs with Chuck, Jeff, and Amir as maintainers and vfs.git as tree" * tag 'vfs-6.7.fsid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: create an entry for exportfs fs: fix build error with CONFIG_EXPORTFS=m or not defined freevxfs: derive f_fsid from bdev->bd_dev fs: report f_fsid from s_dev for "simple" filesystems exportfs: support encoding non-decodeable file handles by default exportfs: define FILEID_INO64_GEN* file handle types exportfs: make ->encode_fh() a mandatory method for NFS export exportfs: add helpers to check if filesystem can encode/decode file handles
2023-11-02Merge tag 'jfs-6.7' of https://github.com/kleikamp/linux-shaggyLinus Torvalds7-29/+54
Pull jfs updates from Dave Kleikamp: "Minor stability improvements" * tag 'jfs-6.7' of https://github.com/kleikamp/linux-shaggy: jfs: define xtree root and page independently jfs: fix array-index-out-of-bounds in diAlloc jfs: fix array-index-out-of-bounds in dbFindLeaf fs/jfs: Add validity check for db_maxag and db_agpref fs/jfs: Add check for negative db_l2nbperpage
2023-10-30Merge tag 'vfs-6.7.ctime' of ↵Linus Torvalds5-23/+25
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode time accessor updates from Christian Brauner: "This finishes the conversion of all inode time fields to accessor functions as discussed on list. Changing timestamps manually as we used to do before is error prone. Using accessors function makes this robust. It does not contain the switch of the time fields to discrete 64 bit integers to replace struct timespec and free up space in struct inode. But after this, the switch can be trivially made and the patch should only affect the vfs if we decide to do it" * tag 'vfs-6.7.ctime' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (86 commits) fs: rename inode i_atime and i_mtime fields security: convert to new timestamp accessors selinux: convert to new timestamp accessors apparmor: convert to new timestamp accessors sunrpc: convert to new timestamp accessors mm: convert to new timestamp accessors bpf: convert to new timestamp accessors ipc: convert to new timestamp accessors linux: convert to new timestamp accessors zonefs: convert to new timestamp accessors xfs: convert to new timestamp accessors vboxsf: convert to new timestamp accessors ufs: convert to new timestamp accessors udf: convert to new timestamp accessors ubifs: convert to new timestamp accessors tracefs: convert to new timestamp accessors sysv: convert to new timestamp accessors squashfs: convert to new timestamp accessors server: convert to new timestamp accessors client: convert to new timestamp accessors ...
2023-10-30Merge tag 'vfs-6.7.xattr' of ↵Linus Torvalds2-2/+2
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs xattr updates from Christian Brauner: "The 's_xattr' field of 'struct super_block' currently requires a mutable table of 'struct xattr_handler' entries (although each handler itself is const). However, no code in vfs actually modifies the tables. This changes the type of 's_xattr' to allow const tables, and modifies existing file systems to move their tables to .rodata. This is desirable because these tables contain entries with function pointers in them; moving them to .rodata makes it considerably less likely to be modified accidentally or maliciously at runtime" * tag 'vfs-6.7.xattr' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (30 commits) const_structs.checkpatch: add xattr_handler net: move sockfs_xattr_handlers to .rodata shmem: move shmem_xattr_handlers to .rodata overlayfs: move xattr tables to .rodata xfs: move xfs_xattr_handlers to .rodata ubifs: move ubifs_xattr_handlers to .rodata squashfs: move squashfs_xattr_handlers to .rodata smb: move cifs_xattr_handlers to .rodata reiserfs: move reiserfs_xattr_handlers to .rodata orangefs: move orangefs_xattr_handlers to .rodata ocfs2: move ocfs2_xattr_handlers and ocfs2_xattr_handler_map to .rodata ntfs3: move ntfs_xattr_handlers to .rodata nfs: move nfs4_xattr_handlers to .rodata kernfs: move kernfs_xattr_handlers to .rodata jfs: move jfs_xattr_handlers to .rodata jffs2: move jffs2_xattr_handlers to .rodata hfsplus: move hfsplus_xattr_handlers to .rodata hfs: move hfs_xattr_handlers to .rodata gfs2: move gfs2_xattr_handlers_max to .rodata fuse: move fuse_xattr_handlers to .rodata ...
2023-10-28exportfs: make ->encode_fh() a mandatory method for NFS exportAmir Goldstein1-0/+1
Rename the default helper for encoding FILEID_INO32_GEN* file handles to generic_encode_ino32_fh() and convert the filesystems that used the default implementation to use the generic helper explicitly. After this change, exportfs_encode_inode_fh() no longer has a default implementation to encode FILEID_INO32_GEN* file handles. This is a step towards allowing filesystems to encode non-decodeable file handles for fanotify without having to implement any export_operations. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20231023180801.2953446-3-amir73il@gmail.com Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28jfs: fix log->bdev_handle null ptr deref in lbmStartIOLizhi Xu1-1/+5
When sbi->flag is JFS_NOINTEGRITY in lmLogOpen(), log->bdev_handle can't be inited, so it value will be NULL. Therefore, add the "log ->no_integrity=1" judgment in lbmStartIO() to avoid such problems. Reported-and-tested-by: syzbot+23bc20037854bb335d59@syzkaller.appspotmail.com Signed-off-by: Lizhi Xu <lizhi.xu@windriver.com> Link: https://lore.kernel.org/r/20231009094557.1398920-1-lizhi.xu@windriver.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28jfs: Convert to bdev_open_by_dev()Jan Kara3-16/+18
Convert jfs to use bdev_open_by_dev() and pass the handle around. CC: Dave Kleikamp <shaggy@kernel.org> CC: jfs-discussion@lists.sourceforge.net Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-24-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-18jfs: convert to new timestamp accessorsJeff Layton5-23/+25
Convert to using the new inode timestamp accessor functions. Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-46-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-13jfs: define xtree root and page independentlyDave Kleikamp6-23/+32
In order to make array bounds checking sane, provide a separate definition of the in-inode xtree root and the external xtree page. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Tested-by: Manas Ghandat <ghandatmanas@gmail.com>
2023-10-09jfs: move jfs_xattr_handlers to .rodataWedson Almeida Filho2-2/+2
This makes it harder for accidental or malicious changes to jfs_xattr_handlers at runtime. Cc: Dave Kleikamp <shaggy@kernel.org> Cc: jfs-discussion@lists.sourceforge.net Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com> Link: https://lore.kernel.org/r/20230930050033.41174-17-wedsonaf@gmail.com Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-04jfs: fix array-index-out-of-bounds in diAllocManas Ghandat1-1/+4
Currently there is not check against the agno of the iag while allocating new inodes to avoid fragmentation problem. Added the check which is required. Reported-by: syzbot+79d792676d8ac050949f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=79d792676d8ac050949f Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-10-04jfs: fix array-index-out-of-bounds in dbFindLeafManas Ghandat1-4/+10
Currently while searching for dmtree_t for sufficient free blocks there is an array out of bounds while getting element in tp->dm_stree. To add the required check for out of bound we first need to determine the type of dmtree. Thus added an extra parameter to dbFindLeaf so that the type of tree can be determined and the required check can be applied. Reported-by: syzbot+aea1ad91e854d0a83e04@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=aea1ad91e854d0a83e04 Signed-off-by: Manas Ghandat <ghandatmanas@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-10-03fs/jfs: Add validity check for db_maxag and db_agprefJuntong Deng1-0/+6
Both db_maxag and db_agpref are used as the index of the db_agfree array, but there is currently no validity check for db_maxag and db_agpref, which can lead to errors. The following is related bug reported by Syzbot: UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:639:20 index 7936 is out of range for type 'atomic_t[128]' Add checking that the values of db_maxag and db_agpref are valid indexes for the db_agfree array. Reported-by: syzbot+38e876a8aa44b7115c76@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=38e876a8aa44b7115c76 Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-10-03fs/jfs: Add check for negative db_l2nbperpageJuntong Deng1-1/+2
l2nbperpage is log2(number of blks per page), and the minimum legal value should be 0, not negative. In the case of l2nbperpage being negative, an error will occur when subsequently used as shift exponent. Syzbot reported this bug: UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:799:12 shift exponent -16777216 is negative Reported-by: syzbot+debee9ab7ae2b34b0307@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=debee9ab7ae2b34b0307 Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-08-31Merge tag 'jfs-6.6' of github.com:kleikamp/linux-shaggyLinus Torvalds4-2/+9
Pull jfs updates from Dave Kleikamp: "A few small fixes" * tag 'jfs-6.6' of github.com:kleikamp/linux-shaggy: jfs: validate max amount of blocks before allocation. jfs: remove redundant initialization to pointer ip jfs: fix invalid free of JFS_IP(ipimap)->i_imap in diUnmount FS: JFS: (trivial) Fix grammatical error in extAlloc fs/jfs: prevent double-free in dbUnmount() after failed jfs_remount()
2023-08-30Merge tag '6.6-rc-smb3-client-fixes-part1' of ↵Linus Torvalds4-134/+7
git://git.samba.org/sfrench/cifs-2.6 Pull smb client updates from Steve French: - fixes for excessive stack usage - multichannel reconnect improvements - DFS fix and cleanup patches - move UCS-2 conversion code to fs/nls and update cifs and jfs to use them - cleanup patch for compounding, one to fix confusing function name - inode number collision fix - reparse point fixes (including avoiding an extra unneeded query on symlinks) and a minor cleanup - directory lease (caching) improvement * tag '6.6-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (24 commits) fs/jfs: Use common ucs2 upper case table fs/smb/client: Use common code in client fs/smb: Swing unicode common code from smb->NLS fs/smb: Remove unicode 'lower' tables SMB3: rename macro CIFS_SERVER_IS_CHAN to avoid confusion [SMB3] send channel sequence number in SMB3 requests after reconnects cifs: update desired access while requesting for directory lease smb: client: reduce stack usage in smb2_query_reparse_point() smb: client: reduce stack usage in smb2_query_info_compound() smb: client: reduce stack usage in smb2_set_ea() smb: client: reduce stack usage in smb_send_rqst() smb: client: reduce stack usage in cifs_demultiplex_thread() smb: client: reduce stack usage in cifs_try_adding_channels() smb: cilent: set reparse mount points as automounts smb: client: query reparse points in older dialects smb: client: do not query reparse points twice on symlinks smb: client: parse reparse point flag in create response smb: client: get rid of dfs code dep in namespace.c smb: client: get rid of dfs naming in automount code smb: client: rename cifs_dfs_ref.c to namespace.c ...
2023-08-30fs/jfs: Use common ucs2 upper case tableDr. David Alan Gilbert4-134/+7
Use the UCS-2 upper case tables from nls, that are shared with smb. This code in JFS is hard to test, so we're only reusing the same tables (which are identical), not trying to reuse the rest of the helper functions. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-29Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ...
2023-08-29jfs: validate max amount of blocks before allocation.Alexei Filippov1-0/+5
The lack of checking bmp->db_max_freebud in extBalloc() can lead to shift out of bounds, so this patch prevents undefined behavior, because bmp->db_max_freebud == -1 only if there is no free space. Signed-off-by: Aleksei Filippov <halip0503@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+5f088f29593e6b4c8db8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=01abadbd6ae6a08b1f1987aa61554c6b3ac19ff2
2023-08-29jfs: remove redundant initialization to pointer ipColin Ian King1-1/+1
The pointer ip is being initialized with a value that is never read, it is being re-assigned later on. The assignment is redundant and can be removed. Cleans up clang scan warning: fs/jfs/namei.c:886:16: warning: Value stored to 'ip' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-08-28Merge tag 'v6.6-vfs.ctime' of ↵Linus Torvalds8-23/+23
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
2023-08-06vfs: get rid of old '->iterate' directory operationLinus Torvalds1-1/+2
All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-02fs: add CONFIG_BUFFER_HEADChristoph Hellwig1-0/+1
Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-24jfs: convert to ctime accessor functionsJeff Layton8-23/+23
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-53-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-18jfs: fix invalid free of JFS_IP(ipimap)->i_imap in diUnmountLiu Shixin via Jfs-discussion1-0/+1
syzbot found an invalid-free in diUnmount: BUG: KASAN: double-free in slab_free mm/slub.c:3661 [inline] BUG: KASAN: double-free in __kmem_cache_free+0x71/0x110 mm/slub.c:3674 Free of addr ffff88806f410000 by task syz-executor131/3632 CPU: 0 PID: 3632 Comm: syz-executor131 Not tainted 6.1.0-rc7-syzkaller-00012-gca57f02295f1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106 print_address_description+0x74/0x340 mm/kasan/report.c:284 print_report+0x107/0x1f0 mm/kasan/report.c:395 kasan_report_invalid_free+0xac/0xd0 mm/kasan/report.c:460 ____kasan_slab_free+0xfb/0x120 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1724 [inline] slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1750 slab_free mm/slub.c:3661 [inline] __kmem_cache_free+0x71/0x110 mm/slub.c:3674 diUnmount+0xef/0x100 fs/jfs/jfs_imap.c:195 jfs_umount+0x108/0x370 fs/jfs/jfs_umount.c:63 jfs_put_super+0x86/0x190 fs/jfs/super.c:194 generic_shutdown_super+0x130/0x310 fs/super.c:492 kill_block_super+0x79/0xd0 fs/super.c:1428 deactivate_locked_super+0xa7/0xf0 fs/super.c:332 cleanup_mnt+0x494/0x520 fs/namespace.c:1186 task_work_run+0x243/0x300 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x664/0x2070 kernel/exit.c:820 do_group_exit+0x1fd/0x2b0 kernel/exit.c:950 __do_sys_exit_group kernel/exit.c:961 [inline] __se_sys_exit_group kernel/exit.c:959 [inline] __x64_sys_exit_group+0x3b/0x40 kernel/exit.c:959 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] JFS_IP(ipimap)->i_imap is not setting to NULL after free in diUnmount. If jfs_remount() free JFS_IP(ipimap)->i_imap but then failed at diMount(). JFS_IP(ipimap)->i_imap will be freed once again. Fix this problem by setting JFS_IP(ipimap)->i_imap to NULL after free. Reported-by: syzbot+90a11e6b1e810785c6ff@syzkaller.appspotmail.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-07-18FS: JFS: (trivial) Fix grammatical error in extAllocImmad Mir1-1/+1
There is a grammatical error in one the commnents in extAlloc function. Signed-off-by: Immad Mir <mirimmad17@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-07-18fs/jfs: prevent double-free in dbUnmount() after failed jfs_remount()Andrew Kanner1-0/+1
Syzkaller reported the following issue: ================================================================== BUG: KASAN: double-free in slab_free mm/slub.c:3787 [inline] BUG: KASAN: double-free in __kmem_cache_free+0x71/0x110 mm/slub.c:3800 Free of addr ffff888086408000 by task syz-executor.4/12750 [...] Call Trace: <TASK> [...] kasan_report_invalid_free+0xac/0xd0 mm/kasan/report.c:482 ____kasan_slab_free+0xfb/0x120 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1807 slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x71/0x110 mm/slub.c:3800 dbUnmount+0xf4/0x110 fs/jfs/jfs_dmap.c:264 jfs_umount+0x248/0x3b0 fs/jfs/jfs_umount.c:87 jfs_put_super+0x86/0x190 fs/jfs/super.c:194 generic_shutdown_super+0x130/0x310 fs/super.c:492 kill_block_super+0x79/0xd0 fs/super.c:1386 deactivate_locked_super+0xa7/0xf0 fs/super.c:332 cleanup_mnt+0x494/0x520 fs/namespace.c:1291 task_work_run+0x243/0x300 kernel/task_work.c:179 resume_user_mode_work include/linux/resume_user_mode.h:49 [inline] exit_to_user_mode_loop+0x124/0x150 kernel/entry/common.c:171 exit_to_user_mode_prepare+0xb2/0x140 kernel/entry/common.c:203 __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline] syscall_exit_to_user_mode+0x26/0x60 kernel/entry/common.c:296 do_syscall_64+0x49/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK> Allocated by task 13352: kasan_save_stack mm/kasan/common.c:45 [inline] kasan_set_track+0x3d/0x60 mm/kasan/common.c:52 ____kasan_kmalloc mm/kasan/common.c:371 [inline] __kasan_kmalloc+0x97/0xb0 mm/kasan/common.c:380 kmalloc include/linux/slab.h:580 [inline] dbMount+0x54/0x980 fs/jfs/jfs_dmap.c:164 jfs_mount+0x1dd/0x830 fs/jfs/jfs_mount.c:121 jfs_fill_super+0x590/0xc50 fs/jfs/super.c:556 mount_bdev+0x26c/0x3a0 fs/super.c:1359 legacy_get_tree+0xea/0x180 fs/fs_context.c:610 vfs_get_tree+0x88/0x270 fs/super.c:1489 do_new_mount+0x289/0xad0 fs/namespace.c:3145 do_mount fs/namespace.c:3488 [inline] __do_sys_mount fs/namespace.c:3697 [inline] __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 13352: kasan_save_stack mm/kasan/common.c:45 [inline] kasan_set_track+0x3d/0x60 mm/kasan/common.c:52 kasan_save_free_info+0x27/0x40 mm/kasan/generic.c:518 ____kasan_slab_free+0xd6/0x120 mm/kasan/common.c:236 kasan_slab_free include/linux/kasan.h:177 [inline] slab_free_hook mm/slub.c:1781 [inline] slab_free_freelist_hook+0x12e/0x1a0 mm/slub.c:1807 slab_free mm/slub.c:3787 [inline] __kmem_cache_free+0x71/0x110 mm/slub.c:3800 dbUnmount+0xf4/0x110 fs/jfs/jfs_dmap.c:264 jfs_mount_rw+0x545/0x740 fs/jfs/jfs_mount.c:247 jfs_remount+0x3db/0x710 fs/jfs/super.c:454 reconfigure_super+0x3bc/0x7b0 fs/super.c:935 vfs_fsconfig_locked fs/fsopen.c:254 [inline] __do_sys_fsconfig fs/fsopen.c:439 [inline] __se_sys_fsconfig+0xad5/0x1060 fs/fsopen.c:314 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] JFS_SBI(ipbmap->i_sb)->bmap wasn't set to NULL after kfree() in dbUnmount(). Syzkaller uses faultinject to reproduce this KASAN double-free warning. The issue is triggered if either diMount() or dbMount() fail in jfs_remount(), since diUnmount() or dbUnmount() already happened in such a case - they will do double-free on next execution: jfs_umount or jfs_remount. Tested on both upstream and jfs-next by syzkaller. Reported-and-tested-by: syzbot+6a93efb725385bc4b2e9@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000471f2d05f1ce8bad@google.com/T/ Link: https://syzkaller.appspot.com/bug?extid=6a93efb725385bc4b2e9 Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-06-29Merge tag 'jfs-6.5' of github.com:kleikamp/linux-shaggyLinus Torvalds4-1/+22
Pull jfs updates from David Kleikamp: "Minor bug fixes and cleanups" * tag 'jfs-6.5' of github.com:kleikamp/linux-shaggy: FS: JFS: Check for read-only mounted filesystem in txBegin FS: JFS: Fix null-ptr-deref Read in txBegin fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev fs: jfs: (trivial) Fix typo in dbInitTree function jfs: jfs_dmap: Validate db_l2nbperpage while mounting
2023-06-26Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds1-6/+6
Pull block updates from Jens Axboe: - NVMe pull request via Keith: - Various cleanups all around (Irvin, Chaitanya, Christophe) - Better struct packing (Christophe JAILLET) - Reduce controller error logs for optional commands (Keith) - Support for >=64KiB block sizes (Daniel Gomez) - Fabrics fixes and code organization (Max, Chaitanya, Daniel Wagner) - bcache updates via Coly: - Fix a race at init time (Mingzhe Zou) - Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye) - use page pinning in the block layer for dio (David) - convert old block dio code to page pinning (David, Christoph) - cleanups for pktcdvd (Andy) - cleanups for rnbd (Guoqing) - use the unchecked __bio_add_page() for the initial single page additions (Johannes) - fix overflows in the Amiga partition handling code (Michael) - improve mq-deadline zoned device support (Bart) - keep passthrough requests out of the IO schedulers (Christoph, Ming) - improve support for flush requests, making them less special to deal with (Christoph) - add bdev holder ops and shutdown methods (Christoph) - fix the name_to_dev_t() situation and use cases (Christoph) - decouple the block open flags from fmode_t (Christoph) - ublk updates and cleanups, including adding user copy support (Ming) - BFQ sanity checking (Bart) - convert brd from radix to xarray (Pankaj) - constify various structures (Thomas, Ivan) - more fine grained persistent reservation ioctl capability checks (Jingbo) - misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan, Jordy, Li, Min, Yu, Zhong, Waiman) * tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits) scsi/sg: don't grab scsi host module reference ext4: Fix warning in blkdev_put() block: don't return -EINVAL for not found names in devt_from_devname cdrom: Fix spectre-v1 gadget block: Improve kernel-doc headers blk-mq: don't insert passthrough request into sw queue bsg: make bsg_class a static const structure ublk: make ublk_chr_class a static const structure aoe: make aoe_class a static const structure block/rnbd: make all 'class' structures const block: fix the exclusive open mask in disk_scan_partitions block: add overflow checks for Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h block: fix signed int overflow in Amiga partition support block: add capacity validation in bdev_add_partition() block: fine-granular CAP_SYS_ADMIN for Persistent Reservation block: disallow Persistent Reservation on partitions reiserfs: fix blkdev_put() warning from release_journal_dev() block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions() block: document the holder argument to blkdev_get_by_path ...
2023-06-26Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds1-1/+1
Pull splice updates from Jens Axboe: "This kills off ITER_PIPE to avoid a race between truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory corruption. Instead, we either use (a) filemap_splice_read(), which invokes the buffered file reading code and splices from the pagecache into the pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads into it and then pushes the filled pages into the pipe; or (c) handle it in filesystem-specific code. Summary: - Rename direct_splice_read() to copy_splice_read() - Simplify the calculations for the number of pages to be reclaimed in copy_splice_read() - Turn do_splice_to() into a helper, vfs_splice_read(), so that it can be used by overlayfs and coda to perform the checks on the lower fs - Make vfs_splice_read() jump to copy_splice_read() to handle direct-I/O and DAX - Provide shmem with its own splice_read to handle non-existent pages in the pagecache. We don't want a ->read_folio() as we don't want to populate holes, but filemap_get_pages() requires it - Provide overlayfs with its own splice_read to call down to a lower layer as overlayfs doesn't provide ->read_folio() - Provide coda with its own splice_read to call down to a lower layer as coda doesn't provide ->read_folio() - Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs and random files as they just copy to the output buffer and don't splice pages - Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3, ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation - Make cifs use filemap_splice_read() - Replace pointers to generic_file_splice_read() with pointers to filemap_splice_read() as DIO and DAX are handled in the caller; filesystems can still provide their own alternate ->splice_read() op - Remove generic_file_splice_read() - Remove ITER_PIPE and its paraphernalia as generic_file_splice_read was the only user" * tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-06-23FS: JFS: Check for read-only mounted filesystem in txBeginImmad Mir1-0/+5
This patch adds a check for read-only mounted filesystem in txBegin before starting a transaction potentially saving from NULL pointer deref. Signed-off-by: Immad Mir <mirimmad17@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-06-23FS: JFS: Fix null-ptr-deref Read in txBeginImmad Mir1-0/+5
Syzkaller reported an issue where txBegin may be called on a superblock in a read-only mounted filesystem which leads to NULL pointer deref. This could be solved by checking if the filesystem is read-only before calling txBegin, and returning with appropiate error code. Reported-By: syzbot+f1faa20eec55e0c8644c@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=be7e52c50c5182cc09a09ea6fc456446b2039de3 Signed-off-by: Immad Mir <mirimmad17@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-06-22fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLevYogesh1-0/+3
Syzkaller reported the following issue: UBSAN: array-index-out-of-bounds in fs/jfs/jfs_dmap.c:1965:6 index -84 is out of range for type 's8[341]' (aka 'signed char[341]') CPU: 1 PID: 4995 Comm: syz-executor146 Not tainted 6.4.0-rc6-syzkaller-00037-gb6dad5178cea #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348 dbAllocDmapLev+0x3e5/0x430 fs/jfs/jfs_dmap.c:1965 dbAllocCtl+0x113/0x920 fs/jfs/jfs_dmap.c:1809 dbAllocAG+0x28f/0x10b0 fs/jfs/jfs_dmap.c:1350 dbAlloc+0x658/0xca0 fs/jfs/jfs_dmap.c:874 dtSplitUp fs/jfs/jfs_dtree.c:974 [inline] dtInsert+0xda7/0x6b00 fs/jfs/jfs_dtree.c:863 jfs_create+0x7b6/0xbb0 fs/jfs/namei.c:137 lookup_open fs/namei.c:3492 [inline] open_last_lookups fs/namei.c:3560 [inline] path_openat+0x13df/0x3170 fs/namei.c:3788 do_filp_open+0x234/0x490 fs/namei.c:3818 do_sys_openat2+0x13f/0x500 fs/open.c:1356 do_sys_open fs/open.c:1372 [inline] __do_sys_openat fs/open.c:1388 [inline] __se_sys_openat fs/open.c:1383 [inline] __x64_sys_openat+0x247/0x290 fs/open.c:1383 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f1f4e33f7e9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc21129578 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1f4e33f7e9 RDX: 000000000000275a RSI: 0000000020000040 RDI: 00000000ffffff9c RBP: 00007f1f4e2ff080 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f1f4e2ff110 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The bug occurs when the dbAllocDmapLev()function attempts to access dp->tree.stree[leafidx + LEAFIND] while the leafidx value is negative. To rectify this, the patch introduces a safeguard within the dbAllocDmapLev() function. A check has been added to verify if leafidx is negative. If it is, the function immediately returns an I/O error, preventing any further execution that could potentially cause harm. Tested via syzbot. Reported-by: syzbot+853a6f4dfa3cf37d3aea@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=ae2f5a27a07ae44b0f17 Signed-off-by: Yogesh <yogi.kernel@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-06-20fs: jfs: (trivial) Fix typo in dbInitTree functionWonguk Lee1-1/+1
While trying to fix the jfs UBSAN problem reported in syzkaller, (https://syzkaller.appspot.com/bug?id=01abadbd6ae6a08b1f1987aa61554c6b3ac19ff2) I found the typo in the comment of dbInitTree function and fix it. Signed-off-by: Wonguk Lee <wonguk.lee1023@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-06-20jfs: jfs_dmap: Validate db_l2nbperpage while mountingSiddh Raman Pant2-0/+8
In jfs_dmap.c at line 381, BLKTODMAP is used to get a logical block number inside dbFree(). db_l2nbperpage, which is the log2 number of blocks per page, is passed as an argument to BLKTODMAP which uses it for shifting. Syzbot reported a shift out-of-bounds crash because db_l2nbperpage is too big. This happens because the large value is set without any validation in dbMount() at line 181. Thus, make sure that db_l2nbperpage is correct while mounting. Max number of blocks per page = Page size / Min block size => log2(Max num_block per page) = log2(Page size / Min block size) = log2(Page size) - log2(Min block size) => Max db_l2nbperpage = L2PSIZE - L2MINBLOCKSIZE Reported-and-tested-by: syzbot+d2cd27dcf8e04b232eb2@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=2a70a453331db32ed491f5cbb07e81bf2d225715 Cc: stable@vger.kernel.org Suggested-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Siddh Raman Pant <code@siddh.me> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig1-1/+1
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: use the holder as indication for exclusive opensChristoph Hellwig1-3/+3
The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05block: introduce holder opsChristoph Hellwig1-1/+1
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-02jfs: Use unsigned variable for length calculationsKees Cook1-3/+3
To avoid confusing the compiler about possible negative sizes, switch "ssize" which can never be negative from int to u32. Seen with GCC 13: ../fs/jfs/namei.c: In function 'jfs_symlink': ../include/linux/fortify-string.h:57:33: warning: '__builtin_memcpy' pointer overflow between offset 0 and size [-2147483648, -1] [-Warray-bounds=] 57 | #define __underlying_memcpy __builtin_memcpy | ^ ... ../fs/jfs/namei.c:950:17: note: in expansion of macro 'memcpy' 950 | memcpy(ip->i_link, name, ssize); | ^~~~~~ Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Chinner <dchinner@redhat.com> Cc: jfs-discussion@lists.sourceforge.net Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Jeff Xu <jeffxu@chromium.org> Message-Id: <20230204183355.never.877-kees@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-31jfs: logmgr: use __bio_add_page to add single page to bioJohannes Thumshirn1-2/+2
The JFS IO code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/9fb5ed86d19f6e0b6f64dfc4109a48ff8ff24497.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-24splice: Use filemap_splice_read() instead of generic_file_splice_read()David Howells1-1/+1
Replace pointers to generic_file_splice_read() with calls to filemap_splice_read(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-29-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-24Merge tag 'pull-write-one-page' of ↵Linus Torvalds1-5/+34
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs write_one_page removal from Al Viro: "write_one_page series" * tag 'pull-write-one-page' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mm,jfs: move write_one_page/folio_write_one to jfs ocfs2: don't use write_one_page in ocfs2_duplicate_clusters_by_page ufs: don't flush page immediately for DIRSYNC directories
2023-03-12mm,jfs: move write_one_page/folio_write_one to jfsChristoph Hellwig1-5/+34
The last remaining user of folio_write_one through the write_one_page wrapper is jfs, so move the functionality there and hard code the call to metapage_writepage. Note that the use of the pagecache by the JFS 'metapage' buffer cache is a bit odd, and we could probably do without VM-level dirty tracking at all, but that's a change for another time. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-03-06fs: drop unused posix acl handlersChristian Brauner1-4/+0
Remove struct posix_acl_{access,default}_handler for all filesystems that don't depend on the xattr handler in their inode->i_op->listxattr() method in any way. There's nothing more to do than to simply remove the handler. It's been effectively unused ever since we introduced the new posix acl api. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-01Merge tag 'jfs-6.3' of https://github.com/kleikamp/linux-shaggyLinus Torvalds1-1/+2
Pull jfs update from Dave Kleikamp: "Just one simple sanity check" * tag 'jfs-6.3' of https://github.com/kleikamp/linux-shaggy: fs/jfs: fix shift exponent db_agl2size negative
2023-02-20Merge tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull legacy dio update from Jens Axboe: "We only have a few file systems that use the old dio code, make them select it rather than build it unconditionally" * tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux: fs: build the legacy direct I/O code conditionally fs: move sb_init_dio_done_wq out of direct-io.c
2023-01-26fs: build the legacy direct I/O code conditionallyChristoph Hellwig1-0/+1
Add a new LEGACY_DIRECT_IO config symbol that is only selected by the file systems that still use the legacy blockdev_direct_IO code, so that kernels without support for those file systems don't need to build the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230125065839.191256-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-19quota: port to mnt_idmapChristian Brauner1-2/+2
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port inode_init_owner() to mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port acl to mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port xattr to mnt_idmapChristian Brauner1-2/+2
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->fileattr_set() to pass mnt_idmapChristian Brauner2-2/+2
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->set_acl() to pass mnt_idmapChristian Brauner3-3/+3
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->rename() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mknod() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mkdir() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->symlink() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->create() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner2-6/+6
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-03fs/jfs: fix shift exponent db_agl2size negativeLiu Shixin via Jfs-discussion1-1/+2
As a shift exponent, db_agl2size can not be less than 0. Add the missing check to fix the shift-out-of-bounds bug reported by syzkaller: UBSAN: shift-out-of-bounds in fs/jfs/jfs_dmap.c:2227:15 shift exponent -744642816 is negative Reported-by: syzbot+0be96567042453c0c820@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-12-13Merge tag 'mm-stable-2022-12-13' of ↵Linus Torvalds1-6/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - More userfaultfs work from Peter Xu - Several convert-to-folios series from Sidhartha Kumar and Huang Ying - Some filemap cleanups from Vishal Moola - David Hildenbrand added the ability to selftest anon memory COW handling - Some cpuset simplifications from Liu Shixin - Addition of vmalloc tracing support by Uladzislau Rezki - Some pagecache folioifications and simplifications from Matthew Wilcox - A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it - Miguel Ojeda contributed some cleanups for our use of the __no_sanitize_thread__ gcc keyword. This series should have been in the non-MM tree, my bad - Naoya Horiguchi improved the interaction between memory poisoning and memory section removal for huge pages - DAMON cleanups and tuneups from SeongJae Park - Tony Luck fixed the handling of COW faults against poisoned pages - Peter Xu utilized the PTE marker code for handling swapin errors - Hugh Dickins reworked compound page mapcount handling, simplifying it and making it more efficient - Removal of the autonuma savedwrite infrastructure from Nadav Amit and David Hildenbrand - zram support for multiple compression streams from Sergey Senozhatsky - David Hildenbrand reworked the GUP code's R/O long-term pinning so that drivers no longer need to use the FOLL_FORCE workaround which didn't work very well anyway - Mel Gorman altered the page allocator so that local IRQs can remnain enabled during per-cpu page allocations - Vishal Moola removed the try_to_release_page() wrapper - Stefan Roesch added some per-BDI sysfs tunables which are used to prevent network block devices from dirtying excessive amounts of pagecache - David Hildenbrand did some cleanup and repair work on KSM COW breaking - Nhat Pham and Johannes Weiner have implemented writeback in zswap's zsmalloc backend - Brian Foster has fixed a longstanding corner-case oddity in file[map]_write_and_wait_range() - sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang Chen - Shiyang Ruan has done some work on fsdax, to make its reflink mode work better under xfstests. Better, but still not perfect - Christoph Hellwig has removed the .writepage() method from several filesystems. They only need .writepages() - Yosry Ahmed wrote a series which fixes the memcg reclaim target beancounting - David Hildenbrand has fixed some of our MM selftests for 32-bit machines - Many singleton patches, as usual * tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits) mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio mm: mmu_gather: allow more than one batch of delayed rmaps mm: fix typo in struct pglist_data code comment kmsan: fix memcpy tests mm: add cond_resched() in swapin_walk_pmd_entry() mm: do not show fs mm pc for VM_LOCKONFAULT pages selftests/vm: ksm_functional_tests: fixes for 32bit selftests/vm: cow: fix compile warning on 32bit selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem mm,thp,rmap: fix races between updates of subpages_mapcount mm: memcg: fix swapcached stat accounting mm: add nodes= arg to memory.reclaim mm: disable top-tier fallback to reclaim on proactive reclaim selftests: cgroup: make sure reclaim target memcg is unprotected selftests: cgroup: refactor proactive reclaim code to reclaim_until() mm: memcg: fix stale protection of reclaim target memcg mm/mmap: properly unaccount memory on mas_preallocate() failure omfs: remove ->writepage jfs: remove ->writepage ...
2022-12-12Merge tag 'jfs-6.2' of https://github.com/kleikamp/linux-shaggyLinus Torvalds9-22/+31
Pull jfs updates from David Kleikamp: "Assorted JFS fixes for 6.2" * tag 'jfs-6.2' of https://github.com/kleikamp/linux-shaggy: jfs: makes diUnmount/diMount in jfs_mount_rw atomic jfs: Fix a typo in function jfs_umount fs: jfs: fix shift-out-of-bounds in dbDiscardAG jfs: Fix fortify moan in symlink jfs: remove redundant assignments to ipaimap and ipaimap2 jfs: remove unused declarations for jfs fs/jfs/jfs_xattr.h: Fix spelling typo in comment MAINTAINERS: git://github -> https://github.com for kleikamp fs/jfs: replace ternary operator with min_t() fs: jfs: fix shift-out-of-bounds in dbAllocAG
2022-12-11jfs: remove ->writepageChristoph Hellwig1-6/+1
->writepage is a very inefficient method to write back data, and only used through write_cache_pages or a a fallback when no ->migrate_folio method is present. Set ->migrate_folio to the generic buffer_head based helper, and remove the ->writepage implementation. Link: https://lkml.kernel.org/r/20221202102644.770505-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-10jfs: makes diUnmount/diMount in jfs_mount_rw atomicOleg Kanatov2-1/+5
jfs_mount_rw can call diUnmount and then diMount. These calls change the imap pointer. Between these two calls there may be calls of function jfs_lookup(). The jfs_lookup() function calls jfs_iget(), which, in turn calls diRead(). The latter references the imap pointer. That may cause diRead() to refer to a pointer freed in diUnmount(). This commit makes the calls to diUnmount()/diMount() atomic so that nothing will read the imap pointer until the whole remount is completed. Signed-off-by: Oleg Kanatov <okanatov@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-11-10jfs: Fix a typo in function jfs_umountOleg Kanatov1-1/+1
When closing the block allocation map, an incorrect pointer was NULL'ed. This commit fixes that. Signed-off-by: Oleg Kanatov <okanatov@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-10-27fs: jfs: fix shift-out-of-bounds in dbDiscardAGHoi Pok Wu1-0/+5
This should be applied to most URSAN bugs found recently by syzbot, by guarding the dbMount. As syzbot feeding rubbish into the bmap descriptor. Signed-off-by: Hoi Pok Wu <wuhoipok@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-10-27jfs: Fix fortify moan in symlinkDr. David Alan Gilbert1-1/+1
JFS has in jfs_incore.h: /* _inline may overflow into _inline_ea when needed */ /* _inline_ea may overlay the last part of * file._xtroot if maxentry = XTROOTINITSLOT */ union { struct { /* 128: inline symlink */ unchar _inline[128]; /* 128: inline extended attr */ unchar _inline_ea[128]; }; unchar _inline_all[256]; and currently the symlink code copies into _inline; if this is larger than 128 bytes it triggers a fortify warning of the form: memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 18446744073709551615) when it's actually OK. Copy it into _inline_all instead. Reported-by: syzbot+5fc38b2ddbbca7f5c680@syzkaller.appspotmail.com Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-10-20fs: rename current get acl methodChristian Brauner2-2/+2
The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. The current inode operation for getting posix acls takes an inode argument but various filesystems (e.g., 9p, cifs, overlayfs) need access to the dentry. In contrast to the ->set_acl() inode operation we cannot simply extend ->get_acl() to take a dentry argument. The ->get_acl() inode operation is called from: acl_permission_check() -> check_acl() -> get_acl() which is part of generic_permission() which in turn is part of inode_permission(). Both generic_permission() and inode_permission() are called in the ->permission() handler of various filesystems (e.g., overlayfs). So simply passing a dentry argument to ->get_acl() would amount to also having to pass a dentry argument to ->permission(). We should avoid this unnecessary change. So instead of extending the existing inode operation rename it from ->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that passes a dentry argument and which filesystems that need access to the dentry can implement instead of ->get_inode_acl(). Filesystems like cifs which allow setting and getting posix acls but not using them for permission checking during lookup can simply not implement ->get_inode_acl(). This is intended to be a non-functional change. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-19fs: pass dentry to set acl methodChristian Brauner3-3/+4
The current way of setting and getting posix acls through the generic xattr interface is error prone and type unsafe. The vfs needs to interpret and fixup posix acls before storing or reporting it to userspace. Various hacks exist to make this work. The code is hard to understand and difficult to maintain in it's current form. Instead of making this work by hacking posix acls through xattr handlers we are building a dedicated posix acl api around the get and set inode operations. This removes a lot of hackiness and makes the codepaths easier to maintain. A lot of background can be found in [1]. Since some filesystem rely on the dentry being available to them when setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode operation. But since ->set_acl() is required in order to use the generic posix acl xattr handlers filesystems that do not implement this inode operation cannot use the handler and need to implement their own dedicated posix acl handlers. Update the ->set_acl() inode method to take a dentry argument. This allows all filesystems to rely on ->set_acl(). As far as I can tell all codepaths can be switched to rely on the dentry instead of just the inode. Note that the original motivation for passing the dentry separate from the inode instead of just the dentry in the xattr handlers was because of security modules that call security_d_instantiate(). This hook is called during d_instantiate_new(), d_add(), __d_instantiate_anon(), and d_splice_alias() to initialize the inode's security context and possibly to set security.* xattrs. Since this only affects security.* xattrs this is completely irrelevant for posix acls. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-10-18jfs: remove redundant assignments to ipaimap and ipaimap2Colin Ian King1-2/+0
The pointers ipaimap and ipaimap2 are re-assigned with values a second time with the same values when they were initialized. The re-assignments are redundant and can be removed. Cleans up two clang scan build warnings: fs/jfs/jfs_umount.c:42:16: warning: Value stored to 'ipaimap' during its initialization is never read [deadcode.DeadStores] fs/jfs/jfs_umount.c:43:16: warning: Value stored to 'ipaimap2' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-10-18jfs: remove unused declarations for jfsGaosheng Cui2-6/+0
extRealloc(), xtRelocate(), xtDelete() and extFill() have been removed since commit e471e5942c00 ("fs/jfs: Remove dead code"), so remove them. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-10-18fs/jfs/jfs_xattr.h: Fix spelling typo in commentJiangshan Yi1-1/+1
Fix spelling typo in comment. Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-10-18fs/jfs: replace ternary operator with min_t()Jiangshan Yi1-4/+2
Fix the following coccicheck warning: fs/jfs/super.c:748: WARNING opportunity for min(). fs/jfs/super.c:788: WARNING opportunity for min(). min_t() macro is defined in include/linux/minmax.h. It avoids multiple evaluations of the arguments when non-constant and performs strict type-checking. Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-10-18fs: jfs: fix shift-out-of-bounds in dbAllocAGDongliang Mu1-6/+16
Syzbot found a crash : UBSAN: shift-out-of-bounds in dbAllocAG. The underlying bug is the missing check of bmp->db_agl2size. The field can be greater than 64 and trigger the shift-out-of-bounds. Fix this bug by adding a check of bmp->db_agl2size in dbMount since this field is used in many following functions. The upper bound for this field is L2MAXL2SIZE - L2MAXAG, thanks for the help of Dave Kleikamp. Note that, for maintenance, I reorganized error handling code of dbMount. Reported-by: syzbot+15342c1aa6a00fb7a438@syzkaller.appspotmail.com Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2022-08-03Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds2-4/+16
Pull folio updates from Matthew Wilcox: - Fix an accounting bug that made NR_FILE_DIRTY grow without limit when running xfstests - Convert more of mpage to use folios - Remove add_to_page_cache() and add_to_page_cache_locked() - Convert find_get_pages_range() to filemap_get_folios() - Improvements to the read_cache_page() family of functions - Remove a few unnecessary checks of PageError - Some straightforward filesystem conversions to use folios - Split PageMovable users out from address_space_operations into their own movable_operations - Convert aops->migratepage to aops->migrate_folio - Remove nobh support (Christoph Hellwig) * tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits) fs: remove the NULL get_block case in mpage_writepages fs: don't call ->writepage from __mpage_writepage fs: remove the nobh helpers jfs: stop using the nobh helper ext2: remove nobh support ntfs3: refactor ntfs_writepages mm/folio-compat: Remove migration compatibility functions fs: Remove aops->migratepage() secretmem: Convert to migrate_folio hugetlb: Convert to migrate_folio aio: Convert to migrate_folio f2fs: Convert to filemap_migrate_folio() ubifs: Convert to filemap_migrate_folio() btrfs: Convert btrfs_migratepage to migrate_folio mm/migrate: Add filemap_migrate_folio() mm/migrate: Convert migrate_page() to migrate_folio() nfs: Convert to migrate_folio btrfs: Convert btree_migratepage to migrate_folio mm/migrate: Convert expected_page_refs() to folio_expected_refs() mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio() ...
2022-08-02jfs: stop using the nobh helperChristoph Hellwig1-3/+15
The nobh mode is an obscure feature to save lowlevel for large memory 32-bit configurations while trading for much slower performance and has been long obsolete. Switch to the regular buffer head based helpers instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-29jfs: Remove check for PageUptodateMatthew Wilcox (Oracle)1-1/+1
Pages returned from read_mapping_page() are always uptodate, so this check is unnecessary. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-26attr: port attribute changes to new typesChristian Brauner1-2/+2
Now that we introduced new infrastructure to increase the type safety for filesystems supporting idmapped mounts port the first part of the vfs over to them. This ports the attribute changes codepaths to rely on the new better helpers using a dedicated type. Before this change we used to take a shortcut and place the actual values that would be written to inode->i_{g,u}id into struct iattr. This had the advantage that we moved idmappings mostly out of the picture early on but it made reasoning about changes more difficult than it should be. The filesystem was never explicitly told that it dealt with an idmapped mount. The transition to the value that needed to be stored in inode->i_{g,u}id appeared way too early and increased the probability of bugs in various codepaths. We know place the same value in struct iattr no matter if this is an idmapped mount or not. The vfs will only deal with type safe vfs{g,u}id_t. This makes it massively safer to perform permission checks as the type will tell us what checks we need to perform and what helpers we need to use. Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to inode->i_{g,u}id since they are different types. Instead they need to use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the vfs{g,u}id into the filesystem. The other nice effect is that filesystems like overlayfs don't need to care about idmappings explicitly anymore and can simply set up struct iattr accordingly directly. Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1] Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-26quota: port quota helpers mount idsChristian Brauner1-2/+2
Port the is_quota_modification() and dqout_transfer() helper to type safe vfs{g,u}id_t. Since these helpers are only called by a few filesystems don't introduce a new helper but simply extend the existing helpers to pass down the mount's idmapping. Note, that this is a non-functional change, i.e. nothing will have happened here or at the end of this series to how quota are done! This a change necessary because we will at the end of this series make ownership changes easier to reason about by keeping the original value in struct iattr for both non-idmapped and idmapped mounts. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. Since struct iattr uses an anonymous union with overlapping types as supported by the C standard, filesystems that haven't converted to ia_vfs{g,u}id won't see any difference and things will continue to work as before. In other words, no functional changes intended with this change. Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> CC: linux-fsdevel@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Seth Forshee <sforshee@digitalocean.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>