aboutsummaryrefslogtreecommitdiffstats
path: root/fs/exfat
AgeCommit message (Collapse)AuthorFilesLines
2026-06-15exfat: bound uniname advance in exfat_find_dir_entry()Bryam Vargas1-5/+8
In exfat_find_dir_entry(), each TYPE_EXTEND (file name) entry advances the output pointer by a fixed amount while the loop guard only tracks the accumulated name length: if (++order == 2) uniname = p_uniname->name; else uniname += EXFAT_FILE_NAME_LEN; len = exfat_extract_uni_name(ep, entry_uniname); name_len += len; unichar = *(uniname+len); *(uniname+len) = 0x0; uniname grows by EXFAT_FILE_NAME_LEN (15) per name entry, but name_len grows only by the actual extracted length, which is shorter when a name fragment contains an early NUL. The only guard is `name_len >= MAX_NAME_LENGTH`, so a crafted directory with many short name fragments lets uniname run far past the p_uniname->name[MAX_NAME_LENGTH + 3] buffer while name_len stays small, causing an out-of-bounds read and write at *(uniname+len). The sibling extractor exfat_get_uniname_from_ext_entry() already stops on a short fragment (the lockstep `len != EXFAT_FILE_NAME_LEN` guard added in commit d42334578eba ("exfat: check if filename entries exceeds max filename length")); exfat_find_dir_entry() never got the equivalent. Track the per-entry write offset as a count and reject a fragment once the offset, or the offset plus the extracted length, would exceed MAX_NAME_LENGTH, before forming the output pointer. Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org Suggested-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: add swap_activate supportJan Polensky3-0/+10
Commit 07d67f3e9083 ("exfat: add iomap buffered I/O support") converted exfat buffered I/O to iomap, but did not add a .swap_activate handler to the address_space_operations. swapon(2) on an exfat swapfile then fails with EINVAL, which causes LTP swap tests to fail. Add exfat_iomap_swap_activate() and hook it into exfat_aops so exfat uses iomap_swapfile_activate() for swapfile activation. Fixes: 614f71ca1bdf ("exfat: add iomap buffered I/O support") Closes: https://lore.kernel.org/all/20260603110212.3020276-1-japo@linux.ibm.com/ Signed-off-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: preserve benign secondary entries during rename and moveRochan Avlur3-28/+116
Commit 8258ef28001a ("exfat: handle unreconized benign secondary entries") added cluster freeing for benign secondary entries inside exfat_remove_entries(). However, exfat_remove_entries() is also called from the rename and move paths (exfat_rename_file and exfat_move_file), where the old entry set is being relocated rather than deleted. This causes benign secondary entries such as vendor extension entries to be silently destroyed on rename or cross-directory move, violating the exFAT spec requirement (section 8.2) that implementations preserve unrecognized benign secondary entries. Fix this by adding a free_benign parameter to exfat_remove_entries() so callers can suppress cluster freeing during relocation, and extending exfat_init_ext_entry() to copy trailing benign secondary entries from the old entry set into the new one internally. Also clean up the error paths to delete newly allocated entries on failure. Fixes: 8258ef28001a ("exfat: handle unreconized benign secondary entries") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/CAG7tbBV--waov7XVu2FHQEc6paR92dufS=em9DW5Kzsrpu3iQg@mail.gmail.com/ Signed-off-by: Rochan Avlur <rochan.avlur@gmail.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: serialize truncate against in-flight DIONamjae Jeon1-0/+6
exfat_setattr() did not call inode_dio_wait() before performing a size change, leaving a window where a concurrent in-flight DIO write could be operating on clusters that the truncate is about to free. Add inode_dio_wait() before the truncate_setsize()/exfat_truncate() sequence so that any in-flight DIO completes before cluster freeing begins. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: add support for SEEK_HOLE and SEEK_DATA in llseekNamjae Jeon2-5/+52
Adds exfat_file_llseek() that implements these whence values via the iomap layer (iomap_seek_hole() and iomap_seek_data()) using the existing exfat_read_iomap_ops. Unlike many other modern filesystems, exFAT does not support sparse files with unallocated clusters (holes). In exFAT, clusters are always fully allocated once they are written or preallocated. In addition, exFAT maintains a separate "Valid Data Length" (valid_size) that is distinct from the logical file size. This affects how holes are reported during seeking. In exfat_iomap_begin(), ranges where the offset is greater than or equal to ei->valid_size are mapped as IOMAP_UNWRITTEN, while ranges below valid_size are mapped as IOMAP_MAPPED. This mapping behavior is used by the iomap seek functions to correctly report SEEK_HOLE and SEEK_DATA positions. - Ranges with offset >= ei->valid_size are mapped as IOMAP_HOLE. - Ranges with offset < ei->valid_size are mapped as IOMAP_MAPPED. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: add iomap direct I/O supportNamjae Jeon6-220/+119
Add iomap-based direct I/O support to the exfat filesystem. This replaces the previous exfat_direct_IO() implementation that used blockdev_direct_IO() with iomap_dio_rw() interface. Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: add iomap buffered I/O supportNamjae Jeon7-130/+367
Add full buffered I/O support using the iomap framework to the exfat filesystem. This will replaces the old exfat_get_block(), exfat_write_begin(), exfat_write_end(), and exfat_block_truncate_page() with their iomap equivalents. Buffered writes now use iomap_file_buffered_write(), read uses iomap_bio_read_folio() and iomap_bio_readahead(), and writeback is handled through iomap_writepages(). Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: fix implicit declaration of brelse()Namjae Jeon1-0/+1
exfat_cluster_walk() calls brelse(bh) without including the header that declares the function, causing the following build error: fs/exfat/exfat_fs.h:542:9: error: implicit declaration of function ‘brelse’ [-Werror=implicit-function-declaration] Fix this by adding the missing buffer_head.h in exfat_fs.h. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: add data_start_bytes and exfat_cluster_to_phys_bytes() helperNamjae Jeon2-0/+9
This caches the data area start offset in bytes (data_start_bytes) and introduces a helper function exfat_cluster_to_phys_bytes() to compute the physical byte position of a given cluster. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: add support for multi-cluster allocationNamjae Jeon6-31/+26
Currently exfat_map_cluster() allocates and returns only one cluster at a time even when more clusters are needed. This causes multiple FAT walks and repeated allocation calls during large sequential writes or when using iomap for writes. This change exfat_map_cluster() and exfat_alloc_cluster() to be able to allocate multiple contiguous clusters. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: add exfat_file_open()Namjae Jeon1-0/+9
Add exfat_file_open() to handle file open operation for exFAT. This change is a preparation step before introducing iomap-based direct IO support. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: add balloc parameter to exfat_map_cluster() for iomap supportNamjae Jeon1-2/+5
In preparation for supporting the iomap infrastructure, we need to know whether a new cluster was allocated or not in exfat_map_cluster(). Add an optional 'bool *balloc' output parameter. When a new cluster is allocated, *balloc is set to true. Pass NULL from exfat_get_block() to preserve the existing behavior. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: replace unsafe macros with static inline functionsNamjae Jeon8-86/+147
The current exFAT driver relies on various macros for unit conversions between clusters, blocks, sectors, and directory entries. These macros are structurally unsafe as they lack type enforcement and are prone to potential integer overflows during bit-shift operations, especially on 64-bit architectures. Replace all arithmetic macros with static inline functions to provide strict type checking and explicit casting. Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: simplify exfat_lookup()Al Viro1-43/+13
1) d_splice_alias() handles ERR_PTR() for inode just fine 2) no need to even look for existing aliases in case of directory inodes; just punt to d_splice_alias(), it'll do the right thing 3) no need to bother with 'd_unhashed(alias)' case - d_find_alias() would've returned that only in case of a directory, and d_splice_alias() will handle that just fine on its own. 4) exfat_d_anon_disconn() is entirely pointless now - we only get to evaluating it in case dentry->d_parent == alias->d_parent and alias being a non-directory. But in that case IS_ROOT(alias) can't possibly be true - that would've reqiured alias == alias->d_parent, i.e alias == dentry->d_parent and dentry->d_parent is guaranteed to be a directory. So exfat_d_anon_disconn() would always return false when it's called, which makes && !exfat_d_anon_disconn(alias) a no-op. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: fix potential use-after-free in exfat_find_dir_entry()Michael Bommarito1-1/+3
In exfat_find_dir_entry(), the buffer_head obtained from exfat_get_dentry() is released with brelse(bh) before the fall-through TYPE_EXTEND branch reads the directory entry through ep (which points into bh->b_data): brelse(bh); if (entry_type == TYPE_EXTEND) { ... len = exfat_extract_uni_name(ep, entry_uniname); ... } After brelse() drops our reference, nothing guarantees that the underlying page backing bh->b_data remains valid for the subsequent exfat_extract_uni_name() read. This is the same pattern fixed in commit fc961522ddbd ("exfat: Fix potential use after free in exfat_load_upcase_table()"). Move brelse(bh) so it runs after ep is no longer dereferenced on each branch. Confirmed on QEMU x86_64 with CONFIG_KASAN=y + CONFIG_DEBUG_PAGEALLOC=y + CONFIG_PAGE_POISONING=y on linux-next, using a crafted exFAT image (long filename with same-hash collisions forcing the TYPE_EXTEND path). With a debug-only invalidate_bdev() inserted between brelse(bh) and the ep read to make the stale-deref window deterministic, the unpatched kernel faults: BUG: KASAN: use-after-free in exfat_find_dir_entry+0x133b/0x15a0 BUG: unable to handle page fault for address: ffff88801a5fa0c2 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI RIP: 0010:exfat_find_dir_entry+0x1188/0x15a0 With this patch applied, the same instrumented harness completes cleanly under the same sanitizer stack. I have not reproduced a crash on an uninstrumented kernel under ordinary reclaim; the instrumented A/B establishes the lifetime violation and that the patch closes it, not an unaided triggerability claim. Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-06-15exfat: fix handling of damaged volume in exfat_create_upcase_table()David Timber1-6/+13
When the size of the upcase table is set to zero in the dentry for any reason(e.g. corrupted media or misbehaving device), an integer overflow causes the module to loop indefinitely. If the size of the upcase table is read zero, do not attempt to load the table. Instead, fallback to loading the default upcase table. If the size of the upcase table is zero or no upcase table is found, raise exfat_fs_error() to mark the volume read-only. Signed-off-by: David Timber <dxdt@dev.snart.me> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-05-11exfat: Implement fileattr_get for case sensitivityChuck Lever3-2/+19
Report exFAT's case sensitivity behavior via the FS_XFLAG_CASEFOLD flag. exFAT compares names through the volume's upcase table; in practice that table folds case, and case is preserved at rest. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Roland Mainz <roland.mainz@nrubsig.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://github.com/exfatprogs/exfatprogs/issues/313 Link: https://patch.msgid.link/20260507-case-sensitivity-v14-4-e62cc8200435@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-04-13Merge tag 'exfat-for-7.1-rc1' of ↵Linus Torvalds10-218/+265
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Implement FALLOC_FL_ALLOCATE_RANGE to add support for preallocating clusters without zeroing, helping to reduce file fragmentation - Add a unified block readahead helper for FAT chain conversion, bitmap allocation, and directory entry lookups - Optimize exfat_chain_cont_cluster() by caching buffer heads to minimize mark_buffer_dirty() and mirroring overhead during NO_FAT_CHAIN to FAT_CHAIN conversion - Switch to truncate_inode_pages_final() in evict_inode() to prevent BUG_ON caused by shadow entries during reclaim - Fix a 32-bit truncation bug in directory entry calculations by ensuring proper bitwise coercion - Fix sb->s_maxbytes calculation to correctly reflect the maximum possible volume size for a given cluster size, resolving xfstests generic/213 - Introduced exfat_cluster_walk() helper to traverse FAT chains by a specified step, handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN modes - Introduced exfat_chain_advance() helper to advance an exfat_chain structure, updating both the current cluster and remaining size - Remove dead assignments and fix Smatch warnings * tag 'exfat-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: use exfat_chain_advance helper exfat: introduce exfat_chain_advance helper exfat: remove NULL cache pointer case in exfat_ent_get exfat: use exfat_cluster_walk helper exfat: introduce exfat_cluster_walk helper exfat: fix incorrect directory checksum after rename to shorter name exfat: fix s_maxbytes exfat: fix passing zero to ERR_PTR() in exfat_mkdir() exfat: fix error handling for FAT table operations exfat: optimize exfat_chain_cont_cluster with cached buffer heads exfat: drop redundant sec parameter from exfat_mirror_bh exfat: use readahead helper in exfat_get_dentry exfat: use readahead helper in exfat_allocate_bitmap exfat: add block readahead in exfat_chain_cont_cluster exfat: add fallocate FALLOC_FL_ALLOCATE_RANGE support exfat: Fix bitwise operation having different size exfat: Drop dead assignment of num_clusters exfat: use truncate_inode_pages_final() at evict_inode()
2026-04-03exfat: use exfat_chain_advance helperChi Zhiling2-74/+25
Replace open-coded cluster chain walking logic with exfat_chain_advance() across exfat_readdir, exfat_find_dir_entry, exfat_count_dir_entries, exfat_search_empty_slot and exfat_check_dir_empty. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03exfat: introduce exfat_chain_advance helperChi Zhiling1-0/+21
Introduce exfat_chain_advance() to walk a exfat_chain structure by a given step, updating both ->dir and ->size fields atomically. This helper handles both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN modes with proper boundary checking. Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03exfat: remove NULL cache pointer case in exfat_ent_getChi Zhiling1-14/+9
Since exfat_get_next_cluster has been updated, no callers pass a NULL pointer to exfat_ent_get, so remove the handling logic for this case. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03exfat: use exfat_cluster_walk helperChi Zhiling2-45/+13
Replace the custom exfat_walk_fat_chain() function and open-coded FAT chain walking logic with the exfat_cluster_walk() helper across exfat_find_location, __exfat_get_dentry_set, and exfat_map_cluster. Suggested-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03exfat: introduce exfat_cluster_walk helperChi Zhiling1-1/+22
Introduce exfat_cluster_walk() to walk the FAT chain by a given step, handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN modes. Also redefine exfat_get_next_cluster as a thin wrapper around it for backward compatibility. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-04-03exfat: fix incorrect directory checksum after rename to shorter nameChi Zhiling1-0/+1
When renaming a file in-place to a shorter name, exfat_remove_entries marks excess entries as DELETED, but es->num_entries is not updated accordingly. As a result, exfat_update_dir_chksum iterates over the deleted entries and computes an incorrect checksum. This does not lead to persistent corruption because mark_inode_dirty() is called afterward, and __exfat_write_inode later recomputes the checksum using the correct num_entries value. Fix by setting es->num_entries = num_entries in exfat_init_ext_entry. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-31exfat: fix s_maxbytesDavid Timber3-3/+10
With fallocate support, xfstest unit generic/213 fails with QA output created by 213 We should get: fallocate: No space left on device Strangely, xfs_io sometimes says "Success" when something went wrong -fallocate: No space left on device +fallocate: File too large because sb->s_maxbytes is set to the volume size. To be in line with other non-extent-based filesystems, set to max volume size possible with the cluster size of the volume. Signed-off-by: David Timber <dxdt@dev.snart.me> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-26fs: Rename generic_file_fsync() to simple_fsync()Jan Kara1-1/+1
The implementation is now really basic so rename generic_file_fsync() simple_fsync() and __generic_file_fsync() to simple_fsync_noflush(). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-56-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-26exfat: Drop pointless invalidate_inode_buffers() callJan Kara1-1/+0
EXFAT never calls mark_buffer_dirty_inode() and thus invalidate_inode_buffers() never has anything to evict. Drop the pointless call. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-49-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-26exfat: fix passing zero to ERR_PTR() in exfat_mkdir()Yang Wen1-3/+4
Detected by Smatch. namei.c:890 exfat_mkdir() warn: passing zero to 'ERR_PTR' Signed-off-by: Yang Wen <anmuxixixi@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05exfat: fix error handling for FAT table operationsChi Zhiling5-10/+16
Fix three error handling issues in FAT table operations: 1. Fix exfat_update_bh() to properly return errors from sync_dirty_buffer 2. Fix exfat_end_bh() to properly return errors from exfat_update_bh() and exfat_mirror_bh() 3. Fix ignored return values from exfat_chain_cont_cluster() in inode.c and namei.c These fixes ensure that FAT table write errors are properly propagated to the caller instead of being silently ignored. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05exfat: optimize exfat_chain_cont_cluster with cached buffer headsChi Zhiling1-12/+37
When converting files from NO_FAT_CHAIN to FAT_CHAIN format, profiling reveals significant time spent in mark_buffer_dirty() and exfat_mirror_bh() operations. This overhead occurs because each FAT entry modification triggers a full block dirty marking and mirroring operation. For consecutive clusters that reside in the same block, optimize by caching the buffer head and performing dirty marking only once at the end of the block's modifications. Performance improvements for converting a 30GB file: | Cluster Size | Before Patch | After Patch | Speedup | |--------------|--------------|-------------|---------| | 512 bytes | 4.243s | 1.866s | 2.27x | | 4KB | 0.863s | 0.236s | 3.66x | | 32KB | 0.069s | 0.034s | 2.03x | | 256KB | 0.012s | 0.006s | 2.00x | Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05exfat: drop redundant sec parameter from exfat_mirror_bhChi Zhiling1-7/+4
The sector offset can be obtained from bh->b_blocknr, so drop the redundant sec parameter from exfat_mirror_bh(). Also clean up the function to use exfat_update_bh() helper. No functional changes. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05exfat: use readahead helper in exfat_get_dentryChi Zhiling1-38/+14
Replace the custom exfat_dir_readahead() function with the unified exfat_blk_readahead() helper in exfat_get_dentry(). This removes the duplicate readahead implementation and uses the common interface, also reducing code complexity. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05exfat: use readahead helper in exfat_allocate_bitmapChi Zhiling1-12/+6
Use the newly added exfat_blk_readahead() helper in exfat_allocate_bitmap() to simplify the code. This eliminates the duplicate inline readahead logic and uses the unified readahead interface. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-05exfat: add block readahead in exfat_chain_cont_clusterChi Zhiling2-2/+46
When a file cannot allocate contiguous clusters, exfat converts the file from NO_FAT_CHAIN to FAT_CHAIN format. For large files, this conversion process can take a significant amount of time. Add simple readahead to read all the FAT blocks in advance, as these blocks are consecutive, significantly improving the conversion performance. Test in an empty exfat filesystem: dd if=/dev/zero of=/mnt/file bs=1M count=30k dd if=/dev/zero of=/mnt/file2 bs=1M count=1 time cat /mnt/file2 >> /mnt/file | cluster size | before patch | after patch | | ------------ | ------------ | ----------- | | 512 | 47.667s | 4.316s | | 4k | 6.436s | 0.541s | | 32k | 0.758s | 0.071s | | 256k | 0.117s | 0.011s | Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-04exfat: add fallocate FALLOC_FL_ALLOCATE_RANGE supportDavid Timber1-0/+41
Currently, the Linux (ex)FAT drivers do not employ any cluster allocation strategy to keep fragmentation at bay. As a result, when multiple processes are competing for new clusters to expand files in exfat filesystem on Linux simultaneously, the files end up heavily fragmented. HDDs are most impacted, but this could also have some negative impact on various forms of flash memory depending on the type of underlying technology. For instance, modern digital cameras produce multiple media files for a single video stream. If the application does not take the fragmentation issue into account or the system is under memory pressure, the kernel end up allocating clusters in said files in a interleaved manner. Demo script: for (( i = 0; i < 4; i += 1 )); do dd if=/dev/urandom iflag=fullblock bs=1M count=64 of=frag-$i & done for (( i = 0; i < 4; i += 1 )); do wait done filefrag frag-* Result - Linux kernel native exfat, async mount: 780 extents found 740 extents found 809 extents found 712 extents found Result - Linux kernel native exfat, sync mount: 1852 extents found 1836 extents found 1846 extents found 1881 extents found Result - Windows XP: 3 extents found 3 extents found 3 extents found 2 extents found Windows kernel, on the other hand, regardless of the underlying storage interface or the medium, seems to space out clusters for each file. Similar strategy has to be employed by Linux fat filesystems for efficient utilisation of storage backend. In the meantime, userspace applications like rsync may use fallocate to combat this issue. This patch may introduce a regression-like behaviour to some niche filesystem-agnostic applications that use fallocate and proceed to non-sequentially write to the file. Examples: - libtorrent's use of posix_fallocate() and the first fragment from a peer is near the end of the file - "Download accelerators" that do partial content requests(HTTP 206) in multiple threads writing to the same file The delay incurred in such use cases is documented in WinAPI. Patches that add the ioctl equivalents to the WinAPI function SetFileValidData() and `fsutil file queryvaliddata ...` will follow. Signed-off-by: David Timber <dxdt@dev.snart.me> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-04exfat: Fix bitwise operation having different sizePhilipp Hahn1-1/+1
cpos has type loff_t (long long), while s_blocksize has type u32. The inversion wil happen on u32, the coercion to s64 happens afterwards and will do 0-left-paddding, resulting in the upper bits getting masked out. Cast s_blocksize to loff_t before negating it. Found by static code analysis using Klocwork. Signed-off-by: Philipp Hahn <phahn-oss@avm.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-03-04exfat: Drop dead assignment of num_clustersPhilipp Hahn1-1/+0
num_clusters is not used anywhere afterwards. Remove assignment. Found by static code analysis using Klocwork. Signed-off-by: Philipp Hahn <phahn-oss@avm.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-26exfat: use truncate_inode_pages_final() at evict_inode()Yang Wen1-1/+1
Currently, exfat uses truncate_inode_pages() in exfat_evict_inode(). However, truncate_inode_pages() does not mark the mapping as exiting, so reclaim may still install shadow entries for the mapping until the inode teardown completes. In older kernels like Linux 5.10, if shadow entries are present at that point,clear_inode() can hit BUG_ON(inode->i_data.nrexceptional); To align with VFS eviction semantics and prevent this situation, switch to truncate_inode_pages_final() in ->evict_inode(). Other filesystems were updated to use truncate_inode_pages_final() in ->evict_inode() by commit 91b0abe36a7b ("mm + fs: store shadow entries in page cache")'. Signed-off-by: Yang Wen <anmuxixixi@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds1-2/+1
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds1-1/+1
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook3-4/+4
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12exfat: add blank line after declarationsWilliam Hansen-Baird2-0/+2
Add a blank line after variable declarations in fatent.c and file.c. This improves readability and makes code style more consistent across the exfat subsystem. Signed-off-by: William Hansen-Baird <william.hansen.baird@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: remove unnecessary else after return statementWilliam Hansen-Baird1-2/+3
Else-branch is unnecessary after return statement in if-branch. Remove to enhance readability and reduce indentation. Signed-off-by: William Hansen-Baird <william.hansen.baird@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: support multi-cluster for exfat_get_clusterChi Zhiling3-8/+53
This patch introduces a count parameter to exfat_get_cluster, which serves as an input parameter for the caller to specify the desired number of clusters, and as an output parameter to store the length of consecutive clusters. This patch can improve read performance by reducing the number of get_block calls in sequential read scenarios. speacially in small cluster size. According to my test data, the performance improvement is approximately 10% when read FAT_CHAIN file with 512 bytes of cluster size. 454 MB/s -> 511 MB/s Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: return the start of next cache in exfat_cache_lookupChi Zhiling1-12/+37
Change exfat_cache_lookup to return the cluster number of the last cluster before the next cache (i.e., the end of the current cache range) or the given 'end' if there is no next cache. This allows the caller to know whether the next cluster after the current cache is cached. The function signature is changed to accept an 'end' parameter, which is the upper bound of the search range. The function now stops early if it finds a cache that starts within the current cache's tail, meaning caches are contiguous. The return value is the cluster number at which the next cache starts (minus one) or the original 'end' if no next cache is found. The new behavior is illustrated as follows: cache: [ccccccc-------ccccccccc] search: [..................] return: ^ Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: tweak cluster cache to support zero offsetChi Zhiling1-2/+2
The current cache mechanism does not support reading clusters starting from a file offset of zero. This patch enables that feature in preparation for subsequent reads of contiguous clusters from offset zero. 1. support finding clusters with zero offset. 2. allow clusters with zero offset to be cached. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: support multi-cluster for exfat_map_clusterChi Zhiling1-13/+17
This patch introduces a parameter 'count' to support fetching multiple clusters in exfat_map_cluster. The returned 'count' indicates the number of consecutive clusters, or 0 when the input cluster offset is past EOF. And the 'count' is also an input parameter for the caller to specify the required number of clusters. Only NO_FAT_CHAIN files enable multi-cluster fetching in this patch. After this patch, the time proportion of exfat_get_block has decreased, The performance data is as follows: Cluster size: 512 bytes Sequential read of a 30GB NO_FAT_CHAIN file: 2.4GB/s -> 2.5 GB/s proportion of exfat_get_block: 10.8% -> 0.02% Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: remove handling of non-file types in exfat_map_clusterChi Zhiling1-17/+1
Yuezhang said: "exfat_map_cluster() is only used for files. The code in this 'else' block is never executed and can be cleaned up." Suggested-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: reuse cache to improve exfat_get_clusterChi Zhiling1-1/+3
Since exfat_ent_get supports cache buffer head, we can use this option to reduce sb_bread calls when fetching consecutive entries. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: reduce the number of parameters for exfat_get_cluster()Chi Zhiling3-24/+11
Remove parameter 'fclus' and 'allow_eof': - The fclus parameter is changed to a local variable as it is not needed to be returned. - The passed allow_eof parameter was always 1, remove it and the associated error handling. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: remove the unreachable warning for cache miss casesChi Zhiling1-12/+1
The cache_id remains unchanged on a cache miss; its value is always exactly what was set by cache_init. Therefore, checking this value again is meaningless. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: remove the check for infinite cluster chain loopChi Zhiling1-10/+0
The infinite cluster chain loop check is not work because the loop will terminate when fclus reaches the parameter cluster, and the parameter cluster value is never greater than ei->valid_size. The following relationship holds: 'fclus' < 'cluster' ≤ ei->valid_size ≤ sb->num_clusters The check would only be triggered if a cluster number greater than sb->num_clusters is passed, but no caller currently does this. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: improve exfat_find_last_clusterChi Zhiling1-1/+3
Since exfat_ent_get support cache buffer head, let's apply it to exfat_find_last_cluster. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: improve exfat_count_num_clustersChi Zhiling1-1/+3
Since exfat_ent_get support cache buffer head, let's apply it to exfat_count_num_clusters. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: support reuse buffer head for exfat_ent_getChi Zhiling3-18/+27
This patch is part 2 of cached buffer head for exfat_ent_get, it introduces an argument for exfat_ent_get, and make sure this routine releases buffer head refcount when any error return. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: add cache option for __exfat_ent_getChi Zhiling1-7/+13
When multiple entries are obtained consecutively, these entries are mostly stored adjacent to each other. this patch introduces a "last" parameter to cache the last opened buffer head, and reuse it when possible, which reduces the number of sb_bread() calls. When the passed parameter "last" is NULL, it means cache option is disabled, the behavior unchanged as it was. Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: reduce unnecessary writes during mmap writeYuling Dong1-9/+6
During mmap write, exfat_page_mkwrite() currently extends valid_size to the end of the VMA range. For a large mapping, this can push valid_size far beyond the page that actually triggered the fault, resulting in unnecessary writes. valid_size only needs to extend to the end of the page being written. Signed-off-by: Yuling Dong <yuling-dong@qq.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-02-12exfat: improve error code handling in exfat_find_empty_entry()Haotian Zhang1-2/+2
Change the type of 'ret' from unsigned int to int in exfat_find_empty_entry(). Although the implicit type conversion (int -> unsigned int -> int) does not cause actual bugs in practice, using int directly is more appropriate for storing error codes returned by exfat_alloc_cluster(). This improves code clarity and consistency with standard error handling practices. Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2026-01-12exfat: add setlease file operationJeff Layton2-0/+4
Add the setlease file_operation to exfat_file_operations and exfat_dir_operations, pointing to generic_setlease. A future patch will change the default behavior to reject lease attempts with -EINVAL when there is no setlease file operation defined. Add generic_setlease to retain the ability to set leases on this filesystem. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-7-ea4dec9b67fa@kernel.org Acked-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-03exfat: fix remount failure in different process environmentsYuezhang Mo1-4/+15
The kernel test robot reported that the exFAT remount operation failed. The reason for the failure was that the process's umask is different between mount and remount, causing fs_fmask and fs_dmask are changed. Potentially, both gid and uid may also be changed. Therefore, when initializing fs_context for remount, inherit these mount options from the options used during mount. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202511251637.81670f5c-lkp@intel.com Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03exfat: fix divide-by-zero in exfat_allocate_bitmapNamjae Jeon1-1/+1
The variable max_ra_count can be 0 in exfat_allocate_bitmap(), which causes a divide-by-zero error in the subsequent modulo operation (i % max_ra_count), leading to a system crash. When max_ra_count is 0, it means that readahead is not used. This patch load the bitmap without readahead. Fixes: 9fd688678dd8 ("exfat: optimize allocation bitmap loading time") Reported-by: Jiaming Zhang <r772577952@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03exfat: validate the cluster bitmap bits of directoryNamjae Jeon5-9/+46
Syzbot created this issue by testing an image that did not have the root cluster bitmap bit marked. After accessing a file through the root directory via exfat_lookup, when creating a file again with mkdir, the root cluster bit can be allocated for direcotry, which can cause the root cluster to be zeroed out and the same entry can be allocated in the same cluster. This patch improved this issue by adding exfat_test_bitmap to validate the cluster bits of the root directory and directory. And the first cluster bit of the root directory should never be unset except when storage is corrupted. This bit is set to allow operations after mount. Reported-by: syzbot+5216036fc59c43d1ee02@syzkaller.appspotmail.com Tested-by: syzbot+5216036fc59c43d1ee02@syzkaller.appspotmail.com Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03exfat: zero out post-EOF page cache on file extensionYuezhang Mo1-0/+5
xfstests generic/363 was failing due to unzeroed post-EOF page cache that allowed mmap writes beyond EOF to become visible after file extension. For example, in following xfs_io sequence, 0x22 should not be written to the file but would become visible after the extension: xfs_io -f -t -c "pwrite -S 0x11 0 8" \ -c "mmap 0 4096" \ -c "mwrite -S 0x22 32 32" \ -c "munmap" \ -c "pwrite -S 0x33 512 32" \ $testfile This violates the expected behavior where writes beyond EOF via mmap should not persist after the file is extended. Instead, the extended region should contain zeros. Fix this by using truncate_pagecache() to truncate the page cache after the current EOF when extending the file. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-12-03exfat: fix refcount leak in exfat_findShuhao Fu1-10/+10
Fix refcount leaks in `exfat_find` related to `exfat_get_dentry_set`. Function `exfat_get_dentry_set` would increase the reference counter of `es->bh` on success. Therefore, `exfat_put_dentry_set` must be called after `exfat_get_dentry_set` to ensure refcount consistency. This patch relocate two checks to avoid possible leaks. Fixes: 82ebecdc74ff ("exfat: fix improper check of dentry.stream.valid_size") Fixes: 13940cef9549 ("exfat: add a check for invalid data size") Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-11-17Merge tag 'vfs-6.18-rc7.fixes' of ↵Linus Torvalds1-1/+4
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix unitialized variable in statmount_string() - Fix hostfs mounting when passing host root during boot - Fix dynamic lookup to fail on cell lookup failure - Fix missing file type when reading bfs inodes from disk - Enforce checking of sb_min_blocksize() calls and update all callers accordingly - Restore write access before closing files opened by open_exec() in binfmt_misc - Always freeze efivarfs during suspend/hibernate cycles - Fix statmount()'s and listmount()'s grab_requested_mnt_ns() helper to actually allow mount namespace file descriptor in addition to mount namespace ids - Fix tmpfs remount when noswap is specified - Switch Landlock to iput_not_last() to remove false-positives from might_sleep() annotations in iput() - Remove dead node_to_mnt_ns() code - Ensure that per-queue kobjects are successfully created * tag 'vfs-6.18-rc7.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: landlock: fix splats from iput() after it started calling might_sleep() fs: add iput_not_last() shmem: fix tmpfs reconfiguration (remount) when noswap is set fs/namespace: correctly handle errors returned by grab_requested_mnt_ns power: always freeze efivarfs binfmt_misc: restore write access before closing files opened by open_exec() block: add __must_check attribute to sb_min_blocksize() virtio-fs: fix incorrect check for fsvq->kobj xfs: check the return value of sb_min_blocksize() in xfs_fs_fill_super isofs: check the return value of sb_min_blocksize() in isofs_fill_super exfat: check return value of sb_min_blocksize in exfat_read_boot_sector vfat: fix missing sb_min_blocksize() return value checks mnt: Remove dead code which might prevent from building bfs: Reconstruct file type when loading from disk afs: Fix dynamic lookup to fail on cell lookup failure hostfs: Fix only passing host root in boot stage with new mount fs: Fix uninitialized 'offp' in statmount_string()
2025-11-05exfat: check return value of sb_min_blocksize in exfat_read_boot_sectorYongpeng Yang1-1/+4
sb_min_blocksize() may return 0. Check its return value to avoid accessing the filesystem super block when sb->s_blocksize is 0. Cc: stable@vger.kernel.org # v6.15 Fixes: 719c1e1829166d ("exfat: add super block operations") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Link: https://patch.msgid.link/20251104125009.2111925-3-yangyongpeng.storage@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-15exfat: fix out-of-bounds in exfat_nls_to_ucs2()Jeongjun Park4-8/+5
Since the len argument value passed to exfat_ioctl_set_volume_label() from exfat_nls_to_utf16() is passed 1 too large, an out-of-bounds read occurs when dereferencing p_cstring in exfat_nls_to_ucs2() later. And because of the NLS_NAME_OVERLEN macro, another error occurs when creating a file with a period at the end using utf8 and other iocharsets. So to avoid this, you should remove the code that uses NLS_NAME_OVERLEN macro and make the len argument value be the length of the label string, but with a maximum length of FSLABEL_MAX - 1. Reported-by: syzbot+98cc76a76de46b3714d4@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=98cc76a76de46b3714d4 Fixes: d01579d590f7 ("exfat: Add support for FS_IOC_{GET,SET}FSLABEL") Suggested-by: Pali Rohár <pali@kernel.org> Signed-off-by: Jeongjun Park <aha310510@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-10-15exfat: fix improper check of dentry.stream.valid_sizeJaehun Gou1-1/+5
We found an infinite loop bug in the exFAT file system that can lead to a Denial-of-Service (DoS) condition. When a dentry in an exFAT filesystem is malformed, the following system calls — SYS_openat, SYS_ftruncate, and SYS_pwrite64 — can cause the kernel to hang. Root cause analysis shows that the size validation code in exfat_find() does not check whether dentry.stream.valid_size is negative. As a result, the system calls mentioned above can succeed and eventually trigger the DoS issue. This patch adds a check for negative dentry.stream.valid_size to prevent this vulnerability. Co-developed-by: Seunghun Han <kkamagui@gmail.com> Signed-off-by: Seunghun Han <kkamagui@gmail.com> Co-developed-by: Jihoon Kwon <jimmyxyz010315@gmail.com> Signed-off-by: Jihoon Kwon <jimmyxyz010315@gmail.com> Signed-off-by: Jaehun Gou <p22gone@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-10-03Merge tag 'exfat-for-6.18-rc1' of ↵Linus Torvalds10-35/+360
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Add support for FS_IOC_{GET,SET}FSLABEL ioctl - Two small clean-up patches - Optimizes allocation bitmap loading time on large partitions with small cluster sizes - Allow changes for discard, zero_size_dir, and errors options via remount - Validate that the clusters used for the allocation bitmap are correctly marked as in-use during mount, preventing potential data corruption from reallocating the bitmap's own space - Uses ratelimit to avoid too many error prints on I/O error path * tag 'exfat-for-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: Add support for FS_IOC_{GET,SET}FSLABEL exfat: combine iocharset and utf8 option setup exfat: support modifying mount options via remount exfat: optimize allocation bitmap loading time exfat: Remove unnecessary parentheses exfat: drop redundant conversion to bool exfat: validate cluster allocation bits of the allocation bitmap exfat: limit log print for IO error
2025-09-30exfat: Add support for FS_IOC_{GET,SET}FSLABELEthan Ferguson5-1/+226
Add support for reading / writing to the exfat volume label from the FS_IOC_GETFSLABEL and FS_IOC_SETFSLABEL ioctls Co-developed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Ethan Ferguson <ethan.ferguson@zetier.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-09-30exfat: combine iocharset and utf8 option setupSang-Heon Jeon1-9/+15
Currently, exfat utf8 mount option depends on the iocharset option value. After exfat remount, utf8 option may become inconsistent with iocharset option. If the options are inconsistent; (specifically, iocharset=utf8 but utf8=0) readdir may reference uninitalized NLS, leading to a null pointer dereference. Extract and combine utf8/iocharset setup logic into exfat_set_iocharset(). Then Replace iocharset setup logic to exfat_set_iocharset to prevent utf8/iocharset option inconsistentcy after remount. Reported-by: syzbot+3e9cb93e3c5f90d28e19@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3e9cb93e3c5f90d28e19 Signed-off-by: Sang-Heon Jeon <ekffu200098@gmail.com> Fixes: acab02ffcd6b ("exfat: support modifying mount options via remount") Tested-by: syzbot+3e9cb93e3c5f90d28e19@syzkaller.appspotmail.com Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-09-30exfat: support modifying mount options via remountYuezhang Mo1-6/+38
Before this commit, all exfat-defined mount options could not be modified dynamically via remount, and no error was returned. After this commit, these three exfat-defined mount options (discard, zero_size_dir, and errors) can be modified dynamically via remount. While other exfat-defined mount options cannot be modified dynamically via remount because their old settings are cached in inodes or dentries, modifying them will be rejected with an error. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-09-30exfat: optimize allocation bitmap loading timeNamjae Jeon1-0/+13
Loading the allocation bitmap is very slow if user set the small cluster size on large partition. For optimizing it, This patch uses sb_breadahead() read the allocation bitmap. It will improve the mount time. The following is the result of about 4TB partition(2KB cluster size) on my target. without patch: real 0m41.746s user 0m0.011s sys 0m0.000s with patch: real 0m2.525s user 0m0.008s sys 0m0.008s Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-09-30exfat: Remove unnecessary parenthesesLiao Yuanhong1-1/+1
When using &, it's unnecessary to have parentheses afterward. Remove redundant parentheses to enhance readability. Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-09-30exfat: drop redundant conversion to boolXichao Zhao1-1/+1
The result of integer comparison already evaluates to bool. No need for explicit conversion. No functional impact. Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-09-30exfat: validate cluster allocation bits of the allocation bitmapNamjae Jeon1-12/+60
syzbot created an exfat image with cluster bits not set for the allocation bitmap. exfat-fs reads and uses the allocation bitmap without checking this. The problem is that if the start cluster of the allocation bitmap is 6, cluster 6 can be allocated when creating a directory with mkdir. exfat zeros out this cluster in exfat_mkdir, which can delete existing entries. This can reallocate the allocated entries. In addition, the allocation bitmap is also zeroed out, so cluster 6 can be reallocated. This patch adds exfat_test_bitmap_range to validate that clusters used for the allocation bitmap are correctly marked as in-use. Reported-by: syzbot+a725ab460fc1def9896f@syzkaller.appspotmail.com Tested-by: syzbot+a725ab460fc1def9896f@syzkaller.appspotmail.com Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-09-30exfat: limit log print for IO errorChi Zhiling1-5/+6
For exFAT filesystems with 4MB read_ahead_size, removing the storage device when the read operation is in progress, which cause the last read syscall spent 150s [1]. The main reason is that exFAT generates excessive log messages [2]. After applying this patch, approximately 300,000 lines of log messages were suppressed, and the delay of the last read() syscall was reduced to about 4 seconds. [1]: write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000120> read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000032> write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 131072 <0.000119> read(4, 0x7fccf28ae000, 131072) = -1 EIO (Input/output error) <150.186215> [2]: [ 333.696603] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5) [ 333.697378] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5) [ 333.698156] exFAT-fs (vdb): error, failed to access to FAT (entry 0x0000d780, err:-5) Signed-off-by: Chi Zhiling <chizhiling@kylinos.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-09-15exfat_find(): constify qstr argumentAl Viro1-1/+1
Nothing outside of fs/dcache.c has any business modifying dentry names; passing &dentry->d_name as an argument should have that argument declared as a const pointer. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-08-01exfat: add cluster chain loop check for dirYuezhang Mo4-11/+48
An infinite loop may occur if the following conditions occur due to file system corruption. (1) Condition for exfat_count_dir_entries() to loop infinitely. - The cluster chain includes a loop. - There is no UNUSED entry in the cluster chain. (2) Condition for exfat_create_upcase_table() to loop infinitely. - The cluster chain of the root directory includes a loop. - There are no UNUSED entry and up-case table entry in the cluster chain of the root directory. (3) Condition for exfat_load_bitmap() to loop infinitely. - The cluster chain of the root directory includes a loop. - There are no UNUSED entry and bitmap entry in the cluster chain of the root directory. (4) Condition for exfat_find_dir_entry() to loop infinitely. - The cluster chain includes a loop. - The unused directory entries were exhausted by some operation. (5) Condition for exfat_check_dir_empty() to loop infinitely. - The cluster chain includes a loop. - The unused directory entries were exhausted by some operation. - All files and sub-directories under the directory are deleted. This commit adds checks to break the above infinite loop. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-08-01exfat: fdatasync flag should be same like generic_write_sync()Zhengxu Zhang1-3/+2
Test: androbench by default setting, use 64GB sdcard. the random write speed: without this patch 3.5MB/s with this patch 7MB/s After patch "11a347fb6cef", the random write speed decreased significantly. the .write_iter() interface had been modified, and check the differences with generic_file_write_iter(), when calling generic_write_sync() and exfat_file_write_iter() to call vfs_fsync_range(), the fdatasync flag is wrong, and make not use the fdatasync mode, and make random write speed decreased. So use generic_write_sync() instead of vfs_fsync_range(). Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Signed-off-by: Zhengxu Zhang <zhengxu.zhang@unisoc.com> Acked-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-07-28Merge tag 'vfs-6.17-rc1.mmap_prepare' of ↵Linus Torvalds1-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull mmap_prepare updates from Christian Brauner: "Last cycle we introduce f_op->mmap_prepare() in c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"). This is preferred to the existing f_op->mmap() hook as it does require a VMA to be established yet, thus allowing the mmap logic to invoke this hook far, far earlier, prior to inserting a VMA into the virtual address space, or performing any other heavy handed operations. This allows for much simpler unwinding on error, and for there to be a single attempt at merging a VMA rather than having to possibly reattempt a merge based on potentially altered VMA state. Far more importantly, it prevents inappropriate manipulation of incompletely initialised VMA state, which is something that has been the cause of bugs and complexity in the past. The intent is to gradually deprecate f_op->mmap, and in that vein this series coverts the majority of file systems to using f_op->mmap_prepare. Prerequisite steps are taken - firstly ensuring all checks for mmap capabilities use the file_has_valid_mmap_hooks() helper rather than directly checking for f_op->mmap (which is now not a valid check) and secondly updating daxdev_mapping_supported() to not require a VMA parameter to allow ext4 and xfs to be converted. Commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") handles the nasty edge-case of nested file systems like overlayfs, which introduces a compatibility shim to allow f_op->mmap_prepare() to be invoked from an f_op->mmap() callback. This allows for nested filesystems to continue to function correctly with all file systems regardless of which callback is used. Once we finally convert all file systems, this shim can be removed. As a result, ecryptfs, fuse, and overlayfs remain unaltered so they can nest all other file systems. We additionally do not update resctl - as this requires an update to remap_pfn_range() (or an alternative to it) which we defer to a later series, equally we do not update cramfs which needs a mixed mapping insertion with the same issue, nor do we update procfs, hugetlbfs, syfs or kernfs all of which require VMAs for internal state and hooks. We shall return to all of these later" * tag 'vfs-6.17-rc1.mmap_prepare' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: update porting, vfs documentation to describe mmap_prepare() fs: replace mmap hook with .mmap_prepare for simple mappings fs: convert most other generic_file_*mmap() users to .mmap_prepare() fs: convert simple use of generic_file_*_mmap() to .mmap_prepare() mm/filemap: introduce generic_file_*_mmap_prepare() helpers fs/xfs: transition from deprecated .mmap hook to .mmap_prepare fs/ext4: transition from deprecated .mmap hook to .mmap_prepare fs/dax: make it possible to check dev dax support without a VMA fs: consistently use can_mmap_file() helper mm/nommu: use file_has_valid_mmap_hooks() helper mm: rename call_mmap/mmap_prepare to vfs_mmap/mmap_prepare
2025-07-28Merge tag 'vfs-6.17-rc1.misc' of ↵Linus Torvalds2-13/+14
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc VFS updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add ext4 IOCB_DONTCACHE support This refactors the address_space_operations write_begin() and write_end() callbacks to take const struct kiocb * as their first argument, allowing IOCB flags such as IOCB_DONTCACHE to propagate to the filesystem's buffered I/O path. Ext4 is updated to implement handling of the IOCB_DONTCACHE flag and advertises support via the FOP_DONTCACHE file operation flag. Additionally, the i915 driver's shmem write paths are updated to bypass the legacy write_begin/write_end interface in favor of directly calling write_iter() with a constructed synchronous kiocb. Another i915 change replaces a manual write loop with kernel_write() during GEM shmem object creation. Cleanups: - don't duplicate vfs_open() in kernel_file_open() - proc_fd_getattr(): don't bother with S_ISDIR() check - fs/ecryptfs: replace snprintf with sysfs_emit in show function - vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() - filelock: add new locks_wake_up_waiter() helper - fs: Remove three arguments from block_write_end() - VFS: change old_dir and new_dir in struct renamedata to dentrys - netfs: Remove unused declaration netfs_queue_write_request() Fixes: - eventpoll: Fix semi-unbounded recursion - eventpoll: fix sphinx documentation build warning - fs/read_write: Fix spelling typo - fs: annotate data race between poll_schedule_timeout() and pollwake() - fs/pipe: set FMODE_NOWAIT in create_pipe_files() - docs/vfs: update references to i_mutex to i_rwsem - fs/buffer: remove comment about hard sectorsize - fs/buffer: remove the min and max limit checks in __getblk_slow() - fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable - fs_context: fix parameter name in infofc() macro - fs: Prevent file descriptor table allocations exceeding INT_MAX" * tag 'vfs-6.17-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (24 commits) netfs: Remove unused declaration netfs_queue_write_request() eventpoll: fix sphinx documentation build warning ext4: support uncached buffered I/O mm/pagemap: add write_begin_get_folio() helper function fs: change write_begin/write_end interface to take struct kiocb * drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter drm/i915: Use kernel_write() in shmem object create eventpoll: Fix semi-unbounded recursion vfs: Remove unnecessary list_for_each_entry_safe() from evict_inodes() fs/libfs: don't assume blocksize <= PAGE_SIZE in generic_check_addressable fs/buffer: remove the min and max limit checks in __getblk_slow() fs: Prevent file descriptor table allocations exceeding INT_MAX fs: Remove three arguments from block_write_end() fs/ecryptfs: replace snprintf with sysfs_emit in show function fs: annotate suspected data race between poll_schedule_timeout() and pollwake() docs/vfs: update references to i_mutex to i_rwsem fs/buffer: remove comment about hard sectorsize fs_context: fix parameter name in infofc() macro VFS: change old_dir and new_dir in struct renamedata to dentrys proc_fd_getattr(): don't bother with S_ISDIR() check ...
2025-07-16fs: change write_begin/write_end interface to take struct kiocb *Taotao Chen2-13/+14
Change the address_space_operations callbacks write_begin() and write_end() to take struct kiocb * as the first argument instead of struct file *. Update all affected function prototypes, implementations, call sites, and related documentation across VFS, filesystems, and block layer. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-19fs: replace mmap hook with .mmap_prepare for simple mappingsLorenzo Stoakes1-4/+6
Since commit c84bf6dd2b83 ("mm: introduce new .mmap_prepare() file callback"), the f_op->mmap() hook has been deprecated in favour of f_op->mmap_prepare(). This callback is invoked in the mmap() logic far earlier, so error handling can be performed more safely without complicated and bug-prone state unwinding required should an error arise. This hook also avoids passing a pointer to a not-yet-correctly-established VMA avoiding any issues with referencing this data structure. It rather provides a pointer to the new struct vm_area_desc descriptor type which contains all required state and allows easy setting of required parameters without any consideration needing to be paid to locking or reference counts. Note that nested filesystems like overlayfs are compatible with an .mmap_prepare() callback since commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems"). In this patch we apply this change to file systems with relatively simple mmap() hook logic - exfat, ceph, f2fs, bcachefs, zonefs, btrfs, ocfs2, orangefs, nilfs2, romfs, ramfs and aio. Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/f528ac4f35b9378931bd800920fee53fc0c5c74d.1750099179.git.lorenzo.stoakes@oracle.com Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-06-10new helper: set_default_d_op()Al Viro1-2/+2
... to be used instead of manually assigning to ->s_d_op. All in-tree filesystem converted (and field itself is renamed, so any out-of-tree ones in need of conversion will be caught by compiler). Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-05-26exfat: do not clear volume dirty flag during syncYuezhang Mo1-23/+7
xfstests generic/482 tests the file system consistency after each FUA operation. It fails when run on exfat. exFAT clears the volume dirty flag with a FUA operation during sync. Since s_lock is not held when data is being written to a file, sync can be executed at the same time. When data is being written to a file, the FAT chain is updated first, and then the file size is updated. If sync is executed between updating them, the length of the FAT chain may be inconsistent with the file size. To avoid the situation where the file system is inconsistent but the volume dirty flag is cleared, this commit moves the clearing of the volume dirty flag from exfat_fs_sync() to exfat_put_super(), so that the volume dirty flag is not cleared until unmounting. After the move, there is no additional action during sync, so exfat_fs_sync() can be deleted. Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-05-26exfat: fix double free in delayed_freeNamjae Jeon1-0/+1
The double free could happen in the following path. exfat_create_upcase_table() exfat_create_upcase_table() : return error exfat_free_upcase_table() : free ->vol_utbl exfat_load_default_upcase_table : return error exfat_kill_sb() delayed_free() exfat_free_upcase_table() <--------- double free This patch set ->vol_util as NULL after freeing it. Reported-by: Jianzhou Zhao <xnxc22xnxc22@qq.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-29exfat: call bh_read in get_block only when necessarySungjong Seo1-81/+76
With commit 11a347fb6cef ("exfat: change to get file size from DataLength"), exfat_get_block() can now handle valid_size. However, most partial unwritten blocks that could be mapped with other blocks are being inefficiently processed separately as individual blocks. Except for partial unwritten blocks that require independent processing, let's handle them simply as before. Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-29exfat: fix potential wrong error return from get_blockSungjong Seo1-0/+2
If there is no error, get_block() should return 0. However, when bh_read() returns 1, get_block() also returns 1 in the same manner. Let's set err to 0, if there is no error from bh_read() Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Cc: stable@vger.kernel.org Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-27exfat: fix missing shutdown checkYuezhang Mo1-2/+27
xfstests generic/730 test failed because after deleting the device that still had dirty data, the file could still be read without returning an error. The reason is the missing shutdown check in ->read_iter. I also noticed that shutdown checks were missing from ->write_iter, ->splice_read, and ->mmap. This commit adds shutdown checks to all of them. Fixes: f761fcdd289d ("exfat: Implement sops->shutdown and ioctl") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-27exfat: fix the infinite loop in exfat_find_last_cluster()Yuezhang Mo1-1/+1
In exfat_find_last_cluster(), the cluster chain is traversed until the EOF cluster. If the cluster chain includes a loop due to file system corruption, the EOF cluster cannot be traversed, resulting in an infinite loop. If the number of clusters indicated by the file size is inconsistent with the cluster chain length, exfat_find_last_cluster() will return an error, so if this inconsistency is found, the traversal can be aborted without traversing to the EOF cluster. Reported-by: syzbot+f7d147e6db52b1e09dba@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f7d147e6db52b1e09dba Tested-by: syzbot+f7d147e6db52b1e09dba@syzkaller.appspotmail.com Fixes: 31023864e67a ("exfat: add fat entry operations") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-27exfat: fix random stack corruption after get_blockSungjong Seo1-6/+33
When get_block is called with a buffer_head allocated on the stack, such as do_mpage_readpage, stack corruption due to buffer_head UAF may occur in the following race condition situation. <CPU 0> <CPU 1> mpage_read_folio <<bh on stack>> do_mpage_readpage exfat_get_block bh_read __bh_read get_bh(bh) submit_bh wait_on_buffer ... end_buffer_read_sync __end_buffer_read_notouch unlock_buffer <<keep going>> ... ... ... ... <<bh is not valid out of mpage_read_folio>> . . another_function <<variable A on stack>> put_bh(bh) atomic_dec(bh->b_count) * stack corruption here * This patch returns -EAGAIN if a folio does not have buffers when bh_read needs to be called. By doing this, the caller can fallback to functions like block_read_full_folio(), create a buffer_head in the folio, and then call get_block again. Let's do not call bh_read() with on-stack buffer_head. Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Cc: stable@vger.kernel.org Tested-by: Yeongjin Gil <youngjin.gil@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-27exfat: remove count used cluster from exfat_statfs()Yuezhang Mo2-12/+0
The callback function statfs() is called only after the file system is mounted. During the process of mounting the exFAT file system, the number of used clusters has been counted, so the condition "sbi->used_clusters == EXFAT_CLUSTERS_UNTRACKED" is always false and should be deleted. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-27exfat: support batch discard of clusters when freeing clustersYuezhang Mo2-14/+29
If the discard mount option is enabled, the file's clusters are discarded when the clusters are freed. Discarding clusters one by one will significantly reduce performance. Poor performance may cause soft lockup when lots of clusters are freed. This commit improves performance by discarding contiguous clusters in batches. Measure the performance by: # truncate -s 80G /mnt/file # time rm /mnt/file Without this commit: real 4m46.183s user 0m0.000s sys 0m12.863s With this commit: real 0m1.661s user 0m0.000s sys 0m0.017s Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-24Merge tag 'vfs-6.15-rc1.async.dir' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs async dir updates from Christian Brauner: "This contains cleanups that fell out of the work from async directory handling: - Change kern_path_locked() and user_path_locked_at() to never return a negative dentry. This simplifies the usability of these helpers in various places - Drop d_exact_alias() from the remaining place in NFS where it is still used. This also allows us to drop the d_exact_alias() helper completely - Drop an unnecessary call to fh_update() from nfsd_create_locked() - Change i_op->mkdir() to return a struct dentry Change vfs_mkdir() to return a dentry provided by the filesystems which is hashed and positive. This allows us to reduce the number of cases where the resulting dentry is not positive to very few cases. The code in these places becomes simpler and easier to understand. - Repack DENTRY_* and LOOKUP_* flags" * tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: doc: fix inline emphasis warning VFS: Change vfs_mkdir() to return the dentry. nfs: change mkdir inode_operation to return alternate dentry if needed. fuse: return correct dentry for ->mkdir ceph: return the correct dentry on mkdir hostfs: store inode in dentry after mkdir if possible. Change inode_operations.mkdir to return struct dentry * nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked() nfs/vfs: discard d_exact_alias() VFS: add common error checks to lookup_one_qstr_excl() VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry VFS: repack LOOKUP_ bit flags. VFS: repack DENTRY_ flags.
2025-03-05exfat: add a check for invalid data sizeYuezhang Mo1-0/+5
Add a check for invalid data size to avoid corrupted filesystem from being further corrupted. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05exfat: short-circuit zero-byte writes in exfat_file_write_iterEric Sandeen1-1/+1
When generic_write_checks() returns zero, it means that iov_iter_count() is zero, and there is no work to do. Simply return success like all other filesystems do, rather than proceeding down the write path, which today yields an -EFAULT in generic_perform_write() via the (fault_in_iov_iter_readable(i, bytes) == bytes) check when bytes == 0. Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Reported-by: Noah <kernel-org-10@maxgrass.eu> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05exfat: fix soft lockup in exfat_clear_bitmapNamjae Jeon3-7/+16
bitmap clear loop will take long time in __exfat_free_cluster() if data size of file/dir enty is invalid. If cluster bit in bitmap is already clear, stop clearing bitmap go to out of loop. Fixes: 31023864e67a ("exfat: add fat entry operations") Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-03-05exfat: fix just enough dentries but allocate a new cluster to dirYuezhang Mo1-1/+1
This commit fixes the condition for allocating cluster to parent directory to avoid allocating new cluster to parent directory when there are just enough empty directory entries at the end of the parent directory. Fixes: af02c72d0b62 ("exfat: convert exfat_find_empty_entry() to use dentry cache") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2025-02-27Change inode_operations.mkdir to return struct dentry *NeilBrown1-4/+4
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have complete control of sequencing on the actual filesystem (e.g. on a different server) and may find that the inode created for a mkdir request already exists in the icache and dcache by the time the mkdir request returns. For example, if the filesystem is mounted twice the directory could be visible on the other mount before it is on the original mount, and a pair of name_to_handle_at(), open_by_handle_at() calls could instantiate the directory inode with an IS_ROOT() dentry before the first mkdir returns. This means that the dentry passed to ->mkdir() may not be the one that is associated with the inode after the ->mkdir() completes. Some callers need to interact with the inode after the ->mkdir completes and they currently need to perform a lookup in the (rare) case that the dentry is no longer hashed. This lookup-after-mkdir requires that the directory remains locked to avoid races. Planned future patches to lock the dentry rather than the directory will mean that this lookup cannot be performed atomically with the mkdir. To remove this barrier, this patch changes ->mkdir to return the resulting dentry if it is different from the one passed in. Possible returns are: NULL - the directory was created and no other dentry was used ERR_PTR() - an error occurred non-NULL - this other dentry was spliced in This patch only changes file-systems to return "ERR_PTR(err)" instead of "err" or equivalent transformations. Subsequent patches will make further changes to some file-systems to return a correct dentry. Not all filesystems reliably result in a positive hashed dentry: - NFS, cifs, hostfs will sometimes need to perform a lookup of the name to get inode information. Races could result in this returning something different. Note that this lookup is non-atomic which is what we are trying to avoid. Placing the lookup in filesystem code means it only happens when the filesystem has no other option. - kernfs and tracefs leave the dentry negative and the ->revalidate operation ensures that lookup will be called to correctly populate the dentry. This could be fixed but I don't think it is important to any of the users of vfs_mkdir() which look at the dentry. The recommendation to use d_drop();d_splice_alias() is ugly but fits with current practice. A planned future patch will change this. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: NeilBrown <neilb@suse.de> Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-30Merge tag 'pull-revalidate' of ↵Linus Torvalds1-8/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs d_revalidate updates from Al Viro: "Provide stable parent and name to ->d_revalidate() instances Most of the filesystem methods where we care about dentry name and parent have their stability guaranteed by the callers; ->d_revalidate() is the major exception. It's easy enough for callers to supply stable values for expected name and expected parent of the dentry being validated. That kills quite a bit of boilerplate in ->d_revalidate() instances, along with a bunch of races where they used to access ->d_name without sufficient precautions" * tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: 9p: fix ->rename_sem exclusion orangefs_d_revalidate(): use stable parent inode and name passed by caller ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller nfs: fix ->d_revalidate() UAF on ->d_name accesses nfs{,4}_lookup_validate(): use stable parent inode passed by caller gfs2_drevalidate(): use stable parent inode and name passed by caller fuse_dentry_revalidate(): use stable parent inode and name passed by caller vfat_revalidate{,_ci}(): use stable parent inode passed by caller exfat_d_revalidate(): use stable parent inode passed by caller fscrypt_d_revalidate(): use stable parent inode passed by caller ceph_d_revalidate(): propagate stable name down into request encoding ceph_d_revalidate(): use stable parent inode passed by caller afs_d_revalidate(): use stable name and parent inode passed by caller Pass parent directory inode and expected name to ->d_revalidate() generic_ci_d_compare(): use shortname_storage ext4 fast_commit: make use of name_snapshot primitives dissolve external_name.u into separate members make take_dentry_name_snapshot() lockless dcache: back inline names with a struct-wrapped array of unsigned long make sure that DNAME_INLINE_LEN is a multiple of word size
2025-01-27exfat_d_revalidate(): use stable parent inode passed by callerAl Viro1-7/+1
... no need to bother with ->d_lock and ->d_parent->d_inode. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-27Pass parent directory inode and expected name to ->d_revalidate()Al Viro1-1/+2
->d_revalidate() often needs to access dentry parent and name; that has to be done carefully, since the locking environment varies from caller to caller. We are not guaranteed that dentry in question will not be moved right under us - not unless the filesystem is such that nothing on it ever gets renamed. It can be dealt with, but that results in boilerplate code that isn't even needed - the callers normally have just found the dentry via dcache lookup and want to verify that it's in the right place; they already have the values of ->d_parent and ->d_name stable. There is a couple of exceptions (overlayfs and, to less extent, ecryptfs), but for the majority of calls that song and dance is not needed at all. It's easier to make ecryptfs and overlayfs find and pass those values if there's a ->d_revalidate() instance to be called, rather than doing that in the instances. This commit only changes the calling conventions; making use of supplied values is left to followups. NOTE: some instances need more than just the parent - things like CIFS may need to build an entire path from filesystem root, so they need more precautions than the usual boilerplate. This series doesn't do anything to that need - these filesystems have to keep their locking mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem a-la v9fs). One thing to keep in mind when using name is that name->name will normally point into the pathname being resolved; the filename in question occupies name->len bytes starting at name->name, and there is NUL somewhere after it, but it the next byte might very well be '/' rather than '\0'. Do not ignore name->len. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-12-31exfat: fix the infinite loop in __exfat_free_cluster()Yuezhang Mo1-0/+10
In __exfat_free_cluster(), the cluster chain is traversed until the EOF cluster. If the cluster chain includes a loop due to file system corruption, the EOF cluster cannot be traversed, resulting in an infinite loop. This commit uses the total number of clusters to prevent this infinite loop. Reported-by: syzbot+1de5a37cb85a2d536330@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1de5a37cb85a2d536330 Tested-by: syzbot+1de5a37cb85a2d536330@syzkaller.appspotmail.com Fixes: 31023864e67a ("exfat: add fat entry operations") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-12-31exfat: fix the new buffer was not zeroed before writingYuezhang Mo1-0/+6
Before writing, if a buffer_head marked as new, its data must be zeroed, otherwise uninitialized data in the page cache will be written. So this commit uses folio_zero_new_buffers() to zero the new buffers before ->write_end(). Fixes: 6630ea49103c ("exfat: move extend valid_size into ->page_mkwrite()") Reported-by: syzbot+91ae49e1c1a2634d20c0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=91ae49e1c1a2634d20c0 Tested-by: syzbot+91ae49e1c1a2634d20c0@syzkaller.appspotmail.com Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-12-31exfat: fix the infinite loop in exfat_readdir()Yuezhang Mo1-1/+2
If the file system is corrupted so that a cluster is linked to itself in the cluster chain, and there is an unused directory entry in the cluster, 'dentry' will not be incremented, causing condition 'dentry < max_dentries' unable to prevent an infinite loop. This infinite loop causes s_lock not to be released, and other tasks will hang, such as exfat_sync_fs(). This commit stops traversing the cluster chain when there is unused directory entry in the cluster to avoid this infinite loop. Reported-by: syzbot+205c2644abdff9d3f9fc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=205c2644abdff9d3f9fc Tested-by: syzbot+205c2644abdff9d3f9fc@syzkaller.appspotmail.com Fixes: ca06197382bd ("exfat: add directory operations") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-12-17exfat: fix exfat_find_empty_entry() not returning error on failureYuezhang Mo1-2/+2
On failure, "dentry" is the error code. If the error code indicates that there is no space, a new cluster may need to be allocated; for other errors, it should be returned directly. Only on success, "dentry" is the index of the directory entry, and it needs to be converted into the directory entry index within the cluster where it is located. Fixes: 8a3f5711ad74 ("exfat: reduce FAT chain traversal") Reported-by: syzbot+6f6c9397e0078ef60bce@syzkaller.appspotmail.com Tested-by: syzbot+6f6c9397e0078ef60bce@syzkaller.appspotmail.com Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: reduce FAT chain traversalYuezhang Mo3-9/+32
Before this commit, ->dir and ->entry of exfat_inode_info record the first cluster of the parent directory and the directory entry index starting from this cluster. The directory entry set will be gotten during write-back-inode/rmdir/ unlink/rename. If the clusters of the parent directory are not continuous, the FAT chain will be traversed from the first cluster of the parent directory to find the cluster where ->entry is located. After this commit, ->dir records the cluster where the first directory entry in the directory entry set is located, and ->entry records the directory entry index in the cluster, so that there is almost no need to access the FAT when getting the directory entry set. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: code cleanup for exfat_readdir()Yuezhang Mo1-22/+2
For the root directory and other directories, the clusters allocated to them can be obtained from exfat_inode_info, and there is no need to distinguish them. And there is no need to initialize atime/ctime/mtime/size in exfat_readdir(), because exfat_iterate() does not use them. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: remove argument 'p_dir' from exfat_add_entry()Yuezhang Mo1-10/+4
The output of argument 'p_dir' of exfat_add_entry() is not used in either exfat_mkdir() or exfat_create(), remove the argument. Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: move exfat_chain_set() out of __exfat_resolve_path()Yuezhang Mo1-34/+26
__exfat_resolve_path() mixes two functions. The first one is to resolve and check if the path is valid. The second one is to output the cluster assigned to the directory. The second one is only needed when need to traverse the directory entries, and calling exfat_chain_set() so early causes p_dir to be passed as an argument multiple times, increasing the complexity of the code. This commit moves the call to exfat_chain_set() before traversing directory entries. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: add exfat_get_dentry_set_by_ei() helperYuezhang Mo3-36/+21
This helper gets the directory entry set of the file for the exfat inode which has been created. It's used to remove all the instances of the pattern it replaces making the code cleaner, it's also a preparation for changing ->dir to record the cluster where the directory entry set is located and changing ->entry to record the index of the directory entry within the cluster. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: rename argument name for exfat_move_file and exfat_rename_fileYuezhang Mo1-12/+12
In this exfat implementation, the relationship between inode and ei is ei=EXFAT_I(inode). However, in the arguments of exfat_move_file() and exfat_rename_file(), argument 'inode' indicates the parent directory, but argument 'ei' indicates the target file to be renamed. They do not have the above relationship, which is not friendly to code readers. So this commit renames 'inode' to 'parent_inode', making the argument name match its role. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: remove unnecessary read entry in __exfat_rename()Yuezhang Mo1-16/+4
To determine whether it is a directory, there is no need to read its directory entry, just use S_ISDIR(inode->i_mode). Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: fix file being changed by unaligned direct writeYuezhang Mo1-0/+10
Unaligned direct writes are invalid and should return an error without making any changes, rather than extending ->valid_size and then returning an error. Therefore, alignment checking is required before extending ->valid_size. Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Co-developed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: fix uninit-value in __exfat_get_dentry_setNamjae Jeon1-0/+1
There is no check if stream size and start_clu are invalid. If start_clu is EOF cluster and stream size is 4096, It will cause uninit value access. because ei->hint_femp.eidx could be 128(if cluster size is 4K) and wrong hint will allocate next cluster. and this cluster will be same with the cluster that is allocated by exfat_extend_valid_size(). The previous patch will check invalid start_clu, but for clarity, initialize hint_femp.eidx to zero. Cc: stable@vger.kernel.org Reported-by: syzbot+01218003be74b5e1213a@syzkaller.appspotmail.com Tested-by: syzbot+01218003be74b5e1213a@syzkaller.appspotmail.com Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: fix out-of-bounds access of directory entriesYuezhang Mo1-4/+16
In the case of the directory size is greater than or equal to the cluster size, if start_clu becomes an EOF cluster(an invalid cluster) due to file system corruption, then the directory entry where ei->hint_femp.eidx hint is outside the directory, resulting in an out-of-bounds access, which may cause further file system corruption. This commit adds a check for start_clu, if it is an invalid cluster, the file or directory will be treated as empty. Cc: stable@vger.kernel.org Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Co-developed-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro3-3/+3
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-09-23exfat: resolve memory leak from exfat_create_upcase_table()Daniel Yang1-1/+4
If exfat_load_upcase_table reaches end and returns -EINVAL, allocated memory doesn't get freed and while exfat_load_default_upcase_table allocates more memory, leading to a memory leak. Here's link to syzkaller crash report illustrating this issue: https://syzkaller.appspot.com/text?tag=CrashReport&x=1406c201980000 Reported-by: syzbot+e1c69cadec0f1a078e3d@syzkaller.appspotmail.com Fixes: a13d1a4de3b0 ("exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper") Cc: stable@vger.kernel.org Signed-off-by: Daniel Yang <danielyangkang@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-23exfat: move extend valid_size into ->page_mkwrite()Yuezhang Mo1-25/+45
It is not a good way to extend valid_size to the end of the mmap area by writing zeros in mmap. Because after calling mmap, no data may be written, or only a small amount of data may be written to the head of the mmap area. This commit moves extending valid_size to exfat_page_mkwrite(). In exfat_page_mkwrite() only extend valid_size to the starting position of new data writing, which reduces unnecessary writing of zeros. If the block is not mapped and is marked as new after being mapped for writing, block_write_begin() will zero the page cache corresponding to the block, so there is no need to call zero_user_segment() in exfat_file_zeroed_range(). And after moving extending valid_size to exfat_page_mkwrite(), the data written by mmap will be copied to the page cache but the page cache may be not mapped to the disk. Calling zero_user_segment() will cause the data written by mmap to be cleared. So this commit removes calling zero_user_segment() from exfat_file_zeroed_range() and renames exfat_file_zeroed_range() to exfat_extend_valid_size(). Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-18exfat: fix memory leak in exfat_load_bitmap()Yuezhang Mo1-5/+5
If the first directory entry in the root directory is not a bitmap directory entry, 'bh' will not be released and reassigned, which will cause a memory leak. Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-18exfat: Implement sops->shutdown and ioctlDongliang Cui5-0/+96
We found that when writing a large file through buffer write, if the disk is inaccessible, exFAT does not return an error normally, which leads to the writing process not stopping properly. To easily reproduce this issue, you can follow the steps below: 1. format a device to exFAT and then mount (with a full disk erase) 2. dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192 3. eject the device You may find that the dd process does not stop immediately and may continue for a long time. The root cause of this issue is that during buffer write process, exFAT does not need to access the disk to look up directory entries or the FAT table (whereas FAT would do) every time data is written. Instead, exFAT simply marks the buffer as dirty and returns, delegating the writeback operation to the writeback process. If the disk cannot be accessed at this time, the error will only be returned to the writeback process, and the original process will not receive the error, so it cannot be returned to the user side. When the disk cannot be accessed normally, an error should be returned to stop the writing process. Implement sops->shutdown and ioctl to shut down the file system when underlying block device is marked dead. Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-17exfat: do not fallback to buffered writeYuezhang Mo5-74/+15
After commit(11a347fb6cef exfat: change to get file size from DataLength), the remaining area or hole had been filled with zeros before calling exfat_direct_IO(), so there is no need to fallback to buffered write, and ->i_size_aligned is no longer needed, drop it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-17exfat: drop ->i_size_ondiskYuezhang Mo5-25/+12
->i_size_ondisk is no longer used by exfat_write_begin() after commit(11a347fb6cef exfat: change to get file size from DataLength), drop it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-08-07fs: Convert aops->write_begin to take a folioMatthew Wilcox (Oracle)2-7/+6
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07fs: Convert aops->write_end to take a folioMatthew Wilcox (Oracle)2-3/+3
Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-17Merge tag 'exfat-for-6.11-rc1' of ↵Linus Torvalds3-11/+15
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Fix deadlock issue reported by syzbot - Handle idmapped mounts * tag 'exfat-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix potential deadlock on __exfat_get_dentry_set exfat: handle idmapped mounts
2024-07-15exfat: fix potential deadlock on __exfat_get_dentry_setSungjong Seo1-1/+1
When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array is allocated in __exfat_get_entry_set. The problem is that the bh-array is allocated with GFP_KERNEL. It does not make sense. In the following cases, a deadlock for sbi->s_lock between the two processes may occur. CPU0 CPU1 ---- ---- kswapd balance_pgdat lock(fs_reclaim) exfat_iterate lock(&sbi->s_lock) exfat_readdir exfat_get_uniname_from_ext_entry exfat_get_dentry_set __exfat_get_dentry_set kmalloc_array ... lock(fs_reclaim) ... evict exfat_evict_inode lock(&sbi->s_lock) To fix this, let's allocate bh-array with GFP_NOFS. Fixes: a3ff29a95fde ("exfat: support dynamic allocate bh for exfat_entry_set_cache") Cc: stable@vger.kernel.org # v6.2+ Reported-by: syzbot+412a392a2cd4a65e71db@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000fef47e0618c0327f@google.com Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-07-15exfat: handle idmapped mountsMichael Jeanson2-10/+14
Pass the idmapped mount information to the different helper functions. Adapt the uid/gid checks in exfat_setattr to use the vfsuid/vfsgid helpers. Based on the fat implementation in commit 4b7899368108 ("fat: handle idmapped mounts") by Christian Brauner. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-07-02exfat: Convert to new uid/gid option parsing helpersEric Sandeen1-4/+4
Convert to new uid/gid option parsing helpers Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/dda575de-11a7-4139-8a25-07957d311ed3@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-25exfat: zero the reserved fields of file and stream extension dentriesYuezhang Mo1-0/+2
From exFAT specification, the reserved fields should initialize to zero and should not use for any purpose. If create a new dentry set in the UNUSED dentries, all fields had been zeroed when allocating cluster to parent directory. But if create a new dentry set in the DELETED dentries, the reserved fields in file and stream extension dentries may be non-zero. Because only the valid bit of the type field of the dentry is cleared in exfat_remove_entries(), if the type of dentry is different from the original(For example, a dentry that was originally a file name dentry, then set to deleted dentry, and then set as a file dentry), the reserved fields is non-zero. So this commit initializes the dentry to 0 before createing file dentry and stream extension dentry. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-31exfat: fix timing of synchronizing bitmap and inodeYuezhang Mo1-4/+3
Commit(f55c096f62f1 exfat: do not zero the extended part) changed the timing of synchronizing bitmap and inode in exfat_cont_expand(). The change caused xfstests generic/013 to fail if 'dirsync' or 'sync' is enabled. So this commit restores the timing. Fixes: f55c096f62f1 ("exfat: do not zero the extended part") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-21Merge tag 'exfat-for-6.9-rc1' of ↵Linus Torvalds4-374/+291
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Improve dirsync performance by syncing on a dentry-set rather than on a per-directory entry * tag 'exfat-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: remove duplicate update parent dir exfat: do not sync parent dir if just update timestamp exfat: remove unused functions exfat: convert exfat_find_empty_entry() to use dentry cache exfat: convert exfat_init_ext_entry() to use dentry cache exfat: move free cluster out of exfat_init_ext_entry() exfat: convert exfat_remove_entries() to use dentry cache exfat: convert exfat_add_entry() to use dentry cache exfat: add exfat_get_empty_dentry_set() helper exfat: add __exfat_get_dentry_set() helper
2024-03-19exfat: remove duplicate update parent dirYuezhang Mo1-1/+2
For renaming, the directory only needs to be updated once if it is in the same directory. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: do not sync parent dir if just update timestampYuezhang Mo1-11/+8
When sync or dir_sync is enabled, there is no need to sync the parent directory's inode if only for updating its timestamp. 1. If an unexpected power failure occurs, the timestamp of the parent directory is not updated to the storage, which has no impact on the user. 2. The number of writes will be greatly reduced, which can not only improve performance, but also prolong device life. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: remove unused functionsYuezhang Mo3-64/+4
exfat_count_ext_entries() is no longer called, remove it. exfat_update_dir_chksum() is no longer called, remove it and rename exfat_update_dir_chksum_with_entry_set() to it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: convert exfat_find_empty_entry() to use dentry cacheYuezhang Mo1-82/+40
Before this conversion, each dentry traversed needs to be read from the storage device or page cache. There are at least 16 dentries in a sector. This will result in frequent page cache searches. After this conversion, if all directory entries in a sector are used, the sector only needs to be read once. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: convert exfat_init_ext_entry() to use dentry cacheYuezhang Mo3-77/+33
Before this conversion, in exfat_init_ext_entry(), to init the dentries in a dentry set, the sync times is equals the dentry number if 'dirsync' or 'sync' is enabled. That affects not only performance but also device life. After this conversion, only needs to be synchronized once if 'dirsync' or 'sync' is enabled. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: move free cluster out of exfat_init_ext_entry()Yuezhang Mo2-5/+3
exfat_init_ext_entry() is an init function, it's a bit strange to free cluster in it. And the argument 'inode' will be removed from exfat_init_ext_entry(). So this commit changes to free the cluster in exfat_remove_entries(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: convert exfat_remove_entries() to use dentry cacheYuezhang Mo3-115/+90
Before this conversion, in exfat_remove_entries(), to mark the dentries in a dentry set as deleted, the sync times is equals the dentry numbers if 'dirsync' or 'sync' is enabled. That affects not only performance but also device life. After this conversion, only needs to be synchronized once if 'dirsync' or 'sync' is enabled. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: convert exfat_add_entry() to use dentry cacheYuezhang Mo3-33/+22
After this conversion, if "dirsync" or "sync" is enabled, the number of synchronized dentries in exfat_add_entry() will change from 2 to 1. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: add exfat_get_empty_dentry_set() helperYuezhang Mo2-0/+82
This helper is used to lookup empty dentry set. If there are no enough empty dentries at the input location, this helper will return the number of dentries that need to be skipped for the next lookup. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: add __exfat_get_dentry_set() helperYuezhang Mo2-22/+43
Since exfat_get_dentry_set() invokes the validate functions of exfat_validate_entry(), it only supports getting a directory entry set of an existing file, doesn't support getting an empty entry set. To remove the limitation, add this helper. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-12mm, slab: remove last vestiges of SLAB_MEM_SPREADLinus Torvalds2-2/+2
Yes, yes, I know the slab people were planning on going slow and letting every subsystem fight this thing on their own. But let's just rip off the band-aid and get it over and done with. I don't want to see a number of unnecessary pull requests just to get rid of a flag that no longer has any meaning. This was mainly done with a couple of 'sed' scripts and then some manual cleanup of the end result. Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-01Merge tag 'exfat-for-6.8-rc7' of ↵Linus Torvalds1-14/+21
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fix from Namjae Jeon: - Fix ftruncate failure when allocating non-contiguous clusters * tag 'exfat-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix appending discontinuous clusters to empty file
2024-02-25Merge tag 'pull-fixes.pathwalk-rcu-2' of ↵Linus Torvalds3-19/+16
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU pathwalk fixes from Al Viro: "We still have some races in filesystem methods when exposed to RCU pathwalk. This series is a result of code audit (the second round of it) and it should deal with most of that stuff. Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate(). Up to maintainers (a note for NTFS folks - when documentation says that a method may not block, it *does* imply that blocking allocations are to be avoided. Really)" [ More explanations for people who aren't familiar with the vagaries of RCU path walking: most of it is hidden from filesystems, but if a filesystem actively participates in the low-level path walking it needs to make sure the fields involved in that walk are RCU-safe. That "actively participate in low-level path walking" includes things like having its own ->d_hash()/->d_compare() routines, or by having its own directory permission function that doesn't just use the common helpers. Having a ->d_revalidate() function will also have this issue. Note that instead of making everything RCU safe you can also choose to abort the RCU pathwalk if your operation cannot be done safely under RCU, but that obviously comes with a performance penalty. One common pattern is to allow the simple cases under RCU, and abort only if you need to do something more complicated. So not everything needs to be RCU-safe, and things like the inode etc that the VFS itself maintains obviously already are. But these fixes tend to be about properly RCU-delaying things like ->s_fs_info that are maintained by the filesystem and that got potentially released too early. - Linus ] * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4_get_link(): fix breakage in RCU mode cifs_get_link(): bail out in unsafe case fuse: fix UAF in rcu pathwalks procfs: make freeing proc_fs_info rcu-delayed procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() nfs: fix UAF on pathwalk running into umount nfs: make nfs_set_verifier() safe for use in RCU pathwalk afs: fix __afs_break_callback() / afs_drop_open_mmap() race hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper affs: free affs_sb_info with kfree_rcu() rcu pathwalk: prevent bogus hard errors from may_lookup() fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
2024-02-25exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helperAl Viro3-19/+16
That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem that is in process of getting shut down. Besides, having nls and upcase table cleanup moved from ->put_super() towards the place where sbi is freed makes for simpler failure exits. Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-18exfat: fix appending discontinuous clusters to empty fileYuezhang Mo1-14/+21
Eric Hong found that when using ftruncate to expand an empty file, exfat_ent_set() will fail if discontinuous clusters are allocated. The reason is that the empty file does not have a cluster chain, but exfat_ent_set() attempts to append the newly allocated cluster to the cluster chain. In addition, exfat_find_last_cluster() only supports finding the last cluster in a non-empty file. So this commit adds a check whether the file is empty. If the file is empty, exfat_find_last_cluster() and exfat_ent_set() are no longer called as they do not need to be called. Fixes: f55c096f62f1 ("exfat: do not zero the extended part") Reported-by: Eric Hong <erichong@qnap.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-18exfat: fix zero the unwritten part for dio readYuezhang Mo1-4/+3
For dio read, bio will be leave in flight when a successful partial aio read have been setup, blockdev_direct_IO() will return -EIOCBQUEUED. In the case, iter->iov_offset will be not advanced, the oops reported by syzbot will occur if revert iter->iov_offset with iov_iter_revert(). The unwritten part had been zeroed by aio read, so there is no need to zero it in dio read. Reported-by: syzbot+fd404f6b03a58e8bc403@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fd404f6b03a58e8bc403 Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08exfat: do not zero the extended partYuezhang Mo2-21/+70
Since the read operation beyond the ValidDataLength returns zero, if we just extend the size of the file, we don't need to zero the extended part, but only change the DataLength without changing the ValidDataLength. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08exfat: change to get file size from DataLengthYuezhang Mo4-19/+231
In stream extension directory entry, the ValidDataLength field describes how far into the data stream user data has been written, and the DataLength field describes the file size. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08exfat: using ffs instead of internal logicJohn Sanpe2-28/+16
Replaced the internal table lookup algorithm with ffs of the bitops library with better performance. Use it to increase the single processing length of the exfat_find_free_bitmap function, from single-byte search to long type. Signed-off-by: John Sanpe <sanpeqf@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08exfat: using hweight instead of internal logicJohn Sanpe1-27/+21
Replace the internal table lookup algorithm with the hweight library, which has instruction set acceleration capabilities. Use it to increase the length of a single calculation of the exfat_find_free_bitmap function to the long type. Signed-off-by: John Sanpe <sanpeqf@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-11-03exfat: fix ctime is not updatedYuezhang Mo1-0/+1
Commit 4c72a36edd54 ("exfat: convert to new timestamp accessors") removed attr_copy() from exfat_set_attr(). It causes xfstests generic/221 to fail. In xfstests generic/221, it tests ctime should be updated even if futimens() update atime only. But in this case, ctime will not be updated if attr_copy() is removed. attr_copy() may also update other attributes, and removing it may cause other bugs, so this commit restores to call attr_copy() in exfat_set_attr(). Fixes: 4c72a36edd54 ("exfat: convert to new timestamp accessors") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-11-03exfat: fix setting uninitialized time to ctime/atimeYuezhang Mo1-2/+2
An uninitialized time is set to ctime/atime in __exfat_write_inode(). It causes xfstests generic/003 and generic/192 to fail. And since there will be a time gap between setting ctime/atime to the inode and writing back the inode, so ctime/atime should not be set again when writing back the inode. Fixes: 4c72a36edd54 ("exfat: convert to new timestamp accessors") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31exfat: support create zero-size directoryYuezhang Mo4-8/+20
This commit adds mount option 'zero_size_dir'. If this option enabled, don't allocate a cluster to directory when creating it, and set the directory size to 0. On Windows, a cluster is allocated for a directory when it is created, so the mount option is disabled by default. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31exfat: support handle zero-size directoryYuezhang Mo1-7/+22
After repairing a corrupted file system with exfatprogs' fsck.exfat, zero-size directories may result. It is also possible to create zero-size directories in other exFAT implementation, such as Paragon ufsd dirver. As described in the specification, the lower directory size limits is 0 bytes. Without this commit, sub-directories and files cannot be created under a zero-size directory, and it cannot be removed. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31exfat: add ioctls for accessing attributesJan Cincera7-32/+128
Add GET and SET attributes ioctls to enable attribute modification. We already do this in FAT and a few userspace utils made for it would benefit from this also working on exFAT, namely fatattr. Signed-off-by: Jan Cincera <hcincera@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-18exfat: convert to new timestamp accessorsJeff Layton6-35/+47
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-31-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-29Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ...
2023-08-28Merge tag 'v6.6-vfs.super' of ↵Linus Torvalds2-21/+20
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
2023-08-28Merge tag 'v6.6-vfs.ctime' of ↵Linus Torvalds4-23/+18
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
2023-08-10exfat: free the sbi and iocharset in ->kill_sbChristoph Hellwig1-10/+18
As a rule of thumb everything allocated to the fs_context and moved into the super_block should be freed by ->kill_sb so that the teardown handling doesn't need to be duplicated between the fill_super error path and put_super. Implement an exfat-specific kill_sb method to do that and share the code with the mount contex free helper for the mount error handling case. Signed-off-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230809220545.1308228-11-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10exfat: don't RCU-free the sbiChristoph Hellwig2-13/+4
There are no RCU critical sections for accessing any information in the sbi, so drop the call_rcu indirection for freeing the sbi. Signed-off-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230809220545.1308228-10-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09fs: pass the request_mask to generic_fillattrJeff Layton1-1/+1
generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06vfs: get rid of old '->iterate' directory operationLinus Torvalds1-1/+2
All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-02fs: add CONFIG_BUFFER_HEADChristoph Hellwig1-0/+1
Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-15exfat: release s_lock before calling dir_emit()Sungjong Seo1-15/+12
There is a potential deadlock reported by syzbot as below: ====================================================== WARNING: possible circular locking dependency detected 6.4.0-next-20230707-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor330/5073 is trying to acquire lock: ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: mmap_read_lock_killable include/linux/mmap_lock.h:151 [inline] ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: get_mmap_lock_carefully mm/memory.c:5293 [inline] ffff8880218527a0 (&mm->mmap_lock){++++}-{3:3}, at: lock_mm_and_find_vma+0x369/0x510 mm/memory.c:5344 but task is already holding lock: ffff888019f760e0 (&sbi->s_lock){+.+.}-{3:3}, at: exfat_iterate+0x117/0xb50 fs/exfat/dir.c:232 which lock already depends on the new lock. Chain exists of: &mm->mmap_lock --> mapping.invalidate_lock#3 --> &sbi->s_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sbi->s_lock); lock(mapping.invalidate_lock#3); lock(&sbi->s_lock); rlock(&mm->mmap_lock); Let's try to avoid above potential deadlock condition by moving dir_emit*() out of sbi->s_lock coverage. Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org #v5.7+ Reported-by: syzbot+1741a5d9b79989c10bdc@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/00000000000078ee7e060066270b@google.com/T/#u Tested-by: syzbot+1741a5d9b79989c10bdc@syzkaller.appspotmail.com Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-07-13exfat: check if filename entries exceeds max filename lengthNamjae Jeon1-2/+7
exfat_extract_uni_name copies characters from a given file name entry into the 'uniname' variable. This variable is actually defined on the stack of the exfat_readdir() function. According to the definition of the 'exfat_uni_name' type, the file name should be limited 255 characters (+ null teminator space), but the exfat_get_uniname_from_ext_entry() function can write more characters because there is no check if filename entries exceeds max filename length. This patch add the check not to copy filename characters when exceeding max filename length. Cc: stable@vger.kernel.org Cc: Yuezhang Mo <Yuezhang.Mo@sony.com> Reported-by: Maxim Suhanov <dfirblog@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-07-13exfat: convert to ctime accessor functionsJeff Layton4-19/+15
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-38-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-11exfat: use kvmalloc_array/kvfree instead of kmalloc_array/kfreegaoming1-3/+3
The call stack shown below is a scenario in the Linux 4.19 kernel. Allocating memory failed where exfat fs use kmalloc_array due to system memory fragmentation, while the u-disk was inserted without recognition. Devices such as u-disk using the exfat file system are pluggable and may be insert into the system at any time. However, long-term running systems cannot guarantee the continuity of physical memory. Therefore, it's necessary to address this issue. Binder:2632_6: page allocation failure: order:4, mode:0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null) Call trace: [242178.097582] dump_backtrace+0x0/0x4 [242178.097589] dump_stack+0xf4/0x134 [242178.097598] warn_alloc+0xd8/0x144 [242178.097603] __alloc_pages_nodemask+0x1364/0x1384 [242178.097608] kmalloc_order+0x2c/0x510 [242178.097612] kmalloc_order_trace+0x40/0x16c [242178.097618] __kmalloc+0x360/0x408 [242178.097624] load_alloc_bitmap+0x160/0x284 [242178.097628] exfat_fill_super+0xa3c/0xe7c [242178.097635] mount_bdev+0x2e8/0x3a0 [242178.097638] exfat_fs_mount+0x40/0x50 [242178.097643] mount_fs+0x138/0x2e8 [242178.097649] vfs_kern_mount+0x90/0x270 [242178.097655] do_mount+0x798/0x173c [242178.097659] ksys_mount+0x114/0x1ac [242178.097665] __arm64_sys_mount+0x24/0x34 [242178.097671] el0_svc_common+0xb8/0x1b8 [242178.097676] el0_svc_handler+0x74/0x90 [242178.097681] el0_svc+0x8/0x340 By analyzing the exfat code,we found that continuous physical memory is not required here,so kvmalloc_array is used can solve this problem. Cc: stable@vger.kernel.org Signed-off-by: gaoming <gaoming20@hihonor.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-07-10exfat: convert to simple_rename_timestampJeff Layton1-3/+2
A rename potentially involves updating 4 different inode timestamps. Convert to the new simple_rename_timestamp helper function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230705190309.579783-10-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-10exfat: ensure that ctime is updated whenever the mtime isJeff Layton1-4/+4
When removing entries from a directory, the ctime must also be updated alongside the mtime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230705190309.579783-4-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-05-24splice: Use filemap_splice_read() instead of generic_file_splice_read()David Howells1-1/+1
Replace pointers to generic_file_splice_read() with calls to filemap_splice_read(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-29-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-03-01Merge tag 'exfat-for-6.3-rc1' of ↵Linus Torvalds8-60/+101
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Handle vendor extension and allocation entries as unrecognized benign secondary entries - Fix wrong ->i_blocks on devices with non-512 byte sector - Add the check to avoid returning -EIO from exfat_readdir() at current position exceeding the directory size - Fix a bug that reach the end of the directory stream at a position not aligned with the dentry size - Redefine DIR_DELETED as 0xFFFFFFF7, the bad cluster number - Two cleanup fixes and fix cluster leakage in error handling * tag 'exfat-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix the newly allocated clusters are not freed in error handling exfat: don't print error log in normal case exfat: remove unneeded code from exfat_alloc_cluster() exfat: handle unreconized benign secondary entries exfat: fix inode->i_blocks for non-512 byte sector size device exfat: redefine DIR_DELETED as the bad cluster number exfat: fix reporting fs error when reading dir beyond EOF exfat: fix unexpected EOF while reading dir
2023-02-28exfat: fix the newly allocated clusters are not freed in error handlingYuezhang Mo1-10/+8
In error handling 'free_cluster', before num_alloc clusters allocated, p_chain->size will not updated and always 0, thus the newly allocated clusters are not freed. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-28exfat: don't print error log in normal caseYuezhang Mo1-2/+3
When allocating a new cluster, exFAT first allocates from the next cluster of the last cluster of the file. If the last cluster of the file is the last cluster of the volume, allocate from the first cluster. This is a normal case, but the following error log will be printed. It makes users confused, so this commit removes the error log. [1960905.181545] exFAT-fs (sdb1): hint_cluster is invalid (262130) Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-28exfat: remove unneeded code from exfat_alloc_cluster()Yuezhang Mo1-8/+1
In the removed code, num_clusters is 0, nothing is done in exfat_chain_cont_cluster(), so it is unneeded, remove it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27exfat: handle unreconized benign secondary entriesNamjae Jeon3-25/+81
Sony PXW-Z280 camera add vendor allocation entries to directory of pictures. Currently, linux exfat does not support it and the file is not visible. This patch handle vendor extension and allocation entries as unreconized benign secondary entries. As described in the specification, it is recognized but ignored, and when deleting directory entry set, the associated clusters allocation are removed as well as benign secondary directory entries. Reported-by: Barócsi Dénes <admin@tveger.hu> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27exfat: fix inode->i_blocks for non-512 byte sector size deviceYuezhang Mo4-9/+5
inode->i_blocks is not real number of blocks, but 512 byte ones. Fixes: 98d917047e8b ("exfat: add file operations") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Tested-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27exfat: redefine DIR_DELETED as the bad cluster numberSungjong Seo1-1/+1
When a file or a directory is deleted, the hint for the cluster of its parent directory in its in-memory inode is set as DIR_DELETED. Therefore, DIR_DELETED must be one of invalid cluster numbers. According to the exFAT specification, a volume can have at most 2^32-11 clusters. However, DIR_DELETED is wrongly defined as 0xFFFF0321, which could be a valid cluster number. To fix it, let's redefine DIR_DELETED as 0xFFFFFFF7, the bad cluster number. Fixes: 1acf1a564b60 ("exfat: add in-memory and on-disk structures and headers") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27exfat: fix reporting fs error when reading dir beyond EOFYuezhang Mo1-1/+1
Since seekdir() does not check whether the position is valid, the position may exceed the size of the directory. We found that for a directory with discontinuous clusters, if the position exceeds the size of the directory and the excess size is greater than or equal to the cluster size, exfat_readdir() will return -EIO, causing a file system error and making the file system unavailable. Reproduce this bug by: seekdir(dir, dir_size + cluster_size); dirent = readdir(dir); The following log will be printed if mount with 'errors=remount-ro'. [11166.712896] exFAT-fs (sdb1): error, invalid access to FAT (entry 0xffffffff) [11166.712905] exFAT-fs (sdb1): Filesystem has been set read-only Fixes: 1e5654de0f51 ("exfat: handle wrong stream entry size in exfat_readdir()") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27exfat: fix unexpected EOF while reading dirYuezhang Mo1-4/+1
If the position is not aligned with the dentry size, the return value of readdir() will be NULL and errno is 0, which means the end of the directory stream is reached. If the position is aligned with dentry size, but there is no file or directory at the position, exfat_readdir() will continue to get dentry from the next dentry. So the dentry gotten by readdir() may not be at the position. After this commit, if the position is not aligned with the dentry size, round the position up to the dentry size and continue to get the dentry. Fixes: ca06197382bd ("exfat: add directory operations") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-20Merge tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linuxLinus Torvalds1-0/+1
Pull legacy dio update from Jens Axboe: "We only have a few file systems that use the old dio code, make them select it rather than build it unconditionally" * tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux: fs: build the legacy direct I/O code conditionally fs: move sb_init_dio_done_wq out of direct-io.c
2023-01-26fs: build the legacy direct I/O code conditionallyChristoph Hellwig1-0/+1
Add a new LEGACY_DIRECT_IO config symbol that is only selected by the file systems that still use the legacy blockdev_direct_IO code, so that kernels without support for those file systems don't need to build the code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230125065839.191256-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-19fs: port ->rename() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mkdir() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->create() to pass mnt_idmapChristian Brauner1-1/+1
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->getattr() to pass mnt_idmapChristian Brauner2-3/+3
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->setattr() to pass mnt_idmapChristian Brauner2-4/+4
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-12-15Merge tag 'exfat-for-6.2-rc1' of ↵Linus Torvalds5-145/+187
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat update from Namjae Jeon: - simplify and remove some redundant directory entry code - optimize the size of exfat_entry_set_cache and its allocation policy - improve the performance for creating files and directories * tag 'exfat-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: reuse exfat_find_location() to simplify exfat_get_dentry_set() exfat: fix overflow in sector and cluster conversion exfat: remove i_size_write() from __exfat_truncate() exfat: remove argument 'size' from exfat_truncate() exfat: remove unnecessary arguments from exfat_find_dir_entry() exfat: remove unneeded codes from __exfat_rename() exfat: remove call ilog2() from exfat_readdir() exfat: replace magic numbers with Macros exfat: rename exfat_free_dentry_set() to exfat_put_dentry_set() exfat: move exfat_entry_set_cache from heap to stack exfat: support dynamic allocate bh for exfat_entry_set_cache exfat: reduce the size of exfat_entry_set_cache exfat: hint the empty entry which at the end of cluster chain exfat: simplify empty entry hint
2022-12-13exfat: reuse exfat_find_location() to simplify exfat_get_dentry_set()Yuezhang Mo1-13/+4
In exfat_get_dentry_set(), part of the code is the same as exfat_find_location(), reuse exfat_find_location() to simplify exfat_get_dentry_set(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-13exfat: fix overflow in sector and cluster conversionYuezhang Mo1-1/+1
According to the exFAT specification, there are at most 2^32-11 clusters in a volume. so using 'int' is not enough for cluster index, the return value type of exfat_sector_to_cluster() should be 'unsigned int'. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-11extfat: remove ->writepageChristoph Hellwig1-7/+2
Patch series "start removing writepage instances v2". The VM doesn't need or want ->writepage for writeback and is fine with just having ->writepages as long as ->migrate_folio is implemented. This series removes all ->writepage instances that use block_write_full_page directly and also have a plain mpage_writepages based ->writepages. This patch (of 7): ->writepage is a very inefficient method to write back data, and only used through write_cache_pages or a a fallback when no ->migrate_folio method is present. Set ->migrate_folio to the generic buffer_head based helper, and remove the ->writepage implementation. Link: https://lkml.kernel.org/r/20221202102644.770505-1-hch@lst.de Link: https://lkml.kernel.org/r/20221202102644.770505-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Bob Copeland <me@bobcopeland.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Jan Kara <jack@suse.com> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-12-12exfat: remove i_size_write() from __exfat_truncate()Yuezhang Mo3-7/+5
The file/directory size is updated into inode by i_size_write() before __exfat_truncate() is called, so it is redundant to re-update by i_size_write() in __exfat_truncate(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: remove argument 'size' from exfat_truncate()Yuezhang Mo3-4/+4
argument 'size' is not used in exfat_truncate(), remove it. Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: remove unnecessary arguments from exfat_find_dir_entry()Yuezhang Mo3-15/+10
This commit removes argument 'num_entries' and 'type' from exfat_find_dir_entry(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: remove unneeded codes from __exfat_rename()Yuezhang Mo1-8/+1
The code gets the dentry, but the dentry is not used, remove the code. Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: remove call ilog2() from exfat_readdir()Yuezhang Mo2-7/+12
There is no need to call ilog2() for the conversions between cluster and dentry in exfat_readdir(), because these conversions can be replaced with EXFAT_DEN_TO_CLU()/EXFAT_CLU_TO_DEN(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-12-12exfat: replace magic numbers with MacrosYuezhang Mo3-10/+10
Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>