aboutsummaryrefslogtreecommitdiffstats
path: root/fs
diff options
authorMark Harmstone <mark@harmstone.com>2026-04-22 15:03:35 +0100
committerDavid Sterba <dsterba@suse.com>2026-05-24 03:05:26 +0200
commit551e510a97a487218d5f22d61d1a3388ef1171ac (patch)
tree1306a5cf44663d96054064190283add1a8105bca /fs
parent964f569c14d7778c8f29ce81ec35d7a8fca31adf (diff)
downloadlinux-next-history-551e510a97a487218d5f22d61d1a3388ef1171ac.tar.gz
btrfs: don't force DIO writes to be serialized
Before btrfs switched to the new mount API in 2023, we were setting SB_NOSEC in btrfs_mount_root(). This flag tells the VFS that the filesystem may have files which don't have security xattrs, enabling it to do some optimizations. Unfortunately this was missed in the transition, meaning that IS_NOSEC will always return false for a btrfs inode. This means that btrfs_direct_write() calls will always get the inode lock exclusively, meaning that DIO writes to the same file will be serialized. On my machine, this one-line change results in a ~59% improvement in DIO throughput: Before patch: test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64 ... fio-3.39 Starting 32 processes test: Laying out IO file (1 file / 1024MiB) Jobs: 32 (f=32): [w(32)][100.0%][w=764MiB/s][w=195k IOPS][eta 00m:00s] test: (groupid=0, jobs=32): err= 0: pid=586: Wed Apr 22 13:03:04 2026 write: IOPS=202k, BW=787MiB/s (826MB/s)(46.1GiB/60012msec); 0 zone resets bw ( KiB/s): min=498714, max=1199892, per=100.00%, avg=806659.03, stdev=4229.94, samples=3808 iops : min=124677, max=299971, avg=201661.82, stdev=1057.49, samples=3808 cpu : usr=0.32%, sys=1.27%, ctx=8329204, majf=0, minf=1163 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,12094328,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=787MiB/s (826MB/s), 787MiB/s-787MiB/s (826MB/s-826MB/s), io=46.1GiB (49.5GB), run=60012-60012msec After patch: test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64 ... fio-3.39 Starting 32 processes test: Laying out IO file (1 file / 1024MiB) Jobs: 32 (f=32): [w(32)][100.0%][w=1255MiB/s][w=321k IOPS][eta 00m:00s] test: (groupid=0, jobs=32): err= 0: pid=572: Wed Apr 22 13:13:46 2026 write: IOPS=320k, BW=1250MiB/s (1311MB/s)(73.3GiB/60003msec); 0 zone resets bw ( MiB/s): min= 619, max= 2289, per=100.00%, avg=1251.28, stdev= 9.64, samples=3808 iops : min=158538, max=586025, avg=320320.80, stdev=2468.97, samples=3808 cpu : usr=0.35%, sys=11.50%, ctx=1584847, majf=0, minf=1160 IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0% issued rwts: total=0,19203309,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=64 Run status group 0 (all jobs): WRITE: bw=1250MiB/s (1311MB/s), 1250MiB/s-1250MiB/s (1311MB/s-1311MB/s), io=73.3GiB (78.7GB), run=60003-60003msec The script to reproduce that: #!/bin/bash mkfs.btrfs -f /dev/nvme0n1 mount /dev/nvme0n1 /mnt/test mkdir /mnt/test/nocow chattr +C /mnt/test/nocow fio /root/test.fio # cat /root/test.fio [global] rw=randwrite ioengine=io_uring iodepth=64 size=1g direct=1 startdelay=20 force_async=4 ramp_time=5 runtime=60 group_reporting=1 numjobs=32 time_based disk_util=0 clat_percentiles=0 disable_lat=1 disable_clat=1 disable_slat=1 filename=/mnt/test/nocow/fiofile [test] name=test bs=4k stonewall This was on a VM with 8 cores and 8GB of RAM, with a real NVMe exposed through PCI passthrough. The figures for XFS and ext4 in comparison are both about ~3GB/s. Fixes: ad21f15b0f79 ("btrfs: switch to the new mount API") Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs')
-rw-r--r--fs/btrfs/super.c1
1 files changed, 1 insertions, 0 deletions
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b26aa9169e838..64514d600eec7 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1873,6 +1873,7 @@ static int btrfs_get_tree_super(struct fs_context *fc)
fs_info->fs_devices = fs_devices;
mutex_unlock(&uuid_mutex);
+ fc->sb_flags |= SB_NOSEC;
sb = sget_fc(fc, btrfs_fc_test_super, set_anon_super_fc);
if (IS_ERR(sb)) {