diff options
| author | Mark Harmstone <mark@harmstone.com> | 2026-04-22 15:03:35 +0100 |
|---|---|---|
| committer | David Sterba <dsterba@suse.com> | 2026-05-24 03:05:26 +0200 |
| commit | 551e510a97a487218d5f22d61d1a3388ef1171ac (patch) | |
| tree | 1306a5cf44663d96054064190283add1a8105bca /fs | |
| parent | 964f569c14d7778c8f29ce81ec35d7a8fca31adf (diff) | |
| download | linux-next-history-551e510a97a487218d5f22d61d1a3388ef1171ac.tar.gz | |
btrfs: don't force DIO writes to be serialized
Before btrfs switched to the new mount API in 2023, we were setting
SB_NOSEC in btrfs_mount_root(). This flag tells the VFS that the
filesystem may have files which don't have security xattrs, enabling it
to do some optimizations.
Unfortunately this was missed in the transition, meaning that IS_NOSEC
will always return false for a btrfs inode. This means that
btrfs_direct_write() calls will always get the inode lock exclusively,
meaning that DIO writes to the same file will be serialized.
On my machine, this one-line change results in a ~59% improvement in DIO
throughput:
Before patch:
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
...
fio-3.39
Starting 32 processes
test: Laying out IO file (1 file / 1024MiB)
Jobs: 32 (f=32): [w(32)][100.0%][w=764MiB/s][w=195k IOPS][eta 00m:00s]
test: (groupid=0, jobs=32): err= 0: pid=586: Wed Apr 22 13:03:04 2026
write: IOPS=202k, BW=787MiB/s (826MB/s)(46.1GiB/60012msec); 0 zone resets
bw ( KiB/s): min=498714, max=1199892, per=100.00%, avg=806659.03, stdev=4229.94, samples=3808
iops : min=124677, max=299971, avg=201661.82, stdev=1057.49, samples=3808
cpu : usr=0.32%, sys=1.27%, ctx=8329204, majf=0, minf=1163
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,12094328,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=787MiB/s (826MB/s), 787MiB/s-787MiB/s (826MB/s-826MB/s), io=46.1GiB (49.5GB), run=60012-60012msec
After patch:
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
...
fio-3.39
Starting 32 processes
test: Laying out IO file (1 file / 1024MiB)
Jobs: 32 (f=32): [w(32)][100.0%][w=1255MiB/s][w=321k IOPS][eta 00m:00s]
test: (groupid=0, jobs=32): err= 0: pid=572: Wed Apr 22 13:13:46 2026
write: IOPS=320k, BW=1250MiB/s (1311MB/s)(73.3GiB/60003msec); 0 zone resets
bw ( MiB/s): min= 619, max= 2289, per=100.00%, avg=1251.28, stdev= 9.64, samples=3808
iops : min=158538, max=586025, avg=320320.80, stdev=2468.97, samples=3808
cpu : usr=0.35%, sys=11.50%, ctx=1584847, majf=0, minf=1160
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,19203309,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=1250MiB/s (1311MB/s), 1250MiB/s-1250MiB/s (1311MB/s-1311MB/s), io=73.3GiB (78.7GB), run=60003-60003msec
The script to reproduce that:
#!/bin/bash
mkfs.btrfs -f /dev/nvme0n1
mount /dev/nvme0n1 /mnt/test
mkdir /mnt/test/nocow
chattr +C /mnt/test/nocow
fio /root/test.fio
# cat /root/test.fio
[global]
rw=randwrite
ioengine=io_uring
iodepth=64
size=1g
direct=1
startdelay=20
force_async=4
ramp_time=5
runtime=60
group_reporting=1
numjobs=32
time_based
disk_util=0
clat_percentiles=0
disable_lat=1
disable_clat=1
disable_slat=1
filename=/mnt/test/nocow/fiofile
[test]
name=test
bs=4k
stonewall
This was on a VM with 8 cores and 8GB of RAM, with a real NVMe exposed
through PCI passthrough. The figures for XFS and ext4 in comparison are
both about ~3GB/s.
Fixes: ad21f15b0f79 ("btrfs: switch to the new mount API")
Signed-off-by: Mark Harmstone <mark@harmstone.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'fs')
| -rw-r--r-- | fs/btrfs/super.c | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index b26aa9169e838..64514d600eec7 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1873,6 +1873,7 @@ static int btrfs_get_tree_super(struct fs_context *fc) fs_info->fs_devices = fs_devices; mutex_unlock(&uuid_mutex); + fc->sb_flags |= SB_NOSEC; sb = sget_fc(fc, btrfs_fc_test_super, set_anon_super_fc); if (IS_ERR(sb)) { |
