aboutsummaryrefslogtreecommitdiffstats
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2 daysMerge branch 'next' of ↵Mark Brown1-3/+3
https://git.kernel.org/pub/scm/linux/kernel/git/melver/linux.git
2 daysMerge branch 'for-next/kspp' of ↵Mark Brown1-0/+65
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
2 daysMerge branch 'bitmap-for-next' of https://github.com/norov/linux.gitMark Brown7-35/+64
2 daysMerge branch 'next' of ↵Mark Brown1-3/+32
https://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git
2 daysMerge branch 'slab/for-next' of ↵Mark Brown3-82/+268
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-3/+0
https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git
2 daysMerge branch 'kunit' of ↵Mark Brown2-0/+124
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
2 daysMerge branch 'for-next' of ↵Mark Brown3-1/+51
https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git
2 daysMerge branch 'gpio/for-next' of ↵Mark Brown7-160/+264
https://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown3-268/+381
https://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown4-7/+20
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
2 daysMerge branch 'for-next' of ↵Mark Brown2-0/+4
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git
2 daysMerge branch 'for-next' of ↵Mark Brown3-16/+45
https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git # Conflicts: # tools/testing/selftests/cgroup/test_memcontrol.c
2 daysMerge branch 'spmi-next' of ↵Mark Brown1-1/+4
https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
2 daysMerge branch 'next' of https://github.com/awilliam/linux-vfio.gitMark Brown1-1/+20
2 daysMerge branch 'togreg' of ↵Mark Brown5-70/+44
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git # Conflicts: # drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_buffer.c
2 daysMerge branch 'icc-next' of ↵Mark Brown4-0/+504
https://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc.git
2 daysMerge branch 'next' of ↵Mark Brown2-7/+22
https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git
2 daysMerge branch 'char-misc-next' of ↵Mark Brown1-86/+0
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git # Conflicts: # drivers/gpib/cb7210/cb7210.c
2 daysMerge branch 'tty-next' of ↵Mark Brown6-348/+187
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
2 daysMerge branch 'next' of ↵Mark Brown1-6/+53
https://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git # Conflicts: # drivers/thunderbolt/property.c
2 daysMerge branch 'usb-next' of ↵Mark Brown2-15/+22
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
2 daysMerge branch 'driver-core-next' of ↵Mark Brown9-96/+162
https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git
2 daysMerge branch 'for-leds-next' of ↵Mark Brown2-2/+30
https://git.kernel.org/pub/scm/linux/kernel/git/lee/leds.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-0/+2
https://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown2-3/+10
https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-0/+9
https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git
2 daysMerge branch 'next' of https://github.com/kvm-x86/linux.gitMark Brown3-8/+46
# Conflicts: # arch/x86/include/asm/tdx.h
2 daysMerge branch 'next' of ↵Mark Brown3-18/+13
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git
2 daysMerge branch 'for-next' of ↵Mark Brown5-3/+10
https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
2 daysMerge branch 'edac-for-next' of ↵Mark Brown1-0/+3
https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git
2 daysMerge branch 'next' of ↵Mark Brown3-3/+48
https://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
2 daysMerge branch 'master' of ↵Mark Brown24-65/+240
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git # Conflicts: # drivers/cpufreq/Kconfig.x86 # drivers/cpufreq/Makefile
2 daysMerge branch 'for-next' of ↵Mark Brown3-5/+105
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git # Conflicts: # drivers/acpi/acpi_apd.c
2 daysMerge branch 'for-next' of ↵Mark Brown1-0/+2
https://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox.git
2 daysMerge branch 'next' of ↵Mark Brown2-8/+8
https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git
2 daysMerge branch 'next' of ↵Mark Brown5-7/+59
https://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux.git
2 daysMerge branch 'for-next-tpm' of ↵Mark Brown1-9/+12
https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd.git
2 daysMerge branch 'master' of git://git.code.sf.net/p/tomoyo/tomoyo.gitMark Brown5-11/+48
2 daysMerge branch 'next' of ↵Mark Brown5-4/+46
https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-4/+6
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git
2 daysMerge branch 'for-mfd-next' of ↵Mark Brown10-287/+822
https://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git
2 daysMerge branch 'next' of ↵Mark Brown1-1/+1
https://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc.git
2 daysMerge branch 'for-next' of ↵Mark Brown17-1376/+670
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown17-84/+2594
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-1/+4
https://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394.git
2 daysMerge branch 'for-next' of ↵Mark Brown5-4/+22
https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
2 daysMerge branch 'for-next' of ↵Mark Brown4-58/+4
https://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev.git
2 daysMerge branch 'for-linux-next' of ↵Mark Brown2-2/+6
https://gitlab.freedesktop.org/drm/rust/kernel.git # Conflicts: # rust/kernel/alloc/kbox.rs
2 daysMerge branch 'drm-xe-next' of https://gitlab.freedesktop.org/drm/xe/kernel.gitMark Brown1-16/+15
2 daysMerge branch 'for-linux-next' of ↵Mark Brown4-41/+103
https://gitlab.freedesktop.org/drm/misc/kernel.git
2 daysMerge branch 'drm-next' of https://gitlab.freedesktop.org/drm/kernel.gitMark Brown56-543/+1367
# Conflicts: # drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
2 daysMerge branch 'master' of ↵Mark Brown7-346/+53
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
2 daysMerge branch 'spi-nor/next' of ↵Mark Brown1-2/+5
https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git
2 daysMerge branch 'nand/next' of ↵Mark Brown3-4/+37
https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git
2 daysMerge branch 'master' of ↵Mark Brown1-0/+6
https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git
2 daysMerge branch 'for-next' of ↵Mark Brown12-61/+416
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
2 daysMerge branch 'main' of ↵Mark Brown72-436/+1747
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
2 daysMerge branch 'for-next' of ↵Mark Brown5-14/+82
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
2 daysMerge branch 'devel' into for-nextLinus Walleij4-1/+89
2 daysworkqueue: Add warnings and fallback if system_{unbound}_wq is usedMarco Crivellari1-0/+1
Currently many users transitioned already to the new introduced workqueue (system_percpu_wq, system_dfl_wq), but there are new users who still use the older system_wq and system_unbound_wq. This change try to push this transition forward, by warning whether the old workqueues are used. Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/ Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2 daysMerge branch 'next' of ↵Mark Brown1-0/+16
https://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm.git
2 daysMerge branch 'linux-next' of ↵Mark Brown25-29/+41
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
2 daysMerge branch 'next' of git://linuxtv.org/media-ci/media-pending.gitMark Brown10-104/+347
2 daysMerge branch 'hwmon-next' of ↵Mark Brown2-0/+28
https://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-2/+0
https://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git
2 daysMerge branch 'next' of ↵Mark Brown5-16/+131
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
2 daysMerge branch 'fs-next' of linux-nextMark Brown44-275/+1740
# Conflicts: # fs/btrfs/defrag.c
2 daysMerge branch 'riscv-soc-for-next' of ↵Mark Brown1-0/+220
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
2 daysMerge branch 'mips-next' of ↵Mark Brown2-48/+0
https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-3/+1
https://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k.git
2 daysMerge branch 'clk-next' of ↵Mark Brown2-0/+22
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
2 daysMerge branch 'for-next' of https://github.com/Xilinx/linux-xlnx.gitMark Brown1-1/+3
2 daysMerge branch 'ti-next' of ↵Mark Brown1-5/+6
https://git.kernel.org/pub/scm/linux/kernel/git/ti/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-64/+17
https://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux.git
2 daysMerge branch 'sunxi/for-next' of ↵Mark Brown1-2/+2
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux.git
2 daysMerge branch 'for-linux-next' of ↵Mark Brown4-18/+41
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-13/+23
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown2-0/+6
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git
2 daysMerge branch 'for-next' of ↵Mark Brown13-21/+769
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-0/+24
https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown2-0/+111
https://git.kernel.org/pub/scm/linux/kernel/git/frank.li/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown4-1/+429
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl.git
2 daysMerge branch 'for-next' of ↵Mark Brown7-7/+74
https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
2 daysMerge branch 'mm-nonmm-unstable' of ↵Mark Brown1-6/+6
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
2 daysMerge branch 'mm-unstable' of ↵Mark Brown23-230/+292
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
2 daysMerge branch 'mm-nonmm-stable' of ↵Mark Brown11-343/+67
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
2 daysMerge branch 'mm-stable' of ↵Mark Brown11-32/+91
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
3 daysMerge branch 'fixes' of ↵Mark Brown1-2/+2
https://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
3 daysMerge branch 'hyperv-fixes' of ↵Mark Brown3-8/+10
https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git
3 daysMerge branch 'char-misc-linus' of ↵Mark Brown1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
3 daysMerge branch 'tty-linus' of ↵Mark Brown2-1/+13
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
3 daysMerge branch 'master' of ↵Mark Brown1-1/+2
https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git
3 daysMerge branch 'arm/fixes' of ↵Mark Brown1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
3 daysMerge branch 'clang-fixes-for-next' of ↵Mark Brown4-0/+18
https://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux.git
3 daysMerge branch 'mm-hotfixes-unstable' of ↵Mark Brown2-16/+0
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
3 daysMerge remote-tracking branch 'spi/for-7.2' into spi-nextMark Brown3-5/+105
3 daysfirmware: samsung: acpm: remove compile-testing stubsArnd Bergmann1-14/+0
Sashiko reported an inconsistent use of NULL vs ERR_PTR() returns in the stub helpers in xynos-acpm-protocol.h. Since this only happens on dead code for COMPILE_TEST=y, this is not really a bug though. Having stub functions that return NULL is a common way to define optional interfaces, where callers still work when the feature is disabled, though this clearly does not work for acpm because some callers have a NULL pointer dereference when compile testing. Since CONFIG_EXYNOS_ACPM_PROTOCOL already supports compile-testing itself, and all (both) drivers using it clearly require the support, so this just simplifies the option space without losing any build coverage. Remove the stub functions entirely and adjust the one Kconfig dependency to require EXYNOS_ACPM_PROTOCOL unconditionally. Fixes: 6837c006d4e7 ("firmware: exynos-acpm: add empty method to allow compile test") Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org Link: https://lore.kernel.org/all/a7994860-24a3-4f87-84bf-109ed653dda4@linaro.org/ Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260529134454.2147446-1-arnd@kernel.org [krzk: Rebase on difference in devm_acpm_get_by_node()] Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
3 daysfirmware: samsung: acpm: Add devm_acpm_get_by_phandle helperTudor Ambarus1-0/+6
Introduce devm_acpm_get_by_phandle() to standardize how consumer drivers acquire a handle to the ACPM IPC interface. Enforce the use of the "samsung,acpm-ipc" property name across the SoC and simplify the boilerplate code in client drivers. The first consumer of this helper is the Exynos ACPM Thermal Management Unit (TMU) driver. The TMU utilizes a hybrid management approach: direct register access from the Application Processor (AP) is restricted to the interrupt pending (INTPEND) registers for event identification. High-level functional tasks, such as sensor initialization, threshold programming, and temperature reads, are delegated to the ACPM firmware via this IPC interface. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Reviewed-by: Peter Griffin <peter.griffin@linaro.org> Link: https://patch.msgid.link/20260515-acpm-tmu-helpers-v2-6-8ca011d5a965@linaro.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
3 daysfirmware: samsung: acpm: Add TMU protocol supportTudor Ambarus1-0/+18
The Thermal Management Unit (TMU) on the Google GS101 SoC is managed through a hybrid model shared between the kernel and the Alive Clock and Power Manager (ACPM) firmware. Add the protocol helpers required to communicate with the ACPM for thermal operations, including initialization, threshold configuration, temperature reading, and system suspend/resume handshakes. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reviewed-by: Peter Griffin <peter.griffin@linaro.org> Link: https://patch.msgid.link/20260515-acpm-tmu-helpers-v2-5-8ca011d5a965@linaro.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
3 daysfirmware: samsung: acpm: Make acpm_ops const and access via pointerTudor Ambarus1-2/+2
Replace the embedded `struct acpm_ops` inside `struct acpm_handle` with a pointer to a `const struct acpm_ops`. Previously, the operations structure was embedded directly within the handle and populated dynamically at runtime via `acpm_setup_ops()`. This resulted in mutable function pointers and unnecessary per-instance memory overhead. By defining `exynos_acpm_driver_ops` statically as a `const` structure, the function pointers are now safely housed in the read-only `.rodata` section. This improves security by preventing function pointer overwrites, saves memory, and slightly reduces initialization overhead in `acpm_probe()`. Consequently, update all consumer drivers (clk, mfd) to access the operations via the new pointer indirection (`->ops->`). Finally, fix the previously empty kernel-doc description for the ops member to reflect its new pointer nature. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Reviewed-by: Peter Griffin <peter.griffin@linaro.org> Link: https://patch.msgid.link/20260515-acpm-tmu-helpers-v2-4-8ca011d5a965@linaro.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
3 daysfirmware: samsung: acpm: Drop redundant _ops suffix in acpm_ops membersTudor Ambarus1-2/+2
Rename the `dvfs_ops` and `pmic_ops` members of `struct acpm_ops` to `dvfs` and `pmic` respectively. Since these members are housed within the `acpm_ops` structure and utilize the `acpm_*_ops` types, the `_ops` suffix on the variable names creates unnecessary redundancy (e.g., `handle.ops.dvfs_ops`). This cleanup removes the stuttering, leading to cleaner consumer code. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Reviewed-by: Peter Griffin <peter.griffin@linaro.org> Acked-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/linux-samsung-soc/CADrjBPqzKpcd9vuCmNUptCUPyPpPbHcc19-7kN-1c0RpW1e5DQ@mail.gmail.com/T/#mcce154a7e0c6cd1ca6cd5a1e37541ed7a85a84d4 [1] Link: https://patch.msgid.link/20260515-acpm-tmu-helpers-v2-3-8ca011d5a965@linaro.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
3 daysnext-20260522/vfs-braunerMark Brown21-74/+223
# Conflicts: # fs/fuse/dev.c
3 daysMerge branch '9p-next' of https://github.com/martinetd/linuxMark Brown1-0/+2
3 daysMerge trace/for-nextSteven Rostedt (Google)3-3/+6
3 daysMerge ring-buffer/for-nextSteven Rostedt (Google)54-111/+493
3 daysMerge branch 'acpica' into linux-nextRafael J. Wysocki22-27/+33
* acpica: (27 commits) ACPICA: add boundary checks in two places ACPICA: Add package limit checks in parser functions ACPICA: Update version to 20260408 ACPICA: Update the copyright year to 2026 ACPICA: Remove spurious precision from format used to dump parse trees ACPICA: Enhance OEM ID and Table ID validation in acpi_ex_load_table_op() ACPICA: Fix NULL pointer dereference in acpi_ns_custom_package() ACPICA: Enhance buffer validation in acpi_ut_walk_aml_resources() ACPICA: Add validation for node in acpi_ns_build_normalized_path() ACPICA: validate handler object type in two places ACPICA: Improve argument parsing in acpi_ps_get_next_simple_arg() ACPICA: Fix integer overflow in acpi_ex_opcode_3A_1T_1R() (mid_op) ACPICA: Prevent adding invalid references ACPICA: add boundary checks in acpi_ps_get_next_field() ACPICA: validate byte_count in acpi_ps_get_next_package_length() ACPICA: Fix use-after-free in acpi_ds_terminate_control_method() ACPICA: fix I2C LVR item count in the conversion table ACPICA: Mention the LVR bits ACPICA: Change LVR to 8 bit value ACPICA: Fetch LVR I2C resource descriptor ...
3 daysMerge branches 'thermal' and 'thermal-intel' into linux-nextRafael J. Wysocki1-1/+2
* thermal: thermal: sysfs: remove space before tab in macro thermal: core: Add WQ_UNBOUND to alloc_workqueue() users thermal/core: Populate max_state before setting up cooling dev sysfs thermal/core: Split __thermal_cooling_device_register() into two functions thermal: hwmon: Use extra_groups for adding temperature attributes thermal: hwmon: Register a hwmon device for each thermal zone thermal: hwmon: Fix critical temperature attribute removal thermal/core: Use the thermal class pointer as init guard thermal/core: Allocate the thermal class dynamically thermal/core: Add dedicated release callback for thermal zones thermal/core: Add dedicated release callback for cooling devices thermal: core: Simplify unregistration of governors thermal: core: Remove dead code from two functions * thermal-intel: thermal: intel: int340x: Check return value of ptc_create_groups() thermal: intel: int340x: Fix potential shift overflow in ptc_mmio_write()
3 daysMerge branches 'pm-cpufreq' and 'pm-cpuidle' into linux-nextRafael J. Wysocki1-1/+4
* pm-cpufreq: cpufreq: governor: Fix stale prev_cpu_nice spike when enabling ignore_nice_load cpufreq: governor: Fix data races on per-CPU idle/nice baselines cpufreq: intel_pstate: Improve warning message on HWP-disabled hybrid CPUs cpufreq: elanfreq: Drop support for AMD Elan SC4* cpufreq: clean up dead dependencies on X86 in Kconfig cpufreq: conservative: Simplify frequency limit handling cpufreq: Avoid redundant target() calls for unchanged limits cpufreq: Fix typo in comment cpufreq: intel_pstate: Sync policy->cur during CPU offline cpufreq: Documentation: fix sampling_down_factor range cpufreq: Fix hotplug-suspend race during reboot cpufreq: pcc: fix use-after-free and double free in _OSC evaluation * pm-cpuidle: intel_idle: Drop C-states redundant when PC6 is disabled intel_idle: Introduce a helper for checking PC6 intel_idle: Add constants for MSR_PKG_CST_CONFIG_CONTROL
3 daysMerge branch 'acpi-driver' into linux-nextRafael J. Wysocki1-0/+2
* acpi-driver: ACPI: video: Do not initialise device_id_scheme directly ACPI: video: Switch over to devres-based resource management ACPI: video: Use devm for video->entry and backlight cleanup ACPI: video: Use devm action for freeing video devices ACPI: video: Use devm action for video bus object cleanup ACPI: video: Rearrange probe and remove code ACPI: video: Reduce the number of auxiliary device dereferences ACPI: PAD: Switch over to devres-based resource management ACPI: PAD: Fix teardown ordering in acpi_pad_remove() ACPI: PAD: Pass struct device pointer to acpi_pad_notify() ACPI: PAD: Rearrange acpi_pad_notify() ACPI: thermal: Switch over to devres-based resource management ACPI: HED: Switch over to devres-based resource management ACPI: HED: Refine guarding against adding a second instance ACPI: battery: Switch over to devres-based resource management ACPI: AC: Switch over to devres-based resource management ACPI: NFIT: core: Use devm_acpi_install_notify_handler() ACPI: bus: Introduce devm_acpi_install_notify_handler() ACPI: PMIC: Replace mutex_lock/unlock() with guard()/scoped_guard()
3 daysMerge branch 'acpi-driver-devm'Rafael J. Wysocki1-0/+2
Merge updates that introduce devm_acpi_install_notify_handler() and convert some drivers for core ACPI devices previously using acpi_dev_install_notify_handler() to devres-based resource management. * acpi-driver-devm: ACPI: video: Switch over to devres-based resource management ACPI: video: Use devm for video->entry and backlight cleanup ACPI: video: Use devm action for freeing video devices ACPI: video: Use devm action for video bus object cleanup ACPI: video: Rearrange probe and remove code ACPI: video: Reduce the number of auxiliary device dereferences ACPI: PAD: Switch over to devres-based resource management ACPI: PAD: Fix teardown ordering in acpi_pad_remove() ACPI: PAD: Pass struct device pointer to acpi_pad_notify() ACPI: PAD: Rearrange acpi_pad_notify() ACPI: thermal: Switch over to devres-based resource management ACPI: HED: Switch over to devres-based resource management ACPI: HED: Refine guarding against adding a second instance ACPI: battery: Switch over to devres-based resource management ACPI: AC: Switch over to devres-based resource management ACPI: NFIT: core: Use devm_acpi_install_notify_handler() ACPI: bus: Introduce devm_acpi_install_notify_handler()
3 daysMerge back earlier cpufreq material for 7.2Rafael J. Wysocki1-1/+4
3 daysMerge branch 'nfsd-next' of ↵Mark Brown18-157/+582
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux # Conflicts: # fs/exfat/file.c
3 daysMerge branch 'dev' of ↵Mark Brown1-0/+28
https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
3 daysMerge branch 'slab/for-7.2/alloc_bulk' into slab/for-nextVlastimil Babka (SUSE)1-2/+4
3 daysmm/slab: improve kmem_cache_alloc_bulkChristoph Hellwig1-2/+4
The kmem_cache_alloc_bulk return value is weird. It returns the number of allocated objects, but that must always be 0 or the requested number based on the implementations and the handling in the callers, but that assumption is not actually documented anywhere, which confuses automated review tools. Fix this by returning a bool if the allocation succeeded and adding a kerneldoc comment explaining the API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> # skbuff Link: https://patch.msgid.link/20260528093437.2519248-2-hch@lst.de Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 daysMerge branch 'dev' of ↵Mark Brown2-24/+4
https://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat.git
3 daysMerge branch 'for-next' of ↵Mark Brown2-20/+901
https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
3 daysMerge branch into tip/master: 'x86/cache'Ingo Molnar1-7/+11
# New commits in x86/cache: 1cfa74c683ea ("fs/resctrl: Document tasks file behaviour for task id 0 and idle tasks") 9a1646211f8c ("fs/resctrl: Document that automatic counter assignment is best effort") 3aec86e4ea01 ("fs/resctrl: Continue counter allocation after failure") ee3d4c81d89c ("fs/resctrl: Add monitor property 'mbm_cntr_assign_fixed'") f52abe650241 ("fs/resctrl: Disallow the software controller when MBM counters are assignable") 94a1206522d1 ("x86,fs/resctrl: Create 'event_filter' files read only if they're not configurable") 7625632fed43 ("fs/resctrl: Tidy up the error path in resctrl_mkdir_event_configs()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'sched/core'Ingo Molnar7-6/+113
# New commits in sched/core: 5ad278dd20bd ("sched: Remove sched_class::pick_next_task()") b3a2dfa8b42e ("sched/fair: Add newidle balance to pick_task_fair()") e05777c44e53 ("sched/debug: Collapse subsequent CONFIG_SCHED_CLASS_EXT sections") 775570022345 ("sched: Use {READ,WRITE}_ONCE() for preempt_dynamic_mode") 333f6f0e11ac ("sched/debug: Use char * instead of char (*)[]") 25139c11693a ("sched/fair: Fix RCU usage in NOHZ exit path on CPU offline") 9e005ed21152 ("sched/topology: Allow multiple domains to claim sched_domain_shared") dd29c017aed6 ("sched/rt: Have RT_PUSH_IPI be default off for non PREEMPT_RT") 04f80f8b12a0 ("sched: Switch rq->next_class on proxy_resched_idle()") 61ea17a63719 ("sched/fair: Add SIS_UTIL support to select_idle_capacity()") bf6aa722198d ("sched/fair: Reject misfit pulls onto busy SMT siblings on asym-capacity") 25a32e400a14 ("sched/fair: Prefer fully-idle SMT cores in asym-capacity idle selection") fdfe5a8cd873 ("sched/fair: Attach sched_domain_shared to sd_asym_cpucapacity") c9d93a73ce87 ("sched/fair: Drop redundant RCU read lock in NOHZ kick path") acbdbab75ff4 ("sched: Unify SMT active check via sched_smt_active()") 3dbb362f90f3 ("sched/fair: Add sched_smt_active check for fastpaths") 5bc6ab2d42e5 ("sched: Simplify ifdeffery around cpu_smt_mask") 815c5cb76a3e ("topology: Introduce cpu_smt_mask for CONFIG_SCHED_SMT=n") 6d2051403d6c ("sched/fair: Update util_est after updating util_avg during dequeue") ea19506013ad ("sched/clock: Provide !HAVE_UNSTABLE_SCHED_CLOCK stub for sched_clock_stable()") 95f44886afec ("sched/cputime: Drop now-stale mul_u64_u64_div_u64() over-approximation guard") eecd5e117cfa ("sched/deadline: Fix replenishment logic for non-deferred servers") c2e390197ad1 ("sched/rt: Update default bandwidth for real-time tasks to ONE") c99b8593b060 ("sched/cache: Fix stale preferred_llc for a new task") a7660ce1590f ("sched/cache: Fix has_multi_llcs iff at least one partition has multiple LLCs") 5beff4f08727 ("sched/cache: Fix cache aware scheduling enabling for multi LLCs system") 9f7c745850b4 ("sched/cache: Fix race condition during sched domain rebuild") d6b9afab44e2 ("sched/cache: Fix checking active load balance by only considering the CFS task") 03755348b8e7 ("sched/cache: Fix unpaired account_llc_enqueue/dequeue") 91d07324c930 ("sched/cache: Annotate lockless accesses to mm->sc_stat.cpu") 9f23469401b0 ("sched/cache: Fix potential NULL mm pointer access") d943b86dfbf4 ("sched/cache: Fix rcu warning when accessing sd_llc domain") c1e7fe5e75ed ("sched/cache: Add user control to adjust the aggressiveness of cache-aware scheduling") 808915f982c2 ("sched/cache: Avoid cache-aware scheduling for memory-heavy processes") 7030513a0877 ("sched/cache: Calculate the LLC size and store it in sched_domain") 7b34bb1ca324 ("sched/cache: Skip cache-aware scheduling for single-threaded processes") deee5e27d5b6 ("sched/cache: Disable cache aware scheduling for processes with high thread counts") a2b4cf39d9d3 ("sched/cache: Allow only 1 thread of the process to calculate the LLC occupancy") 4ac4d6549a65 ("sched: Use trace_call__<tp>() to save a static branch") 067a31358143 ("sched/cache: Allow the user space to turn on and off cache aware scheduling") d59f4fd1d303 ("sched/cache: Enable cache aware scheduling for multi LLCs NUMA node") 5b1d5e6db20a ("sched/cache: Respect LLC preference in task migration and detach") 714059f79ff0 ("sched/cache: Handle moving single tasks to/from their preferred LLC") e4c9a4cb244a ("sched/cache: Add migrate_llc_task migration type for cache-aware balancing") f38cc2f0d8a3 ("sched/cache: Prioritize tasks preferring destination LLC during balancing") 9a5e22fbb0c8 ("sched/cache: Check local_group only once in update_sg_lb_stats()") 15ad45fb80ca ("sched/cache: Count tasks prefering destination LLC in a sched group") 82c960aee304 ("sched/cache: Calculate the percpu sd task LLC preference") a8d0ca0b7f2f ("sched/cache: Introduce per CPU's tasks LLC preference counter") 46afe3af7ead ("sched/cache: Track LLC-preferred tasks per runqueue") 47d8696b95f7 ("sched/cache: Assign preferred LLC ID to processes") b5ea300a17e3 ("sched/cache: Make LLC id continuous") 23b2b5ccc45c ("sched/cache: Introduce helper functions to enforce LLC migration policy") f025ef275388 ("sched/cache: Record per LLC utilization to guide cache aware scheduling decisions") b4606faab318 ("sched/cache: Limit the scan number of CPUs when calculating task occupancy") df0d98475954 ("sched/cache: Introduce infrastructure for cache-aware load balancing") abb12b9b52cf ("x86/topology: Add paramter to split LLC") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'locking/core'Ingo Molnar3-12/+21
# New commits in locking/core: 88331c4ec23a ("seqlock: Allow UBSAN_ALIGNMENT to fail optimizing") a9e4e50519e9 ("locking/rtmutex: Annotate API and implementation") 03240f5de2dd ("selftests/membarrier: Add rseq stress test for CFS throttle interactions") a5959728548c ("sched/membarrier: Modernize membarrier_global_expedited with cleanup guards") 89976cd73739 ("sched/membarrier: Use per-CPU mutexes for targeted commands") b00192d78bb4 ("locking/barrier: Use correct parameter names") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'locking/context'Ingo Molnar1-8/+22
# New commits in locking/context: f45c5c4adb27 ("compiler-context-analysis: Bump required Clang version to 23") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'irq/msi'Ingo Molnar1-1/+1
# New commits in irq/msi: 3661d5f40376 ("genirq/msi: Fix typos in msi_domain_ops comment") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'irq/drivers'Ingo Molnar2-3/+3
# New commits in irq/drivers: e61654fbc3bc ("irqchip/gic-v4: Don't advertise VLPIs if no ITS is probed") 5fd6f2154734 ("irqchip/gic-v3-its: Use FIELD_MODIFY()") 2ee2a685ee83 ("irqchip/econet-en751221: Support MIPS 34Kc VEIC mode") 02bea6ff684b ("dt-bindings: interrupt-controller: econet: Add CPU interrupt mapping") 5b9cb104594f ("irqchip/meson-gpio: Add support for Amlogic A9 SoCs") f51c99a0e502 ("dt-bindings: interrupt-controller: Add support for Amlogic A9 SoCs") e8d3dcdf9f57 ("irqchip/meson-gpio: Use the correct register in meson_s4_gpio_irq_set_type()") 8b9db6739610 ("irqchip/starfive: Fix error check for devm_platform_ioremap_resource()") 76841b0ea8be ("irqchip/qcom: Unify user-visible "Qualcomm" name") 5a59e82f95d3 ("irqchip/gic: Replace __ASSEMBLY__ with __ASSEMBLER__") 96c0c9b48850 ("irqchip/starfive: Implement irq_set_type() and irq_ack() callbacks") 5d1b12880fd8 ("irqchip/starfive: Increase the interrupt source number up to 64") 2f59ca185497 ("irqchip/starfive: Use devm_ interfaces to simplify resource release") ac2005bba8d9 ("irqchip/starfive: Rename jh8100 to jhb100") a540d544db1c ("dt-bindings: interrupt-controller: Repurpose binding for unreleased jh8100 for jhb100") d3587cc4a5e6 ("irqchip/aspeed-intc: Remove AST2700-A0 support") 46e39ee92d14 ("irqchip/ast2700-intc: Add KUnit tests for route resolution") 07825e41519a ("irqchip/ast2700-intc: Add AST2700-A2 support") 51561ad8c89c ("dt-bindings: interrupt-controller: Describe AST2700-A2 hardware instead of A0") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'irq/core'Ingo Molnar3-3/+7
# New commits in irq/core: 171cc0d9eed1 ("genirq/proc: Speed up /proc/interrupts iteration") 61b51a167c52 ("genirq/proc: Runtime size the chip name") 7603e0575d8a ("genirq: Expose irq_find_desc_at_or_after() in core code") 1d9c4745bfb6 ("genirq: Add rcuref count to struct irq_desc") 34594da7650d ("genirq/proc: Increase default interrupt number precision to four") 2d62735f1d4a ("genirq: Calculate precision only when required") 4892e5e71ec9 ("genirq: Cache the condition for /proc/interrupts exposure") 3ba92f6a2820 ("genirq/manage: Make NMI cleanup RT safe") b99dc723b12e ("genirq: Expose nr_irqs in core code") cca5e6fa791b ("scripts/gdb: Update x86 interrupts to the array based storage") d6b70b16b4e7 ("x86/irq: Move IOAPIC misrouted and PIC/APIC error counts into irq_stats") 8713f2e596a1 ("x86/irq: Suppress unlikely interrupt stats by default") 2b57c69917ee ("x86/irq: Make irqstats array based") 0179464391af ("genirq/proc: Utilize irq_desc::tot_count to avoid evaluation") 95c33a64f203 ("genirq/proc: Avoid formatting zero counts in /proc/interrupts") 115bbf0c1b60 ("x86/irq: Optimize interrupts decimals printing") c2c7983c93f5 ("genirq/proc: Size interrupt directory names for 10-digit interrupt numbers") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'timers/merge'Ingo Molnar6-25/+62
# New commits in timers/merge: 3eb4923e6851 ("clocksource: Add devm_clocksource_register_*() helpers") c8d32a0389fb ("timers: Fix flseep() typo in kernel-doc comment") 5d330d652d7a ("hrtimer: Fix the bogus return type of __hrtimer_start_range_ns()") 3af1f49f415d ("hrtimer: Return ktime_t from hrtimer_get_next_event()/hrtimer_next_event_without()") 33d4bfc49613 ("clocksource: Clean up clocksource_update_freq() functions") ed3b3c497668 ("alarmtimer: Remove stale return description from alarm_handle_timer()") b00385b8d081 ("selftests/posix_timers: Use CLOCK_THREAD_CPUTIME_ID for ITIMER_PROF measurements") cab0cd0130eb ("scripts/timers: Add timer_migration_tree.py") 5a7dfbcbbdb6 ("timers/migration: Handle capacity in connect tracepoints") 098cbaad8e57 ("timers/migration: Split per-capacity hierarchies") 3ba25488380f ("timers/migration: Track CPUs in a hierarchy") ff65875f80d1 ("timers/migration: Abstract out hierarchy to prepare for CPU capacity awareness") ed78a7019419 ("alarmtimer: Remove unused interfaces") 12e4311aa5b2 ("netfilter: xt_IDLETIMER: Switch to alarm_start_timer()") 9fa2e38ab749 ("power: supply: charger-manager: Switch to alarm_start_timer()") 7dda99952ced ("fs/timerfd: Use the new alarm/hrtimer functions") f4b58f61da79 ("alarmtimer: Convert posix timer functions to alarm_start_timer()") 183d00b72713 ("alarmtimer: Provide alarm_start_timer()") acc071343d29 ("posix-timers: Switch to hrtimer_start_expires_user()") cfb7fe3fdd4c ("posix-timers: Handle the timer_[re]arm() return value") 6fdb2677a594 ("posix-timers: Expand timer_[re]arm() callbacks with a boolean return value") b40c927345a9 ("hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers") bd5956166d20 ("hrtimer: Provide hrtimer_start_range_ns_user()") 68ed094971b0 ("clocksource/drivers/timer-of: Make the code compatible with modules") 2423405880c2 ("clocksource/drivers/mmio: Make the code compatible with modules") fed9f727cc3f ("clocksource/drivers/sun5i: Handle error returns from devm_reset_control_get_optional_exclusive()") 045a9dac7eb7 ("clocksource/drivers/timer-rtl-otto: Make rttm_cs variable static") b385caf91868 ("dt-bindings: timer: fsl,imxgpt: add compatible string fsl,imx25-epit") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 dayscrypto: af_alg - Drop support for off-CPU cryptographyDemi Marie Obenour1-1/+13
AF_ALG is deprecated and exposed to unprivileged userspace. Only use the least buggy algorithm implementations: the pure software ones. This removes one of the main advantages of AF_ALG, which is the ability to use it with off-CPU accelerators. However, using off-CPU accelerators has huge overheads, both in performance and attack surface. I have yet to see real-world, performance-critical workloads where using an accelerator via AF_ALG is actually a win over doing cryptography in userspace. If using an off-CPU accelerator really does turn out to be a win, a new API should be developed that is actually a good fit for it. Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 daysnet: Remove support for AIO on socketsDemi Marie Obenour2-5/+1
The only user of msg->msg_iocb was AF_ALG, but that's deprecated. It can be removed entirely at the cost of only supporting synchronous operations. This doesn't break userspace, which will silently block (for a bounded amount of time) in io_submit instead of operating asynchronously. This also makes struct msghdr smaller, helping every other caller of sendmsg(). Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 dayscrypto: hisilicon/qm - support doorbell enable controlZongyu Wu1-0/+12
The driver notifies the hardware to handle task through doorbell. Currently, doorbell is enabled by default. To prevent the process from sending doorbells during hardware reset scenarios, which could cause the hardware to process doorbells and trigger new errors: For example, when the physical machine is resetting the device, doorbells are still being sent from the virtual machine. Therefore, the driver disables doorbell during hardware unavailability. After hardware initialization is completed, doorbell is enabled, and any task sent during the unavailability period will return errors. The hardware supports the PF to disable doorbells for all functions, while the VF can only disable its own doorbell function. When the PF is reset, it will disable doorbells for all functions. When VF is reset, it only disables its own doorbell and does not affect tasks on other functions. Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 dayscrypto: hisilicon/qm - support function-level error resetZhushuai Yin1-0/+1
When executing operations on crypto devices, hardware errors are inevitable. For certain errors, a full device reset is required to recover. However, in certain cases, only a specific function may fail, while other functions can still operate normally. A system-wide RAS reset in such cases would unnecessarily impact functioning components. This patch introduces function-level granularity handling, enabling targeted resets of only the error-reporting functions without affecting other operational functions. Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com> Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 dayscrypto: hisilicon/qm - place the interrupt status interface after the PM ↵Zhushuai Yin1-1/+0
usage counter To avoid accessing memory of a suspended device, and since the counter interface used by PM involves sleep operations, the counter interface cannot be placed in the interrupt top half. Therefore, the interface for acquiring the interrupt status in the RAS reset flow that resides in the interrupt context needs to be moved to the bottom half for processing. Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com> Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 dayscrypto: hisilicon/qm - allow VF devices to query hardware isolation statusZhushuai Yin1-0/+1
The problem that the VF device cannot obtain the isolation status and isolation threshold of the device is resolved. The accelerator driver can query the device isolation status and threshold via the VF device using the fault query sysfs interface under uacce. Note that only the PF device supports isolation policy configuration, while the VF device is limited to read-only query operations. Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com> Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 dayserr.h: use __always_inline on all error pointer helpersArnd Bergmann1-6/+6
While testing randconfig builds on s390, I came across a link failure with CONFIG_DMA_SHARED_BUFFER disabled: ERROR: modpost: "dma_buf_put" [drivers/iommu/iommufd/iommufd.ko] undefined! The problem here is that IS_ERR() is not inlined and dead code elimination fails as a consequence. The err.h helpers all turn into a trivial assignment of a bit mask and should never result in a function call, so force them to always be inline. This should generally result in better object code aside from avoiding the link failure above. Link: https://lore.kernel.org/20260526101851.2495110-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Ansuel Smith <ansuelsmth@gmail.com> Cc: Bjorn Andersson <andersson@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysfooAndrew Morton11-343/+67
3 daysmm: document the folio refcount a little betterMatthew Wilcox (Oracle)1-0/+18
Expand the documentation of folio_ref_count() to talk about expected, temporary and spurious refcounts as well as the concept of freezing. Link: https://lore.kernel.org/20260526200032.353868-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayslib: split codetag_lock_module_list()Bart Van Assche1-1/+2
Letting a function argument indicate whether a lock or unlock operation should be performed is incompatible with compile-time analysis of locking operations by sparse and Clang. Hence, split codetag_lock_module_list() into two functions: a function that locks cttype->mod_lock and another function that unlocks cttype->mod_lock. No functionality has been changed. See also commit 916cc5167cc6 ("lib: code tagging framework"). Link: https://lore.kernel.org/20260324214226.3684605-1-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAXJP Kobryn (Meta)1-4/+4
compact_gap() returns 2 << order, which is used as watermark headroom in __compaction_suitable() and as a reclaim target in kswapd. The computed value scales exponentially by order. For order-9 THP allocations this evaluates to 1024 pages, but the compaction free scanner's working set is bounded by COMPACT_CLUSTER_MAX (32 pages). The scanner stops isolating free pages once it matches the migration batch. The current gap over-reserves by 32x. On fragmented production hosts, kswapd will try and reclaim up to the gap, but it only reaches that threshold 18% of the time, causing reclaim to continue a majority of the time. The over-sized gap also causes 46% of order-9 compaction suitability checks to fail unnecessarily - the zone has sufficient free pages for the scanner to operate, but not enough to clear the inflated threshold. Cap compact_gap() at COMPACT_CLUSTER_MAX to align the watermark headroom with the scanner's actual capacity. Orders 0-4 are unaffected since their gap is <= 32. A/B test on ~100 instagram production hosts (64GB, 60s measurement): Unpatched (43 hosts) pgscan_kswapd (mean/host): ~1.6M reclaim efficiency (steal/scan): 83.8% compaction success (success/stall): 2.1% THP success (alloc/alloc+fallback): 4.9% forced lru_add_drain (mean/host): ~107K Patched (59 hosts) pgscan_kswapd (mean/host): ~449K reclaim efficiency (steal/scan): 91.0% compaction success (success/stall): 28.3% THP success (alloc/alloc+fallback): 17.2% forced lru_add_drain (mean/host): ~64K Link: https://lore.kernel.org/20260519200851.141955-1-jp.kobryn@linux.dev Signed-off-by: JP Kobryn (Meta) <jp.kobryn@linux.dev> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/alloc_tag: replace fixed-size early PFN array with dynamic linked listHao Ge1-2/+2
Pages allocated before page_ext is available have their codetag left uninitialized. Track these early PFNs and clear their codetag in clear_early_alloc_pfn_tag_refs() to avoid "alloc_tag was not set" warnings when they are freed later. Currently a fixed-size array of 8192 entries is used, with a warning if the limit is exceeded. However, the number of early allocations depends on the number of CPUs and can be larger than 8192. Replace the fixed-size array with a dynamically allocated linked list of pfn_pool structs. Each node is allocated via alloc_page() and mapped to a pfn_pool containing a next pointer, an atomic slot counter, and a PFN array that fills the remainder of the page. The tracking pages themselves are allocated via alloc_page(), which would trigger __pgalloc_tag_add() -> alloc_tag_add_early_pfn() and recurse indefinitely. Introduce __GFP_NO_CODETAG (reuses the %__GFP_NO_OBJ_EXT bit) and pass gfp_flags through pgalloc_tag_add() so that the early path can skip recording allocations that carry this flag. Link: https://lore.kernel.org/20260506022256.32664-1-hao.ge@linux.dev Signed-off-by: Hao Ge <hao.ge@linux.dev> Suggested-by: Suren Baghdasaryan <surenb@google.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/truncate: use folio_split() in truncate_inode_partial_folio()Zi Yan1-23/+2
After READ_ONLY_THP_FOR_FS is removed, FS either supports large folio or not. folio_split() can be used on a FS with large folio support without worrying about getting a THP on a FS without large folio support. When READ_ONLY_THP_FOR_FS was present, a PMD large pagecache folio can appear in a FS without large folio support after khugepaged or madvise(MADV_COLLAPSE) creates it. During truncate_inode_partial_folio(), such a PMD large pagecache folio is split and if the FS does not support large folio, it needs to be split to order-0 ones and could not be split non uniformly to ones with various orders. try_folio_split_to_order() was added to handle this situation by checking folio_check_splittable(..., SPLIT_TYPE_NON_UNIFORM) to detect if the large folio is created due to READ_ONLY_THP_FOR_FS and the FS does not support large folio. Now READ_ONLY_THP_FOR_FS is removed, all large pagecache folios are created with FSes supporting large folio, this function is no longer needed and all large pagecache folios can be split non uniformly. Link: https://lore.kernel.org/20260517135416.1434539-10-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysfs: remove nr_thps from struct address_spaceZi Yan1-5/+0
filemap_nr_thps*() are removed, the related field, address_space->nr_thps, is no longer needed. Remove it. This shrinks struct address_space by 8 bytes on 64-bit systems which may increase the number of inodes we can cache. Link: https://lore.kernel.org/20260517135416.1434539-8-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Barry Song <baohua@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm: fs: remove filemap_nr_thps*() functions and their usersZi Yan1-29/+0
They are used by READ_ONLY_THP_FOR_FS to handle writes to FSes without large folio support, so that read-only THPs created in these FSes are not seen by the FSes when the underlying fd becomes writable. Now read-only PMD THPs only appear in a FS with large folio support and the supported orders include PMD_ORDER. READ_ONLY_THP_FOR_FS was using mapping->nr_thps, inode->i_writecount, and smp_mb() to prevent writes to a read-only THP and collapsing writable folios into a THP. In collapse_file(), mapping->nr_thps is increased, then smp_mb(), and if inode->i_writecount > 0, collapse is stopped, while do_dentry_open() first increases inode->i_writecount, then a full memory fence, and if mapping->nr_thps > 0, all read-only THPs are truncated. Now this mechanism can be removed along with READ_ONLY_THP_FOR_FS code, since a dirty folio check has been added after try_to_unmap() in collapse_file() to prevent dirty folios from being collapsed as clean. Link: https://lore.kernel.org/20260517135416.1434539-7-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Barry Song <baohua@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/khugepaged: remove READ_ONLY_THP_FOR_FS check in hugepage_enabled()Zi Yan1-1/+1
Remove the READ_ONLY_THP_FOR_FS gate and khugepaged for file-backed pmd-sized hugepages are enabled by the global transparent hugepage control. khugepaged can still be enabled by per-size control for anon and shmem when the global control is off. Add shmem_hpage_pmd_enabled() stub for !CONFIG_SHMEM to remove IS_ENABLED(SHMEM) in hugepage_enabled(). Clean up hugepage_enabled() by moving anon code to anon_hpage_enabled(). Link: https://lore.kernel.org/20260517135416.1434539-5-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Nico Pache <npache@redhat.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Barry Song <baohua@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/khugepaged: remove READ_ONLY_THP_FOR_FS checkZi Yan1-0/+27
Patch series "Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files", v6. This patch (of 14): collapse_file() requires FSes supporting large folio with at least PMD_ORDER, so replace the READ_ONLY_THP_FOR_FS check with that. MADV_COLLAPSE ignores shmem huge config, so exclude the check for shmem. While at it, replace VM_BUG_ON with VM_WARN_ON_ONCE. Add a helper function mapping_pmd_folio_support() for FSes supporting large folio with at least PMD_ORDER. Link: https://lore.kernel.org/20260517135416.1434539-1-ziy@nvidia.com Link: https://lore.kernel.org/20260517135416.1434539-2-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Nico Pache <npache@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Barry Song <baohua@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/khugepaged: improve tracepoints for mTHP ordersNico Pache1-12/+22
Add the order to the mm_collapse_huge_page<_swapin,_isolate> tracepoints to give better insight into what order is being operated at for. Link: https://lore.kernel.org/20260522150009.121603-10-npache@redhat.com Signed-off-by: Nico Pache <npache@redhat.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Barry Song <baohua@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Rientjes <rientjes@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam R. Howlett <liam@infradead.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Rafael Aquini <raquini@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Takashi Iwai (SUSE) <tiwai@suse.de> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Usama Arif <usama.arif@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/khugepaged: add per-order mTHP collapse failure statisticsNico Pache1-0/+3
Add three new mTHP statistics to track collapse failures for different orders when encountering swap PTEs, excessive none PTEs, and shared PTEs: - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to encountering a swap PTE. - collapse_exceed_none_pte: Counts when mTHP collapse fails due to exceeding the none PTE threshold for the given order - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to encountering a shared PTE. These statistics complement the existing THP_SCAN_EXCEED_* events by providing per-order granularity for mTHP collapse attempts. The stats are exposed via sysfs under `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each supported hugepage size. As we currently do not support collapsing mTHPs that contain a swap or shared entry, those statistics keep track of how often we are encountering failed mTHP collapses due to these restrictions. We will add support for mTHP collapse for anonymous pages next; lets also track when this happens at the PMD level within the per-mTHP stats. Link: https://lore.kernel.org/20260522150009.121603-9-npache@redhat.com Signed-off-by: Nico Pache <npache@redhat.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Rafael Aquini <raquini@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Takashi Iwai (SUSE) <tiwai@suse.de> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Usama Arif <usama.arif@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/khugepaged: generalize alloc_charge_folio()Dev Jain1-0/+2
Pass order to alloc_charge_folio() and update mTHP statistics. Link: https://lore.kernel.org/20260522150009.121603-3-npache@redhat.com Signed-off-by: Dev Jain <dev.jain@arm.com> Co-developed-by: Nico Pache <npache@redhat.com> Signed-off-by: Nico Pache <npache@redhat.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Usama Arif <usama.arif@linux.dev> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Barry Song <baohua@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Rientjes <rientjes@google.com> Cc: Gregory Price <gourry@gourry.net> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Rafael Aquini <raquini@redhat.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shivank Garg <shivankg@amd.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Takashi Iwai (SUSE) <tiwai@suse.de> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yang Shi <yang@os.amperecomputing.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysuserfaultfd: make functions that are not used outside uffd staticMike Rapoport (Microsoft)1-36/+0
After merging fs/userfaultfd.c into mm/userfaultfd.c, several functions that were previously shared between the two files are now only used within mm/userfaultfd.c. Make them static and remove their declarations from include/linux/userfaultfd_k.h. Link: https://lore.kernel.org/20260523173759.3964908-3-rppt@kernel.org Assisted-by: Copilot:claude-opus-4-6 Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/mglru: use folio_mark_accessed to replace folio_set_activeBarry Song (Xiaomi)1-1/+1
MGLRU gives high priority to folios mapped in page tables. As a result, folio_set_active() is invoked for all folios read during page faults. In practice, however, readahead can bring in many folios that are never accessed via page tables. A previous attempt by Lei Liu proposed introducing a separate LRU for readahead[1] to make readahead pages easier to reclaim, but that approach is likely over-engineered. Before commit 4d5d14a01e2c ("mm/mglru: rework workingset protection"), folios with PG_active were always placed in the youngest generation, leading to over-protection and increased refaults. After that commit, PG_active folios are placed in the second youngest generation, which is still too optimistic given the presence of readahead. In contrast, the classic active/inactive scheme is more conservative. This patch switches to using folio_mark_accessed() and begins prefaulted file folios from the second oldest generation instead of active generations. We should also adjust the following accordingly: - WORKINGSET_ACTIVATE: aligned with setting active for refaulted workingset folios; - lru_gen_folio_seq(): place (pre)faulted file folios into the second oldest generation; - promote second-scanned folios to workingset in folio_check_references(): we now have to depend on folio_lru_refs() > 1, since we previously relied on PG_referenced being set during the first scan, but PG_referenced is now set earlier. On x86, running a kernel build inside a memcg with a 1GB memory limit using 20 threads. w/o patch: real 1m50.764s user 25m32.305s sys 4m0.012s pswpin: 1333245 pswpout: 4366443 pgpgin: 6962592 pgpgout: 17780712 swpout_zero: 1019603 swpin_zero: 14764 refault_file: 287794 refault_anon: 1347963 w/ patch: real 1m48.879s user 25m29.224s sys 3m37.421s pswpin: 568480 pswpout: 2322657 pgpgin: 4073416 pgpgout: 9613408 swpout_zero: 593275 swpin_zero: 9118 refault_file: 262505 refault_anon: 577550 active/inactive LRU: real 1m49.928s user 25m28.196s sys 3m40.740s pswpin: 463452 pswpout: 2309119 pgpgin: 4438856 pgpgout: 9568628 swpout_zero: 743704 swpin_zero: 7244 refault_file: 562555 refault_anon: 470694 Lance and Xueyuan made a huge contribution to this patch through testing. Link: https://lore.kernel.org/20260526130938.66253-1-baohua@kernel.org Link: https://lore.kernel.org/linux-mm/20250916072226.220426-1-liulei.rjpt@vivo.com/ [1] Signed-off-by: Barry Song (Xiaomi) <baohua@kernel.org> Tested-by: Lance Yang <lance.yang@linux.dev> Tested-by: Xueyuan Chen <xueyuan.chen21@gmail.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Kairui Song <kasong@tencent.com> Cc: Qi Zheng <qi.zheng@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: wangzicheng <wangzicheng@honor.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Lei Liu <liulei.rjpt@vivo.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon: fix missing parens in macro argumentsMaksym Shcherba1-4/+4
Patch series "mm/damon: fix macro arguments and clarify quota goals doc", v2. This patch (of 2): The DAMON iterator macros do not wrap their pointer arguments with parentheses. This can cause build failures when the argument is a complex expression due to operator precedence issues. Add missing parentheses around the arguments in the following macros to prevent potential build failures: - damon_for_each_region() - damon_for_each_region_from() - damon_for_each_region_safe() - damos_for_each_quota_goal() Link: https://lore.kernel.org/20260521202020.126500-1-maksym.shcherba@lnu.edu.ua Link: https://lore.kernel.org/20260521202020.126500-2-maksym.shcherba@lnu.edu.ua Signed-off-by: Maksym Shcherba <maksym.shcherba@lnu.edu.ua> Reviewed-by: SeongJae Park <sj@kernel.org> Assisted-by: Antigravity:Gemini-3.1-Pro Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: hide damon_destroy_region()SeongJae Park1-1/+0
damon_destroy_region() is being used by only DAMON core, but exposed to DAMON API callers. Exposing something that is not really being used by others will only increase the maintenance cost. Hide it. Link: https://lore.kernel.org/20260522154026.80546-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: hide damon_insert_region()SeongJae Park1-11/+0
damon_insert_region() is being used by only DAMON core, but exposed to DAMON API callers. Exposing something that is not really being used by others will only increase the maintenance cost. Hide it. Link: https://lore.kernel.org/20260522154026.80546-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: hide damon_add_region()SeongJae Park1-1/+0
damon_add_region() is being used by only DAMON core, but exposed to DAMON API callers. Exposing something that is not really being used by others will only increase the maintenance cost. Hide it. Link: https://lore.kernel.org/20260522154026.80546-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/vma: eliminate mmap_action->error_hook, introduce error_filterLorenzo Stoakes1-6/+3
Rather than providing a hook, simplify things by providing the ability to filter errors. This allows us to more carefully validate the value provided and thus ensure only a valid error code is specified, and simplifies the interface. This way, we eliminate all hooks but mmap_prepare and allow only mmap actions to be specified (which core mm controls). This significantly improves robustness and eliminates any unnecessary code duplication in driver mmap hooks. We also update the /dev/mem logic (the only user) to use mmap_action->error_filter instead. Link: https://lore.kernel.org/e770b28427937057fa953ac380a134b24acd8bb4.1779462249.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/vma: remove mmap_action->success_hookLorenzo Stoakes1-10/+0
This hook was introduced to work around code that seemed to absolutely require access to a VMA pointer upon mmap(). However, providing this hook leaves a backdoor to drivers getting access to the very thing mmap_prepare eliminates - a pointer to the VMA. Let's solve this contradiction by removing it. The key intended user was hugetlb, however it seems that the best course now is to avoid allowing all drivers the ability to work around mmap_prepare, and find a different solution there. Link: https://lore.kernel.org/2521c19866f3f10f9085d094cc4f06769042be71.1779462249.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysdrivers/char/mem: eliminate unnecessary use of success_hookLorenzo Stoakes1-0/+5
Patch series "remove mmap_action success, error hooks", v2. The mmap_action->success_hook was a strange beast added to enable code which appeared to absolutely require access to a VMA pointer to work correctly. Primarily this was for hugetlb, however a different approach will be taken there, as clearly more work is required to figure out a sensible way of converting hugetlb to use mmap_prepare. The other user was the memory char driver, specifically /dev/zero which has the unusual property of explicitly setting file-backed VMAs anonymous. Providing the success hook was always foolish, as it allowed drivers a way to workaround the restriction that they should not access a pointer to a not-yet-correctly-initialised VMA - which defeats the purpose of the mmap_prepare work. We can achieve the same thing in memory char driver without needing the success hook, so this series removes that, then removes the success hook altogether. The error hook is also unnecessary - the motivation for this was for functions which need to filter the error code when performing an mmap action in order to avoid breaking userspace. We can achieve this by just providing a field for the error code. Doing this means we don't have to worry about the hook doing anything odd. We also add a check to ensure the error code is in fact valid. Again the memory char driver is the only current user of this, so this series updates it to use that. After this change mmap_action has no custom hooks at all, which seems rather more cromulent than before. This patch (of 3): /dev/zero, uniquely, marks memory mapped there as anonymous. This is currently achieved using the mmap_action->success_hook. However this hook circumvents the abstraction of VMA initialisation so it's preferable to do things a different way. To achieve this, this patch firstly defaults the VMA descriptor's vm_ops field to the dummy VMA operations, which is what file-backed VMAs default this field to. That way, we can detect whether a driver sets this field to NULL in order to mark it anonymous. We then introduce vma_desc_set_anonymous() to do this explicitly, and invoke it in mmap_zero_prepare(). This way, any driver which does not explicitly set desc->vm_ops, retains the dummy vm_ops as they would previously. We also update set_vma_user_defined_fields() to make clear that we are either setting vma->vm_ops to what is provided by the driver (or defaulting to dummy_vm_ops if not set), or setting the VMA anonymous. This lays the groundwork for removing the success hook. Link: https://lore.kernel.org/cover.1779462249.git.ljs@kernel.org Link: https://lore.kernel.org/5d1e8bd29d6e070218ba7a03461df562e372b91e.1779462249.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/sysfs: setup damon_filter->memcg_id from pathSeongJae Park1-0/+1
Find and set the memcg_id for damon_filter from the user-passed memory cgroup path when updating the DAMON input parameters. Link: https://lore.kernel.org/20260518234119.97569-27-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: introduce DAMON_FILTER_TYPE_MEMCGSeongJae Park1-0/+6
Belonging memory cgoup is another data attribute that can be useful to monitor. Introduce a new DAMON filter type, namely DAMON_FILTER_TYPE_MEMCG, for monitoring of this attribute. Link: https://lore.kernel.org/20260518234119.97569-23-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon: trace probe_hitsSeongJae Park1-0/+38
Introduce a new tracepoint for exposing the per-region per-probe positive sample count via tracefs. Link: https://lore.kernel.org/20260518234119.97569-19-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: introduce damon_ops->apply_probesSeongJae Park1-0/+4
Extend damon_operations struct with a new callback, namely apply_probes. The callback will be invoked for data attributes monitoring. More specifically, the callback will apply damon_probe objects to each region and update the per-region per-probe counters for the number of encountered probe-positive samples. Link: https://lore.kernel.org/20260518234119.97569-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: introduce damon_region->probe_hitsSeongJae Park1-0/+4
Add an array for the per-region per-probe positive samples count. For simple and efficient implementation, add a limit to the number of data probes and set the array to support only the limited number of counters. Link: https://lore.kernel.org/20260518234119.97569-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: introduce damon_filterSeongJae Park1-0/+36
Define a data structure for constructing damon_probe's attributes check, namely damon_filter. It is very similar to damos_filter but works only for monitoring purposes. Also embed that into damon_probe, implement essential handling of the link, with fundamental helpers. Link: https://lore.kernel.org/20260518234119.97569-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: embed damon_probe objects in damon_ctxSeongJae Park1-0/+9
Let damon_probe objects be able to be installed on a given damon_ctx, by adding a linked list header for storing the objects. Add initialization and cleanup of the new field with helper functions, too. Link: https://lore.kernel.org/20260518234119.97569-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: introduce struct damon_probeSeongJae Park1-0/+9
Patch series "mm/damon: introduce data attributes monitoring". TL; DR ====== Extend DAMON for monitoring general data attributes other than accesses. The short term motivation is lightweight page type (e.g., belonging cgroup) aware monitoring. In long term, this will help extending DAMON for multiple access events capture primitives (e.g., page faults and PMU) and eventually pivotting DAMON to a "Data Attributes Monitoring and Operations eNgine" in long term. Background: High Cost of Page Level Properties Monitoring ========================================================= DAMON is initially introduced as a Data Access MONitor. It has been extended for not only access monitoring but also data access-aware system operations (DAMOS). But still the monitoring part is only for data accesses. Data access patterns is good information, but some users need more holistic views. Particularly, users want to show the access pattern information together with the types of the memory. For example, users who work for making huge pages efficiently want to know how much of DAMON-found hot/cold regions are backed by huge pages. Users who run multiple workloads with different cgroups want to know how much of DAMON-found hot/cold regions belong to specific cgroups. For the user demand, we developed a DAMOS extension for page level properties based monitoring [1], which has landed on 6.14. Using the feature, users can inform the page level data properties that they are interested in, in a flexible format that uses DAMOS filters. Then, DAMON applies the filters to each folio of the entire DAMON region and lets users know how many bytes of memory in each DAMON region passed the given filters. This gives page level detailed and deterministic information to users. But, because the operation is done at page level, the overhead is proportional to the memory size. It was useful for test or debugging purposes on a small number of machines. But it was obviously too heavy to be enabled always on all machines running the real user workloads. For real world workloads, it was recommended to use the feature with user-space controlled sampling approaches. For example, users could do the page level monitoring only once per hour, on randomly selected one percent of machines of their fleet. If the runtime and the size of the fleet is long and big enough, it should provide statistically meaningful data. But users are too busy to implement such controls on their own. Data Attributes Monitoring ========================== Extend DAMON to monitor not only data accesses, but also general data attributes. Do the extension while keeping the main promise of DAMON, the bounded and best-effort minimum overhead. Allow users to specify what data attributes in addition to the data access they want to monitor. Users can install one 'data probe' per data attribute of their interest for this purpose. The 'data probe' should be able to be applied to any memory, and determine if the given memory has the appropriate data attribute. E.g., if memory of physical address 42 belongs to cgroup A. Each 'data probe' is configured with filters that are very similar to the DAMOS filters. When DAMON checks if each sampling address memory of each region is accessed since the last check, it applies data probes if registered. Same to the number of access check-positive samples accounting (nr_accesses), it accounts the number of each data probe-positive samples in another per-region counters array, namely 'probe_hits'. When DAMON resets nr_accesses every aggregation interval, it resets 'probe_hits' together. Users can read 'probe_hits' just before the values are reset. In this way, users can know how many hot/cold memory regions have data attributes of their interest. E.g., 30 percent of this system's hot memory is belonging to cgroup A, and 80 percent of the cgroup A-belonging hot memory is backed by huge pages. Patches Sequence ================ First eight patches implement the core feature, interface and the working support. Patch 1 introduces data probe data structure, namely damon_probe. Patch 2 extends damon_ctx for installing data probes. Patch 3 introduces another data structure for filters of each data probe, namely damon_filter. Patch 4 updates damon_ctx commit function to handle the probes. Patch 5 extends damon_region for the per-region per-probe positive samples counter, namely probe_hits. Patch 6 extends damon_operations for applying probes on the underlying DAMON operations implementation. Patch 7 updates kdamond_fn() to invoke the probes applying callback. Patch 8 finally implements the probes support on paddr ops. Ten changes for user interface (patches 9-18) come next. Patches 9-13 implements sysfs directories and files for setting data probes, namely probes directory, probe directory, filters directory, filter directory and filter directory internal files, respectively. Patch 14 connects the user inputs that are made via the sysfs files to DAMON core. Following three patches (patches 15-17) implement sysfs directories and files for showing the probe_hits to users, namely probes directory, probe directory and hits files, respectively. Patch 18 introduces a new tracepoint for showing the probe_hits via tracefs. Patch 19 adds a selftest for the sysfs files. Patches 20 and 21 documents the design and usage of the new feature, respectively. Seven additional patches (patches 22-28) for monitoring belonging memory cgroup follow. Depending on the feedback, this part might be separated to another series in future. Patch 22 defines the DAMON filter type for the new attribute, namely DAMON_FILTER_TYPE_MEMCG. Patch 23 add the support on paddr ops. Patch 24 updates the sysfs interface for setup of the target memcg. Patch 25 move code for easy reuse of the filter target memcg setup. Patch 26 connects the user input to the core layer. Finally, patches 27 and 28 update the design and usage documents for the memcg attribute monitoring support. Discussion ========== This allows the page properties monitoring with overhead that is low enough to be enabled always on real world workloads. Because the sampling time for access check is reused for data attributes check, the upper-bounded and best-effort minimum overhead of DAMON is kept. Because the sampling memory for access check is reused for data attributes check, additional overhead is minimum. Still DAMOS-based page level properties monitoring should be useful, because it provides a deterministic page level information. When in doubt of the sampling based information, running DAMOS-based one together and comparing the results would be useful, for debugging and tuning. Future Works: Mid Term ======================== This version of implementation is limiting the maximum number of data probes to four. I will try to find a way to remove the limit in future. I personally think it should be enough for common use cases, though, and therefore not giving high priority at the moment. Future Works: Long Term ======================= There are user requests for extending DAMON with detailed access information, for example, per-CPUs/threads/read/writes monitoring. For that, I was working [2] on extending DAMON to use page fault events as another access check primitives, and making the infrastructure flexible for future use of yet another access check primitive. Actually there is another ongoing work [3] for extending DAMON with PMU events. The motivation of the work is reducing the overhead, though. In my work [2], I was introducing a new interface for access sampling primitives control. Now I think this data probe interface can be used for that, too. That is, data access becomes just one type of data attribute. Also, pg_idle-confirmed access, page fault-confirmed access, and PMU event-confirmed access will be different types of data attributes. The regions adjustment mechanism is currently working based on the access information. That's because DAMON is designed for data access monitoring. That is, data access information is the primary interest, and therefore DAMON adjusts regions in a way that can best-present the information. Once data access becomes just one of data attributes, there is no reason to think data access that special. There might be some users not interested in access at all but want to know the location of memory of specific type. Data probes interface will allow doing that. Further, we could extend the interface to let users set any data attribute as the 'primary' attribute. Then, DAMON will split and merge regions in a way that can best-present the 'primary' attributes. DAMOS will also be extended, to specify targets based on not only the data access pattern, but all user-registered data attributes. From this stage, we may be able to call DAMON as a "Data Attributes Monitoring and Operations eNgine". This patch (of 28): Introduce a data structure for data attribute probe. It is just a linked list header at this step. It will be extended in a way that it can determine if a given memory has a specific data attribute. Link: https://lore.kernel.org/20260518234119.97569-1-sj@kernel.org Link: https://lore.kernel.org/20260518234119.97569-2-sj@kernel.org Link: https://lore.kernel.org/20250106193401.109161-1-sj@kernel.org [1] Link: https://lore.kernel.org/20251208062943.68824-1-sj@kernel.org/ [2] Link: https://lore.kernel.org/20260423004211.7037-1-akinobu.mita@gmail.com [3] Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm, swap: merge zeromap into swap tableKairui Song1-1/+0
By allocating one additional bit in the swap table entry's flags field alongside the count, we can store the zeromap inline For 64 bit systems, zeromap will store in the swap table, avoiding zeromap allocation. It reduces the allocated memory. That is the happy path. For certain 32-bit archs, there might not be enough bits in the swap table to contain both PFN and flags. Therefore, conditionally let each cluster have a zeromap field at build time, and use that instead. If the swapfile cluster is not fully used, it will still save memory for zeromap. The empty cluster does not allocate a zeromap. In the worst case, all cluster are fully populated. We will use memory similar to the previous zeromap implementation. A few macros were moved to different headers for build time struct definition. [akpm@linux-foundation.org: swap_cluster_alloc_table(): remove unused local `ret] [akpm@linux-foundation.org: fix unused label `err_free'] Link: https://lore.kernel.org/20260517-swap-table-p4-v5-12-88ae43e064c7@tencent.com Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Youngjun Park <youngjun.park@lge.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/memcg: remove no longer used swap cgroup arrayKairui Song1-47/+0
Now all swap cgroup records are stored in the swap cluster directly, the static array is no longer needed. Link: https://lore.kernel.org/20260517-swap-table-p4-v5-11-88ae43e064c7@tencent.com Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Youngjun Park <youngjun.park@lge.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmemcgv1: don't compile swap functions when CONFIG_SWAP=nAndrew Morton1-5/+10
Stub these out to save some dead code and to fix a build error with the upcoming "mm/memcg, swap: store cgroup id in cluster table directly". Link: https://lore.kernel.org/202605281711.bSeZlErK-lkp@intel.com Cc: David Hildenbrand <david@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/memcg, swap: store cgroup id in cluster table directlyKairui Song2-6/+8
Drop the usage of the swap_cgroup_ctrl, and use the dynamic cluster table instead. The per-cluster memcg table is 1024 / 512 bytes on most archs, and does not need RCU protection: the cgroup data is only read and written under the cluster lock. That keeps things simple, lets the allocation use plain kmalloc with immediate kfree (no deferred free), and keeps fragmentation acceptable. [akpm@linux-foundation.org: fix CONFIG_SWAP=n build] Link: https://lore.kernel.org/20260517-swap-table-p4-v5-10-88ae43e064c7@tencent.com Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Youngjun Park <youngjun.park@lge.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm, swap: delay and unify memcg lookup and charging for swapinKairui Song1-3/+3
Instead of checking the cgroup private ID during page table walk in swap_pte_batch(), move the memcg lookup into __swap_cache_add_check() under the cluster lock. The first pre-alloc check is speculative and skips the memcg check since the post-alloc stable check ensures all slots covered by the folio belong to the same memcg. It is very rare for contiguous and aligned entries across a contiguous region of a page table of the same process or shmem mapping to belong to different memcgs. This also prepares for recording the memcg info in the cluster's table. Also make the order check and fallback more compact. There should be no user-observable behavior change. Link: https://lore.kernel.org/20260517-swap-table-p4-v5-8-88ae43e064c7@tencent.com Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Youngjun Park <youngjun.park@lge.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/memcg, swap: tidy up cgroup v1 memsw swap helpersKairui Song2-10/+8
The cgroup v1 swap helpers always operate on swap cache folios whose swap entry is stable: the folio is locked and in the swap cache. There is no need to pass the swap entry or page count as separate parameters when they can be derived from the folio itself. Simplify the redundant parameters and add sanity checks to document the required preconditions. Also rename memcg1_swapout to __memcg1_swapout to indicate it requires special calling context: the folio must be isolated and dying, and the call must be made with interrupts disabled. No functional change. Link: https://lore.kernel.org/20260517-swap-table-p4-v5-6-88ae43e064c7@tencent.com Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Youngjun Park <youngjun.park@lge.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/huge_memory: move THP gfp limit helper into headerKairui Song1-0/+30
Shmem has some special requirements for THP GFP and has to limit it in certain zones or provide a more lenient fallback. We'll use this helper for generic swap THP allocation, which needs to support shmem. For a typical GFP_HIGHUSER_MOVABLE swap-in, this helper is basically a no-op. But it's necessary for certain shmem users, mostly drivers. No feature change. Link: https://lore.kernel.org/20260517-swap-table-p4-v5-3-88ae43e064c7@tencent.com Signed-off-by: Kairui Song <kasong@tencent.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Youngjun Park <youngjun.park@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm: rejig pageblock mask definitionsBrendan Jackman1-3/+3
- Add a PAGEBLOCK_ prefix to the names to avoid polluting the "global namespace" too much. - This new prefix makes MIGRATETYPE_AND_ISO_MASK look pretty long. Well, that global mask only exists for quite a specific purpose, and is quite a weird thing to have a name for anyway. So drop it and take advantage of the newly-defined PAGEBLOCK_ISO_MASK. Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-3-dacdf5402be8@google.com Signed-off-by: Brendan Jackman <jackmanb@google.com> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Len Brown <lenb@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm: introduce for_each_free_list()Brendan Jackman1-3/+6
Patch series "mm: misc cleanups from __GFP_UNMAPPED series". In v2 of the __GFP_UNMAPPED series [0], we realised that some of the patches could potentially be merged as independent cleanups. These are all independent of one another, if you think some are useful cleanups and others are pointless churn, it should be fine to just pick whatever subset you prefer. No functional change intended. This patch (of 4): There are a couple of places that iterate over the freelists with awareness of the data structures' layout. It seems ideally, code outside of mm should not be aware of the page allocator's freelists at all. But, this patch just doesn't hide them completely, it's just a meek incremental step in that direction: provide a macro to iterate over it without needing to be aware of the actual struct fields. Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-0-dacdf5402be8@google.com Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-1-dacdf5402be8@google.com Link: https://lore.kernel.org/all/20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com/ [0] Signed-off-by: Brendan Jackman <jackmanb@google.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Len Brown <lenb@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/mmu_notifier: fix a begin vs. start typo in the invalidate range commentTakahiro Itazuri1-2/+2
Fix a goof in the block comment for invalidate_range_{start,end}() where start() is incorrectly referred to as begin(). No functional change intended. [seanjc@google.com: split to separate patch, write changelog] Link: https://lore.kernel.org/20260513163546.1176742-1-seanjc@google.com Signed-off-by: Takahiro Itazuri <itazur@amazon.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysdrivers/base/memory: make memory block get/put explicitMuchun Song1-2/+5
Rename the memory block lookup helper to make the acquired reference explicit, add memory_block_put() to wrap put_device(), remove find_memory_block(), and use memory_block_get() as the single block-id based lookup interface. This makes it clearer to callers that a successful lookup holds a reference that must be dropped, reducing the chance of forgetting the matching put and leaking the memory block device reference. Link: https://lore.kernel.org/linux-mm/7887915D-E598-42B3-9AFE-BFFBACE8DE2D@linux.dev/#t Link: https://lore.kernel.org/20260512072635.3969576-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Tested-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> #s390 Cc: Richard Cheng <icheng@nvidia.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Doug Anderson <dianders@chromium.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/bootmem_info: remove call to kmemleak_free_part_phys()David Hildenbrand (Arm)1-1/+0
The call to kmemleak_free_part_phys() was added in 2022 in commit dd0ff4d12dd2 ("bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem"). In 2025, commit b2aad24b5333 ("mm/memmap: prevent double scanning of memmap by kmemleak") started to use MEMBLOCK_ALLOC_NOLEAKTRACE when allocating the memmap to skip the kmemleak_alloc_phys() in the buddy. So remove the call to kmemleak_free_part_phys(). If this would still be required for other purposes, either free_reserved_page() should take care of it, or selected users. Link: https://lore.kernel.org/20260511-bootmem_info_prep-v1-4-3fb0be6fc688@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Lance Yang <lance.yang@linux.dev> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon: replace damon_rand() with a per-ctx lockless PRNGJiayuan Chen1-7/+21
damon_rand() on the sampling_addr hot path called get_random_u32_below(), which takes a local_lock_irqsave() around a per-CPU batched entropy pool and periodically refills it with ChaCha20. At elevated nr_regions counts (20k+), the lock_acquire / local_lock pair plus __get_random_u32_below() dominate kdamond perf profiles. Replace the helper with a lockless lfsr113 generator (struct rnd_state) held per damon_ctx and seeded from get_random_u64() in damon_new_ctx(). kdamond is the single consumer of a given ctx, so no synchronization is required. Range mapping uses traditional reciprocal multiplication, similar as get_random_u32_below(); for spans larger than U32_MAX (only reachable on 64-bit) the slow path combines two u32 outputs and uses mul_u64_u64_shr() at 64-bit width. On 32-bit the slow path is dead code and gets eliminated by the compiler. The new helper takes a ctx parameter; damon_split_regions_of() and the kunit tests that call it directly are updated accordingly. lfsr113 is a linear PRNG and MUST NOT be used for anything security-sensitive. DAMON's sampling_addr is not exposed to userspace and is only consumed as a probe point for PTE accessed-bit sampling, so a non-cryptographic PRNG is appropriate here. Tested with paddr monitoring and max_nr_regions=20000: kdamond CPU usage reduced from ~72% to ~50% of one core. Link: https://lore.kernel.org/20260505145212.108644-1-jiayuan.chen@linux.dev Link: https://lore.kernel.org/damon/20260426173346.86238-1-sj@kernel.org/T/#m4f1fd74112728f83a41511e394e8c3fef703039c Link: https://lore.kernel.org/20260509011816.85145-1-sj@kernel.org Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Shu Anzai <shu17az@gmail.com> Cc: Quanmin Yan <yanquanmin1@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysfooAndrew Morton11-32/+91
3 daysinclude: remove unused cnt32_to_63.hCosta Shulyupin1-104/+0
All users have been removed over time as ARM and other architectures switched to generic sched_clock. The last user was microblaze, removed in commit 839396ab88e4 ("microblaze: timer: Use generic sched_clock implementation"). Assisted-by: Claude:claude-opus-4-6 Link: https://lore.kernel.org/20260515183429.1503740-1-costa.shul@redhat.com Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Nicolas Pitre <npitre@baylibre.com> Cc: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysraid6: hide internalsChristoph Hellwig2-88/+27
Split out two new headers from the public pq.h: - lib/raid/raid6/algos.h contains the algorithm lists private to lib/raid/raid6 - include/linux/raid/pq_tables.h contains the tables also used by async_tx providers. The public include/linux/pq.h is now limited to the public interface for the consumers of the RAID6 PQ API. [hch@lst.de: remove duplicate ccflags-y line] Link: https://lore.kernel.org/20260527074539.2292913-2-hch@lst.de Link: https://lore.kernel.org/20260518051804.462141-10-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # kunit only on arm64 Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Sterba <dsterba@suse.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Li Nan <linan122@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Song Liu <song@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysraid6: warn when using less than four devicesChristoph Hellwig1-0/+2
Quoting H. Peter Anvin who came up with the RAID6 P/Q algorithm, and who wrote the initial implementation, then still part of the md driver: The RAID-6 code has *never* supported only 3 units, and if it ever worked for *any* of the implementations it was purely by accident. Speaking as the original author I should know; this was deliberate as in some cases the degenerate case (3) would have required extra trays in the code to no user benefit. While md never allowed less than 4 devices, btrfs does. This new warning will trigger for such file systems, but given how it already causes havoc that is a good thing. If btrfs wants to fix third, it should switch to transparently use three-way mirroring underneath, which will work as P and Q are copies of the single data device by the definition of the Linux RAID 6 P/Q algorithm. Link: https://lore.kernel.org/20260518051804.462141-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # kunit only on arm64 Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Sterba <dsterba@suse.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Li Nan <linan122@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Song Liu <song@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysraid6: improve the public interfaceChristoph Hellwig1-9/+10
Stop directly calling into function pointers from users of the RAID6 PQ API, and provide exported functions with proper documentation and API guarantees asserts where applicable instead. Link: https://lore.kernel.org/20260518051804.462141-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # kunit only on arm64 Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Sterba <dsterba@suse.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Li Nan <linan122@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Song Liu <song@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysraid6: remove raid6_get_zero_pageChristoph Hellwig1-6/+0
Just open code it as in other places in the kernel. Link: https://lore.kernel.org/20260518051804.462141-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # kunit only on arm64 Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Sterba <dsterba@suse.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Li Nan <linan122@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Song Liu <song@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysraid6: remove unused defines in pq.hChristoph Hellwig1-6/+0
These are not used anywhere in the kernel. Link: https://lore.kernel.org/20260518051804.462141-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # kunit only on arm64 Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Sterba <dsterba@suse.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Li Nan <linan122@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Song Liu <song@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysraid6: remove __KERNEL__ ifdefsChristoph Hellwig1-90/+0
With the test code ported to kernel space, none of this is required. Link: https://lore.kernel.org/20260518051804.462141-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # kunit only on arm64 Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Sterba <dsterba@suse.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Li Nan <linan122@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Song Liu <song@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysraid6: turn the userspace test harness into a kunit testChristoph Hellwig1-3/+0
Patch series "cleanup the RAID6 P/Q library", v3. This series cleans up the RAID6 P/Q library to match the recent updates to the RAID 5 XOR library and other CRC/crypto libraries. This includes providing properly documented external interfaces, hiding the internals, using static_call instead of indirect calls and turning the user space test suite into an in-kernel kunit test which is also extended to improve coverage. Note that this changes registration so that non-priority algorithms are not registered, which greatly helps with the benchmark time at boot time. I'd like to encourage all architecture maintainers to see if they can further optimized this by registering as few as possible algorithms when there is a clear benefit in optimized or more unrolled implementations. This patch (of 18): Currently the raid6 code can be compiled as userspace code to run the test suite. Convert that to be a kunit case with minimal changes to avoid mutating global state so that we can drop this requirement. Note that this is not a good kunit test case yet and will need a lot more work, but that is deferred until the raid6 code is moved to it's new place, which is easier if the userspace makefile doesn't need adjustments for the new location first. Link: https://lore.kernel.org/20260518051804.462141-1-hch@lst.de Link: https://lore.kernel.org/20260518051804.462141-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> # kunit only on arm64 Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Mason <clm@fb.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Sterba <dsterba@suse.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Li Nan <linan122@huawei.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Song Liu <song@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayskcov: allow simultaneous KCOV_ENABLE/KCOV_REMOTE_ENABLEJann Horn2-2/+3
Allow the same userspace thread to simultaneously collect normal coverage in syscall context (KCOV_ENABLE) and remote coverage of asynchronous work created by the thread (KCOV_REMOTE_ENABLE). With this, remote KCOV coverage becomes useful for generic fuzzing and not just fuzzing of specific data injection interfaces. This requires that the task_struct::kcov_* fields are separated into ones that are used by the task that generates coverage, and ones that are used by the task that requested remote coverage. To split this up: - Split task_struct::kcov into kcov and kcov_remote. kcov_task_exit() now has to clean up both separately. - Only use task_struct::kcov_mode on the task that generates coverage. - Only reset task_struct::kcov_handle on the task that requested remote coverage. After this change, fields used by the task that generates coverage are: - kcov_mode - kcov_size - kcov_area - kcov - kcov_sequence - kcov_softirq Fields used by the task that requested remote coverage are: - kcov_remote - kcov_handle [jannh@google.com: remove unused constant KCOV_MODE_REMOTE, per Dmitry] Link: https://lore.kernel.org/20260515-kcov-simultaneous-remote-v2-1-56fde1cfa509@google.com [jannh@google.com: update documentation on remote coverage collection] Link: https://lore.kernel.org/20260519-kcov-docs-v1-1-5bb22f4cb20c@google.com [jannh@google.com: move and reword sentence on simultaneous normal/remote collection Link: https://lore.kernel.org/20260520-kcov-docs-v2-1-819f78778763@google.com Link: https://lore.kernel.org/20260505-kcov-simultaneous-remote-v1-1-a670ba7cefd2@google.com Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysllist: make locking comments consistentPhilipp Stanner1-2/+2
llist's locking requirement table has a legend which claims that all operations not needing a lock a marked with '-', whereas in truth for some table entries just a whitespace is used. Add the '-' to all appropriate places. Link: https://lore.kernel.org/20260507094918.23910-2-phasta@kernel.org Signed-off-by: Philipp Stanner <phasta@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayskcov: refactor common handle ID into kcov_common_handle_idJann Horn3-17/+15
Store common handle IDs in "struct kcov_common_handle_id", which consumes no space in non-KCOV builds. This cleanup removes #ifdef boilerplate code from subsystems that integrate with KCOV (in particular in usbip_common.h and skbuff.h, see the diffstat). This should also make it easier to add KCOV remote coverage to more subsystems in the future. Link: https://lore.kernel.org/20260430-kcov-refactor-common-handle-v1-1-23a0c7a0ba38@google.com Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Hongren (Zenithal) Zheng <i@zenithal.me> Cc: Jann Horn <jannh@google.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Valentina Manea <valentina.manea.m@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysuaccess: minimize INLINE_COPY_USER-related ifdeferyYury Norov1-14/+7
Now that we've got the same config selecting inline vs outline copy_to_user() and copy_from_user(), we can simplify the corresponding logic in the uaccess.h. Link: https://lore.kernel.org/20260425020857.356850-4-ynorov@nvidia.com Fixes: 1f9a8286bc0c ("uaccess: always export _copy_[from|to]_user with CONFIG_RUST") Signed-off-by: Yury Norov <ynorov@nvidia.com> Tested-by: Alice Ryhl <aliceryhl@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysuaccess: unify inline vs outline copy_{from,to}_user() selectionYury Norov2-8/+7
The kernel allows arches to select between inline and outline implementations of the copy_{from,to}_user() by defining individual INLINE_COPY_FROM_USER and INLINE_COPY_TO_USER, correspondingly. However, all arches enable or disable them always together. Without the real use-case for one helper being inlined while the other outlined, having independent controls is excessive and error prone. Switch the codebase to the single unified INLINE_COPY_USER control. Link: https://lore.kernel.org/20260425020857.356850-3-ynorov@nvidia.com Signed-off-by: Yury Norov <ynorov@nvidia.com> Tested-by: Alice Ryhl <aliceryhl@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysinit.h: discard exitcall symbols earlyArnd Bergmann1-1/+1
Any __exitcall() and built-in module_exit() handler is marked as __used, which leads to the code being included in the object file and later discarded at link time. As far as I can tell, this was originally added at the same time as initcalls were marked the same way, to prevent them from getting dropped with gcc-3.4, but it was never actaully necessary to keep exit functions around. Mark them as __maybe_unused instead, which lets the compiler treat the exitcalls as entirely unused, and make better decisions about dropping specializing static functions called from these. Link: https://lore.kernel.org/all/acruxMNdnUlyRHiy@google.com/ Link: https://lore.kernel.org/20260331142846.3187706-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolas Schier <nsc@kernel.org> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Marco Elver <elver@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayshighmem-internal.h: fix typo in the comment for kunmap_atomic()Zhouyi Zhou1-1/+1
Replace `PREEMP_RT` with `PREEMPT_RT` in the header comment to match the correct kernel configuration name. Link: https://lore.kernel.org/20260505021125.1941691-1-zhouzhouyi@gmail.com Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: remove damon_set_region_biggest_system_ram_default()SeongJae Park1-5/+0
Now nobody is using damon_set_region_biggest_system_ram_default(). Remove it. Link: https://lore.kernel.org/20260429041232.90257-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon: introduce damon_set_region_system_rams_default()SeongJae Park1-0/+5
Patch series "mm/damon/reclaim,lru_sort: monitor all system rams by default". DAMON_RECLAIM and DAMON_LRU_SORT set the biggest 'System RAM' resource of the system as the default monitoring target address range. The main intention behind the design is to minimize the overhead coming from monitoring of non-System RAM areas. This could result in an odd setup when there are multiple discrete System RAMs of considerable sizes. For example, there are System RAMs each having 500 GiB size. In this case, only the first 500 GiB will be set as the monitoring region by default. This is particularly common on NUMA systems. Hence the modules allow users to set the monitoring target address range using the module parameters if the default setup doesn't work for them. In other words, the current design trades ease of setup for lower overhead. However, because DAMON utilizes the sampling based access check and the adaptive regions adjustment mechanisms, the overhead from the monitoring of non-System RAM areas should be negligible in most setups. Meanwhile, the setup complexity is causing real headaches for users who need to run those modules on various types of systems. That is, the current tradeoff is not a good deal. Set the physical address range that can cover all System RAM areas of the system as the default monitoring regions for DAMON_RECLAIM and DAMON_LRU_SORT. Technically speaking, this is changing documented behavior. However, it makes no sense to believe there is a real use case that really depends on the old weird default behavior. If the old default behavior was working for them in the reasonable way, this change will only add a negligible amount of monitoring overhead. If it didn't work, the users may already be using manual monitoring regions setup, and they will not be affected by this change. Patches Sequence ================ Patch 1 introduces a new core function that will be used for the new default monitoring target region setup. Patch 2 and 3 update DAMON_RECLAIM and DAMON_LRU_SORT to use the new function instead of the old one, respectively. Patch 4 removes the old core function that was replaced by the new one, as there is no more user of it. Patch 5 updates DAMON_STAT to use the new one instead of its in-house nearly-duplicate self implementation of the functionality. Finally patches 6 and 7 update the DAMON_RECLAIM and DAMON_LRU_SORT user documentation for the new behaviors, respectively. This patch (of 7): damon_set_region_biggest_system_ram_default() sets the monitoring target region as the caller requested. If the caller didn't specify the region, it finds the biggest System RAM of the system and sets it as the target region. When there are more than one considerable size of System RAM resources in the system, the default target setup makes no sense. Introduce a variant, namely damon_set_region_system_rams_default(). It sets a physical address range that covers all System RAM resources as the default target region. Link: https://lore.kernel.org/20260429041232.90257-1-sj@kernel.org Link: https://lore.kernel.org/20260429041232.90257-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm: skip KASAN tagging for page-allocated page tablesMuhammad Usama Anjum1-1/+1
Page tables are always accessed via the linear mapping with a match-all tag, so HW-tag KASAN never checks them. For page-allocated tables (PTEs and PGDs etc), avoid the tag setup and poisoning overhead by using __GFP_SKIP_KASAN. SLUB-backed page tables are unchanged for now. (They aren't widely used and require more SLUB related skip logic. Leave it later.) Link: https://lore.kernel.org/20260429102704.680174-4-dev.jain@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ben Segall <bsegall@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayskasan: skip HW tagging for all kernel thread stacksMuhammad Usama Anjum1-1/+1
HW-tag KASAN never checks kernel stacks because stack pointers carry the match-all tag, so setting/poisoning tags is pure overhead. - Add __GFP_SKIP_KASAN to THREADINFO_GFP so every stack allocator that uses it skips tagging (fork path plus arch users) - Add __GFP_SKIP_KASAN to GFP_VMAP_STACK for the fork-specific vmap stacks. - When reusing cached vmap stacks, skip kasan_unpoison_range() if HW tags are enabled. Software KASAN is unchanged; this only affects tag-based KASAN. Link: https://lore.kernel.org/20260429102704.680174-3-dev.jain@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ben Segall <bsegall@google.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysvmalloc: add __GFP_SKIP_KASAN supportMuhammad Usama Anjum1-3/+3
Patch series "kasan: hw_tags: Disable tagging for stack and page-tables", v4. Stacks and page tables are always accessed with the match-all tag, so assigning a new random tag every time at allocation and setting invalid tag at deallocation time, just adds overhead without improving the detection. With __GFP_SKIP_KASAN the page keeps its poison tag and KASAN_TAG_KERNEL (match-all tag) is stored in the page flags while keeping the poison tag in the hardware. The benefit of it is that 256 tag setting instruction per 4 kB page aren't needed at allocation and deallocation time. Thus match-all pointers still work, while non-match tags (other than poison tag) still fault. __GFP_SKIP_KASAN only skips for KASAN_HW_TAGS mode, so coverage is unchanged. Benchmark: The benchmark has two modes. In thread mode, the child process forks and creates N threads. In pgtable mode, the parent maps and faults a specified memory size and then forks repeatedly with children exiting immediately. Thread benchmark: 2000 iterations, 2000 threads: 2.575 s → 2.229 s (~13.4% faster) The pgtable samples: - 2048 MB, 2000 iters 19.08 s → 17.62 s (~7.6% faster) This patch (of 3): For allocations that will be accessed only with match-all pointers (e.g., kernel stacks), setting tags is wasted work. If the caller already set __GFP_SKIP_KASAN, skip tag setting of vmalloc pages. Before this patch, __GFP_SKIP_KASAN wasn't being used with vmalloc APIs. So it wasn't being checked. Now its being checked and acted upon. Other KASAN modes are unchanged because __GFP_SKIP_KASAN is ignored for them in the page allocator, and in vmalloc too we ignore this flag for them. This is a preparatory patch for optimizing kernel stack allocations. Link: https://lore.kernel.org/20260429102704.680174-1-dev.jain@arm.com Link: https://lore.kernel.org/20260429102704.680174-2-dev.jain@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Co-developed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Co-developed-by: Dev Jain <dev.jain@arm.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ben Segall <bsegall@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon/core: introduce damon_ctx->pausedSeongJae Park1-0/+2
Patch series "mm/damon: let DAMON be paused and resumed", v2. DAMON utilizes a few mechanisms that enhance itself over time. Adaptive regions adjustment, goal-based DAMOS quota auto-tuning and monitoring intervals auto-tuning like self-training mechanisms are such examples. It also adds access frequency stability information (age) to the monitoring results, which makes it enhanced over time. Sometimes users have to stop DAMON. In this case, DAMON internal state that enhanced over the time of the last execution simply goes away. Restarted DAMON have to train itself and enhance its output from the scratch. This makes DAMON less useful in such cases. Introducing three such use cases below. Investigation of DAMON. It is best to do the investigation online, especially when it is a production environment. DAMON therefore provides features for such online investigations, including DAMOS stats, monitoring result snapshot exposure, and multiple tracepoints. When those are insufficient, and there are additional clues that could be interfered by DAMON, users have to temporarily stop DAMON to collect the additional clues. It is not very useful since many of DAMON internal clues are gone when DAMON is stopped. The loss of the monitoring results that improved over time is also problematic, especially in production environments. Monitoring of workloads that have different user-known phases. For example, in Android, applications are known to have very different access patterns and behaviors when they are running on the foreground and the background. It can therefore be useful to separate monitoring of apps based on whether they are running on the foreground and on the background. Having two DAMON threads per application that paused and resumed for the apps foreground/background switches can be useful for the purpose. But such pause/resume of the execution is not supported. Tests of DAMON. A few DAMON selftests are using drgn to dump the internal DAMON status. The tests show if the dumped status is the same as what the test code expected. Because DAMON keeps running and modifying its internal status, there are chances of data races that can cause false test results. Stopping DAMON can avoid the race. But, since the internal state of DAMON is dropped, the test coverage will be limited. Let DAMON execution be paused and resumed without loss of the internal state, to overhaul the limitations. For this, introduce a new DAMON context parameter, namely 'pause'. API callers can update it while the context is running, using the online parameters update functions (damon_commit_ctx() and damon_call()). Once it is set, kdamond_fn() main loop will do only limited works excluding the monitoring and DAMOS works, while sleeping sampling intervals per the work. The limited works include handling of the online parameters update. Hence users can unset the 'pause' parameter again. Once it is unset, kdamond_fn() main loop will do all the work again (resumed). Under the paused state, it also does stop condition checks and handling of it, so that paused DAMON can also be stopped if needed. Expose the feature to the user space via DAMON sysfs interface. Also, update existing drgn-based tests to test and use the feature. Tests ===== I confirmed the feature functionality using real time tracing ('perf trace' or 'trace-cmd stream') of damon:damon_aggregated DAMON tracepoint. By pausing and resuming the DAMON execution, I was able to see the trace stops and continued as expected. Note that the pause feature support is added to DAMON user-space tool (damo) after v3.1.9. Users can use '--pause_ctx' command line option of damo for that, and I actually used it for my test. The extended drgn-based selftests are also testing a part of the functionality. Patches Sequence ================ Patch 1 introduces the new core API for the pause feature. Patch 2 extend DAMON sysfs interface for the new parameter. Patches 3-5 update design, usage and ABI documents for the new sysfs file, respectively. The following five patches are for tests. Patch 6 implements a new kunit test for the pause parameter online commitment. Patches 7 and 8 extend DAMON selftest helpers to support the new feature. Patch 9 extends selftest to test the commitment of the feature. Finally, patch 10 updates existing selftest to be safe from the race condition using the pause/resume feature. This patch (of 10): DAMON supports only start and stop of the execution. When it is stopped, its internal data that it self-trained goes away. It will be useful if the execution can be paused and resumed with the previous self-trained data. Introduce per-context API parameter, 'paused', for the purpose. The parameter can be set and unset while DAMON is running and paused, using the online parameters commit helper functions (damon_commit_ctx() and damon_call()). Once 'paused' is set, the kdamond_fn() main loop does only limited works with sampling interval sleep during the works. The limited works include the handling of the online parameters update, so that users can unset the 'pause' and resume the execution when they want. It also keep checking DAMON stop conditions and handling of it, so that DAMON can be stopped while paused if needed. Link: https://lore.kernel.org/20260427151231.113429-1-sj@kernel.org Link: https://lore.kernel.org/20260427151231.113429-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm: limit filemap_fault readahead to VMA boundariesFrederick Mayle1-0/+2
When a file mapping covers a strict subset of a file, an access to the mapping can trigger readahead of file pages outside the mapped region. Readahead is meant to prefetch pages likely to be accessed soon, but these pages aren't accessible via the same means, so it fair to say we don't have a good indicator they'll be accessed soon. Take an ELF file for example: an access to the end of a program's read-only segment isn't a sign that nearby file contents will be accessed next (they are likely to be mapped discontiguously, or not at all). The pressure from loading these pages into the cache can evict more useful pages. To improve the behavior, make three changes: * Introduce a new readahead_control field, max_index, as a hard limit on the readahead. The existing file_ra_state->size can't be used as a limit, it is more of a hint and can be increased by various heuristics. * Set readahead_control->max_index to the end of the VMA in all of the readahead paths that can be triggered from a fault on a file mapping (both "sync" and "async" readahead). * Limit the read-around range start to the VMA's start. Note that these changes only affect readahead triggered in the context of a fault, they do not affect readahead triggered by read syscalls. If a user mixes the two types of accesses, the behavior is expected to be the following: if a fault causes readahead and places a PG_readahead marker and then a read(2) syscall hits the PG_readahead marker, the resulting async readahead *will not* be limited to the VMA end. Conversely, if a read(2) syscall places a PG_readahead marker and then a fault hits the marker, the async readahead *will* be limited to the VMA end. There is an edge case that the above motivation glosses over: A single file mapping might be backed by multiple VMAs. For example, a whole file could be mapped RW, then part of the mapping made RO using mprotect. This patch would hurt performance of a sequential faulted read of such a mapping, the degree depending on how fragmented the VMAs are. A usage pattern like that is likely rare and already suffering from sub-optimal performance because, e.g., the fragmented VMAs limit the fault-around, so each VMA boundary in a sequential faulted read would cause a minor fault. Still, this patch would make it worse. See a previous discussion of this topic at [1]. Tested by mapping and reading a small subset of a large file, then using the cachestat syscall to verify the number of cached pages didn't exceed the mapping size. In practical scenarios, the effect depends on the specific file and usage. Sometimes there is no effect at all, but, for some ELF files in Android, we see ~20% fewer pages pulled into the cache. A comprehensive performance evaluation hasn't been done, but, in addition to the anecdontal memory savings mentioned above, a benchmark was run with fio 3.38, showing neutral looking results: /data/local/tmp/fio --version fio --name=mmap_test --ioengine=mmap --rw=read --bs=4k \ --offset=1G --size=1G --filesize=3G --numjobs=1 \ --filename=testfile.bin Before: 4366.6 MiB/s (avg of 3459, 4592, 4613, 4697, 4472) After: 4444.0 MiB/s (avg of 4633, 4655, 4511, 4571, 3850) +1.7% Same, with --ioengine=mmap --rw=randread Before: 445.6 MiB/s (avg of 446, 447, 442, 452, 441) After: 447.0 MiB/s (avg of 447, 446, 446, 451, 445) +0.3% Same, with --ioengine=psync --rw=read Before: 3086.6 MiB/s (avg of 3122, 3094, 3066, 3094, 3057) After: 3084.6 MiB/s (avg of 3039, 3103, 3103, 3084, 3094) -0.06% Same, with --ioengine=psync --rw=randread Before: 2226.4 MiB/s (avg of 2256, 2183, 2207, 2265, 2221) After: 2231.4 MiB/s (avg of 2236, 2241, 2236, 2193, 2251) +0.2% Link: https://lore.kernel.org/20260427030148.653228-1-fmayle@google.com Link: https://lore.kernel.org/all/ivnv2crd3et76p2nx7oszuqhzzah756oecn5yuykzqfkqzoygw@yvnlkhjjssoz/ [1] Signed-off-by: Frederick Mayle <fmayle@google.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>