aboutsummaryrefslogtreecommitdiffstats
path: root/tools
AgeCommit message (Collapse)AuthorFilesLines
2 daysMerge branch 'next' of ↵Mark Brown2-307/+266
https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git
2 daysMerge branch 'for-next' of ↵Mark Brown31-179/+831
https://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc.git
2 daysMerge branch 'for-next/kspp' of ↵Mark Brown1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git
2 daysMerge branch 'next' of ↵Mark Brown6-122/+1338
https://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git
2 daysMerge branch 'slab/for-next' of ↵Mark Brown3-17/+12
https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
2 daysMerge branch 'next' of ↵Mark Brown1-2/+11
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
2 daysMerge branch 'for-next' of ↵Mark Brown4-108/+150
https://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching.git
2 daysMerge branch 'kunit' of ↵Mark Brown3-2/+20
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
2 daysMerge branch 'next' of ↵Mark Brown9-101/+181
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
2 daysMerge branch 'for-next' of ↵Mark Brown6-32/+54
https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git # Conflicts: # tools/testing/selftests/cgroup/test_memcontrol.c
2 daysMerge branch 'next' of https://github.com/awilliam/linux-vfio.gitMark Brown11-53/+539
2 daysMerge branch 'togreg' of ↵Mark Brown4-16/+71
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git # Conflicts: # drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_buffer.c
2 daysMerge branch 'for-next' of ↵Mark Brown18-420/+1531
https://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git
2 daysMerge branch 'next' of https://github.com/kvm-x86/linux.gitMark Brown27-78/+703
# Conflicts: # arch/x86/include/asm/tdx.h
2 daysMerge branch 'next' of ↵Mark Brown2-6/+2
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git
2 daysMerge branch 'next' of ↵Mark Brown2-7/+4
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git
2 daysMerge branch 'for-next' of ↵Mark Brown1-8/+52
https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
2 daysMerge branch 'next' of ↵Mark Brown3-0/+121
https://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux.git
2 daysMerge branch 'master' of ↵Mark Brown22-689/+2282
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git # Conflicts: # drivers/cpufreq/Kconfig.x86 # drivers/cpufreq/Makefile
2 daysMerge branch 'next' of ↵Mark Brown4-0/+118
https://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm.git
2 daysMerge branch 'master' of ↵Mark Brown2-2/+0
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
2 daysMerge branch 'for-next' of ↵Mark Brown160-1100/+9386
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
2 daysMerge branch 'main' of ↵Mark Brown59-682/+3489
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
2 daysMerge branch 'for-next' of ↵Mark Brown2-1/+77
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
2 daysMerge branch 'linux-next' of ↵Mark Brown11-11/+11
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
2 daysMerge branch 'docs-next' of git://git.lwn.net/linux.gitMark Brown3-11/+66
2 daysMerge branch 'for-next' of ↵Mark Brown1-7/+0
https://git.kernel.org/pub/scm/linux/kernel/git/hid/hid.git
2 daysMerge branch 'fs-next' of linux-nextMark Brown20-696/+936
# Conflicts: # fs/btrfs/defrag.c
2 daysMerge branch 'for-next' of ↵Mark Brown6-13/+13
https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
2 daysMerge branch 'for-next/core' of ↵Mark Brown8-16/+276
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
2 daysMerge branch 'perf-tools-next' of ↵Mark Brown221-3984/+10938
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
2 daysMerge branch 'mm-unstable' of ↵Mark Brown54-1847/+2125
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
2 daysMerge branch 'mm-nonmm-stable' of ↵Mark Brown7-63/+425
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
2 daysMerge branch 'mm-stable' of ↵Mark Brown19-204/+608
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
2 daysMerge branch 'fixes' of ↵Mark Brown4-12/+9
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock.git
2 daysMerge branch 'fixes' of ↵Mark Brown1-62/+43
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git
3 daysperf test: Add truncated perf.data robustness testArnaldo Carvalho de Melo1-0/+86
Add a shell test that verifies perf report handles truncated perf.data files gracefully — exiting with an error code rather than crashing with SIGSEGV or SIGABRT. The test records a simple workload, then truncates the resulting perf.data at four offsets that exercise different parsing stages: 8 bytes — file header magic only 64 bytes — partial file header (attr section incomplete) 256 bytes — into the first events (partial event headers) 75% size — mid-stream truncation (partial event data) For each truncation, perf report is run and the exit code is checked: - Exit code 0 (success) fails the test — a truncated file should never parse without error. - Crash signals are detected portably via kill -l, which maps the signal number to a name on the running system. This handles architectures where signal numbers differ (e.g. SIGBUS is 7 on x86/ARM but 10 on MIPS/SPARC). Core-dump and fatal signals (KILL, ILL, ABRT, BUS, FPE, SEGV, TRAP, SYS) fail the test. - Higher exit codes (200+) are perf's own negative-errno returns (e.g. -EINVAL = 234) and are expected. This exercises the bounds checking, minimum-size validation, and error propagation added by the preceding patches in this series. Testing it: root@number:~# perf test truncat 84: Test that perf report handles truncated perf.data gracefully (no crash, no segfault — clean error exit).: Ok root@number:~# perf test -vv truncat 84: Test that perf report handles truncated perf.data gracefully (no crash, no segfault — clean error exit).: --- start --- test child forked, pid 62890 ---- end(0) ---- 84: Test that perf report handles truncated perf.data gracefully (no crash, no segfault — clean error exit).: Ok root@number:~# Changes in v2: - Add SIGKILL to the list of fatal signals so OOM kills from resource exhaustion bugs are detected (Reported-by: sashiko-bot@kernel.org) Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> [ Fixed the SPDX on the line where 'perf test' expects the test description, reviewed by Ian Rogers ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Snapshot event->header.size in process_user_event()Arnaldo Carvalho de Melo1-14/+13
On native-endian files, events are read from MAP_SHARED memory. Multiple reads of event->header.size can return different values if the file is concurrently modified, allowing an attacker to bypass bounds checks performed on an earlier read. Snapshot header.size into a local variable at function entry using READ_ONCE() to prevent compiler rematerialization, and use it for all size-dependent arithmetic within the function. This ensures every bounds calculation uses the same value that was validated by the reader. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf kwork: Bounds check work->cpu before indexing cpus_runtime[]Arnaldo Carvalho de Melo2-6/+40
work->cpu comes from sample->cpu which is (u32)-1 when PERF_SAMPLE_CPU is absent. Stored as int, this becomes -1 which passes the signed BUG_ON(work->cpu >= MAX_NR_CPUS) but causes an out-of-bounds access on cpus_runtime[-1]. Replace the BUG_ON in top_calc_total_runtime() with an unsigned bounds check that skips entries with invalid CPU values, counting them for a summary warning. Guard the same index in profile_event_match() (bitmap OOB), top_calc_idle_time(), top_calc_irq_runtime(), top_calc_cpu_usage(), and top_calc_load_runtime(). Also guard against division by zero in top_calc_cpu_usage() when no runtime was accumulated. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Yang Jihong <yangjihong@bytedance.com> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Bound nr_cpus_avail and validate sample CPUArnaldo Carvalho de Melo2-1/+117
Several downstream consumers (timechart, kwork, sched) use fixed-size arrays indexed by CPU. A crafted perf.data can supply arbitrary CPU values that index past these arrays, causing out-of-bounds access. Validate sample.cpu against min(nr_cpus_avail, MAX_NR_CPUS) in perf_session__deliver_event() before any tool callback runs. The cap at MAX_NR_CPUS protects fixed-size downstream arrays; the true nr_cpus_avail is preserved in env for header parsing (e.g. process_cpu_topology) which needs the real count. Fall back to MAX_NR_CPUS when HEADER_NRCPUS is missing (truncated files, pipe mode, pre-2017 perf). Only validate when PERF_SAMPLE_CPU is set in sample_type — when absent, evsel__parse_sample() leaves sample.cpu as (u32)-1, a sentinel that downstream tools (script, inject) check to identify events without CPU info. Clamping it to 0 would break those checks. Inline evlist__parse_sample() into perf_session__deliver_event() so the evsel lookup needed for sample_type checking reuses the same evsel that parsed the sample, avoiding a second evlist__event2evsel() call on every event. For pipe-mode streams where HEADER_NRCPUS may arrive late or not at all, the MAX_NR_CPUS fallback ensures the bounds check is still effective against the fixed-size downstream arrays. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Check for decompression buffer size overflowArnaldo Carvalho de Melo1-0/+13
On 32-bit systems, sizeof(struct decomp) + decomp_len can wrap size_t when comp_mmap_len is large. The preceding patch validates comp_mmap_len alignment but does not cap the upper bound, so two additions can still overflow: 1. decomp_len += decomp_last_rem: on 32-bit, adding a u64 to size_t silently truncates, producing a corrupted decomp_len that would bypass the subsequent overflow check and result in an undersized buffer allocation. 2. sizeof(struct decomp) + decomp_len: the final addition could overflow on systems with small size_t. Add explicit overflow checks before each addition as defense-in-depth. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf tools: Harden compressed event processingArnaldo Carvalho de Melo2-1/+54
Add several hardening checks to the compressed event decompression pipeline: 1. Guard against decomp_last_rem underflow: check that decomp_last->head does not exceed decomp_last->size before subtracting. A u64 underflow here would produce a huge decomp_len, causing an oversized mmap allocation. 2. Validate comp_mmap_len from the HEADER_COMPRESSED feature section: reject values that are not 4K-aligned or smaller than 4096. The downstream decompression path checks allocation sizes against SIZE_MAX, which handles 32-bit safety. 3. Validate COMPRESSED event header size: reject events where header.size is too small to contain the fixed struct fields, preventing underflow in the payload size calculation. 4. Validate COMPRESSED2 event data_size: check that data_size does not exceed the available payload (header.size minus the fixed struct fields) for the newer compressed format. 5. Reject compressed events when the HEADER_COMPRESSED feature is missing from the file header, which means no decompression context was initialized. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Add byte-swap handler for PERF_RECORD_COMPRESSED2Arnaldo Carvalho de Melo1-0/+9
PERF_RECORD_COMPRESSED2 events carry a data_size field that must be byte-swapped when reading cross-endian perf.data files. Without a swap handler, reading COMPRESSED2 events on a different-endian machine would misinterpret data_size as a garbage value, causing the decompression path to read the wrong number of bytes. The compressed payload itself is a raw byte stream and needs no swapping. Fixes: 208c0e16834472bb ("perf record: Add 8-byte aligned event type PERF_RECORD_COMPRESSED2") Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf header: Validate bitmap size before allocating in do_read_bitmap()Arnaldo Carvalho de Melo1-5/+29
do_read_bitmap() reads a u64 bit count from the file and passes it to bitmap_zalloc() without checking it against the remaining section size. A crafted perf.data could trigger a large allocation that would only fail later when the per-element reads exceed section bounds. Additionally, bitmap_zalloc() takes an int parameter, so a crafted size with bits set above bit 31 (e.g. 0x100000040) would pass the section bounds check but truncate when passed to bitmap_zalloc(), allocating a much smaller buffer than the subsequent read loop expects. Reject size values that exceed INT_MAX, and check that the data needed (BITS_TO_U64(size) u64 values) fits in the remaining section before allocating. Switch from bitmap_zalloc() to calloc() of u64 units so the allocation size matches the u64 read/write granularity and avoids unsigned long vs u64 mismatch on 32-bit architectures. Fix do_write_bitmap() to use memcpy to read u64-sized chunks from the unsigned long bitmap, preventing out-of-bounds reads on 32-bit systems where sizeof(unsigned long) is 4 but the bitmap is stored in u64 units. Fix process_mem_topology() minimum section size: the check used nr * 2 * sizeof(u64) per node, but do_read_bitmap() reads an additional u64 for the bitmap size, so the minimum is 3 * sizeof(u64). Fix memory leak in process_mem_topology() error paths: replace free(nodes) with memory_node__delete_nodes() to free per-node bitmaps allocated by do_read_bitmap(). Currently used by process_mem_topology() for HEADER_MEM_TOPOLOGY. Fixes: a881fc56038a ("perf header: Sanity check HEADER_MEM_TOPOLOGY") Closes: https://lore.kernel.org/linux-perf-users/20260414224622.2AE69C19425@smtp.kernel.org/ Closes: https://lore.kernel.org/linux-perf-users/20260410223242.DD76FC19421@smtp.kernel.org/ Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf header: Sanity check HEADER_EVENT_DESC attr.size before swapArnaldo Carvalho de Melo1-0/+54
read_event_desc() reads nre (event count), sz (attr size), and nr (IDs per event) from the file and uses them to control allocations and loops without validating them against the section size. A crafted perf.data could trigger large allocations or many loop iterations before __do_read() eventually rejects the reads. Add bounds checks in read_event_desc(): - Reject sz smaller than PERF_ATTR_SIZE_VER0. - Require at least one event (nre > 0). - Check that nre events fit in the remaining section, using the minimum per-event footprint of sz + sizeof(u32). - Pre-swap attr->size to native byte order, then reject values below PERF_ATTR_SIZE_VER0 or above sz before calling perf_event__attr_swap() to prevent heap out-of-bounds access. - Handle ABI0 (attr.size == 0): substitute PERF_ATTR_SIZE_VER0, and on native-endian files write the value back so free_event_desc() does not treat the zero as its end-of-array sentinel (it iterates while attr.size != 0). The swap path skips the write-back — perf_event__attr_swap() has its own ABI0 fallback that sets VER0 after swapping. - Check that nr IDs fit in the remaining section before allocating. Fixes: b30b61729246 ("perf tools: Fix a problem when opening old perf.data with different byte order") Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf header: Validate feature section size and add read path bounds checkingArnaldo Carvalho de Melo1-9/+57
Harden feature section parsing against crafted perf.data files: 1. perf_header__process_sections() reads the feature section table and passes each section's offset and size directly to the processing callbacks without validating them against the actual file size. A crafted section size would make all downstream bounds checks against ff->size ineffective since they compare against the untrusted, inflated bound. Add an fstat() check with S_ISREG() guard and verify that each section's offset + size does not extend past EOF. 2. __do_read_buf() validates reads against ff->size (section size), but __do_read_fd() had no such check, so a malformed perf.data with an understated section size could cause reads past the end of the current section into the next section's data. Add the bounds check in __do_read(), the common caller of both helpers, so it is enforced uniformly for both the fd and buf paths. Track the section-relative offset in __do_read_fd() so the check works for the fd path. Reject negative sizes which on 32-bit can occur when a u32 >= 0x80000000 is passed as ssize_t. 3. do_read_string() relied on file data being null-padded. Add explicit null-termination (buf[len-1] = '\0') after reading and validate length (>= 1, fits within section) before allocating, so callers like process_cpu_topology() never receive an unterminated string. 4. Initialize feat_fd.offset to 0 (section-relative) instead of section->offset (file-absolute) so the bounds tracking is consistent with __do_read()'s section-relative comparison. Adjust process_build_id() to use lseek() for its file-absolute offset needs since it cannot rely on ff->offset for that. 5. Propagate ff->size to perf_file_section__fprintf_info() so its reads are also bounded. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf header: Validate f_attr.ids section before use in ↵Arnaldo Carvalho de Melo1-1/+76
perf_session__read_header() perf_session__read_header() reads f_attr.ids.size from the perf.data file and divides it by sizeof(u64) to compute nr_ids, which is declared as int. No validation is performed on the value before it is used to allocate arrays and drive a read loop. On 32-bit architectures, a crafted f_attr.ids.size of 0x100000000 (4 GB) produces nr_ids = 0x20000000, but the allocation size 1 * 0x20000000 * 8 overflows size_t to 0, so zalloc(0) returns a valid pointer. The subsequent loop writes 0x20000000 IDs into that zero-length buffer, corrupting the heap. On 64-bit, the u64-to-int truncation silently drops high bits, processing fewer IDs than the file claims. While not exploitable, this is a data integrity issue. Add validation before using f_attr.ids: - Cap nr_attrs (attrs.size / attr_size) to MAX_NR_ATTRS (1 << 16) with overflow-safe u64 comparison before assigning to int - Reject ids.size not aligned to sizeof(u64) - Cap ids.size / sizeof(u64) to MAX_IDS_PER_ATTR (1 << 24) to prevent int truncation and size_t overflow on 32-bit - Reject ids sections that extend past the end of the file, guarded by S_ISREG() so non-regular files (block devices, pipes) are not falsely rejected Also fix perf_header__getbuffer64() to set errno = EIO when readn() returns 0 (EOF). Without this, the out_errno path in perf_session__read_header() returns -errno which is 0 (success) on truncated files, causing downstream NULL dereferences. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf header: Propagate feature section processing errorsArnaldo Carvalho de Melo1-13/+38
perf_session__read_header() discards the return value from perf_header__process_sections(), so any error from a feature section processor (process_nrcpus, process_compressed, etc.) is silently ignored and the session opens as if nothing went wrong. This defeats the validation added by subsequent commits in this series: a crafted perf.data that fails a feature section check would still be processed with partially-initialized state. Check the return value and fail the session if any feature section processor returns an error. For truncated files (data.size == 0, i.e. recording was interrupted before the header was finalized), skip feature section processing entirely and clear the feature bitmap so tools use their "feature not present" fallbacks instead of accessing uninitialized env fields. Change the feature processor stubs for optional libraries (libtraceevent, libbpf) from returning -1 to returning 0, so that perf.data files containing these features can still be opened on builds without the optional library — the feature is simply skipped rather than causing a fatal error. Also propagate evlist__prepare_tracepoint_events() failure as -ENOMEM, since the function can fail due to strdup() allocation failure inside evsel__prepare_tracepoint_event(). Fixes: 1c0b04d12ae9 ("perf tools: Add perf_session__read_header function") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf tools: Bounds check perf_event_attr fields against attr.size before ↵Arnaldo Carvalho de Melo2-50/+114
printing perf_event_attr__fprintf() accessed all struct fields unconditionally, but attrs from older perf.data files or BPF-captured syscall payloads may have a smaller size than the current struct. Fields beyond the recorded size contain uninitialized or zero-filled data. Add size-guarded macros (PRINT_ATTRn, PRINT_ATTRn_bf) that compare each field's offset against attr->size before accessing it. Guard the bitfield block (disabled, inherit, ... defer_output) with attr_size >= 48. These bitfields share a single __u64 at offset 40, which is within PERF_ATTR_SIZE_VER0 for validated perf.data attrs, but BPF-captured attrs from perf trace can have a smaller size when the tracee passes a minimal struct to sys_perf_event_open. Also fix the BPF trace path: when perf trace intercepts sys_perf_event_open via BPF, the program copies PERF_ATTR_SIZE_VER0 bytes when the tracee passes size=0, but leaves the size field as 0. Set attr->size to PERF_ATTR_SIZE_VER0 in the augmented syscall handler so the bounds checks match the actual copied size. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf header: Validate null-termination in PERF_RECORD_EVENT_UPDATE string fieldsArnaldo Carvalho de Melo2-7/+242
strdup(ev->unit) and strdup(ev->name) read until '\0' with no guarantee the string is null-terminated within event->header.size. The dump_trace fprintf path has the same problem with %s. Validate before either path runs — same class of bug fixed for MMAP/MMAP2/COMM/CGROUP by perf_event__check_nul(). Also harden the event_update swap handler to: - Validate SCALE event size before swapping the double at offset 24, which exceeds the 24-byte min_size. - Validate CPUS event size before accessing the cpu_map type/nr/long_size fields, which also start at the min_size boundary. - Swap CPUS variant fields (type, nr, long_size) so the processing path sees native byte order. Add validation in perf_event__process_event_update() for all event update variants (UNIT, NAME, SCALE, CPUS) before dump_trace or processing. Validate CPUS nr against payload size for both PERF_CPU_MAP__CPUS and PERF_CPU_MAP__MASK types on the fprintf (dump_trace) path: - CPUS: check nr does not exceed available cpu entries - MASK: check nr does not exceed available mask entries for both mask32 (long_size == 4) and mask64 (long_size == 8) layouts, with underflow guards on the offsetof subtraction Fix a missing break before the default case in the CPUS switch path. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Add byte-swap and bounds check for PERF_RECORD_BPF_METADATA eventsArnaldo Carvalho de Melo1-1/+88
PERF_RECORD_BPF_METADATA has no entry in perf_event__swap_ops[], so its nr_entries field is never byte-swapped when reading a cross-endian perf.data file. Downstream processing in perf_event__fprintf_bpf_metadata() loops over nr_entries, so a foreign-endian value causes out-of-bounds reads. Add a swap handler that byte-swaps nr_entries after validating that header.size is large enough. The entries[] array contains only char arrays (key/value strings), so no per-entry swap is needed — but ensure NUL-termination on the writable cross-endian path. Validate header.size, nr_entries, and string NUL-termination in the common event delivery path so that native-endian files with malicious values are also rejected. Snapshot nr_entries via READ_ONCE() before validation — the event is on a MAP_SHARED mmap that could theoretically change between the bounds check and the loop. Changes in v2: - Snapshot event->header.size via READ_ONCE() into a local variable to prevent a double-fetch underflow in the max_entries calculation (Reported-by: sashiko-bot@kernel.org) - Write back clamped nr_entries to the event on the swap path, consistent with NAMESPACES and STAT_CONFIG handlers — without writeback the native path sees the inflated nr and skips the event entirely (Reported-by: sashiko-bot@kernel.org) Fixes: ab38e84ba9a8 ("perf record: collect BPF metadata from existing BPF programs") Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Blake Jones <blakejones@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf auxtrace: Harden auxtrace_error event handlingArnaldo Carvalho de Melo3-9/+61
Fix four issues in PERF_RECORD_AUXTRACE_ERROR handling: 1. auxtrace_error_name() takes a signed int parameter, but e->type is __u32. A crafted value like 0xFFFFFFFF converts to -1, passes the bounds check, and causes a negative array index. Fix by changing the parameter to unsigned int. 2. The msg field is printed via %s without a length bound. The min_size table only guarantees fields up to msg (offset 48), so a truncated event has zero msg bytes within the event boundary. Compute the available msg length from header.size, cap at sizeof(e->msg), and use %.*s. 3. fmt >= 2 adds machine_pid and vcpu fields after msg[64]. Older files may have fmt >= 2 but an event size that doesn't include these fields. Add a size check in the swap handler to downgrade fmt before the conditional field access, and a matching size guard in the fprintf path for native-endian events (which are mmap'd read-only and can't be modified in place). 4. python_process_auxtrace_error() had the same issues: msg was passed to tuple_set_string() unbounded, and machine_pid/vcpu were accessed unconditionally without checking fmt or event size. Apply the same bounds checks. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf cpumap: Reject RANGE_CPUS with start_cpu > end_cpuArnaldo Carvalho de Melo1-17/+45
cpu_map__from_range() computes nr_cpus as end_cpu - start_cpu + 1. When a crafted perf.data has start_cpu > end_cpu, this wraps to a huge value, causing perf_cpu_map__empty_new() to attempt a massive allocation. Return NULL when the range is inverted. Also clamp any_cpu to boolean (0 or 1) since it is added to the allocation count — a crafted value > 1 would inflate the map size. Harden cpu_map__from_mask() to reject unsupported long_size values (anything other than 4 or 8), preventing misinterpretation of the mask data layout. Snapshot mmap'd fields via READ_ONCE() into locals to prevent TOCTOU re-reads — the data pointer references MAP_SHARED mmap'd memory that could theoretically change between reads on a FUSE-backed file: - cpu_map__from_range(): snapshot start_cpu, end_cpu, any_cpu - cpu_map__from_entries(): snapshot nr and each cpu[i] element - cpu_map__from_mask(): snapshot long_size (before validation, closing the check-then-read gap), mask_nr - perf_record_cpu_map_data__read_one_mask(): add u16 long_size parameter so callers pass the validated copy instead of re-reading data->mask32_data.long_size from mmap'd memory Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf header: Byte-swap build ID event pid and bounds check section entriesArnaldo Carvalho de Melo2-5/+72
perf_header__read_build_ids() swaps the event header fields for cross-endian perf.data files but not bev.pid. This causes perf_session__findnew_machine() to look up the wrong machine for guest VM build IDs, misattributing them. Swap bev.pid alongside the header fields. Also add a build_id_swap callback for stream-mode build ID events, and validate NUL-termination of build_id.filename on the native-endian delivery path (perf_session__process_user_event) — events with unterminated filenames are skipped. Harden perf_header__read_build_ids() against crafted perf.data files: - Add overflow check on offset + size to prevent wrap past ULLONG_MAX. - Reject bev.header.size == 0 which would loop forever. - Reject bev.header.size > remaining section to prevent reading past the section boundary. - Guard memcmp(filename, "nel.kallsyms]", 13) with len >= 13 to avoid reading uninitialized stack memory on short filenames. - Force NUL-termination of filename before passing it to functions like machine__findnew_dso() that use strlen/strcmp. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Validate nr fields against event size on both swap and common ↵Arnaldo Carvalho de Melo1-19/+234
paths Several event types use an nr field to control iteration over variable-length arrays. The swap handlers byte-swap and loop using these fields without bounds checks, and the native processing path trusts them as well. Add bounds checks on both paths for: - PERF_RECORD_THREAD_MAP: validate nr against payload, return -1 on the swap path. On the native path, reject with -EINVAL. - PERF_RECORD_NAMESPACES: clamp nr on the swap path (safe because each entry is indexed by type; missing entries just won't be resolved). Skip the event on the native path. - PERF_RECORD_CPU_MAP: clamp nr for CPUS and MASK sub-types on the swap path. Add bounds checks for mask64 which previously had no nr validation. Skip the event on the native path. - PERF_RECORD_STAT_CONFIG: clamp nr on the swap path (safe because each config entry is self-describing via its tag). Skip the event on the native path. The swap path (cross-endian, writable MAP_PRIVATE mapping) can safely clamp by writing back to the event. The native path (read-only MAP_SHARED mapping) must skip instead of clamping because writing to the mmap'd event would segfault. Also fix stat_config swap range: change size += 1 to size += sizeof(event->stat_config.nr) for clarity. The old +1 happened to work because mem_bswap_64 processes 8-byte chunks, but the intent is to include the 8-byte nr field in the swap range. Changes in v2: - Document that PERF_RECORD_NAMESPACES max_nr includes trailing sample_id space when sample_id_all is present — harmless on the swap path because both per-element bswap_64 and swap_sample_id_all() perform the same u64 byte swap (Reported-by: sashiko-bot@kernel.org) Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Validate HEADER_ATTR attr.size before swappingArnaldo Carvalho de Melo5-15/+166
Harden PERF_RECORD_HEADER_ATTR handling against crafted perf.data: - Validate attr.size: must be >= PERF_ATTR_SIZE_VER0, a multiple of sizeof(u64), and fit within the event payload. - Copy only min(attr.size, sizeof(struct perf_event_attr)) bytes into a local attr, zeroing the rest so legacy files don't leak adjacent event data into new fields. - Keep the original attr.size so perf_event__synthesize_attr() uses it for both allocation and ID-array placement. Fix perf_event__synthesize_attr() to use attr->size (not the compiled sizeof) for event allocation and layout, so perf inject correctly re-synthesizes attrs from files recorded by a different perf version. Without this, the ID array destination pointer (computed via perf_record_header_attr_id()) would be inconsistent with the allocation when attr->size differs from sizeof. Also fix the parse-no-sample-id-all test to set attr.size, which is now validated, and improve error handling in read_attr() for short reads and invalid attr sizes. Handle ABI0 pipe/inject events where attr.size is 0: use a local attr_size variable set to PERF_ATTR_SIZE_VER0 for both the bounded copy and ID array position, instead of writing back to the event. Native-endian files may be MAP_SHARED (read-only mmap), so writing to the event buffer would SIGSEGV. The swap path handles ABI0 in perf_event__attr_swap() which writes to the MAP_PRIVATE copy. header.size alignment is now validated centrally in perf_session__process_event() (see "Add minimum event size and alignment validation"). Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Use bounded copy for PERF_RECORD_TIME_CONVArnaldo Carvalho de Melo1-1/+8
session->time_conv = event->time_conv copies sizeof(struct perf_record_time_conv) bytes unconditionally, but older kernels emit shorter TIME_CONV events without the time_cycles, time_mask, cap_user_time_zero, and cap_user_time_short fields. For a 32-byte event (the original format), this reads 24 bytes past the event boundary into adjacent mmap'd data. The garbage values end up in session->time_conv and can cause incorrect TSC conversion if cap_user_time_zero happens to be non-zero. Replace the struct assignment with a bounded memcpy capped at event->header.size, zeroing the remainder so extended fields default to off when absent. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Add validated swap infrastructure with null-termination checksArnaldo Carvalho de Melo1-81/+325
Change swap callbacks from void to int return so handlers can propagate errors. All 28 existing handlers are converted to return 0 on success, -1 on error. Three new handlers (KSYMBOL, BPF_EVENT, HEADER_FEATURE) are added returning int from the start, with sample_id_all handling for the kernel event types. event_swap() propagates the return to its callers (process_event and peek_event), which skip events that fail to swap. Add perf_event__check_nul() for null-termination enforcement on the common event delivery path for MMAP, MMAP2, COMM, CGROUP, and KSYMBOL events. Events with unterminated strings are skipped — native-endian files are mapped read-only, so writing a NUL byte in place would segfault. Swap handler hardening: - Use strnlen bounded by event size (instead of strlen) in COMM/MMAP/MMAP2/CGROUP swap handlers, returning -1 on unterminated strings. - Bounds check text_poke old_len+new_len before computing the sample_id offset, returning -1 on overflow. Use offsetof() for the native-path check in machines__deliver_event() since sizeof() includes struct padding past the flexible array. - Fix PERF_RECORD_SWITCH sample_id_all: non-CPU_WIDE SWITCH events have sample_id immediately after the 8-byte header, not at sizeof(struct perf_record_switch) which is the CPU_WIDE variant size. - Fix perf_event__time_conv_swap(): decouple time_cycles and time_mask into independent per-field event_contains() checks, so each field is only swapped when the event is large enough to contain it. The original code guarded both fields under a single time_cycles check, which would swap time_mask on a short event that contains time_cycles but not time_mask. - Handle ABI0 (attr.size == 0) in perf_event__attr_swap() by substituting PERF_ATTR_SIZE_VER0, so bswap_safe() correctly swaps VER0 fields instead of skipping everything. - peek_events: on swap failure, advance past the malformed entry instead of aborting the loop. Note: the nr-field bounds checks for namespaces, thread_map, cpu_map, and stat_config arrays are added by a subsequent patch ("perf session: Validate nr fields against event size on both swap and common paths"). The HEADER_ATTR attr.size validation is added by ("perf session: Validate HEADER_ATTR attr.size before swapping"). By establishing the int-returning swap infrastructure first, all subsequent hardening patches can use direct error returns from day one — no poison values, no workarounds for void return. Changes in v2: - peek_events: abort instead of skip for AUXTRACE events on validation failure — skipping only header.size would land inside the raw trace payload, causing subsequent iterations to misparse data as events (Reported-by: sashiko-bot@kernel.org) Fixes: 9aa0bfa370b2 ("perf tools: Handle PERF_RECORD_KSYMBOL") Fixes: 45178a928a4b ("perf tools: Handle PERF_RECORD_BPF_EVENT") Fixes: e9def1b2e74e ("perf tools: Add feature header record to pipe-mode") Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Fix swap_sample_id_all() crash on crafted eventsArnaldo Carvalho de Melo1-3/+11
swap_sample_id_all() calls BUG_ON(size % sizeof(u64)) which kills perf on any event where the sample_id_all tail is not 8-byte aligned. A crafted perf.data can trigger this trivially. Replace BUG_ON with a bounds check: skip the swap if the data pointer is past the end of the event, and only swap when there are bytes remaining. Note: the strlen calls in string-field swap handlers (comm, mmap, mmap2, cgroup) are replaced with bounded strnlen by the next patch in this series ("perf session: Add validated swap infrastructure with null-termination checks"). Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Fix PERF_RECORD_READ swap and dump for variable-length eventsArnaldo Carvalho de Melo1-17/+44
The kernel dynamically sizes PERF_RECORD_READ based on attr.read_format: only the fields enabled by PERF_FORMAT_TOTAL_TIME_ENABLED, PERF_FORMAT_TOTAL_TIME_RUNNING, PERF_FORMAT_ID, and PERF_FORMAT_LOST are emitted, packed with no gaps. perf_event__read_swap() unconditionally byte-swapped time_enabled, time_running, and id at their fixed struct offsets, causing out-of-bounds access on smaller events and swapping the wrong bytes when not all format fields are present. It also swapped sample_id_all at a fixed offset past the full struct, which is wrong for shorter events. Replace the individual field swaps with a single mem_bswap_64() over the entire tail from value onward. Since every field after pid/tid is u64 regardless of which combination is present, this correctly handles any read_format combination and any trailing sample_id_all fields. Similarly, dump_read() accessed optional fields via fixed struct offsets, displaying values from wrong positions when not all format bits are set. Walk the packed u64 array sequentially instead, with bounds checks against event->header.size. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf zstd: Fix multi-iteration decompression and error handlingArnaldo Carvalho de Melo1-4/+16
zstd_decompress_stream() has two bugs in its multi-iteration loop: 1. After each ZSTD_decompressStream() call, the code advances output.dst by output.pos but doesn't reset output.pos to 0. ZSTD interprets output.pos relative to output.dst, so the next iteration writes at (dst + pos) + pos = dst + 2*pos, skipping a gap and potentially writing out of bounds. 2. On ZSTD_decompressStream() error, the loop executes break and returns output.pos (which is > 0 if some bytes were decompressed before the error). The caller checks !decomp_size and skips the error, silently accepting truncated or corrupted data. Fix both by removing the output buffer adjustment — ZSTD correctly accumulates output.pos across calls without it. Return 0 on decompression error so the caller detects it. Add a no-progress guard to prevent infinite loops if the output buffer fills before all input is consumed. Note: the compressed event data_size is validated against header.size by a subsequent patch in this series ("perf tools: Harden compressed event processing"). Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf zstd: Fix compression error path in zstd_compress_stream_to_records()Arnaldo Carvalho de Melo2-3/+30
The error fallback does memcpy(dst, src, src_size) intending to store uncompressed data when compression fails, but this has three bugs: 1. dst has been advanced past the record header (and potentially past earlier compressed records), so the copy writes to the wrong offset in the output buffer. 2. src still points to the start of the input, not to the remaining uncompressed data at src + input.pos. On a second or later iteration, previously compressed data would be duplicated. 3. No check that dst_size >= src_size — if the remaining output space is smaller, this is an out-of-bounds write. Replace with return -1 after resetting the ZSTD compression context via ZSTD_initCStream(). The -1 propagates through zstd_compress() -> record__pushfn() -> perf_mmap__push() to the recording loop, which breaks out and terminates recording. Add an out_child_no_flush label in __cmd_record() so the mmap-read failure path skips the final record__mmap_read_all() flush — retrying the same read that just failed would just fail again, and the flush is only useful when the mmap data is intact but the control path (auxtrace, switch_output) had an error. Consolidate all error paths through a single 'reset' label to ensure the compression context is always reset on failure — including the output-buffer-full path, where a bare return without resetting would leave stale stream state that corrupts output if the caller retries. Also guard against process_header() writing the event header before the buffer-full check: add a sizeof(perf_event_header) pre-check so the callback never writes past the output buffer. Guard against ZSTD making no progress: if output.pos is zero after ZSTD_compressStream(), calling process_header(record, 0) would re-trigger header initialization, double-subtracting the header size from dst_size and underflowing the unsigned counter. Also fix two pre-existing issues in the same function: - Add a dst_size guard before subtracting the record header size: if the output buffer is nearly full, the unsigned dst_size -= size underflows to a huge value, causing ZSTD_compressStream to write past the buffer boundary. - Check the ZSTD_initCStream() return value and log an error if the context reset itself fails. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf tools: Fix event_contains() macro to verify full field extentArnaldo Carvalho de Melo5-5/+12
event_contains() checked whether a field's start offset was within the event (header.size > offsetof), but not whether the full field fit. A crafted event with header.size = offsetof(field) + 1 would pass the check, but an 8-byte access (bswap_64, direct read) would overrun the event boundary by up to 7 bytes. Fix the macro to verify the complete field: header.size >= offsetof(field) + sizeof(field) Also update all callers that check event_contains(time_cycles) but access later fields (time_mask, cap_user_time_zero, cap_user_time_short) to check for cap_user_time_short — the last field accessed — so the entire extended block is verified: tsc.c, arm-spe.c, cs-etm.c, jitdump.c. Note: session.c's perf_event__time_conv_swap() also guards on time_cycles but accesses time_mask — a pre-existing issue not introduced by this macro change. It is fixed by a later patch in this series ("perf session: Add validated swap infrastructure with null-termination checks"), which decouples time_cycles and time_mask into independent per-field event_contains() checks. The struct assignment overread (session->time_conv = event->time_conv copies sizeof on a potentially shorter event) is separately fixed by "perf session: Use bounded copy for PERF_RECORD_TIME_CONV". Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Bounds-check one_mmap event pointer in peek_eventArnaldo Carvalho de Melo2-3/+28
perf_session__peek_event() computes an event pointer directly from file_offset when one_mmap is active, without verifying that file_offset and the subsequent event->header.size fall within the mapped region. A corrupted perf.data file could cause out-of-bounds memory reads. Add one_mmap_size to the session struct and validate both the header and full event fit within the mmap before dereferencing. Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysperf session: Add minimum event size and alignment validationArnaldo Carvalho de Melo1-33/+220
Add a per-type minimum size table (perf_event__min_size[]) and enforce it before swap and processing, so that both cross-endian and native-endian paths are protected from accessing fields past the event boundary. The table uses offsetof() for types with trailing variable-length fields (filenames, strings, msg arrays) and sizeof() for fixed-size types. Zero entries mean no minimum beyond the 8-byte header already enforced by the reader. Undersized events are skipped with a warning in process_event and rejected in peek_event — both checked before the swap handler runs, preventing OOB access on crafted event fields. Also reject events whose header.size is not 8-byte aligned. The kernel aligns all event sizes to sizeof(u64) — see perf_event_comm_event() (ALIGN), perf_event_mmap_event(), perf_event_cgroup(), perf_event_ksymbol() (IS_ALIGNED loops), and perf_event_text_poke() (ALIGN) in kernel/events/core.c. An unaligned size means the file is corrupted or crafted; reject early so downstream code that divides by sizeof(u64) to compute array element counts gets exact results. Three legacy user events are exempted from the alignment check: TRACING_DATA (66) had a 12-byte struct before commit b39c915a4f36 ("libperf event: Ensure tracing data is multiple of 8 sized") added padding, COMPRESSED (81) carries raw ZSTD output (already superseded by COMPRESSED2 with PERF_ALIGN), and HEADER_FEATURE (80) uses do_write_string() with a 4-byte length prefix. Also guard event_swap() against crafted event types >= PERF_RECORD_HEADER_MAX to prevent OOB reads on the perf_event__swap_ops[] array. Changes in v2: - Fix double-skip for unsupported event types: return 0 instead of event->header.size in perf_session__process_event() for HEADER_MAX, since reader__read_event() already advances by event->header.size (Reported-by: sashiko-bot@kernel.org) - Exempt TRACING_DATA, COMPRESSED, and HEADER_FEATURE from the alignment check — these legacy user events predate the 8-byte alignment rule (Reported-by: sashiko-bot@kernel.org) - peek_event: return 0 (skip) for unknown event types instead of -1 (error), consistent with process_event which already skips unsupported types gracefully (Reported-by: sashiko-bot@kernel.org) Reported-by: sashiko-bot@kernel.org # Running on a local machine Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 daysMerge branch 'fs-current' of linux-nextMark Brown1-0/+1
3 daysMerge branch 'mm-hotfixes-unstable' of ↵Mark Brown1-0/+128
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
3 daysnext-20260522/vfs-braunerMark Brown19-696/+929
# Conflicts: # fs/fuse/dev.c
3 daysMerge ring-buffer/for-nextSteven Rostedt (Google)50-88/+709
3 daysMerge branch 'acpica' into linux-nextRafael J. Wysocki10-10/+10
* acpica: (27 commits) ACPICA: add boundary checks in two places ACPICA: Add package limit checks in parser functions ACPICA: Update version to 20260408 ACPICA: Update the copyright year to 2026 ACPICA: Remove spurious precision from format used to dump parse trees ACPICA: Enhance OEM ID and Table ID validation in acpi_ex_load_table_op() ACPICA: Fix NULL pointer dereference in acpi_ns_custom_package() ACPICA: Enhance buffer validation in acpi_ut_walk_aml_resources() ACPICA: Add validation for node in acpi_ns_build_normalized_path() ACPICA: validate handler object type in two places ACPICA: Improve argument parsing in acpi_ps_get_next_simple_arg() ACPICA: Fix integer overflow in acpi_ex_opcode_3A_1T_1R() (mid_op) ACPICA: Prevent adding invalid references ACPICA: add boundary checks in acpi_ps_get_next_field() ACPICA: validate byte_count in acpi_ps_get_next_package_length() ACPICA: Fix use-after-free in acpi_ds_terminate_control_method() ACPICA: fix I2C LVR item count in the conversion table ACPICA: Mention the LVR bits ACPICA: Change LVR to 8 bit value ACPICA: Fetch LVR I2C resource descriptor ...
3 daysMerge branches 'pm-sleep', 'pm-powercap' and 'pm-tools' into linux-nextRafael J. Wysocki1-1/+1
* pm-sleep: PM: hibernate: Use flexible array for CRC uncompressed buffers PM: hibernate: make LZ4 available for hibernation compression PM: sleep: Use complete() in device_pm_sleep_init() PM: hibernate: call preallocate_image() after freeze prepare * pm-powercap: powercap: intel_rapl: Fix memory leak in rapl_add_package_cpuslocked() * pm-tools: PM: tools: pm-graph: fix ValueError when parsing incomplete device properties
3 daysMerge branch 'nfsd-next' of ↵Mark Brown1-0/+7
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux # Conflicts: # fs/exfat/file.c
3 daysMerge branch 'slab/for-7.2/alloc_bulk' into slab/for-nextVlastimil Babka (SUSE)2-12/+9
3 daysmm/slab: improve kmem_cache_alloc_bulkChristoph Hellwig2-12/+9
The kmem_cache_alloc_bulk return value is weird. It returns the number of allocated objects, but that must always be 0 or the requested number based on the implementations and the handling in the callers, but that assumption is not actually documented anywhere, which confuses automated review tools. Fix this by returning a bool if the allocation succeeded and adding a kerneldoc comment explaining the API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> # skbuff Link: https://patch.msgid.link/20260528093437.2519248-2-hch@lst.de Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 daysMerge branch 'vfs.fixes' of ↵Mark Brown1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
3 daystools/mm/slabinfo: remove redundant slab->partial assignmentXuewen Wang1-1/+0
slab->partial is assigned by get_obj("partial") and then immediately overwritten by get_obj_and_str("partial", &t). Remove the first redundant assignment. Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn> Link: https://patch.msgid.link/20260518062159.80664-4-wangxuewen@kylinos.cn Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 daystools/mm/slabinfo: remove dead assignment in get_obj_and_str()Xuewen Wang1-3/+2
The assignment `x = NULL` sets the local parameter variable instead of `*x`, which is a no-op since `*x` was already set to NULL on the line above. Remove the dead assignment. Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Link: https://patch.msgid.link/20260518062159.80664-3-wangxuewen@kylinos.cn Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 daystools/mm/slabinfo: Fix trace disable logic inversionXuewen Wang1-1/+1
The disable trace path in slab_debug() had a logic error where it would set trace=1 instead of trace=0. This made trace functionality permanently enabled once turned on for any slab cache. Fixes: a87615b8f9e2 ("SLUB: slabinfo upgrade") Cc: stable@vger.kernel.org Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn> WARNING: From:/Signed-off-by: email address mismatch: 'From: wangxuewen <18810879172@163.com>' != 'Signed-off-by: wangxuewen <wangxuewen@kylinos.cn>' Link: https://patch.msgid.link/20260518062159.80664-2-wangxuewen@kylinos.cn Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
3 daysMerge branch into tip/master: 'objtool/core'Ingo Molnar19-660/+1300
# New commits in objtool/core: 2d3bb398861a ("objtool/klp: Cache dont_correlate() result") fe6a87e0abac ("objtool: Improve and simplify prefix symbol detection") f7ceffd21a8a ("objtool/klp: Fix kCFI prefix finding/cloning") fc0bb9915bce ("objtool: Grow __cfi_* prefix symbols for all CFI+CALL_PADDING") cca84cb12908 ("objtool/klp: Fix position-dependent checksums for non-relocated jumps/calls") 3ee67629b2b7 ("objtool: Add insn_sym() helper") 5d6a03eeb717 ("objtool/klp: Add correlation debugging output") 6016dd33a10a ("objtool/klp: Rewrite symbol correlation algorithm") 873a2208ea31 ("objtool/klp: Calculate object checksums") 225d16dd510d ("klp-build: Validate short-circuit prerequisites") 3b8e56b86faa ("objtool/klp: Remove "objtool --checksum"") d4888d58041d ("klp-build: Use "objtool klp checksum" subcommand") e10764614ad6 ("objtool/klp: Add "objtool klp checksum" subcommand") a5b661233262 ("objtool: Consolidate file decoding into decode_file()") 30cae58cdc13 ("objtool/klp: Extricate checksum calculation from validate_branch()") 6282e9f46b4f ("objtool: Add is_cold_func() helper") 8eebd5731133 ("objtool: Add is_alias_sym() helper") ff0cf5efef40 ("objtool/klp: Handle Clang .data..Lanon anonymous data sections") 9e4512d7de5a ("objtool/klp: Create empty checksum sections for function-less object files") ac999926774a ("objtool: Include libsubcmd headers directly from source tree") 8d4cbb6d0caf ("objtool/klp: Don't set sym->file for section symbols") b6480aaedf3c ("klp-build: Remove redundant SRC and OBJ variables") e950d2a10a30 ("klp-build: Print "objtool klp diff" command in verbose mode") df0d7bb04a27 ("klp-build: Reject patches to realmode") d8c3e262361b ("klp-build: Reject patches to vDSO") f3048888ea62 ("klp-build: Fix patch cleanup on interrupt") 96524543740e ("klp-build: Suppress excessive fuzz output by default") b3ece3019e8e ("klp-build: Validate patch file existence") 946d3510fe19 ("klp-build: Don't use errexit") ba77fe55781a ("klp-build: Fix checksum comparison for changed offsets") cc39ccce7d5b ("klp-build: Fix hang on out-of-date .config") a375e327b63e ("objtool: Fix reloc hash collision in find_reloc_by_dest_range()") 5f49ec82b9f6 ("objtool/klp: Fix reloc corruption in convert_reloc_sym_to_secsym()") 51e1dfce24c8 ("objtool/klp: Don't correlate .rodata.cst* constant pool objects") d5b0f025281f ("objtool/klp: Fix pointer comparisons for rodata objects") 8fdc3585b3b0 ("objtool/klp: Simplify reloc symbol conversion") 3e01ab44af20 ("objtool: Move mark_rodata() to elf.c") 3787e82a4e3a ("objtool/klp: Fix relocation conversion failures for R_X86_64_NONE") da4326573ae8 ("objtool/klp: Fix kCFI trap handling") 62a7a01fde87 ("objtool/klp: Fix extraction of text annotations for alternatives") 479ac5260e7e ("objtool/klp: Fix XXH3 state memory leak") 98377f3ba7c0 ("objtool/klp: Fix cloning of zero-length section symbols") c4c02d4450b5 ("objtool/klp: Fix handling of zero-length .altinstr_replacement sections") def5b60dcd22 ("objtool/klp: Fix --debug-checksum for duplicate symbol names") 0333b7399587 ("objtool: Replace iterator callback with for_each_sym_by_mangled_name()") 3de711fba73a ("objtool/klp: Fix create_fake_symbols() skipping entsize-based sections") e872b3f13922 ("objtool/klp: Improve local label check") 76eb0f8639fb ("objtool/klp: Don't report uncorrelated functions as new") 0a7823d1d70d ("objtool/klp: Don't correlate __initstub__ symbols") 710c4c254688 ("objtool/klp: Don't correlate absolute symbols") 8edec016255d ("objtool/klp: Don't correlate __ADDRESSABLE() symbols") ff529864e738 ("objtool/klp: Fix .data..once static local non-correlation") 84c304a534b8 ("objtool/klp: Fix is_uncorrelated_static_local() for Clang") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'locking/core'Ingo Molnar2-2/+954
# New commits in locking/core: 88331c4ec23a ("seqlock: Allow UBSAN_ALIGNMENT to fail optimizing") a9e4e50519e9 ("locking/rtmutex: Annotate API and implementation") 03240f5de2dd ("selftests/membarrier: Add rseq stress test for CFS throttle interactions") a5959728548c ("sched/membarrier: Modernize membarrier_global_expedited with cleanup guards") 89976cd73739 ("sched/membarrier: Use per-CPU mutexes for targeted commands") b00192d78bb4 ("locking/barrier: Use correct parameter names") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysMerge branch into tip/master: 'timers/merge'Ingo Molnar1-27/+28
# New commits in timers/merge: 3eb4923e6851 ("clocksource: Add devm_clocksource_register_*() helpers") c8d32a0389fb ("timers: Fix flseep() typo in kernel-doc comment") 5d330d652d7a ("hrtimer: Fix the bogus return type of __hrtimer_start_range_ns()") 3af1f49f415d ("hrtimer: Return ktime_t from hrtimer_get_next_event()/hrtimer_next_event_without()") 33d4bfc49613 ("clocksource: Clean up clocksource_update_freq() functions") ed3b3c497668 ("alarmtimer: Remove stale return description from alarm_handle_timer()") b00385b8d081 ("selftests/posix_timers: Use CLOCK_THREAD_CPUTIME_ID for ITIMER_PROF measurements") cab0cd0130eb ("scripts/timers: Add timer_migration_tree.py") 5a7dfbcbbdb6 ("timers/migration: Handle capacity in connect tracepoints") 098cbaad8e57 ("timers/migration: Split per-capacity hierarchies") 3ba25488380f ("timers/migration: Track CPUs in a hierarchy") ff65875f80d1 ("timers/migration: Abstract out hierarchy to prepare for CPU capacity awareness") ed78a7019419 ("alarmtimer: Remove unused interfaces") 12e4311aa5b2 ("netfilter: xt_IDLETIMER: Switch to alarm_start_timer()") 9fa2e38ab749 ("power: supply: charger-manager: Switch to alarm_start_timer()") 7dda99952ced ("fs/timerfd: Use the new alarm/hrtimer functions") f4b58f61da79 ("alarmtimer: Convert posix timer functions to alarm_start_timer()") 183d00b72713 ("alarmtimer: Provide alarm_start_timer()") acc071343d29 ("posix-timers: Switch to hrtimer_start_expires_user()") cfb7fe3fdd4c ("posix-timers: Handle the timer_[re]arm() return value") 6fdb2677a594 ("posix-timers: Expand timer_[re]arm() callbacks with a boolean return value") b40c927345a9 ("hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers") bd5956166d20 ("hrtimer: Provide hrtimer_start_range_ns_user()") 68ed094971b0 ("clocksource/drivers/timer-of: Make the code compatible with modules") 2423405880c2 ("clocksource/drivers/mmio: Make the code compatible with modules") fed9f727cc3f ("clocksource/drivers/sun5i: Handle error returns from devm_reset_control_get_optional_exclusive()") 045a9dac7eb7 ("clocksource/drivers/timer-rtl-otto: Make rttm_cs variable static") b385caf91868 ("dt-bindings: timer: fsl,imxgpt: add compatible string fsl,imx25-epit") Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 daysnet: Remove support for AIO on socketsDemi Marie Obenour1-1/+0
The only user of msg->msg_iocb was AF_ALG, but that's deprecated. It can be removed entirely at the cost of only supporting synchronous operations. This doesn't break userspace, which will silently block (for a bounded amount of time) in io_submit instead of operating asynchronously. This also makes struct msghdr smaller, helping every other caller of sendmsg(). Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
3 daysselftests/memfd: remove unused variable 'sig' in fuse_testKonstantin Khorenko1-1/+1
fuse_test.c: In function 'sealing_thread_fn': fuse_test.c:165:13: warning: unused variable 'sig' [-Wunused-variable] 165 | int sig, r; | ^~~ Remove unused 'sig' to fix -Wunused-variable warning. Link: https://lore.kernel.org/20260524193732.48853-3-eva.kurchatova@virtuozzo.com Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/memfd: fix -Wmaybe-uninitialized warning in memfd_testKonstantin Khorenko1-2/+2
Patch series "selftests/memfd: fix compilation warnings". This patchset fixes warnings about unused but initialized variables, and unused dummy buffer passed to pwrite() syscall in the tests. This patch (of 2): memfd_test.c: In function 'mfd_fail_grow_write.part.0': memfd_test.c:685:13: warning: '<unknown>' may be used uninitialized [-Wmaybe-uninitialized] 685 | l = pwrite(fd, buf, mfd_def_size * 8, 0); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pwrite() is declared with attribute 'access (read_only, 2, 3)', so GCC knows it reads from the buffer. malloc() returns uninitialized memory, hence the warning. Use calloc() to zero-initialize the buffer. The actual contents don't matter here since the test verifies that pwrite() fails on a sealed memfd. Link: https://lore.kernel.org/20260524193732.48853-1-eva.kurchatova@virtuozzo.com Link: https://lore.kernel.org/20260524193732.48853-2-eva.kurchatova@virtuozzo.com Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com> Signed-off-by: Eva Kurchatova <eva.kurchatova@virtuozzo.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: clarify alternate unmapping in compaction_testSayali Patil1-0/+3
Add a comment explaining that every other entry in the list is unmapped to intentionally create fragmentation with locked pages before invoking check_compaction(). Link: https://lore.kernel.org/da5e0a8d5152e54152c0d2f456aac2fac35af291.1779296493.git.sayalip@linux.ibm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: move hwpoison setup into run_test() and silence modprobe ↵Sayali Patil1-21/+41
output for memory-failure category run_vmtests.sh contains special handling to ensure the hwpoison_inject module is available for the memory-failure tests. This logic was implemented outside of run_test(), making the setup category-specific but managed globally. Move the hwpoison_inject handling into run_test() and restrict it to the memory-failure category so that: 1. the module is checked and loaded only when memory-failure tests run, 2. the test is skipped if the module or the debugfs interface (/sys/kernel/debug/hwpoison/) is not available. 3. the module is unloaded after the test if it was loaded by the script. This localizes category-specific setup and makes the test flow consistent with other per-category preparations. While updating this logic, fix the module availability check. The script previously used: modprobe -R hwpoison_inject The -R option prints the resolved module name to stdout, causing every run to print: hwpoison_inject in the test output, even when no action is required, introducing unnecessary noise. Replace this with: modprobe -n hwpoison_inject which verifies that the module is loadable without producing output, keeping the selftest logs clean and consistent. Also, ensure that skipped tests do not override a previously recorded failure. A skipped test currently sets exitcode to ksft_skip even if a prior test has failed, which can mask failures in the final exit status. Update the logic to only set exitcode to ksft_skip when no failure has been recorded. Link: https://lore.kernel.org/93441f34f7ef5add47d1a130d03daa79e21b5050.1779296493.git.sayalip@linux.ibm.com Fixes: ff4ef2fbd101 ("selftests/mm: add memory failure anonymous page test") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: use ksft_exit_skip() instead of KSFT_SKIP in uffd-stressSayali Patil1-2/+1
When nr_pages_per_cpu evaluates to zero, the test is skipped by printing a message and returning KSFT_SKIP manually. Replace this with ksft_exit_skip(), which prints the skip message and exits with the correct skip status in a single helper, making the code consistent with other selftests. Link: https://lore.kernel.org/88202b56-1dc5-43e2-9d1f-a0823a9531f0@linux.ibm.com Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: skip uffd-stress test when nr_pages_per_cpu is zeroSayali Patil1-3/+3
uffd-stress currently fails when the computed nr_pages_per_cpu evaluates to zero: nr_pages_per_cpu = bytes / page_size / nr_parallel This can occur on systems with large hugepage sizes (e.g. 1GB) and a high number of CPUs, where the total allocated memory is sufficient overall but not enough to provide at least one page per cpu. In such cases, the failure is due to insufficient test resources rather than incorrect kernel behaviour. Update the test to treat this condition as a test skip instead of reporting an error. Link: https://lore.kernel.org/0707e9a0f1b3dd904c4a069b91db317f9c160faa.1779296493.git.sayalip@linux.ibm.com Fixes: db0f1c138f18 ("selftests/mm: print some details when uffd-stress gets bad params") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: skip uffd-wp-mremap if UFFD write-protect is unsupportedSayali Patil1-0/+13
The uffd-wp-mremap test requires the UFFD_FEATURE_PAGEFAULT_FLAG_WP capability. On systems where userfaultfd write-protect is not supported, uffd_register() fails and the test reports failures. Check for the required feature at startup and skip the test when the UFFD_FEATURE_PAGEFAULT_FLAG_WP capability is not present, preventing false failures on unsupported configurations. Before patch: running ./uffd-wp-mremap ------------------------ [INFO] detected THP size: 256 KiB [INFO] detected THP size: 512 KiB [INFO] detected THP size: 1024 KiB [INFO] detected THP size: 2048 KiB [INFO] detected hugetlb page size: 2048 KiB [INFO] detected hugetlb page size: 1048576 KiB 1..24 [RUN] test_one_folio(size=65536, private=false, swapout=false, hugetlb=false) not ok 1 uffd_register() failed [RUN] test_one_folio(size=65536, private=true, swapout=false, hugetlb=false) not ok 2 uffd_register() failed [RUN] test_one_folio(size=65536, private=false, swapout=true, hugetlb=false) not ok 3 uffd_register() failed [RUN] test_one_folio(size=65536, private=true, swapout=true, hugetlb=false) not ok 4 uffd_register() failed [RUN] test_one_folio(size=262144, private=false, swapout=false, hugetlb=false) not ok 5 uffd_register() failed [RUN] test_one_folio(size=524288, private=false, swapout=false, hugetlb=false) not ok 6 uffd_register() failed . . . Bail out! 24 out of 24 tests failed Totals: pass:0 fail:24 xfail:0 xpass:0 skip:0 error:0 [FAIL] not ok 1 uffd-wp-mremap # exit=1 After patch: running ./uffd-wp-mremap ------------------------ 1..0 # SKIP uffd-wp feature not supported [SKIP] ok 1 uffd-wp-mremap # SKIP Link: https://lore.kernel.org/c3c5af76d71d5f4446f773f4de94882efc33ebe4.1779296493.git.sayalip@linux.ibm.com Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: ensure destination is hugetlb-backed in hugetlb-mremapSayali Patil1-7/+4
The hugetlb-mremap selftest reserves the destination address using a anonymous base-page mapping before calling mremap() with MREMAP_FIXED, while the source region is hugetlb-backed. When remapping a hugetlb mapping into a base-page VMA may fail with: mremap: Device or resource busy This is observed on powerpc hash MMU systems where slice constraints and page size incompatibilities prevent the remap. Ensure the destination region is created using MAP_HUGETLB so that both source and destination VMAs are hugetlb-backed and compatible. Update the FLAGS macro to include MAP_HUGETLB | MAP_SHARED so that both mappings are hugetlb-backed and compatible. Also use the macro for the mmap() calls to avoid repeating the flag combination. This ensures the test reliably exercises hugetlb mremap instead of failing due to VMA type mismatch. Link: https://lore.kernel.org/367644df45c65098f23e3945c6a80f4b8a8964a6.1779296493.git.sayalip@linux.ibm.com Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftest/mm: register existing mapping with userfaultfd in hugetlb-mremapSayali Patil1-16/+5
Previously, register_region_with_uffd() created a new anonymous mapping and overwrote the address supplied by the caller before registering the range with userfaultfd. As a result, userfaultfd was applied to an unrelated anonymous mapping instead of the hugetlb region used by the test. Remove the extra mmap() and register the caller-provided address range directly using UFFDIO_REGISTER_MODE_MISSING, so that faults are generated for the hugetlb mapping used by the test. This ensures userfaultfd operates on the actual hugetlb test region and validates the expected fault handling. Before patch: running ./hugetlb-mremap ------------------------- TAP version 13 1..1 Map haddr: Returned address is 0x7eaa40000000 Map daddr: Returned address is 0x7daa40000000 Map vaddr: Returned address is 0x7faa40000000 Address returned by mmap() = 0x7fff9d000000 Mremap: Returned address is 0x7faa40000000 First hex is 0 First hex is 3020100 ok 1 Read same data Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [PASS] ok 1 hugetlb-mremap After patch: running ./hugetb-mremap ------------------------- TAP version 13 1..1 Map haddr: Returned address is 0x7eaa40000000 Map daddr: Returned address is 0x7daa40000000 Map vaddr: Returned address is 0x7faa40000000 Registered memory at address 0x7eaa40000000 with userfaultfd Mremap: Returned address is 0x7faa40000000 First hex is 0 First hex is 3020100 ok 1 Read same data Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 [PASS] ok 1 hugetlb-mremap Link: https://lore.kernel.org/13845da872ed174316173e8996dbb5f181994017.1779296493.git.sayalip@linux.ibm.com Fixes: 12b613206474 ("mm, hugepages: add hugetlb vma mremap() test") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: free dynamically allocated PMD-sized buffers in ↵Sayali Patil1-6/+16
split_huge_page_test Dynamically allocated buffers of PMD size for file-backed THP operations (file_buf1 and file_buf2) were not freed on the success path and some failure paths. Since the function is called repeatedly in a loop for each split order, this can cause significant memory leaks. On architectures with large PMD sizes, repeated leaks could exhaust system memory and trigger the OOM killer during test execution. Ensure all allocated buffers are freed to maintain stable repeated test runs. Link: https://lore.kernel.org/060c673b376bbeeed2b1fb1d48a825e846654191.1779296493.git.sayalip@linux.ibm.com Fixes: 035a112e5fd5 ("selftests/mm: make file-backed THP split work by writing PMD size data") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: size tmpfs according to PMD page size in split_huge_page_testSayali Patil1-1/+4
The split_file_backed_thp() test mounts a tmpfs with a fixed size of "4m". This works on systems with smaller PMD page sizes, but fails on configurations where the PMD huge page size is larger (e.g. 16MB). On such systems, the fixed 4MB tmpfs is insufficient to allocate even a single PMD-sized THP, causing the test to fail. Fix this by sizing the tmpfs dynamically based on the runtime pmd_pagesize, allocating space for two PMD-sized pages. Before patch: running ./split_huge_page_test /tmp/xfs_dir_YTrI5E -------------------------------------------------- TAP version 13 1..55 ok 1 Split zero filled huge pages successful ok 2 Split huge pages to order 0 successful ok 3 Split huge pages to order 2 successful ok 4 Split huge pages to order 3 successful ok 5 Split huge pages to order 4 successful ok 6 Split huge pages to order 5 successful ok 7 Split huge pages to order 6 successful ok 8 Split huge pages to order 7 successful ok 9 Split PTE-mapped huge pages successful Please enable pr_debug in split_huge_pages_in_file() for more info. Failed to write data to testing file: Success (0) Bail out! Error occurred Planned tests != run tests (55 != 9) Totals: pass:9 fail:0 xfail:0 xpass:0 skip:0 error:0 [FAIL] After patch: running ./split_huge_page_test /tmp/xfs_dir_bMvj6o -------------------------------------------------- TAP version 13 1..55 ok 1 Split zero filled huge pages successful ok 2 Split huge pages to order 0 successful ok 3 Split huge pages to order 2 successful ok 4 Split huge pages to order 3 successful ok 5 Split huge pages to order 4 successful ok 6 Split huge pages to order 5 successful ok 7 Split huge pages to order 6 successful ok 8 Split huge pages to order 7 successful ok 9 Split PTE-mapped huge pages successful Please enable pr_debug in split_huge_pages_in_file() for more info. Please check dmesg for more information ok 10 File-backed THP split to order 0 test done Please enable pr_debug in split_huge_pages_in_file() for more info. Please check dmesg for more information ok 11 File-backed THP split to order 1 test done Please enable pr_debug in split_huge_pages_in_file() for more info. Please check dmesg for more information ok 12 File-backed THP split to order 2 test done ... ok 55 Split PMD-mapped pagecache folio to order 7 at in-folio offset 128 passed Totals: pass:55 fail:0 xfail:0 xpass:0 skip:0 error:0 [PASS] ok 1 split_huge_page_test /tmp/xfs_dir_bMvj6o Link: https://lore.kernel.org/33e1bc10753fe82d1217613d8cd496020778cf2b.1779296493.git.sayalip@linux.ibm.com Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: fix cgroup task placement and drop memory.current checks in ↵Sayali Patil1-24/+18
hugetlb_reparenting_test.sh The test currently moves the calling shell ($$) into the target cgroup before executing write_to_hugetlbfs. This results in the shell and any intermediate allocations being charged to the cgroup, introducing noise and nondeterminism in accounting. It also requires moving the shell back to the root cgroup after execution. Spawn a helper process that joins the target cgroup and exec()'s write_to_hugetlbfs. This ensures that only the workload is accounted to the cgroup and avoids unintended charging from the shell. The test currently validates both hugetlb usage and memory.current. However, memory.current includes internal memcg allocations and per-CPU batched accounting (MEMCG_CHARGE_BATCH), which are not synchronized and can vary across systems, leading to non-deterministic results. Since hugetlb memory is accounted via hugetlb.<size>.current, memory.current is not a reliable indicator here. Drop memory.current checks and rely only on hugetlb controller statistics for stable and accurate validation. Link: https://lore.kernel.org/fb57491ba83cb0a499c72922e1579b61bee514db.1779296493.git.sayalip@linux.ibm.com Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: fix hugetlb pathname construction in hugetlb_reparenting_test.shSayali Patil1-2/+12
The hugetlb_reparenting_test.sh script constructs hugetlb cgroup memory interface file names based on the configured huge page size. The script formats the size only in MB units, which causes mismatches on systems using larger huge pages where the kernel exposes normalized units (e.g. "1GB" instead of "1024MB"). As a result, the test fails to locate the corresponding cgroup files when 1GB huge pages are configured. Update the script to detect the huge page size and select the appropriate unit (MB or GB) so that the constructed paths match the kernel's hugetlb controller naming. Also print an explicit "Fail" message when a test failure occurs to improve result visibility. Link: https://lore.kernel.org/837ce751965c93f74c95d89587debf1e93281364.1779296493.git.sayalip@linux.ibm.com Fixes: e487a5d513cb ("selftest/mm: make hugetlb_reparenting_test tolerant to async reparenting") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: restore default nr_hugepages value via exit trap in ↵Sayali Patil1-2/+2
hugetlb_reparenting_test.sh The test modifies nr_hugepages during execution and restores it from cleanup() and again reconfigure it setup, which is invoked multiple times across test flow. This can lead to repeated allocation/freeing of hugepages. With set -e, failures in cleanup (e.g., rmdir/umount) can also cause early exit before restoring the original value at the end. Move restoration of the original nr_hugepages value to a trap handler registered for EXIT, INT, and TERM signals so it is always restored on all exit paths. This also avoids unnecessary allocation churn across repeated cleanup/setup cycles. Link: https://lore.kernel.org/29db637c3c6ba6c168f6b33f59f059a0b39c35c8.1779296493.git.sayalip@linux.ibm.com Fixes: 585a9145886a ("selftests/mm: restore default nr_hugepages value during cleanup in hugetlb_reparenting_test.sh") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Acked-by: Zi Yan <ziy@nvidia.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: fix hugetlb pathname construction in charge_reserved_hugetlb.shSayali Patil1-13/+29
The charge_reserved_hugetlb.sh script assumes hugetlb cgroup memory interface file names use the "<size>MB" format (e.g. hugetlb.1024MB.current). This assumption breaks on systems with larger huge pages such as 1GB, where the kernel exposes normalized units: hugetlb.1GB.current hugetlb.1GB.max hugetlb.1GB.rsvd.max ... As a result, the script attempts to access files like hugetlb.1024MB.current, which do not exist when the kernel reports the size in GB. Normalize the huge page size and construct the pathname using the appropriate unit (MB or GB), matching the hugetlb controller naming. Link: https://lore.kernel.org/04b6b49e4a2acf46319f627caf82b09e6dc1ad7f.1779296493.git.sayalip@linux.ibm.com Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting") Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: restore default nr_hugepages value via exit trap in ↵Sayali Patil1-2/+1
charge_reserved_hugetlb.sh Patch series "selftests/mm: fix failures and robustness improvements", v7. Powerpc systems with a 64K base page size exposed several issues while running mm selftests. Some tests assume specific hugetlb configurations, use incorrect interfaces, or fail instead of skipping when the required kernel features are not available. This series fixes these issues and improves test robustness. This patch (of 13): cleanup() resets nr_hugepages to 0 on every invocation, while the test reconfigures it again in the next iteration. This leads to repeated allocation and freeing of large numbers of hugepages, especially when the original value is high. Additionally, with set -e, failures in earlier cleanup steps (e.g., rmdir or umount returning EBUSY while background activity is still ongoing) can cause the script to exit before restoring the original value, leaving the system in a modified state. Introduce a trap on EXIT, INT, and TERM to restore the original nr_hugepages value once at script termination. This avoids unnecessary allocation churn and ensures the original value is reliably restored on all exit paths. Link: https://lore.kernel.org/cover.1779296493.git.sayalip@linux.ibm.com Link: https://lore.kernel.org/5b8fbb29cd6ceffe6752e0af104f60cec072aa10.1779296493.git.sayalip@linux.ibm.com Fixes: 7d695b1c3695 ("selftests/mm: save and restore nr_hugepages value") Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> Acked-by: Zi Yan <ziy@nvidia.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: fix incorrect mmap() error handling with NULL instead of ↵Hongfu Li4-5/+5
MAP_FAILED mmap() returns MAP_FAILED, which is defined as (void *)-1, on error, not NULL. Several selftests incorrectly check the return value of mmap() using !ptr or ptr == NULL, which would erroneously treat MAP_FAILED as a valid pointer since MAP_FAILED is non-zero and non-NULL. This can lead to segfaults when mmap() actually fails under memory pressure. Link: https://lore.kernel.org/20260513025223.592766-1-lihongfu@kylinos.cn Signed-off-by: Hongfu Li <lihongfu@kylinos.cn> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: run_vmtests.sh: drop detection and setup of HugeTLBMike Rapoport (Microsoft)1-118/+7
All the tests that use HugeTLB can detect and setup HugeTLB pages on their own. Drop detection and setup of HugeTLB from run_vmtests.sh. Link: https://lore.kernel.org/20260511162840.375890-56-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: run_vmtests.sh: free memory if available memory is lowMike Rapoport (Microsoft)1-12/+6
Currently when running THP and HugeTLB tests, if HAVE_HUGEPAGES is set run_test() drops caches, compacts memory and runs the test. But if HAVE_HUGEPAGES is not set it skips the tests entirely, even if THP tests have nothing to do with HAVE_HUGEPAGES. Replace the check if HAVE_HUGEPAGES is set with a check of how much memory is available. If there is less than 256 MB of available memory, drop caches and run compaction and then continue to run a test regardless of HAVE_HUGEPAGES value. Link: https://lore.kernel.org/20260511162840.375890-55-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: va_high_addr_switch.sh: drop huge pages setupMike Rapoport (Microsoft)1-40/+1
Since va_high_addr_switch takes care of setting up huge pages, there is no need to set them up in the va_high_addr_switch.sh wrapper script. Link: https://lore.kernel.org/20260511162840.375890-54-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: va_high_addr_switch: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-1/+1
va_high_addr_switch skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-53-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: uffd-wp-mremap: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-4/+4
uffd-wp-remap skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-52-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: uffd-unit-tests: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-5/+28
uffd-unit-tests skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Replace exit() calls with _exit() to avoid restoring HugeTLB settings in the middle of test. Link: https://lore.kernel.org/20260511162840.375890-51-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: uffd-stress: use hugetlb_save and alloc huge pagesMike Rapoport (Microsoft)1-3/+6
uffd-stress skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-50-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: thuge-gen: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-83/+12
thuge-gen skips tests if there are no free huge pages prepared by a wrapper script and shm liimts in proc are too low. Replace custom detection of huge pages with the library functions and add setup of HugeTLB pages and shm limits to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-49-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: protection_keys: use library code for HugeTLB setupMike Rapoport (Microsoft)1-36/+14
protection_keys open codes setup of HugeTLB pages. Replace it with the library functions from hugepage_setup. Replace exit() calls with _exit() to avoid restoring HugeTLB settings in the middle of test. Link: https://lore.kernel.org/20260511162840.375890-48-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: pagemap_ioctl: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-4/+9
pagemap-ioctl skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-47-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: migration: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-0/+21
migration skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Since kselftest_harness runs fixture setup and the tests in child processes, use HUGETLB_SETUP_DEFAULT_PAGES() that defines a constructor that runs in the main process and add verification that there are enough free huge pages to the tests that use them. Reset signal handlers to defaults in FIXTURE_SETUP() so that sending SIGTERM and SIGHUP during the tests won't cause restoration of HugeTLB settings. Link: https://lore.kernel.org/20260511162840.375890-46-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-vmemmap: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-0/+3
hugetlb-vmemmap test fails if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-45-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-soft-offline: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-37/+8
hugetlb-soft-offline test uses open coded access to /proc to determine availability of huge pages and fails if there are no enough free huget pages.. Replace open coded access to /proc with hugepage helpers and add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-44-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-shm: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-0/+22
hugetlb-shm test fails if there are no free huge pages prepared by a wrapper script and shm liimts in proc are too low. Add setup of HugeTLB pages and shm limits to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-43-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-mremap: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-1/+12
hugetlb-mremap test fails if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-42-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-mmap: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-4/+11
hugetlb-mmap test fails if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-41-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb_madv_vs_map: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-5/+2
hugetlb_madv_vs_map test skips testing if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-40-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-madvise: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-8/+2
hugetlb-madvise test skips testing if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-39-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb_fault_after_madv: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-6/+2
hugetlb_fault_after_madv test skips testing if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-38-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugepage_dio: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-8/+2
hugepage_dio test fails if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-37-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hmm-tests: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-10/+12
hmm-tests skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Since kselftest_harness runs fixture setup and the tests in child processes, use HUGETLB_SETUP_DEFAULT_PAGES() that defines a constructor that runs in the main process and add verification that there are enough free huge pages to the tests that use them. Replace exit() calls with _exit() to avoid restoring HugeTLB settings in the middle of test and use SIGKILL to kill a child process in hmm_cow_in_device test to avoid interference with signal handlers in hugepage_restore_settings(). Link: https://lore.kernel.org/20260511162840.375890-36-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: gup_test: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-0/+15
gup_test fails to run HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-35-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: gup_longterm: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-1/+1
gup_longterm tests skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-34-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: cow: add setup of HugeTLB pagesMike Rapoport (Microsoft)1-3/+3
cow tests skips HugeTLB tests if there are no free huge pages prepared by a wrapper script. Add setup of HugeTLB pages to the test and make sure that the original settings are restored on the test exit. Link: https://lore.kernel.org/20260511162840.375890-33-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: compaction_test: use HugeTLB helpers ...Mike Rapoport (Microsoft)1-98/+17
... instead of open coded access of HugeTLB parameters via /proc. Link: https://lore.kernel.org/20260511162840.375890-32-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: vm_util: add helpers to set and restore shm limitsMike Rapoport (Microsoft)2-0/+37
hugetlb-shm and thuge-gen tests require that limits defined by /proc/sys/kernel/{shmmax,shmall} should be higher than certain values. Add helpers that allow setting these limits and restoring their settings on a test exit. They will be used later in hugetlb-shm and thuge-gen. Link: https://lore.kernel.org/20260511162840.375890-31-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: move read_file(), read_num() and write_num() to vm_utilMike Rapoport (Microsoft)4-45/+42
These are useful helpers for writing and reading sysfs and proc files. Make them available to the tests that don't use thp_settings. While on it make write_num() use "%lu" instead of "%ld" to match 'unsigned long num' argument type. Link: https://lore.kernel.org/20260511162840.375890-30-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugepage_settings: add APIs for HugeTLB setup and teardownMike Rapoport (Microsoft)2-23/+213
A lot of tests require free HugeTLB pages. Some need just a few default huge pages, some need a certain amount of memory available as HugeTLB, and some just skip lots of tests if huge pages of all supported sizes are not available. This all resulted in a huge mess in run_vmtests.sh that sets up some huge pages, adjusts them later and restores some of the settings if the stars align. Add APIs that allow saving the state of HugeTLB and setting up the desired amount of HugeTLB pages. Saving the state also registers atexit() callback and signal handler that will ensure restoration of HugeTLB state. Since many tests use both HugeTLB and THP, the atexit() callbacks and signal handler are restoring both. For kselftest_harness tests that run fixture setups and test in child processes add a constructor that will save and restore settings in the main process. Link: https://lore.kernel.org/20260511162840.375890-29-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugepage_settings: rename and rework get_free_hugepages()Mike Rapoport (Microsoft)7-28/+28
... to hugetlb_free_default_pages() for consistency with hugetlb_nr_default_pages(). Make hugetlb_free_default_pages() use hugetlb_sysfs_path() helper instead of parsing /proc/meminfo. Link: https://lore.kernel.org/20260511162840.375890-28-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugepage_settings: add APIs to get and set nr_hugepagesMike Rapoport (Microsoft)2-0/+48
Add APIs that allow reading and writing of /sys/kernel/mm/hugepages/hugepages-NkB/nr_hugepages to detect and change the amount of HugeTLB pages of different sizes. Link: https://lore.kernel.org/20260511162840.375890-27-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugepage_settings: use unsigned long in detect_hugetlb_page_sizeMike Rapoport (Microsoft)5-7/+7
... instead of size_t to avoid type mismatch in 32 and 64 bit builds. Link: https://lore.kernel.org/20260511162840.375890-26-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: move HugeTLB helpers to hugepage_settingsMike Rapoport (Microsoft)16-71/+88
Move library functions that abstract HugeTLB /proc and /sysfs access from vm_util to hugepage_settings. This will help creating common helpers that save and restore HugeTLB and THP settings. Link: https://lore.kernel.org/20260511162840.375890-25-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: rename thp_settings.[ch] to hugepage_settings.[ch]Mike Rapoport (Microsoft)14-17/+17
... for upcoming addition of HugeTLB helpers. Link: https://lore.kernel.org/20260511162840.375890-24-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: add atexit() and signal handlers to thp_settingsMike Rapoport (Microsoft)5-78/+38
khugepaged registers atexit() and signal handlers that ensure that THP settings are restored regardless of how the test exited. Make these handlers available for all users of thp_settings. The call to thp_save_settings() installs thp_restore_settings as the atexit() callback and makes sure that signals that kill a process would still call exit() and atexit() callback. Update child process in khugepaged tests using thp_settings to use _exit() instead of exit() to avoid altering THP settings in the middle of a test. Remove redundant THP cleanup from folio_split_race_test.c. Link: https://lore.kernel.org/20260511162840.375890-23-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: va_high_addr_switch: use kselftest frameworkMike Rapoport (Microsoft)1-21/+20
Convert va_high_addr_switch test to use kselftest framework for reporting and tracking successful and failing runs. Link: https://lore.kernel.org/20260511162840.375890-22-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: uffd-unit-tests: use kselftest frameworkMike Rapoport (Microsoft)1-53/+52
Convert uffd-unit-tests to use kselftest framework for reporting and tracking successful and failing runs. Link: https://lore.kernel.org/20260511162840.375890-21-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: uffd-stress: use kselftest frameworkMike Rapoport (Microsoft)1-21/+19
Convert uffd-stress test to use kselftest framework for reporting and tracking successful and failing runs. Link: https://lore.kernel.org/20260511162840.375890-20-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: uffd-common: use kselftest frameworkMike Rapoport (Microsoft)1-9/+8
Update err() and errexit() to use ksft_print_msg() and ksft_exit_fail(). This is preparatory change required to update userfaulfd tests to use kselftest framework. Link: https://lore.kernel.org/20260511162840.375890-19-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Luiz Capitulino <luizcap@redhat.com> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: protection_keys: use kselftest frameworkMike Rapoport (Microsoft)2-18/+23
Convert protection_keys test to use kselftest framework for reporting and tracking successful and failing runs. Adjust dprintf0() printouts to use "#" in the beginning of the line for TAP compatibility. Link: https://lore.kernel.org/20260511162840.375890-18-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: protection_keys: use descriptive test names in the outputMike Rapoport (Microsoft)1-24/+31
Replace the numeric test index in TAP output with the actual test function name. Use a structure containing function pointer and its name rather than only the function pointer in the pkey_tests array. Link: https://lore.kernel.org/20260511162840.375890-17-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: ksm_tests: use kselftest frameworkMike Rapoport (Microsoft)1-99/+81
Convert ksm_tests to use kselftest framework for reporting and tracking successful and failing runs. Link: https://lore.kernel.org/20260511162840.375890-16-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests-mm-khugepaged-use-ksefltest-framework-fixMike Rapoport1-1/+1
make the output TAP-compatible Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: khugepaged: use kselftest frameworkMike Rapoport (Microsoft)1-189/+132
Convert khugepaged tests to use kselftest framework for reporting and tracking successful and failing runs. The conversion is mostly about replacing printf()/perror() + exit() pairs with their ksft_ counterparts. The nice colored success and failure indications are left intact. Replace the progress report in collapse_compound_extreme() with a single ksft_print_msg() to avoid headache with formatting and make the test output more concise. [ziy@nvidia.com: update for "Remove CONFIG_READ_ONLY_THP_FOR_FS and enable file THP for writable files", v6] https://lore.kernel.org/20260517135416.1434539-1-ziy@nvidia.com Link: https://lore.kernel.org/20260511162840.375890-15-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: Li Wang <li.wang@linux.dev> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: khugepaged: group tests in an arrayMike Rapoport (Microsoft)1-7/+36
Currently khugepaged decides if a test can run using TEST() macro that checks what mem_ops and collapse_context are set by the command line arguments. For better compatibility with ksefltest framework, add an array of 'struct test_case's and redefine TEST() macro to conditionally add enabled tests to that array. Then execute the enabled test by looping the test_case's array. Link: https://lore.kernel.org/20260511162840.375890-14-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-read-hwpoison: use kselftest frameworkMike Rapoport (Microsoft)1-60/+55
Convert hugetlb-read-hwpoison test to use kselftest framework for reporting and tracking successful and failing runs. Link: https://lore.kernel.org/20260511162840.375890-13-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Li Wang <li.wang@linux.dev> Reviewed-by: Li Wang <li.wang@linux.dev> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb_madv_vs_map: use kselftest frameworkMike Rapoport (Microsoft)1-9/+9
Convert hugetlb_madv_vs_map test to use kselftest framework for reporting and tracking successful and failing runs. Link: https://lore.kernel.org/20260511162840.375890-12-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-madvise: use kselftest frameworkMike Rapoport (Microsoft)1-122/+82
Convert hugetlb-madvise test to use kselftest framework for reporting and tracking successful and failing runs. While on it fix the check for base page size detection to actually use base_page_size. Link: https://lore.kernel.org/20260511162840.375890-11-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-vmemmap: use kselftest frameworkMike Rapoport (Microsoft)1-24/+18
Convert hugetlb-vmemmap test to use kselftest framework for reporting and tracking successful and failing runs. Link: https://lore.kernel.org/20260511162840.375890-10-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-shm: use kselftest frameworkMike Rapoport (Microsoft)1-25/+22
Convert hugetlb-shm test to use kselftest framework for reporting and tracking successful and failing runs. Link: https://lore.kernel.org/20260511162840.375890-9-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: rename hugepage-* tests to hugetlb-*Mike Rapoport (Microsoft)8-12/+16
hugepage could mean both THP and HugeTLB these days. Rename hugepage-* tests for HugeTLB to hugetlb-* to avoid confusion. Make sure that Makefile update keeps alphabetical ordering of the TEST_GEN_FILES entries. Keep old binary names in .gitignore because Linus prefers it this way. Link: https://lore.kernel.org/20260511162840.375890-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Li Wang <li.wang@linux.dev> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: merge map_hugetlb into hugepage-mmapMike Rapoport (Microsoft)4-115/+82
Both tests create a hugettlb mapping, fill it with data and verify the data, the only difference is that one uses file-backed memory and another one uses anonymous memory. Merge both tests into a single file. Link: https://lore.kernel.org/20260511162840.375890-7-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: run_vmtests.sh: don't gate THP and KSM tests on HAVE_HUGEPAGESMike Rapoport (Microsoft)1-13/+9
HAVE_HUGEPAGES indicates availability of free HugeTLB pages. It should not be used to gate KSM test that merges transparent huge pages or split_huge_page_test. Remove check for HAVE_HUGEPAGES when running these tests. Link: https://lore.kernel.org/20260511162840.375890-6-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Li Wang <li.wang@linux.dev> Reviewed-by: Li Wang <li.wang@linux.dev> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: migration: properly cleanup fork()ed processesMike Rapoport (Microsoft)1-12/+35
Several migration test use fork() to create worker processes. These processes are later killed, but nothing collects their exit status and they remain as zombies in the system. Add a helper function that kills the worker processes, waitpid()s for them and verifies the exit status. Replace the loops that call kill() for each process with a call to that helper. Link: https://lore.kernel.org/20260511162840.375890-5-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reported-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: migration: make nthreads represent number of working threadsMike Rapoport (Microsoft)1-31/+16
Fixture setup sets self->nthreads to number of available CPUs minus 1 and then each test creates 'self->nthreads - 1' threads or processes, so essentially nthreads counts the worker tasks and the main task. Make nthreads represent the number of spawned tasks to simplify thread/process creation and teardown. While on it, make the fixture setup skip the tests if there are not enough CPUs or NUMA nodes instead of checking this in each test. Link: https://lore.kernel.org/20260511162840.375890-4-rppt@kernel.org Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: migration: don't assume huge page is TWOMEGMike Rapoport (Microsoft)1-12/+32
migration tests presume that both THP and HugeTLB huge pages are 2MB. Add dynamic detection of huge page size with read_pmd_pagesize() for THP and with default_huge_page_size() for HugeTLB. Link: https://lore.kernel.org/20260511162840.375890-3-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Li Wang <li.wang@linux.dev> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Sarthak Sharma <sarthak.sharma@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: hugetlb-read-hwpoison: add SIGBUS handlerMike Rapoport (Microsoft)1-0/+6
Patch series "make MM selftests more CI friendly", v4. There's a lot of dancing around HugeTLB settings in run_vmtests.sh. Some test need just a few default huge pages, some require at least 256 MB, and some just skip lots of tests if huge pages of all supported sizes are not available. The goal of this set is to make tests deal with HugeTLB setup and teardown. There are already convenient helpers that allow easy reading and writing of /proc and /sysfs, so adding a few APIs that will detect and update HugeTLB settings shouldn't be a big deal. But these nice helpers use kselftest framework, and many of HugeTLB (and even THP) test don't, so as a result this patchset also includes a lot of churn for conversion of those tests to kselftest framework (patches 7-19). The series break out: patches 1-5: small fixes patch 6: merge of hugetlb mmap tests patch 7: renaming of hugepage-* to hugetlb-* patches 8-21: mechanical conversion to kselftest framework patches 22-28: extension of thp_settings to hugepage_settings to also include HugeTLB helpers patches 29-30: add helpers for setting up SHM limits in hugetlb-shm and thuge-gen tests patches 31-53: integrate the new APIs in all the tests that use HugeTLB patches 54-55: drop HugeTLB setup from run_vmtests.sh This patch (of 55): Injection of a memory error with madvise() causes SIGBUS, which terminates the hugetlb-read-hwpoison test prematurely. Add a dummy SIGBUS handler to allow the test to continue regardless of SIGBUS. Link: https://lore.kernel.org/20260511162840.375890-1-rppt@kernel.org Link: https://lore.kernel.org/20260511162840.375890-2-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Li Wang <li.wang@linux.dev> Reviewed-by: Li Wang <li.wang@linux.dev> Tested-by: Luiz Capitulino <luizcap@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Leon Romanovsky <leon@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Nico Pache <npache@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: add writable-file collapse tests for khugepagedZi Yan1-26/+85
collapse_file() now supports collapsing clean pagecache folios from writable files, so add corresponding tests. Note that madvise(MADV_COLLAPSE) works for dirty pagecache folios from writable files, because collapse_single_pmd() triggers a synchronous writeback when first attempt of collapse_file() fails. That writeback makes dirty folios clean and the retry of collapse_file() succeeds. Link: https://lore.kernel.org/20260517135416.1434539-15-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: remove READ_ONLY_THP_FOR_FS code from guard-regionsZi Yan1-14/+4
Any file system with large folio support and the supported orders include PMD_ORDER can be used. There is no need to open a file with read-only. Link: https://lore.kernel.org/20260517135416.1434539-13-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: remove READ_ONLY_THP_FOR_FS in khugepagedZi Yan2-41/+96
Change the requirement to a file system with large folio support and the supported order needs to include PMD_ORDER. Also add tests of opening a file with read write permission and populating folios with writes. Reuse the XFS image from split_huge_page_test. Link: https://lore.kernel.org/20260517135416.1434539-12-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Jan Kara <jack@suse.cz> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/proc: add /proc/pid/smaps tearing testsSuren Baghdasaryan1-45/+133
Add tearing tests for /proc/pid/smaps file. New tests reuse the same logic as with maps file but skipping all the data except for the VMA addresses, which are the only part relevant for the tearing tests. Skip PROCMAP_QUERY parts of the tests because smaps does not implement that ioctl. Link: https://lore.kernel.org/20260426062718.1238437-4-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <liam@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/proc: ensure the test is performed at the right page boundarySuren Baghdasaryan1-19/+100
When running tearing tests we need to ensure the pages we use include VMAs that were mapped by the child process for this test. Currently we always use the first two pages, checking VMAs at their boundaries and this works, however once we add tests for /proc/pid/smaps, the first two pages might not contain the VMAs that child modifies. Locate the page that contains the first VMA mapped by the child and use that and the next page for the test. Link: https://lore.kernel.org/20260426062718.1238437-3-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Liam R. Howlett <liam@infradead.org> Cc: Jann Horn <jannh@google.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/sysfs.sh: test pause file existenceSeongJae Park1-0/+1
sysfs.sh DAMON selftest is not testing the existence of the 'pause' sysfs file. Add the test. Link: https://lore.kernel.org/20260522154026.80546-15-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/sysfs.sh: test addr_unit file existenceSeongJae Park1-0/+1
sysfs.sh DAMON selftest is not testing the existence of addr_unit sysfs file. Add the test. Link: https://lore.kernel.org/20260522154026.80546-14-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/sysfs.sh: test monitoring intervals goal dirSeongJae Park1-0/+12
sysfs.sh DAMON selftest is not testing monitoring intervals goal directory. Add the test. Link: https://lore.kernel.org/20260522154026.80546-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/sysfs.py: stop kdamonds before failingSeongJae Park1-0/+4
When an assertion is failed, sysfs.py DAMON selftest immediately exits the test program leaving the DAMON running behind. Many of the following tests need to start DAMON on their own. But because DAMON that was started by sysfs.py is still running, those start attempts fail, and the tests are failed or skipped. Update sysfs.py to stop DAMON before exiting the test program due to the assertion failure. Link: https://lore.kernel.org/20260522154026.80546-12-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/vma: eliminate mmap_action->error_hook, introduce error_filterLorenzo Stoakes1-6/+3
Rather than providing a hook, simplify things by providing the ability to filter errors. This allows us to more carefully validate the value provided and thus ensure only a valid error code is specified, and simplifies the interface. This way, we eliminate all hooks but mmap_prepare and allow only mmap actions to be specified (which core mm controls). This significantly improves robustness and eliminates any unnecessary code duplication in driver mmap hooks. We also update the /dev/mem logic (the only user) to use mmap_action->error_filter instead. Link: https://lore.kernel.org/e770b28427937057fa953ac380a134b24acd8bb4.1779462249.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/vma: remove mmap_action->success_hookLorenzo Stoakes1-10/+0
This hook was introduced to work around code that seemed to absolutely require access to a VMA pointer upon mmap(). However, providing this hook leaves a backdoor to drivers getting access to the very thing mmap_prepare eliminates - a pointer to the VMA. Let's solve this contradiction by removing it. The key intended user was hugetlb, however it seems that the best course now is to avoid allowing all drivers the ability to work around mmap_prepare, and find a different solution there. Link: https://lore.kernel.org/2521c19866f3f10f9085d094cc4f06769042be71.1779462249.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysdrivers/char/mem: eliminate unnecessary use of success_hookLorenzo Stoakes1-0/+1
Patch series "remove mmap_action success, error hooks", v2. The mmap_action->success_hook was a strange beast added to enable code which appeared to absolutely require access to a VMA pointer to work correctly. Primarily this was for hugetlb, however a different approach will be taken there, as clearly more work is required to figure out a sensible way of converting hugetlb to use mmap_prepare. The other user was the memory char driver, specifically /dev/zero which has the unusual property of explicitly setting file-backed VMAs anonymous. Providing the success hook was always foolish, as it allowed drivers a way to workaround the restriction that they should not access a pointer to a not-yet-correctly-initialised VMA - which defeats the purpose of the mmap_prepare work. We can achieve the same thing in memory char driver without needing the success hook, so this series removes that, then removes the success hook altogether. The error hook is also unnecessary - the motivation for this was for functions which need to filter the error code when performing an mmap action in order to avoid breaking userspace. We can achieve this by just providing a field for the error code. Doing this means we don't have to worry about the hook doing anything odd. We also add a check to ensure the error code is in fact valid. Again the memory char driver is the only current user of this, so this series updates it to use that. After this change mmap_action has no custom hooks at all, which seems rather more cromulent than before. This patch (of 3): /dev/zero, uniquely, marks memory mapped there as anonymous. This is currently achieved using the mmap_action->success_hook. However this hook circumvents the abstraction of VMA initialisation so it's preferable to do things a different way. To achieve this, this patch firstly defaults the VMA descriptor's vm_ops field to the dummy VMA operations, which is what file-backed VMAs default this field to. That way, we can detect whether a driver sets this field to NULL in order to mark it anonymous. We then introduce vma_desc_set_anonymous() to do this explicitly, and invoke it in mmap_zero_prepare(). This way, any driver which does not explicitly set desc->vm_ops, retains the dummy vm_ops as they would previously. We also update set_vma_user_defined_fields() to make clear that we are either setting vma->vm_ops to what is provided by the driver (or defaulting to dummy_vm_ops if not set), or setting the VMA anonymous. This lays the groundwork for removing the success hook. Link: https://lore.kernel.org/cover.1779462249.git.ljs@kernel.org Link: https://lore.kernel.org/5d1e8bd29d6e070218ba7a03461df562e372b91e.1779462249.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David Hildenbrand <david@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm/split_huge_page_test.c: close fd on write errorWei Yang1-1/+1
When create_pagecache_thp_and_fd() write returns error on /proc/sys/vm/dropcache, it just "goto err_out_unlink", which left fd still open. Use "goto err_out_close" to close the fd. Link: https://lore.kernel.org/20260520020336.28914-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Lance Yang <lance.yang@linux.dev> Cc: "Liam R. Howlett" <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/sysfs.sh: test probes dirSeongJae Park1-0/+48
Add simple existence tests for data probes sysfs directories and files. Link: https://lore.kernel.org/20260518234119.97569-20-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daystools/mm/page-types: fix kpageflags option argument in getopt_longYe Liu1-1/+1
The --kpageflags option requires an argument to specify the kpageflags file path, but has_arg was set to 0 (no_argument) in the long options table. Change it to 1 (required_argument) so getopt_long correctly parses the argument. Link: https://lore.kernel.org/20260513022120.58033-4-ye.liu@linux.dev Signed-off-by: Ye Liu <liuye@kylinos.cn> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daystools/mm/page-types: fix ternary operator precedence in sigbus handlerYe Liu1-1/+1
The ternary operator (?:) has lower precedence than addition (+), so the expression `off + sigbus_addr ? sigbus_addr - ptr : 0` was parsed as `(off + sigbus_addr) ? (sigbus_addr - ptr) : 0` rather than the intended `off + (sigbus_addr ? sigbus_addr - ptr : 0)`. Add explicit parentheses to ensure the correct evaluation order. Link: https://lore.kernel.org/20260513022120.58033-3-ye.liu@linux.dev Signed-off-by: Ye Liu <liuye@kylinos.cn> Acked-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daystools/mm/page-types: fix typo in madvise() error messageYe Liu1-2/+2
Patch series "tools/mm/page-types: Fix misc bugs". This series fixes three issues in tools/mm/page-types.c: 1. Fix two typos in madvise() error messages ("madvice" -> "madvise") 2. Fix operator precedence bug in the sigbus handler where the ternary operator binds looser than addition, producing incorrect offset calculation when sigbus_addr is non-NULL 3. Fix --kpageflags option declaration in getopt_long: has_arg should be 1 (required_argument) since the option requires a file path This patch (of 3): Two error messages incorrectly spelled the madvise() function name as "madvice". Fix the typo in both occurrences. Link: https://lore.kernel.org/20260513022120.58033-1-ye.liu@linux.dev Link: https://lore.kernel.org/20260513022120.58033-2-ye.liu@linux.dev Signed-off-by: Ye Liu <liuye@kylinos.cn> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: ksm-functional-tests: fix partial write handlingVineet Agarwal1-8/+11
Update write() checks to properly detect and handle partial writes. Previously, the write() calls used <= 0 to detect failure. This condition is never true for partial writes (ret > 0 but ret < len), so partial writes were silently treated as success. Fix this by verifying that write() returns the full expected length and treating any mismatch as failure. Link: https://lore.kernel.org/20260504081638.683223-1-agarwal.vineet2006@gmail.com Signed-off-by: Vineet Agarwal <agarwal.vineet2006@gmail.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: check file initialization writes in split_huge_page_testVineet Agarwal1-3/+7
create_pagecache_thp_and_fd() fills the backing file for the pagecache THP tests using repeated write() calls, but the return value is never checked. If a write fails or completes only partially, the test may continue with an incompletely initialized file and produce misleading results. Check the result of write() and fail the test if the expected number of bytes was not written. [akpm@linux-foundation.org: remove unneeded local, per David] Link: https://lore.kernel.org/da82de92-29d8-457c-9f65-40fc4900b922@kernel.org Link: https://lore.kernel.org/20260512074924.27721-1-agarwal.vineet2006@gmail.com Signed-off-by: Vineet Agarwal <agarwal.vineet2006@gmail.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Vineet Agarwal <agarwal.vineet2006@gmail.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: fix mmap() return value check in run_migration_benchmarkHongfu Li1-1/+1
mmap() returns MAP_FAILED on error, not NULL. The current check uses !buffer->ptr, which evaluates to false when mmap() fails (since MAP_FAILED is (void *)-1, not 0), so the error path is never taken. Link: https://lore.kernel.org/20260512101305.139509-1-lihongfu@kylinos.cn Signed-off-by: Hongfu Li <lihongfu@kylinos.cn> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Dev Jain <dev.jain@arm.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysfooAndrew Morton19-204/+608
3 daystools headers UAPI: sync linux/taskstats.h for procacct.cWang Yaxin1-0/+128
After commit 9b93f7e32774 ("tools/getdelays: use the static UAPI headers from tools/include/uapi"), the Makefile was changed to use -I../include/uapi/ instead of -I../../usr/include to ensure tools always use the up-to-date UAPI headers. However, only linux/taskstats.h was added to tools/include/uapi/ in commit e5bbb35a07b3 ("tools headers UAPI: sync linux/taskstats.h"), but linux/acct.h was missing. This causes procacct.c to fail to compile with: procacct.c:234:37: error: 'AGROUP' undeclared (first use in this function) gcc -I../include/uapi/ getdelays.c -o getdelays gcc -I../include/uapi/ procacct.c -o procacct procacct.c: In function `print_procacct': procacct.c:234:37: error: `AGROUP' undeclared (first use in this function) did you mean `NOGROUP'? 234 | , t->version >= 12 ? (t->ac_flag & AGROUP ? 'P' : 'T') : '?' | ^~~~~~ | NOGROUP procacct.c:234:37: note: each undeclared ident because procacct.c uses the AGROUP macro defined in linux/acct.h. Add the missing linux/acct.h to complete the static UAPI header set. Link: https://lore.kernel.org/20260527213558929EhiHHy9EDTMjmg3uuDOMi@zte.com.cn Fixes: 9b93f7e32774 ("tools/getdelays: use the static UAPI headers from tools/include/uapi") Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/perf_events: fix mmap() error check in sigtrap_threadsHongfu Li1-1/+1
In sigtrap_threads(), the return value of mmap() is checked against NULL. mmap() returns MAP_FAILED, which is (void *)-1, not NULL, when it fails. Since MAP_FAILED is non-zero and non-NULL, the condition "p == NULL" will never be true on failure, causing the program to proceed with an invalid pointer and segfault if mmap() actually fails under memory pressure. Link: https://lore.kernel.org/20260513025838.594945-1-lihongfu@kylinos.cn Signed-off-by: Hongfu Li <lihongfu@kylinos.cn> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mickael Salaun <mic@digikod.net> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Kyle Huey <khuey@kylehuey.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayskselftest/filelock: add a .gitignore fileMark Brown1-0/+1
Tell git to ignore the generated binary for the test. Link: https://lore.kernel.org/20260226-selftest-filelock-ktap-v4-3-db8ae192ff42@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayskselftest/filelock: report each test in oftlocks separatelyMark Brown1-51/+39
The filelock test checks four different things but only reports an overall status, convert to use ksft_test_result() for these individual tests. Each test depends on the previous ones so we still bail out if any of them fail but we get a bit more information from UIs parsing the results. Link: https://lore.kernel.org/20260226-selftest-filelock-ktap-v4-2-db8ae192ff42@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 dayskselftest/filelock: use ksft_perror()Mark Brown1-2/+2
Patch series "selftests/filelock: Make output more kselftestish", v4. This series makes the output from the ofdlocks test a bit easier for tooling to work with, and also ignores the generated file while we're here. This patch (of 3): The ofdlocks test reports some errors via perror() which does not produce KTAP output, convert to ksft_perror() which does. Link: https://lore.kernel.org/20260226-selftest-filelock-ktap-v4-0-db8ae192ff42@kernel.org Link: https://lore.kernel.org/20260226-selftest-filelock-ktap-v4-1-db8ae192ff42@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/acct: add taskstats TGID retention testYiyang Chen3-2/+381
Add a kselftest for the taskstats TGID aggregation fix. The test creates a worker thread, snapshots TGID taskstats while the worker is still alive, lets the worker exit, and then verifies that the TGID CPU total does not regress after the thread has been reaped. The pass/fail check intentionally keys off ac_utime + ac_stime only, which is the primary user-visible regression fixed by the taskstats change and is less sensitive to scheduling noise than context-switch counters. Link: https://lore.kernel.org/0d55354911c54cd1b9f10a09f6fd378af85c8d43.1776094300.git.cyyzero16@gmail.com Signed-off-by: Yiyang Chen <cyyzero16@gmail.com> Acked-by: Balbir Singh <balbirs@nvidia.com> Cc: Dr. Thomas Orgis <thomas.orgis@uni-hamburg.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Wang Yaxin <wang.yaxin@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daystools/accounting/getdelays: fix -Wformat-truncation warning in format_timespecYiyang Chen1-7/+1
Reproduce with GCC 13.3.0: $ cd tools/accounting $ make This emits: getdelays.c: In function `format_timespec': getdelays.c:218:67: warning: `:' directive output may be truncated writing 1 byte into a region of size between 0 and 16 [-Wformat-truncation=] 218 | snprintf(buffer, sizeof(buffer), "%04d-%02d-%02dT%02d:%02d:%02d", | getdelays.c:218:9: note: `snprintf' output between 20 and 72 bytes into a destination of size 32 The problem is that %04d and %02d specify minimum field widths only. GCC cannot prove that formatting tm_year + 1900 and the other struct tm fields will always fit in the fixed 32-byte buffer, so it warns about possible truncation. Fix this by replacing the manual snprintf() formatting with strftime("%Y-%m-%dT%H:%M:%S", ...). That matches the data we already have in struct tm, keeps the intended timestamp format, and avoids the warning when building tools/accounting with GCC. Link: https://lore.kernel.org/87d9723e0b59d816ee2e4bd7cddd58a54c6c9f91.1776956545.git.cyyzero16@gmail.com Signed-off-by: Yiyang Chen <cyyzero16@gmail.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: add kmemleak verbose dedup testBreno Leitao2-0/+223
Add a regression test for the per-scan verbose dedup added in the preceding commit. The test loads samples/kmemleak's helper module (CONFIG_SAMPLE_KMEMLEAK=m) to generate orphan allocations, several of which share an allocation backtrace, runs four kmemleak scans with verbose printing enabled, then walks dmesg looking for two "unreferenced object" reports within a single scan that share an identical backtrace - which would mean dedup failed to collapse them. The test is intentionally permissive on detection but strict on regressions: - PASS when no duplicates are observed, regardless of whether the dedup summary line ("... and N more object(s) with the same backtrace") was actually emitted. Per-CPU chunk reuse, slab freelist pointers, kernel stack residue and CONFIG_DEBUG_KMEMLEAK_ AUTO_SCAN can all keep most of the orphans "still referenced" or reported across many separate scans, so the dedup path may have nothing to fold within one scan. That is not a regression. - PASS reports whether dedup actually fired, so a passing run on a well-behaved environment is still informative. - FAIL when two same-backtrace reports land in a single scan (clear dedup regression). - FAIL when kmemleak's own per-scan tally counts leaks but the verbose path emits zero "unreferenced object" lines - that catches a regression in the verbose printer itself, which would otherwise pass the duplicate check trivially. - SKIP when kmemleak is absent, disabled at runtime, or the helper module is not built. The dmesg parser anchors stack-frame matching to the indentation kmemleak uses for them (4+ spaces under "kmemleak: ") so unrelated kmemleak warnings landing between reports do not get lumped into the backtrace key and mask a duplicate. Link: https://lore.kernel.org/20260506-kmemleak_dedup-v3-2-2d36aafc34da@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/cgroup: include slab in test_percpu_basic memory checkLi Wang1-5/+6
test_percpu_basic() currently compares memory.current against only memory.stat:percpu after creating 1000 child cgroups. Observed failure: #./test_kmem ok 1 test_kmem_basic ok 2 test_kmem_memcg_deletion ok 3 test_kmem_proc_kpagecgroup ok 4 test_kmem_kernel_stacks ok 5 test_kmem_dead_cgroups memory.current 11530240 percpu 8440000 not ok 6 test_percpu_basic That assumption is too strict: child cgroup creation also allocates slab-backed metadata, so memory.current is expected to be larger than percpu alone. One visible path is: cgroup_mkdir() cgroup_create() cgroup_addrm_file() cgroup_add_file() __kernfs_create_file() __kernfs_new_node() kmem_cache_zalloc() These kernfs allocations are charged as slab and show up in memory.stat:slab. Update the check to compare memory.current against (percpu + slab) within MAX_VMSTAT_ERROR, and print slab/delta in the failure message to improve diagnostics. Link: https://lore.kernel.org/20260501022058.18024-3-li.wang@linux.dev Signed-off-by: Li Wang <li.wang@linux.dev> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Sayali Patil <sayalip@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/cgroup: fix hardcoded page size in test_percpu_basicLi Wang1-1/+1
Patch series "selftests/cgroup: Fix false positive failures in test_percpu_basic", v2. This patch series addresses two separate issues that cause false positive failures in the test_percpu_basic test within the cgroup kmem selftests. The first issue stems from a hardcoded assumption about the system page size, which breaks the test on architectures with larger page sizes. The second issue is an overly strict memory check that fails to account for the slab metadata allocated during cgroup creation. This patch (of 2): MAX_VMSTAT_ERROR uses a hardcoded page size of 4096, which assumes 4K pages. This causes test_percpu_basic to fail on systems where the kernel is configured with a larger page size, such as aarch64 systems using 16K or 64K pages, where the maximum permissible discrepancy between memory.current and percpu charges is proportionally larger. Replace the hardcoded 4096 with sysconf(_SC_PAGESIZE) to correctly derive the page size at runtime regardless of the underlying architecture or kernel configuration. Link: https://lore.kernel.org/20260501022058.18024-1-li.wang@linux.dev Link: https://lore.kernel.org/20260501022058.18024-2-li.wang@linux.dev Signed-off-by: Li Wang <li.wang@linux.dev> Acked-by: Waiman Long <longman@redhat.com> Reviewed-by: Sayali Patil <sayalip@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: khugepaged: initialize file contents via mmapVineet Agarwal1-3/+15
file_setup_area() currently allocates anonymous memory, fills it, and writes it into the backing file used for collapse testing. Instead of copying data through write(), resize the file with ftruncate(), map it directly with MAP_SHARED, and initialize the mapped area in place. This simplifies the setup path and avoids the need for explicit partial write handling. Link: https://lore.kernel.org/20260429115816.98824-1-agarwal.vineet2006@gmail.com Signed-off-by: Vineet Agarwal <agarwal.vineet2006@gmail.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Dev Jain <dev.jain@arm.com> Cc: Lance Yang <lance.yang@linux.dev> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/sysfs.py: pause DAMON before dumping statusSeongJae Park1-0/+38
The sysfs.py test commits DAMON parameters, dump the internal DAMON state, and show if the parameters are committed as expected using the dumped state. While the dumping is ongoing, DAMON is alive. It can make internal changes including addition and removal of regions. It can therefore make a race that can result in false test results. Pause DAMON execution during the state dumping to avoid such races. Link: https://lore.kernel.org/20260427151231.113429-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/sysfs.py: check pause on assert_ctx_committed()SeongJae Park1-0/+1
Extend sysfs.py tests to confirm damon_ctx->pause can be set using the pause sysfs file. Link: https://lore.kernel.org/20260427151231.113429-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/drgn_dump_damon_status: dump pauseSeongJae Park1-0/+1
drgn_dump_damon_status is not dumping the damon_ctx->pause parameter value, so it cannot be tested. Dump it for future tests. Link: https://lore.kernel.org/20260427151231.113429-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/damon/_damon_sysfs: support pause file stagingSeongJae Park1-1/+9
DAMON test-purpose sysfs interface control Python module, _damon_sysfs, is not supporting the newly added pause file. Add the support of the file, for future test and use of the feature. Link: https://lore.kernel.org/20260427151231.113429-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/madvise: reject invalid process_madvise() advice for zero-length vectorsfujunjie1-0/+28
process_madvise() used to validate the advice while walking each imported iovec. If the vector has zero total length, vector_madvise() does not enter the loop and can return success without checking whether the advice value is valid. For a local mm, such as process_madvise(PIDFD_SELF, ...), the remote-only process_madvise_remote_valid() check is skipped. As a result, an invalid advice can be reported as success when the vector has zero total length. This differs from madvise(), which rejects an invalid advice before returning success for a zero-length range. Validate the generic madvise behavior at the syscall-facing entry points before any vector walk. In process_madvise(), do this before the remote-only advice restriction so unsupported advice is rejected with the same priority for local and remote mm. Use an errno-returning helper for address/length validation, and handle zero-length ranges explicitly at the call sites. Requests with valid advice and zero total length remain a noop and continue to return 0. Add a selftest that covers invalid advice with a zero-length iovec and an empty vector, while also checking that a request with valid advice and zero length still succeeds. Link: https://lore.kernel.org/tencent_C3AEB0E769C5F4F9370F9411B69B7F8B2907@qq.com Fixes: 021781b01275 ("mm/madvise: unrestrict process_madvise() for current process") Signed-off-by: fujunjie <fujunjie1@qq.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysmm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme actionAsier Gutierrez1-5/+6
This patch set introces a new action: DAMOS_COLLAPSE. For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be working, since it relies on hugepage_madvise to add a new slot. This slot should be picked up by khugepaged and eventually collapse (or not, if we are using DAMOS_NOHUGEPAGE) the pages. If THP is not enabled, khugepaged will not be working, and therefore no collapse will happen. DAMOS_COLLAPSE eventually calls madvise_collapse, which will collapse the address range synchronously. In cases where there is a large VMA (databases, for example), DAMOS_COLLAPSE allows us to collapse only the hot region, and not the entire VMA. This new action may be required to support autotuning with hugepage as a goal[1]. ========= Benchmarks: ========= MySQL ===== Tests were performed in an ARM physical server with MariaDB 10.5 and sysbench. Read only benchmark was perform with gaussian row hitting, which follows a normal distribution. T n, D h: THP set to never, DAMON action set to hugepage T m, D h: THP set to madvise, DAMON action set to hugepage T n, D c: THP set to never, DAMON action set to collapse Memory consumption. Lower is better. +------------------+----------+----------+----------+ | | T n, D h | T m, D h | T n, D c | +------------------+----------+----------+----------+ | Total memory use | 2.13 | 2.20 | 2.20 | | Huge pages | 0 | 1.3 | 1.27 | +------------------+----------+----------+----------+ Performance in TPS (Transactions Per Second). Higher is better. T n, D h: 18225.58 T m, D h 18252.93 T n, D c: 18270.21 Performance counter I got the number of L1 D/I TLB accesses and the number a D/I TLB accesses that triggered a page walk. I divided the second by the first to get the percentage of page walkes per TLB access. The lower the better. +---------------+--------------+--------------+--------------+ | | T n, D h | T m, D h | T n, D c | +---------------+--------------+--------------+--------------+ | L1 DTLB | 127248242753 | 125431020479 | 125327001821 | | L1 ITLB | 80332558619 | 79346759071 | 79298139590 | | DTLB walk | 75011087 | 52800418 | 55895794 | | ITLB walk | 71577076 | 71505137 | 67262140 | | DTLB % misses | 0.058948623 | 0.042095183 | 0.044599961 | | ITLB % misses | 0.089100954 | 0.090117275 | 0.084821839 | +---------------+--------------+--------------+--------------+ Masim ===== I used masim with the "demo" configuration, but changing the times to 100 seconds for the initial phase and 50 seconds for the rest of the phases. Memory consumption: +------------------+----------+----------+----------+ | | T n, D h | T m, D h | T n, D c | +------------------+----------+----------+----------+ | Total memory use | 2.38 GB | 2.36 GB | 2.37 GB | | Huge pages | 0 | 190 MB | 188 MB | +------------------+----------+----------+----------+ Performance: THP never, DAMOS_HUGEPAGE initial phase: 40,491 accesses/msec, 100001 msecs run low phase 0: 39,658 accesses/msec, 50002 msecs run high phase 0: 41,678 accesses/msec, 50000 msecs run low phase 1: 39,625 accesses/msec, 50003 msecs run high phase 1: 41,658 accesses/msec, 50002 msecs run low phase 2: 39,642 accesses/msec, 50002 msecs run high phase 2: 41,640 accesses/msec, 50001 msecs run THP madvise, DAMOS_HUGEPAGE initial phase: 51,977 accesses/msec, 100000 msecs run low phase 0: 86,953 accesses/msec, 50000 msecs run high phase 0: 94,812 accesses/msec, 50000 msecs run low phase 1: 101,017 accesses/msec, 50000 msecs run high phase 1: 94,841 accesses/msec, 50000 msecs run low phase 2: 100,993 accesses/msec, 50000 msecs run high phase 2: 94,791 accesses/msec, 50001 msecs run THP never, DAMOS_COLLAPSE initial phase: 93,678 accesses/msec, 100001 msecs run low phase 0: 101,475 accesses/msec, 50000 msecs run high phase 0: 98,589 accesses/msec, 50000 msecs run low phase 1: 101,531 accesses/msec, 50001 msecs run high phase 1: 98,506 accesses/msec, 50001 msecs run low phase 2: 101,458 accesses/msec, 50001 msecs run high phase 2: 98,555 accesses/msec, 50000 msecs run Memory consumption dynamic (how quickly collapses occur): It shows in seconds how many huge pages are allocated. +----+----------+----------+ | | T m, D h | T n, D c | +----+----------+----------+ | 5 | 32 | 188 | | 10 | 48 | 188 | | 15 | 64 | 188 | | 20 | 96 | 188 | | 30 | 112 | 188 | | 35 | 144 | 188 | | 40 | 160 | 188 | | 45 | 190 | 188 | | 50 | 190 | 188 | | 55 | 190 | 188 | | 60 | 190 | 188 | +----+----------+----------+ ========= - We can see that DAMOS "hugepage" action works only when THP is set to madvise. "collapse" action works even when THP is set to never. - Performance for "collapse" action is slightly lower than "hugepage" action and THP madvise. This is due to the fact that collapases occur synchronously. With "hugepage" they may occur during page faults. - Memory consumption is slighly lower for "collapse" than "hugepage" with THP madvise. This is due to the khugepage collapses all VMAs, while "collapse" action only collapses the VMAs in the hot region. - There is an improvement in TLB utilization when collapse through "hugepage" or "collapse" actions are triggered. The amount of TLB misses is lower. - "collapse" action is performance synchronously, which means that page collapses happen earlier and more rapidly. This can be useful or not, depending on the scenario. - "hugepage" action may trigger a VMA split in some scenarios, since it needs to change the flag of the VMA to THP enabled. This may lead to additional overhead. Collapse action just adds a new option to chose the correct system balance. Link: https://lore.kernel.org/20260426231619.107231-5-sj@kernel.org Link: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/ [1] Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Cheng-Han Wu <hank20010209@gmail.com> Cc: David Hildenbrand <david@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam R. Howlett <liam@infradead.org> Cc: Liew Rui Yan <aethernet65535@gmail.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: simplify byte pattern checking in mremap_testDev Jain1-99/+10
The original version of mremap_test (7df666253f26: "kselftests: vm: add mremap tests") validated remapped contents byte-by-byte and printed a mismatch index in case the bytes streams didn't match. That was rather inefficient, especially also if the test passed. Later, commit 7033c6cc9620 ("selftests/mm: mremap_test: optimize execution time from minutes to seconds using chunkwise memcmp") used memcmp() on bigger chunks, to fallback to byte-wise scanning to detect the problematic index only if it discovered a problem. However, the implementation is overly complicated (e.g., get_sqrt() is currently not optimal) and we don't really have to report the exact index: whoever debugs the failing test can figure that out. Let's simplify by just comparing both byte streams with memcmp() and not detecting the exact failed index. Link: https://lore.kernel.org/20260415044509.579428-1-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reported-by: Sarthak Sharma <sarthak.sharma@arm.com> Tested-by: Sarthak Sharma <sarthak.sharma@arm.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: David Laight <david.laight.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: run the MAP_DROPPABLE selftestAnthony Yznaga2-1/+9
The test was not being run by the selftest framework so it was never noticed that it would fail with an assertion failure on configs without support for MAP_DROPPABLE. Update the test so that it is skipped instead when MAP_DROPPABLE is not supported, and add it to the mmap category so that the test is run by the framework. Link: https://lore.kernel.org/20260416033939.49981-4-anthony.yznaga@oracle.com Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Mark Brown <broonie@kernel.org> Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/mm: verify droppable mappings cannot be lockedAnthony Yznaga1-9/+75
For configs that support MAP_DROPPABLE verify that a mapping created with MAP_DROPPABLE cannot be locked via mlock(), and that it will not be locked if it's created after mlockall(MCL_FUTURE). Link: https://lore.kernel.org/20260416033939.49981-3-anthony.yznaga@oracle.com Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/cgroup: test_zswap: wait for asynchronous writebackLi Wang1-2/+26
zswap writeback is asynchronous, but test_zswap.c checks writeback counters immediately after reclaim/trigger paths. On some platforms (e.g. ppc64le), this can race with background writeback and cause spurious failures even when behavior is correct. Add wait_for_writeback() to poll get_cg_wb_count() with a bounded timeout, and use it in: test_zswap_writeback_one() when writeback is expected test_no_invasive_cgroup_shrink() for the wb_group check This keeps the original before/after assertion style while making the tests robust against writeback completion latency. No test behavior change, selftest stability improvement only. Link: https://lore.kernel.org/20260424040059.12940-9-li.wang@linux.dev Signed-off-by: Li Wang <li.wang@linux.dev> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Jiayuan Chen <jiayuan.chen@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Yosry Ahmed <yosry@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftest/cgroup: fix zswap attempt_writeback() on 64K pagesize systemLi Wang1-4/+4
In attempt_writeback(), a memsize of 4M only covers 64 pages on 64K page size systems. When memory.reclaim is called, the kernel prefers reclaiming clean file pages (binary, libc, linker, etc.) over swapping anonymous pages. With only 64 pages of anonymous memory, the reclaim target can be largely or entirely satisfied by dropping file pages, resulting in very few or zero anonymous pages being pushed into zswap. This causes zswap_usage to be extremely small or zero, making zswap_usage/4 insufficient to create meaningful writeback pressure. The test then fails because no writeback is triggered. On 4K page size systems this is not an issue because 4M covers 1024 pages, and file pages are a small fraction of the reclaim target. Fix this by: - Always allocating 1024 pages regardless of page size. This ensures enough anonymous pages to reliably populate zswap and trigger writeback, while keeping the original 4M allocation on 4K systems. - Setting zswap.max to zswap_usage/4 instead of zswap_usage/2 to create stronger writeback pressure, ensuring reclaim reliably triggers writeback even on large page size systems. === Error Log === # uname -rm 6.12.0-211.el10.ppc64le ppc64le # getconf PAGESIZE 65536 # ./test_zswap TAP version 13 1..7 ok 1 test_zswap_usage ok 2 test_swapin_nozswap ok 3 test_zswapin not ok 4 test_zswap_writeback_enabled ... Link: https://lore.kernel.org/20260424040059.12940-8-li.wang@linux.dev Signed-off-by: Li Wang <li.wang@linux.dev> Acked-by: Yosry Ahmed <yosry@kernel.org> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Jiayuan Chen <jiayuan.chen@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftest/cgroup: fix zswap test_no_invasive_cgroup_shrink on large pagesize ↵Li Wang1-21/+49
system test_no_invasive_cgroup_shrink sets up two cgroups: wb_group, which is expected to trigger zswap writeback, and a control group (renamed to zw_group), which should only have pages sitting in zswap without any writeback. There are two problems with the current test: 1) The data patterns are reversed. wb_group uses allocate_bytes(), which writes only a single byte per page — trivially compressible, especially by zstd — so compressed pages fit within zswap.max and writeback is never triggered. Meanwhile, the control group uses getrandom() to produce hard-to-compress data, but it is the group that does *not* need writeback. 2) The test uses fixed sizes (10K zswap.max, 10MB allocation) that are too small on systems with large PAGE_SIZE (e.g. 64K), failing to build enough memory pressure to trigger writeback reliably. Fix both issues by: - Swapping the data patterns: fill wb_group pages with partially random data (getrandom for page_size/4 bytes) to resist compression and trigger writeback, and fill zw_group pages with simple repeated data to stay compressed in zswap. - Making all size parameters PAGE_SIZE-aware: set allocation size to PAGE_SIZE * 1024, memory.zswap.max to PAGE_SIZE, and memory.max to allocation_size / 2 for both cgroups. - Allocating memory inline instead of via cg_run() so the pages remain resident throughout the test. === Error Log === # getconf PAGESIZE 65536 # ./test_zswap TAP version 13 ... ok 5 test_zswap_writeback_disabled ok 6 # SKIP test_no_kmem_bypass not ok 7 test_no_invasive_cgroup_shrink Link: https://lore.kernel.org/20260424040059.12940-7-li.wang@linux.dev Signed-off-by: Li Wang <li.wang@linux.dev> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Jiayuan Chen <jiayuan.chen@linux.dev> Cc: Waiman Long <longman@redhat.com> Cc: Yosry Ahmed <yosry@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
3 daysselftests/cgroup: replace hardcoded page size values in test_zswapLi Wang1-20/+25
test_zswap uses hardcoded values of 4095 and 4096 throughout as page stride and page size, which are only correct on systems with a 4K page size. On architectures with larger pages (e.g., 64K on arm64 or ppc64), these constants cause memory to be touched at sub-page granularity, leading to inefficient access patterns and incorrect page count calculations, which can cause test failures. Replace all hardcoded 4095 and 4096 values with a global pagesize variable initialized from sysconf(_SC_PAGESIZE) at startup, and remove the redundant local sysconf() calls scattered across individual functions. No functional change on 4K page size systems. Link: https://lore.kernel.org/20260424040059.12940-6-li.wang@linux.dev Signed-off-by: Li Wang <li.wang@linux.dev> Acked-by: Yosry Ahmed <yosry@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Waiman Long <longman@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>