diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2026-06-19 10:14:34 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2026-06-19 10:14:34 -0700 |
| commit | a552c81ff4a16738ca5a44a177d552eb38d552ce (patch) | |
| tree | 82800368fc5bc70e728875edb52777521f082ca8 /tools | |
| parent | c98d767b34574be82b74d77d02264a830ae1cadd (diff) | |
| parent | e3d8707358ea76b78bdec9928937bb9a797f2c8f (diff) | |
| download | ath-a552c81ff4a16738ca5a44a177d552eb38d552ce.tar.gz | |
Merge tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "selftests/mm: clean up build output and verbosity" (Li Wang)
Remove some noise from the MM selftests build
- "mm: Free contiguous order-0 pages efficiently" (Ryan Roberts)
Speed up the freeing of a batch of 0-order pages by first scanning
them for coalescing opportunities. This is applicable to vfree() and
to the releasing of frozen pages
- "mm/damon: introduce DAMOS failed region quota charge ratio"
(SeongJae Park)
Address a DAMOS usability issue: The DAMOS quota often exhausts
prematurely because it charges for all memory attempted, causing slow
and inconsistent performance when actions fail on unreclaimable
memory.
To fix this, a new feature lets users set a smaller, flexible quota
charge ratio (via a numerator and denominator) for failed regions.
Since failed actions cause less overhead, reducing their quota cost
ensures more predictable and efficient DAMOS processing
- "selftests/cgroup: improve zswap tests robustness and support large
page sizes" (Li Wang)
Fix various spurious failures and improves the overall robustness of
the cgroup zswap selftests
- "fix MAP_DROPPABLE not supported errno" (Anthony Yznaga)
Fix an issue in the mlock selftests on arm32
- "mm: huge_memory: clean up defrag sysfs with shared" (Breno Leitao)
Some maintenance work in the huge_memory code
- "treewide: fixup gfp_t printks" (Brendan Jackman)
Use the special vprintf() gfp_t conversion in various places
- "mm: Fix vmemmap optimization accounting and initialization" (Muchun
Song)
Fix several bugs in the vmemmap optimization, mainly around incorrect
page accounting and memmap initialization in the DAX and memory
hotplug paths. It also fixes pageblock migratetype initialization and
struct page initialization for ZONE_DEVICE compound pages
- "mm/damon: repost non-hotfix reviewed patches in damon/next tree"
A sprinkle of unrelated minor bugfixes for DAMON
- "mm: remove page_mapped()" (David Hildenbrand)
Remove this function from the tree, replacing it with folio_mapped()
- "mm/damon: let DAMON be paused and resumed" (SeongJae Park)
Allow DAMON to be paused and resumed without losing its current state
- "kasan: hw_tags: Disable tagging for stack and page-tables" (Muhammad
Usama Anjum)
Simplify and speed up kasan by removing its ineffective tagging of
stacks and page tables
- "mm/damon/reclaim,lru_sort: monitor all system rams by default"
(SeongJae Park)
Simplify deployment on diverse hardware like NUMA systems by updating
DAMON_RECLAIM and DAMON_LRU_SORT to automatically monitor the
physical address range covering all System RAM areas by default,
replacing the overly restrictive behavior that only targeted the
single largest memory block to save on negligible overhead
- "mm/damon/sysfs: document filters/ directory as deprecated" (SeongJae
Park)
Update some DAMON docs
- "mm: use spinlock guards for zone lock" (Dmitry Ilvokhin)
Switch zone->lock handling over to using the guard() mechanisms
- "mm/filemap: tighten mmap_miss hit accounting" (fujunjie)
Fix a flaw where the mmap_miss counter over-credited page cache hits
during fault-arounds and page-fault retries. This results in
significant reduction of redundant synchronous mmap readahead I/O,
drastically cutting down execution time and gigabytes read for sparse
random or strided memory access workloads
- "selftests/cgroup: Fix false positive failures in test_percpu_basic"
(Li Wang)
Fix a couple of false-positives in the cgroup kmem selftests
- "mm/damon/reclaim: support monitoring intervals auto-tuning"
(SeongJae Park)
Add a new parameter to DAMON permitting DAMON_RECLAIM to
automatically tune DAMON's sampling and aggregation intervals
- "mm/damon/stat: add kdamond_pid parameter" (SeongJae Park)
Change DAMON_STAT to provide the pid of its kdamond
- "mm/kmemleak: dedupe verbose scan output" (Breno Leitao)
Remove large amounts of duplicated backtraces from the verbose-mode
kmemleak output
- "mm: remove CONFIG_HAVE_BOOTMEM_INFO_NODE (Part 1)" (David
Hildenbrand)
Reduce our use of CONFIG_HAVE_BOOTMEM_INFO_NODE, with a view to
removing it entirely in a later series
- "mm/damon: validate min_region_size to be power of 2" (Liew Rui Yan)
Prevent users from passing a non-power-of-2 value of `addr_unit', as
this later results in undesirable behavior
- "mm: document read_pages and simplify usage" (Frederick Mayle)
- "tools/mm/page-types: Fix misc bugs" (Ye Liu)
Fix three issues in tools/mm/page-types.c
- "mm: misc cleanups from __GFP_UNMAPPED series" (Brendan Jackman)
Implement several cleanups in the page allocator and related code
- "mm, swap: swap table phase IV: unify allocation" (Kairui Song)
Unify the allocation and charging of anon and shmem swap in folios,
provides better synchronization, consolidates the metadata
management, hence dropping the static array and map, and improves
performance
- "mm/damon: introduce data attributes monitoring" (SeongJae Park(
Extend DAMON to monitor general data attributes other than accesses
- "mm/vmalloc: free unused pages on vrealloc() shrink" (Shivam Kalra)
Implement the TODO in vrealloc() to unmap and free unused pages when
shrinking across a page boundary
- "mm/damon: documentation and comment fixes" (niecheng)
- "remove mmap_action success, error hooks" (Lorenzo Stoakes)
Eliminate custom hooks from mmap_action by removing the problematic
success_hook which allowed drivers to improperly access uninitialized
VMAs. It replaces the error_hook with a simple error-code field and
updates the memory char driver accordingly
- "mm/damon: minor improvements for code readability and tests"
(SeongJae Park)
- "mm/damon: fix macro arguments and clarify quota goals doc" (Maksym
Shcherba)
- "userfaultfd: merge fs/userfaultfd.c into mm/userfaultfd.c" (Mike
Rapoport)
- "mm/mglru: improve reclaim loop and dirty folio" (Kairui Song and
others)
Clean up and slightly improves MGLRU's reclaim loop and dirty
writeback handling. Large performance improvements are measured
- "use vma locks for proc/pid/{smaps|numa_maps} reads" (Suren
Baghdasaryan)
Use per-vma locks when reading /proc/pid/smaps and numa_maps similar
to reduce contention on central mmap_lock
- "refactors thpsize_shmem_enabled_store() and thpsize_shmem_enabled_show()"
(Ran Xiaokai)
Some cleanup work in the THP code
- "selftests/memfd: fix compilation warnings" (Konstantin Khorenko)
Fix a few build glitches in the memfd selftest code.
- "memcg: shrink obj_stock_pcp and cache multiple objcgs" (Shakeel
Butt)
Resolve a 68% performance regression caused by NUMA-node cache
thrashing around struct obj_stock_pcp by shrinking its existing
fields and expanding it into a multi-slot array that caches up to
five obj_cgroup pointers per CPU, allowing per-node variants of the
same memcg to coexist within a single 64-byte cache line.
- "zram: writeback fixes" (Sergey Senozhatsky)
address a couple of unrelated zram writeback issues
- "mm: switch THP shrinker to list_lru" (Johannes Weiner)
Resolve NUMA-awareness issues and streamlines callsite interaction by
refactoring and extending the list_lru API to completely replace the
complex, open-coded deferred split queue for Transparent Huge Pages
- "mm: improve large folio readahead for exec memory" (Usama Arif)
Improve large-folio readahead on systems like 64K-page arm64 by
preventing the mmap_miss check from permanently disabling
target-oriented VM_EXEC readahead, and by generalizing the
force_thp_readahead gate to support mappings with any usefully large
maximum folio order under the cache cap.
- "userfaultfd/pagemap: pre-existing fixes" (Kiryl Shutsemau)
Fix a bunch of minor issues in the userfaultfd/pagemap, all of which
were flagged by Sashiko review of proposed new material
- "mm/sparse-vmemmap: Provide generic vmemmap_set_pmd() and
vmemmap_check_pmd()" (Muchun Song)
Provide generic versions of these two functions so the four
arch-specific implementations can be removed.
- "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap
device" (Youngjun Park)
Address a uswsusp-vs-swapoff race and reduces the swap device
reference taking/releasing frequency.
- "mm/hmm: A fix and a selftest" (Dev Jain)
* tag 'mm-stable-2026-06-18-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
selftests/mm/hmm-tests: test pagemap reads of PMD device-private entries
fs/proc/task_mmu: do not warn on seeing non-migration pmd entry
lib/test_hmm: check alloc_page_vma() return value and handle OOM
mm/compaction: cap compact_gap() at COMPACT_CLUSTER_MAX
mm/swap: remove redundant swap device reference in alloc/free
mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device
mm/filemap: use folio_next_index() for start
vmalloc: fix NULL pointer dereference in is_vm_area_hugepages()
sparc/mm: drop vmemmap_check_pmd helper and use generic code
loongarch/mm: drop vmemmap_check_pmd helper and use generic code
riscv/mm: drop vmemmap_pmd helpers and use generic code
arm64/mm: drop vmemmap_pmd helpers and use generic code
mm/sparse-vmemmap: provide generic vmemmap_set_pmd() and vmemmap_check_pmd()
rust: page: mark Page::nid as inline
userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks
userfaultfd: gate must_wait writability check on pte_present()
mm/huge_memory: preserve pmd_swp_uffd_wp on device-private PMD downgrade
fs/proc/task_mmu: fix hugetlb self-deadlock in pagemap_scan_pte_hole()
fs/proc/task_mmu: use huge_page_size() in pagemap_scan_hugetlb_entry()
fs/proc/task_mmu: fix make_uffd_wp_huge_pte() prot-update race
...
Diffstat (limited to 'tools')
28 files changed, 965 insertions, 301 deletions
diff --git a/tools/mm/page-types.c b/tools/mm/page-types.c index d7e5e8902af86..7fc5a8be5997f 100644 --- a/tools/mm/page-types.c +++ b/tools/mm/page-types.c @@ -997,10 +997,10 @@ static void walk_file_range(const char *name, int fd, /* turn off readahead */ if (madvise(ptr, len, MADV_RANDOM)) - fatal("madvice failed: %s", name); + fatal("madvise failed: %s", name); if (sigsetjmp(sigbus_jmp, 1)) { - end = off + sigbus_addr ? sigbus_addr - ptr : 0; + end = off + (sigbus_addr ? sigbus_addr - ptr : 0); fprintf(stderr, "got sigbus at offset %lld: %s\n", (long long)end, name); goto got_sigbus; @@ -1015,7 +1015,7 @@ got_sigbus: /* turn off harvesting reference bits */ if (madvise(ptr, len, MADV_SEQUENTIAL)) - fatal("madvice failed: %s", name); + fatal("madvise failed: %s", name); if (pagemap_read(buf, (unsigned long)ptr / page_size, nr_pages) != nr_pages) @@ -1261,7 +1261,7 @@ static const struct option opts[] = { { "no-summary", 0, NULL, 'N' }, { "hwpoison" , 0, NULL, 'X' }, { "unpoison" , 0, NULL, 'x' }, - { "kpageflags", 0, NULL, 'F' }, + { "kpageflags", 1, NULL, 'F' }, { "help" , 0, NULL, 'h' }, { NULL , 0, NULL, 0 } }; diff --git a/tools/testing/selftests/cgroup/lib/cgroup_util.c b/tools/testing/selftests/cgroup/lib/cgroup_util.c index a7b3380d88d77..2596c12cd8645 100644 --- a/tools/testing/selftests/cgroup/lib/cgroup_util.c +++ b/tools/testing/selftests/cgroup/lib/cgroup_util.c @@ -144,7 +144,7 @@ int cg_read_strcmp_wait(const char *cgroup, const char *control, int cg_read_strstr(const char *cgroup, const char *control, const char *needle) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; if (cg_read(cgroup, control, buf, sizeof(buf))) return -1; @@ -174,7 +174,7 @@ long cg_read_long_fd(int fd) long cg_read_key_long(const char *cgroup, const char *control, const char *key) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; char *ptr; if (cg_read(cgroup, control, buf, sizeof(buf))) @@ -210,7 +210,7 @@ long cg_read_key_long_poll(const char *cgroup, const char *control, long cg_read_lc(const char *cgroup, const char *control) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; const char delim[] = "\n"; char *line; long cnt = 0; @@ -262,7 +262,7 @@ int cg_write_numeric(const char *cgroup, const char *control, long value) static int cg_find_root(char *root, size_t len, const char *controller, bool *nsdelegate) { - char buf[10 * PAGE_SIZE]; + char buf[10 * BUF_SIZE]; char *fs, *mount, *type, *options; const char delim[] = "\n\t "; @@ -317,7 +317,7 @@ int cg_create(const char *cgroup) int cg_wait_for_proc_count(const char *cgroup, int count) { - char buf[10 * PAGE_SIZE] = {0}; + char buf[10 * BUF_SIZE] = {0}; int attempts; char *ptr; @@ -342,7 +342,7 @@ int cg_wait_for_proc_count(const char *cgroup, int count) int cg_killall(const char *cgroup) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; char *ptr = buf; /* If cgroup.kill exists use it. */ @@ -552,7 +552,7 @@ int cg_run_nowait(const char *cgroup, int proc_mount_contains(const char *option) { - char buf[4 * PAGE_SIZE]; + char buf[4 * BUF_SIZE]; ssize_t read; read = read_text("/proc/mounts", buf, sizeof(buf)); @@ -564,7 +564,7 @@ int proc_mount_contains(const char *option) int cgroup_feature(const char *feature) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; ssize_t read; read = read_text("/sys/kernel/cgroup/features", buf, sizeof(buf)); @@ -591,7 +591,7 @@ ssize_t proc_read_text(int pid, bool thread, const char *item, char *buf, size_t int proc_read_strstr(int pid, bool thread, const char *item, const char *needle) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; if (proc_read_text(pid, thread, item, buf, sizeof(buf)) < 0) return -1; diff --git a/tools/testing/selftests/cgroup/lib/include/cgroup_util.h b/tools/testing/selftests/cgroup/lib/include/cgroup_util.h index 567b1082974c5..febc1723d0903 100644 --- a/tools/testing/selftests/cgroup/lib/include/cgroup_util.h +++ b/tools/testing/selftests/cgroup/lib/include/cgroup_util.h @@ -2,8 +2,8 @@ #include <stdbool.h> #include <stdlib.h> -#ifndef PAGE_SIZE -#define PAGE_SIZE 4096 +#ifndef BUF_SIZE +#define BUF_SIZE 4096 #endif #define MB(x) (x << 20) diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c index 7b83c7e7c9d4f..88ca832d4fc13 100644 --- a/tools/testing/selftests/cgroup/test_core.c +++ b/tools/testing/selftests/cgroup/test_core.c @@ -87,7 +87,7 @@ static int test_cgcore_destroy(const char *root) int ret = KSFT_FAIL; char *cg_test = NULL; int child_pid; - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; cg_test = cg_name(root, "cg_test"); diff --git a/tools/testing/selftests/cgroup/test_freezer.c b/tools/testing/selftests/cgroup/test_freezer.c index ead68542d45e9..0569e93fa6b00 100644 --- a/tools/testing/selftests/cgroup/test_freezer.c +++ b/tools/testing/selftests/cgroup/test_freezer.c @@ -642,7 +642,7 @@ cleanup: */ static int proc_check_stopped(int pid) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; int len; len = proc_read_text(pid, 0, "stat", buf, sizeof(buf)); diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c index 12f59925500bd..1db0ba1226b9b 100644 --- a/tools/testing/selftests/cgroup/test_kmem.c +++ b/tools/testing/selftests/cgroup/test_kmem.c @@ -24,7 +24,7 @@ * the maximum discrepancy between charge and vmstat entries is number * of cpus multiplied by 64 pages. */ -#define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs()) +#define MAX_VMSTAT_ERROR (sysconf(_SC_PAGESIZE) * 64 * get_nprocs()) #define KMEM_DEAD_WAIT_RETRIES 80 @@ -353,7 +353,7 @@ static int test_percpu_basic(const char *root) { int ret = KSFT_FAIL; char *parent, *child; - long current, percpu; + long current, percpu, slab; int i; parent = cg_name(root, "percpu_basic_test"); @@ -383,13 +383,14 @@ static int test_percpu_basic(const char *root) current = cg_read_long(parent, "memory.current"); percpu = cg_read_key_long(parent, "memory.stat", "percpu "); + slab = cg_read_key_long(parent, "memory.stat", "slab "); - if (current > 0 && percpu > 0 && labs(current - percpu) < - MAX_VMSTAT_ERROR) + if (current > 0 && percpu > 0 && slab >= 0 && + labs(current - (percpu + slab)) < MAX_VMSTAT_ERROR) ret = KSFT_PASS; else - printf("memory.current %ld\npercpu %ld\n", - current, percpu); + printf("memory.current %ld\npercpu %ld\nslab %ld\ndelta %ld\n", + current, percpu, slab, current - (percpu + slab)); cleanup_children: for (i = 0; i < 1000; i++) { diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 21aedb35cc122..0ebf796f3cffe 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -26,6 +26,7 @@ static bool has_localevents; static bool has_recursiveprot; +static int page_size; int get_temp_fd(void) { @@ -34,7 +35,7 @@ int get_temp_fd(void) int alloc_pagecache(int fd, size_t size) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; struct stat st; int i; @@ -65,7 +66,7 @@ static char *alloc_and_populate_anon(size_t size) return NULL; } - for (ptr = buf; ptr < buf + size; ptr += PAGE_SIZE) + for (ptr = buf; ptr < buf + size; ptr += page_size) *ptr = 0; return buf; @@ -86,7 +87,7 @@ int alloc_anon(const char *cgroup, void *arg) int is_swap_enabled(void) { - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; const char delim[] = "\n"; int cnt = 0; char *line; @@ -129,7 +130,7 @@ static int test_memcg_subtree_control(const char *root) { char *parent, *child, *parent2 = NULL, *child2 = NULL; int ret = KSFT_FAIL; - char buf[PAGE_SIZE]; + char buf[BUF_SIZE]; /* Create two nested cgroups with the memory controller enabled */ parent = cg_name(root, "memcg_test_0"); @@ -1792,6 +1793,10 @@ int main(int argc, char **argv) char root[PATH_MAX]; int i, proc_status; + page_size = sysconf(_SC_PAGE_SIZE); + if (page_size <= 0) + page_size = BUF_SIZE; + ksft_print_header(); ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c index a7bdcdd09d627..49b36ee791606 100644 --- a/tools/testing/selftests/cgroup/test_zswap.c +++ b/tools/testing/selftests/cgroup/test_zswap.c @@ -11,10 +11,16 @@ #include <string.h> #include <sys/wait.h> #include <sys/mman.h> +#include <sys/random.h> #include "kselftest.h" #include "cgroup_util.h" +static int page_size; + +#define PATH_ZSWAP "/sys/module/zswap" +#define PATH_ZSWAP_ENABLED "/sys/module/zswap/parameters/enabled" + static int read_int(const char *path, size_t *value) { FILE *file; @@ -70,11 +76,11 @@ static int allocate_and_read_bytes(const char *cgroup, void *arg) if (!mem) return -1; - for (int i = 0; i < size; i += 4095) + for (int i = 0; i < size; i += page_size) mem[i] = 'a'; /* Go through the allocated memory to (z)swap in and out pages */ - for (int i = 0; i < size; i += 4095) { + for (int i = 0; i < size; i += page_size) { if (mem[i] != 'a') ret = -1; } @@ -90,7 +96,7 @@ static int allocate_bytes(const char *cgroup, void *arg) if (!mem) return -1; - for (int i = 0; i < size; i += 4095) + for (int i = 0; i < size; i += page_size) mem[i] = 'a'; free(mem); return 0; @@ -115,6 +121,27 @@ fail: } /* + * Writeback is asynchronous; poll until at least one writeback has + * been recorded for @cg, or until @timeout_ms has elapsed. + */ +static long wait_for_writeback(const char *cg, int timeout_ms) +{ + long elapsed, count; + for (elapsed = 0; elapsed < timeout_ms; elapsed += 100) { + count = get_cg_wb_count(cg); + + if (count < 0) + return -1; + if (count > 0) + return count; + + usleep(100000); + } + + return 0; +} + +/* * Sanity test to check that pages are written into zswap. */ static int test_zswap_usage(const char *root) @@ -162,21 +189,25 @@ out: static int test_swapin_nozswap(const char *root) { int ret = KSFT_FAIL; - char *test_group; - long swap_peak, zswpout; + char *test_group, mem_max_buf[32]; + long swap_peak, zswpout, min_swap; + size_t allocation_size = page_size * 512; + + min_swap = allocation_size / 4; + snprintf(mem_max_buf, sizeof(mem_max_buf), "%zu", allocation_size * 3/4); test_group = cg_name(root, "no_zswap_test"); if (!test_group) goto out; if (cg_create(test_group)) goto out; - if (cg_write(test_group, "memory.max", "8M")) + if (cg_write(test_group, "memory.max", mem_max_buf)) goto out; if (cg_write(test_group, "memory.zswap.max", "0")) goto out; /* Allocate and read more than memory.max to trigger swapin */ - if (cg_run(test_group, allocate_and_read_bytes, (void *)MB(32))) + if (cg_run(test_group, allocate_and_read_bytes, (void *)allocation_size)) goto out; /* Verify that pages are swapped out, but no zswap happened */ @@ -186,8 +217,9 @@ static int test_swapin_nozswap(const char *root) goto out; } - if (swap_peak < MB(24)) { - ksft_print_msg("at least 24MB of memory should be swapped out\n"); + if (swap_peak < min_swap) { + ksft_print_msg("at least %ldKB of memory should be swapped out\n", + min_swap / 1024); goto out; } @@ -237,7 +269,7 @@ static int test_zswapin(const char *root) goto out; } - if (zswpin < MB(24) / PAGE_SIZE) { + if (zswpin < MB(24) / page_size) { ksft_print_msg("at least 24MB should be brought back from zswap\n"); goto out; } @@ -257,16 +289,15 @@ out: This will move it into zswap. * 3. Save current zswap usage. * 4. Move the memory allocated in step 1 back in from zswap. - * 5. Set zswap.max to half the amount that was recorded in step 3. + * 5. Set zswap.max to 1/4 of the amount that was recorded in step 3. * 6. Attempt to reclaim memory equal to the amount that was allocated, this will either trigger writeback if it's enabled, or reclamation will fail if writeback is disabled as there isn't enough zswap space. */ static int attempt_writeback(const char *cgroup, void *arg) { - long pagesize = sysconf(_SC_PAGESIZE); - size_t memsize = MB(4); - char buf[pagesize]; + size_t memsize = page_size * 1024; + char buf[page_size]; long zswap_usage; bool wb_enabled = *(bool *) arg; int ret = -1; @@ -281,11 +312,11 @@ static int attempt_writeback(const char *cgroup, void *arg) * half empty, this will result in data that is still compressible * and ends up in zswap, with material zswap usage. */ - for (int i = 0; i < pagesize; i++) - buf[i] = i < pagesize/2 ? (char) i : 0; + for (int i = 0; i < page_size; i++) + buf[i] = i < page_size/2 ? (char) i : 0; - for (int i = 0; i < memsize; i += pagesize) - memcpy(&mem[i], buf, pagesize); + for (int i = 0; i < memsize; i += page_size) + memcpy(&mem[i], buf, page_size); /* Try and reclaim allocated memory */ if (cg_write_numeric(cgroup, "memory.reclaim", memsize)) { @@ -296,19 +327,19 @@ static int attempt_writeback(const char *cgroup, void *arg) zswap_usage = cg_read_long(cgroup, "memory.zswap.current"); /* zswpin */ - for (int i = 0; i < memsize; i += pagesize) { - if (memcmp(&mem[i], buf, pagesize)) { + for (int i = 0; i < memsize; i += page_size) { + if (memcmp(&mem[i], buf, page_size)) { ksft_print_msg("invalid memory\n"); goto out; } } - if (cg_write_numeric(cgroup, "memory.zswap.max", zswap_usage/2)) + if (cg_write_numeric(cgroup, "memory.zswap.max", zswap_usage/4)) goto out; /* * If writeback is enabled, trying to reclaim memory now will trigger a - * writeback as zswap.max is half of what was needed when reclaim ran the first time. + * writeback as zswap.max is 1/4 of what was needed when reclaim ran the first time. * If writeback is disabled, memory reclaim will fail as zswap is limited and * it can't writeback to swap. */ @@ -335,7 +366,10 @@ static int test_zswap_writeback_one(const char *cgroup, bool wb) return -1; /* Verify that zswap writeback occurred only if writeback was enabled */ - zswpwb_after = get_cg_wb_count(cgroup); + if (wb) + zswpwb_after = wait_for_writeback(cgroup, 5000); + else + zswpwb_after = get_cg_wb_count(cgroup); if (zswpwb_after < 0) return -1; @@ -417,44 +451,71 @@ static int test_zswap_writeback_disabled(const char *root) static int test_no_invasive_cgroup_shrink(const char *root) { int ret = KSFT_FAIL; - size_t control_allocation_size = MB(10); - char *control_allocation = NULL, *wb_group = NULL, *control_group = NULL; + unsigned int off; + size_t allocation_size = page_size * 1024; + unsigned int nr_pages = allocation_size / page_size; + char zswap_max_buf[32], mem_max_buf[32]; + char *zw_allocation = NULL, *wb_allocation = NULL; + char *zw_group = NULL, *wb_group = NULL; + + snprintf(zswap_max_buf, sizeof(zswap_max_buf), "%d", page_size); + snprintf(mem_max_buf, sizeof(mem_max_buf), "%zu", allocation_size / 2); wb_group = setup_test_group_1M(root, "per_memcg_wb_test1"); if (!wb_group) return KSFT_FAIL; - if (cg_write(wb_group, "memory.zswap.max", "10K")) + if (cg_write(wb_group, "memory.zswap.max", zswap_max_buf)) + goto out; + if (cg_write(wb_group, "memory.max", mem_max_buf)) + goto out; + + zw_group = setup_test_group_1M(root, "per_memcg_wb_test2"); + if (!zw_group) goto out; - control_group = setup_test_group_1M(root, "per_memcg_wb_test2"); - if (!control_group) + if (cg_write(zw_group, "memory.max", mem_max_buf)) goto out; - /* Push some test_group2 memory into zswap */ - if (cg_enter_current(control_group)) + /* Push some zw_group memory into zswap (simple data, easy to compress) */ + if (cg_enter_current(zw_group)) goto out; - control_allocation = malloc(control_allocation_size); - for (int i = 0; i < control_allocation_size; i += 4095) - control_allocation[i] = 'a'; - if (cg_read_key_long(control_group, "memory.stat", "zswapped") < 1) + zw_allocation = malloc(allocation_size); + for (int i = 0; i < nr_pages; i++) { + off = (unsigned long)i * page_size; + memset(&zw_allocation[off], 0, page_size); + memset(&zw_allocation[off], 'a', page_size/4); + } + if (cg_read_key_long(zw_group, "memory.stat", "zswapped") < 1) goto out; - /* Allocate 10x memory.max to push wb_group memory into zswap and trigger wb */ - if (cg_run(wb_group, allocate_bytes, (void *)MB(10))) + /* Push wb_group memory into zswap with hard-to-compress data to trigger wb */ + if (cg_enter_current(wb_group)) goto out; + wb_allocation = malloc(allocation_size); + if (!wb_allocation) + goto out; + for (int i = 0; i < nr_pages; i++) { + off = (unsigned long)i * page_size; + memset(&wb_allocation[off], 0, page_size); + getrandom(&wb_allocation[off], page_size/4, 0); + } /* Verify that only zswapped memory from gwb_group has been written back */ - if (get_cg_wb_count(wb_group) > 0 && get_cg_wb_count(control_group) == 0) + if (wait_for_writeback(wb_group, 5000) > 0 && get_cg_wb_count(zw_group) == 0) ret = KSFT_PASS; out: cg_enter_current(root); - if (control_group) { - cg_destroy(control_group); - free(control_group); + if (zw_group) { + cg_destroy(zw_group); + free(zw_group); } - cg_destroy(wb_group); - free(wb_group); - if (control_allocation) - free(control_allocation); + if (wb_group) { + cg_destroy(wb_group); + free(wb_group); + } + if (zw_allocation) + free(zw_allocation); + if (wb_allocation) + free(wb_allocation); return ret; } @@ -473,7 +534,7 @@ static int no_kmem_bypass_child(const char *cgroup, void *arg) values->child_allocated = true; return -1; } - for (long i = 0; i < values->target_alloc_bytes; i += 4095) + for (long i = 0; i < values->target_alloc_bytes; i += page_size) ((char *)allocation)[i] = 'a'; values->child_allocated = true; pause(); @@ -521,7 +582,7 @@ static int test_no_kmem_bypass(const char *root) min_free_kb_low = sys_info.totalram / 500000; values->target_alloc_bytes = (sys_info.totalram - min_free_kb_high * 1000) + sys_info.totalram * 5 / 100; - stored_pages_threshold = sys_info.totalram / 5 / 4096; + stored_pages_threshold = sys_info.totalram / 5 / page_size; trigger_allocation_size = sys_info.totalram / 20; /* Set up test memcg */ @@ -548,7 +609,7 @@ static int test_no_kmem_bypass(const char *root) if (!trigger_allocation) break; - for (int i = 0; i < trigger_allocation_size; i += 4095) + for (int i = 0; i < trigger_allocation_size; i += page_size) trigger_allocation[i] = 'b'; usleep(100000); free(trigger_allocation); @@ -559,8 +620,8 @@ static int test_no_kmem_bypass(const char *root) /* If memory was pushed to zswap, verify it belongs to memcg */ if (stored_pages > stored_pages_threshold) { int zswapped = cg_read_key_long(test_group, "memory.stat", "zswapped "); - int delta = stored_pages * 4096 - zswapped; - int result_ok = delta < stored_pages * 4096 / 4; + int delta = stored_pages * page_size - zswapped; + int result_ok = delta < stored_pages * page_size / 4; ret = result_ok ? KSFT_PASS : KSFT_FAIL; break; @@ -614,7 +675,7 @@ static int allocate_random_and_wait(const char *cgroup, void *arg) close(fd); /* Touch all pages to ensure they're faulted in */ - for (size_t i = 0; i < size; i += PAGE_SIZE) + for (size_t i = 0; i < size; i += page_size) mem[i] = mem[i]; /* Use MADV_PAGEOUT to push pages into zswap */ @@ -725,9 +786,18 @@ struct zswap_test { }; #undef T -static bool zswap_configured(void) +static void check_zswap_enabled(void) { - return access("/sys/module/zswap", F_OK) == 0; + char value[2]; + + if (access(PATH_ZSWAP, F_OK)) + ksft_exit_skip("zswap isn't configured\n"); + + if (read_text(PATH_ZSWAP_ENABLED, value, sizeof(value)) <= 0) + ksft_exit_fail_msg("Failed to read " PATH_ZSWAP_ENABLED "\n"); + + if (value[0] == 'N') + ksft_exit_skip("zswap is disabled (hint: echo 1 > " PATH_ZSWAP_ENABLED ")\n"); } int main(int argc, char **argv) @@ -735,13 +805,16 @@ int main(int argc, char **argv) char root[PATH_MAX]; int i; + page_size = sysconf(_SC_PAGE_SIZE); + if (page_size <= 0) + page_size = BUF_SIZE; + ksft_print_header(); ksft_set_plan(ARRAY_SIZE(tests)); if (cg_find_unified_root(root, sizeof(root), NULL)) ksft_exit_skip("cgroup v2 isn't mounted\n"); - if (!zswap_configured()) - ksft_exit_skip("zswap isn't configured\n"); + check_zswap_enabled(); /* * Check that memory controller is available: diff --git a/tools/testing/selftests/damon/_damon_sysfs.py b/tools/testing/selftests/damon/_damon_sysfs.py index 2b4df655d9fd0..8b12cc0484405 100644 --- a/tools/testing/selftests/damon/_damon_sysfs.py +++ b/tools/testing/selftests/damon/_damon_sysfs.py @@ -132,14 +132,17 @@ class DamosQuota: goals = None # quota goals goal_tuner = None # quota goal tuner reset_interval_ms = None # quota reset interval + fail_charge_num = None + fail_charge_denom = None weight_sz_permil = None weight_nr_accesses_permil = None weight_age_permil = None scheme = None # owner scheme def __init__(self, sz=0, ms=0, goals=None, goal_tuner='consist', - reset_interval_ms=0, weight_sz_permil=0, - weight_nr_accesses_permil=0, weight_age_permil=0): + reset_interval_ms=0, fail_charge_num=0, fail_charge_denom=0, + weight_sz_permil=0, weight_nr_accesses_permil=0, + weight_age_permil=0): self.sz = sz self.ms = ms self.reset_interval_ms = reset_interval_ms @@ -151,6 +154,8 @@ class DamosQuota: for idx, goal in enumerate(self.goals): goal.idx = idx goal.quota = self + self.fail_charge_num = fail_charge_num + self.fail_charge_denom = fail_charge_denom def sysfs_dir(self): return os.path.join(self.scheme.sysfs_dir(), 'quotas') @@ -197,6 +202,18 @@ class DamosQuota: os.path.join(self.sysfs_dir(), 'goal_tuner'), self.goal_tuner) if err is not None: return err + + err = write_file( + os.path.join(self.sysfs_dir(), 'fail_charge_num'), + self.fail_charge_num) + if err is not None: + return err + err = write_file( + os.path.join(self.sysfs_dir(), 'fail_charge_denom'), + self.fail_charge_denom) + if err is not None: + return err + return None class DamosWatermarks: @@ -604,10 +621,11 @@ class DamonCtx: targets = None schemes = None kdamond = None + pause = None idx = None def __init__(self, ops='paddr', monitoring_attrs=DamonAttrs(), targets=[], - schemes=[]): + schemes=[], pause=False): self.ops = ops self.monitoring_attrs = monitoring_attrs self.monitoring_attrs.context = self @@ -622,6 +640,8 @@ class DamonCtx: scheme.idx = idx scheme.context = self + self.pause=pause + def sysfs_dir(self): return os.path.join(self.kdamond.sysfs_dir(), 'contexts', '%d' % self.idx) @@ -662,6 +682,11 @@ class DamonCtx: err = scheme.stage() if err is not None: return err + + err = write_file(os.path.join(self.sysfs_dir(), 'pause'), self.pause) + if err is not None: + return err + return None class Kdamond: diff --git a/tools/testing/selftests/damon/drgn_dump_damon_status.py b/tools/testing/selftests/damon/drgn_dump_damon_status.py index af99b07a4f565..972948e6215f1 100755 --- a/tools/testing/selftests/damon/drgn_dump_damon_status.py +++ b/tools/testing/selftests/damon/drgn_dump_damon_status.py @@ -112,6 +112,8 @@ def damos_quota_to_dict(quota): ['goals', damos_quota_goals_to_list], ['goal_tuner', int], ['esz', int], + ['fail_charge_num', int], + ['fail_charge_denom', int], ['weight_sz', int], ['weight_nr_accesses', int], ['weight_age', int], @@ -200,6 +202,7 @@ def damon_ctx_to_dict(ctx): ['attrs', attrs_to_dict], ['adaptive_targets', targets_to_list], ['schemes', schemes_to_list], + ['pause', bool], ]) def main(): diff --git a/tools/testing/selftests/damon/sysfs.py b/tools/testing/selftests/damon/sysfs.py index 3aa5c91548a53..aa03a1187489f 100755 --- a/tools/testing/selftests/damon/sysfs.py +++ b/tools/testing/selftests/damon/sysfs.py @@ -24,9 +24,12 @@ def dump_damon_status_dict(pid): except Exception as e: return None, 'json.load fail (%s)' % e +kdamonds = None def fail(expectation, status): print('unexpected %s' % expectation) print(json.dumps(status, indent=4)) + if kdamonds is not None: + kdamonds.stop() exit(1) def assert_true(condition, expectation, status): @@ -73,6 +76,10 @@ def assert_quota_committed(quota, dump): } assert_true(dump['goal_tuner'] == tuner_val[quota.goal_tuner], 'goal_tuner', dump) + assert_true(dump['fail_charge_num'] == quota.fail_charge_num, + 'fail_charge_num', dump) + assert_true(dump['fail_charge_denom'] == quota.fail_charge_denom, + 'fail_charge_denom', dump) assert_true(dump['weight_sz'] == quota.weight_sz_permil, 'weight_sz', dump) assert_true(dump['weight_nr_accesses'] == quota.weight_nr_accesses_permil, 'weight_nr_accesses', dump) @@ -123,11 +130,12 @@ def assert_scheme_committed(scheme, dump): 'pageout': 2, 'hugepage': 3, 'nohugeapge': 4, - 'lru_prio': 5, - 'lru_deprio': 6, - 'migrate_hot': 7, - 'migrate_cold': 8, - 'stat': 9, + 'collapse': 5, + 'lru_prio': 6, + 'lru_deprio': 7, + 'migrate_hot': 8, + 'migrate_cold': 9, + 'stat': 10, } assert_true(dump['action'] == action_val[scheme.action], 'action', dump) assert_true(dump['apply_interval_us'] == scheme. apply_interval_us, @@ -190,21 +198,60 @@ def assert_ctx_committed(ctx, dump): assert_monitoring_attrs_committed(ctx.monitoring_attrs, dump['attrs']) assert_monitoring_targets_committed(ctx.targets, dump['adaptive_targets']) assert_schemes_committed(ctx.schemes, dump['schemes']) + assert_true(dump['pause'] == ctx.pause, 'pause', dump) def assert_ctxs_committed(kdamonds): + ctxs_paused_for_dump = [] + kdamonds_paused_for_dump = [] + # pause for safe state dumping + for kd in kdamonds.kdamonds: + for ctx in kd.contexts: + if ctx.pause is False: + ctx.pause = True + ctxs_paused_for_dump.append(ctx) + if not kd in kdamonds_paused_for_dump: + kdamonds_paused_for_dump.append(kd) + if kd in kdamonds_paused_for_dump: + err = kd.commit() + if err is not None: + print('pause fail (%s)' % err) + kdamonds.stop() + exit(1) + status, err = dump_damon_status_dict(kdamonds.kdamonds[0].pid) if err is not None: print(err) kdamonds.stop() exit(1) + # resume contexts paused for safe state dumping + for ctx in ctxs_paused_for_dump: + ctx.pause = False + for kd in kdamonds_paused_for_dump: + err = kd.commit() + if err is not None: + print('resume fail (%s)' % err) + kdamonds.stop() + exit(1) + + # restore for comparison + for ctx in ctxs_paused_for_dump: + ctx.pause = True + ctxs = kdamonds.kdamonds[0].contexts dump = status['contexts'] assert_true(len(ctxs) == len(dump), 'ctxs length', dump) for idx, ctx in enumerate(ctxs): assert_ctx_committed(ctx, dump[idx]) + # restore for the caller + for kd in kdamonds.kdamonds: + for ctx in kd.contexts: + if ctx in ctxs_paused_for_dump: + ctx.pause = False + def main(): + global kdamonds kdamonds = _damon_sysfs.Kdamonds( [_damon_sysfs.Kdamond( contexts=[_damon_sysfs.DamonCtx( @@ -239,6 +286,8 @@ def main(): nid=1)], goal_tuner='temporal', reset_interval_ms=1500, + fail_charge_num=1, + fail_charge_denom=4096, weight_sz_permil=20, weight_nr_accesses_permil=200, weight_age_permil=1000), @@ -301,6 +350,7 @@ def main(): print('kdamond start failed: %s' % err) exit(1) kdamonds.kdamonds[0].contexts[0].targets[1].obsolete = True + kdamonds.kdamonds[0].contexts[0].pause = True kdamonds.kdamonds[0].commit() del kdamonds.kdamonds[0].contexts[0].targets[1] assert_ctxs_committed(kdamonds) diff --git a/tools/testing/selftests/damon/sysfs.sh b/tools/testing/selftests/damon/sysfs.sh index 83e3b7f63d81c..78f4badb5bebb 100755 --- a/tools/testing/selftests/damon/sysfs.sh +++ b/tools/testing/selftests/damon/sysfs.sh @@ -282,6 +282,17 @@ test_targets() ensure_dir "$targets_dir/1" "not_exist" } + +test_intervals_goal() +{ + goal_dir=$1 + ensure_dir "$goal_dir" "exist" + ensure_file "$goal_dir/access_bp" "exist" "600" + ensure_file "$goal_dir/aggrs" "exist" "600" + ensure_file "$goal_dir/min_sample_us" "exist" "600" + ensure_file "$goal_dir/max_sample_us" "exist" "600" +} + test_intervals() { intervals_dir=$1 @@ -289,6 +300,54 @@ test_intervals() ensure_file "$intervals_dir/aggr_us" "exist" "600" ensure_file "$intervals_dir/sample_us" "exist" "600" ensure_file "$intervals_dir/update_us" "exist" "600" + test_intervals_goal "$intervals_dir/intervals_goal" +} + +test_damon_filter() +{ + damon_filter_dir=$1 + ensure_file "$damon_filter_dir/type" "exist" "600" + ensure_write_succ "$damon_filter_dir/type" "anon" "valid input" + ensure_write_fail "$damon_filter_dir/type" "foo" "invalid input" + ensure_file "$damon_filter_dir/matching" "exist" "600" + ensure_file "$damon_filter_dir/allow" "exist" "600" +} + +test_damon_filters() +{ + filters_dir=$1 + ensure_dir "$filters_dir" "exist" + ensure_file "$filters_dir/nr_filters" "exist" "600" + ensure_write_succ "$filters_dir/nr_filters" "1" "valid input" + test_damon_filter "$filters_dir/0" + + ensure_write_succ "$filters_dir/nr_filters" "2" "valid input" + test_damon_filter "$filters_dir/0" + test_damon_filter "$filters_dir/1" + + ensure_write_succ "$filters_dir/nr_filters" "0" "valid input" + ensure_dir "$filters_dir/0" "not_exist" + ensure_dir "$filters_dir/1" "not_exist" +} + +test_probe() +{ + probe_dir=$1 + ensure_dir "$probe_dir" "exist" + test_damon_filters "$probe_dir/filters" +} + +test_probes() +{ + probes_dir=$1 + ensure_dir "$probes_dir" "exist" + ensure_file "$probes_dir/nr_probes" "exist" "600" + + ensure_write_succ "$probes_dir/nr_probes" "1" "valid input" + test_probe "$probes_dir/0" + + ensure_write_succ "$probes_dir/nr_probes" "0" "valid input" + ensure_dir "$probes_dir/0" "not_exist" } test_monitoring_attrs() @@ -296,6 +355,7 @@ test_monitoring_attrs() monitoring_attrs_dir=$1 ensure_dir "$monitoring_attrs_dir" "exist" test_intervals "$monitoring_attrs_dir/intervals" + test_probes "$monitoring_attrs_dir/probes" test_range "$monitoring_attrs_dir/nr_regions" } @@ -305,6 +365,8 @@ test_context() ensure_dir "$context_dir" "exist" ensure_file "$context_dir/avail_operations" "exit" 400 ensure_file "$context_dir/operations" "exist" 600 + ensure_file "$context_dir/addr_unit" "exist" 600 + ensure_file "$context_dir/pause" "exist" 600 test_monitoring_attrs "$context_dir/monitoring_attrs" test_targets "$context_dir/targets" test_schemes "$context_dir/schemes" diff --git a/tools/testing/selftests/memfd/fuse_test.c b/tools/testing/selftests/memfd/fuse_test.c index dbc171a3806db..510056c1b0d07 100644 --- a/tools/testing/selftests/memfd/fuse_test.c +++ b/tools/testing/selftests/memfd/fuse_test.c @@ -162,7 +162,7 @@ static void *global_p = NULL; static int sealing_thread_fn(void *arg) { - int sig, r; + int r; /* * This thread first waits 200ms so any pending operation in the parent diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c index 2ca07ea7202a5..cdab3a8376244 100644 --- a/tools/testing/selftests/memfd/memfd_test.c +++ b/tools/testing/selftests/memfd/memfd_test.c @@ -688,9 +688,9 @@ static void mfd_assert_grow_write(int fd) if (hugetlbfs_test) return; - buf = malloc(mfd_def_size * 8); + buf = calloc(1, mfd_def_size * 8); if (!buf) { - printf("malloc(%zu) failed: %m\n", mfd_def_size * 8); + printf("calloc(1, %zu) failed: %m\n", mfd_def_size * 8); abort(); } diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index cd24596cdd27e..41053fdaad88d 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -151,6 +151,7 @@ TEST_PROGS += ksft_gup_test.sh TEST_PROGS += ksft_hmm.sh TEST_PROGS += ksft_hugetlb.sh TEST_PROGS += ksft_hugevm.sh +TEST_PROGS += ksft_kmemleak_dedup.sh TEST_PROGS += ksft_ksm.sh TEST_PROGS += ksft_ksm_numa.sh TEST_PROGS += ksft_madv_guard.sh @@ -216,7 +217,8 @@ ifeq ($(CAN_BUILD_I386),1) $(BINARIES_32): CFLAGS += -m32 -mxsave $(BINARIES_32): LDLIBS += -lrt -ldl -lm $(BINARIES_32): $(OUTPUT)/%_32: %.c - $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ + $(call msg,CC,,$@) + $(Q)$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ $(foreach t,$(VMTARGETS),$(eval $(call gen-target-rule-32,$(t)))) endif @@ -224,7 +226,8 @@ ifeq ($(CAN_BUILD_X86_64),1) $(BINARIES_64): CFLAGS += -m64 -mxsave $(BINARIES_64): LDLIBS += -lrt -ldl $(BINARIES_64): $(OUTPUT)/%_64: %.c - $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ + $(call msg,CC,,$@) + $(Q)$(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ $(foreach t,$(VMTARGETS),$(eval $(call gen-target-rule-64,$(t)))) endif @@ -261,7 +264,8 @@ $(OUTPUT)/migration: LDLIBS += -lnuma $(OUTPUT)/rmap: LDLIBS += -lnuma local_config.mk local_config.h: check_config.sh - CC="$(CC)" CFLAGS="$(CFLAGS)" ./check_config.sh + $(call msg,CHK,config,$@) + $(Q)CC="$(CC)" CFLAGS="$(CFLAGS)" ./check_config.sh EXTRA_CLEAN += local_config.mk local_config.h diff --git a/tools/testing/selftests/mm/check_config.sh b/tools/testing/selftests/mm/check_config.sh index b84c82bbf8752..32beaefe279e5 100755 --- a/tools/testing/selftests/mm/check_config.sh +++ b/tools/testing/selftests/mm/check_config.sh @@ -16,7 +16,7 @@ echo "#include <sys/types.h>" > $tmpfile_c echo "#include <liburing.h>" >> $tmpfile_c echo "int func(void) { return 0; }" >> $tmpfile_c -$CC $CFLAGS -c $tmpfile_c -o $tmpfile_o +$CC $CFLAGS -c $tmpfile_c -o $tmpfile_o >/dev/null 2>&1 if [ -f $tmpfile_o ]; then echo "#define LOCAL_CONFIG_HAVE_LIBURING 1" > $OUTPUT_H_FILE diff --git a/tools/testing/selftests/mm/droppable.c b/tools/testing/selftests/mm/droppable.c index 44940f75c461d..30c8be37fcb9d 100644 --- a/tools/testing/selftests/mm/droppable.c +++ b/tools/testing/selftests/mm/droppable.c @@ -26,7 +26,14 @@ int main(int argc, char *argv[]) ksft_set_plan(1); alloc = mmap(0, alloc_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_DROPPABLE, -1, 0); - assert(alloc != MAP_FAILED); + if (alloc == MAP_FAILED) { + if ((errno == EOPNOTSUPP) || (errno == EINVAL)) { + ksft_test_result_skip("MAP_DROPPABLE not supported\n"); + exit(KSFT_SKIP); + } + ksft_test_result_fail("mmap error: %s\n", strerror(errno)); + exit(KSFT_FAIL); + } memset(alloc, 'A', alloc_size); for (size_t i = 0; i < alloc_size; i += page_size) assert(*(uint8_t *)(alloc + i)); diff --git a/tools/testing/selftests/mm/hmm-tests.c b/tools/testing/selftests/mm/hmm-tests.c index 77fb4c5d871bb..6a23c09ac2da5 100644 --- a/tools/testing/selftests/mm/hmm-tests.c +++ b/tools/testing/selftests/mm/hmm-tests.c @@ -2274,8 +2274,11 @@ TEST_F(hmm, migrate_anon_huge_fault) unsigned long npages; unsigned long size; unsigned long i; + unsigned char *m; + uint64_t entry; void *old_ptr; void *map; + int pagemap_fd; int *ptr; int ret; @@ -2298,8 +2301,6 @@ TEST_F(hmm, migrate_anon_huge_fault) npages = size >> self->page_shift; map = (void *)ALIGN((uintptr_t)buffer->ptr, size); - ret = madvise(map, size, MADV_HUGEPAGE); - ASSERT_EQ(ret, 0); old_ptr = buffer->ptr; buffer->ptr = map; @@ -2307,6 +2308,9 @@ TEST_F(hmm, migrate_anon_huge_fault) for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) ptr[i] = i; + ret = madvise(map, size, MADV_COLLAPSE); + ASSERT_EQ(ret, 0); + /* Migrate memory to device. */ ret = hmm_migrate_sys_to_dev(self->fd, buffer, npages); ASSERT_EQ(ret, 0); @@ -2316,6 +2320,32 @@ TEST_F(hmm, migrate_anon_huge_fault) for (i = 0, ptr = buffer->mirror; i < size / sizeof(*ptr); ++i) ASSERT_EQ(ptr[i], i); + if (!hmm_is_coherent_type(variant->device_number)) { + ret = hmm_dmirror_cmd(self->fd, HMM_DMIRROR_SNAPSHOT, + buffer, npages); + ASSERT_EQ(ret, 0); + ASSERT_EQ(buffer->cpages, npages); + + m = buffer->mirror; + for (i = 0; i < npages; ++i) + ASSERT_EQ(m[i], HMM_DMIRROR_PROT_DEV_PRIVATE_LOCAL | + HMM_DMIRROR_PROT_WRITE | + HMM_DMIRROR_PROT_PMD); + + pagemap_fd = open("/proc/self/pagemap", O_RDONLY); + ASSERT_GE(pagemap_fd, 0); + + for (i = 0; i < npages; ++i) { + entry = pagemap_get_entry(pagemap_fd, + (char *)buffer->ptr + i * self->page_size); + + ASSERT_NE(entry & PM_SWAP, 0); + ASSERT_FALSE(PAGEMAP_PRESENT(entry)); + } + + close(pagemap_fd); + } + /* Fault pages back to system memory and check them. */ for (i = 0, ptr = buffer->ptr; i < size / sizeof(*ptr); ++i) ASSERT_EQ(ptr[i], i); @@ -2738,7 +2768,7 @@ static inline int run_migration_benchmark(int fd, int use_thp, size_t buffer_siz buffer->ptr = mmap(NULL, buffer_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); - if (!buffer->ptr) + if (buffer->ptr == MAP_FAILED) return -1; /* Apply THP hint if requested */ diff --git a/tools/testing/selftests/mm/khugepaged.c b/tools/testing/selftests/mm/khugepaged.c index 3fe7ef04ac62e..c8393ca52cab7 100644 --- a/tools/testing/selftests/mm/khugepaged.c +++ b/tools/testing/selftests/mm/khugepaged.c @@ -373,7 +373,7 @@ static void *file_setup_area(int nr_hpages) unlink(finfo.path); /* Cleanup from previous failed tests */ printf("Creating %s for collapse%s...", finfo.path, finfo.type == VMA_SHMEM ? " (tmpfs)" : ""); - fd = open(finfo.path, O_DSYNC | O_CREAT | O_RDWR | O_TRUNC | O_EXCL, + fd = open(finfo.path, O_CREAT | O_RDWR | O_TRUNC | O_EXCL, 777); if (fd < 0) { perror("open()"); @@ -381,9 +381,21 @@ static void *file_setup_area(int nr_hpages) } size = nr_hpages * hpage_pmd_size; - p = alloc_mapping(nr_hpages); + if (ftruncate(fd, size)) { + perror("ftruncate()"); + exit(EXIT_FAILURE); + } + p = mmap(BASE_ADDR, size, PROT_READ | PROT_WRITE, + MAP_SHARED, fd, 0); + if (p != BASE_ADDR) { + perror("mmap()"); + exit(EXIT_FAILURE); + } fill_memory(p, 0, size); - write(fd, p, size); + if (msync(p, size, MS_SYNC)) { + perror("msync()"); + exit(EXIT_FAILURE); + } close(fd); munmap(p, size); success("OK"); diff --git a/tools/testing/selftests/mm/ksft_kmemleak_dedup.sh b/tools/testing/selftests/mm/ksft_kmemleak_dedup.sh new file mode 100755 index 0000000000000..d019502444901 --- /dev/null +++ b/tools/testing/selftests/mm/ksft_kmemleak_dedup.sh @@ -0,0 +1,222 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Regression test for kmemleak's per-scan verbose dedup. +# +# Loads samples/kmemleak's helper module to generate orphan allocations +# (some of which share an allocation backtrace), runs a few kmemleak +# scans with verbose printing enabled, and verifies that no two +# "unreferenced object" reports within a single scan share the same +# backtrace - which would mean dedup failed to collapse them. +# +# This test is intentionally permissive: the kmemleak-test module's +# leaks frequently get reported across many separate scans (per-CPU +# chunk reuse, slab freelist pointers, kernel stack residue), so dedup +# may never have anything to fold within one scan. That is not a +# regression. The test only fails when it actually catches dedup not +# happening on input that should have triggered it - i.e. two reports +# with identical backtraces in the same scan. +# +# Author: Breno Leitao <leitao@debian.org> + +ksft_skip=4 +KMEMLEAK=/sys/kernel/debug/kmemleak +VERBOSE_PARAM=/sys/module/kmemleak/parameters/verbose +MODULE=kmemleak-test + +skip() { + echo "SKIP: $*" + exit $ksft_skip +} + +fail() { + echo "FAIL: $*" + exit 1 +} + +pass() { + echo "PASS: $*" + exit 0 +} + +[ "$(id -u)" -eq 0 ] || skip "must run as root" +[ -r "$KMEMLEAK" ] || skip "no kmemleak debugfs (CONFIG_DEBUG_KMEMLEAK)" +[ -w "$VERBOSE_PARAM" ] || skip "kmemleak verbose param missing" +modinfo "$MODULE" >/dev/null 2>&1 || + skip "$MODULE not built (CONFIG_SAMPLE_KMEMLEAK)" + +# The verdict depends entirely on dmesg contents, so a silently-empty +# dmesg (dmesg_restrict=1 with CAP_SYSLOG dropped, restricted container, +# etc.) would let the script report PASS without parsing anything. Probe +# both read and clear up front and skip cleanly if either is denied. +dmesg >/dev/null 2>&1 || + skip "cannot read dmesg (need CAP_SYSLOG or dmesg_restrict=0)" +dmesg -C >/dev/null 2>&1 || + skip "cannot clear dmesg (need CAP_SYSLOG or dmesg_restrict=0)" + +# kmemleak can be present but disabled at runtime (boot arg kmemleak=off, +# or it self-disabled after an internal error). In that state writes other +# than "clear" return EPERM, so probe once and skip if so. +if ! echo scan > "$KMEMLEAK" 2>/dev/null; then + skip "kmemleak is disabled (check dmesg or kmemleak= boot arg)" +fi + +prev_verbose=$(cat "$VERBOSE_PARAM") +# shellcheck disable=SC2317 # invoked indirectly via trap +cleanup() { + echo "$prev_verbose" > "$VERBOSE_PARAM" 2>/dev/null + rmmod "$MODULE" 2>/dev/null + # Drain the leak set we generated. Subsequent selftests (e.g. + # tools/testing/selftests/net/netfilter/nft_interface_stress.sh) + # fail on any non-empty kmemleak report, so leaving the helper + # module's intentional leaks behind would poison the rest of a + # kselftest run. + # + # Caveat: kmemleak_clear() only greys objects that have already + # been reported (OBJECT_REPORTED && unreferenced_object()). Helper + # allocations that stayed "still referenced" throughout the test + # (stale pointers in per-CPU chunks, slab freelists, kernel stacks) + # were never reported and are therefore not greyed by this clear - + # they remain tracked and a later scan can still surface them. Such + # leftovers are inherent to the kmemleak-test sample module and are + # not specific to this test; consumers that fail on any kmemleak + # output (rather than on the test-specific backtraces) need to be + # robust to that, or this test should be excluded from the run. + echo clear > "$KMEMLEAK" 2>/dev/null +} +trap cleanup EXIT + +echo 1 > "$VERBOSE_PARAM" + +# Drain the existing leak set so the next scan only reports our objects. +echo clear > "$KMEMLEAK" + +# Re-clear dmesg now (the up-front probe also cleared it, but anything +# logged between then and here - module unload chatter, the probe scan, +# the verbose-param write - would otherwise pollute the parse window). +dmesg -C >/dev/null + +# If the module was left loaded by a previous aborted run, modprobe would +# be a no-op and the init function would not run, so no new leaks would be +# generated. Force a clean state first. +rmmod "$MODULE" 2>/dev/null +modprobe "$MODULE" || skip "failed to load $MODULE" +# Removing the module orphans the list elements without freeing them. +rmmod "$MODULE" || skip "failed to unload $MODULE" + +# Run a handful of scans so kmemleak has the chance to age and report +# the orphans. We do not require any particular number to be reported: +# the regression check below operates on whatever lands in dmesg. +# +# Note: with CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y the kernel's own scan +# thread can report and mark these orphans (OBJECT_REPORTED) before our +# manual scans run, after which our scans will see nothing. The +# lower-bound check below catches the case where that happens and the +# manual scans also produce nothing. +SCAN_COUNT=4 +SCAN_SLEEP=6 +for _ in $(seq 1 "$SCAN_COUNT"); do + echo scan > "$KMEMLEAK" + sleep "$SCAN_SLEEP" +done + +# Strip the leading "[ nnn.nnnnnn] " dmesg timestamp prefix. Without +# this, two identical stack frames printed from two reports in the same +# scan would produce different per-frame strings (different timestamps) +# and the duplicate-backtrace check below would not match them, silently +# passing a real dedup regression. Doing the strip here makes the rest +# of the parser timestamp-agnostic regardless of what dmesg defaults to. +log=$(dmesg | sed 's/^\[[^]]*\] //') + +# After running the workload (modprobe + scans), dmesg should contain at +# least the helper module's pr_info lines and our manual-scan output. An +# empty capture here means dmesg succeeded earlier but is now denying us +# the buffer (race with dmesg_restrict toggling, etc.); refuse to give a +# verdict on no evidence. +[ -n "$log" ] || skip "dmesg returned empty after running workload" + +# Lower bound: if kmemleak's own per-scan tally counted leaks but the +# verbose path emitted no "unreferenced object" line, the verbose printer +# itself is regressed - fail rather than silently passing on no input. +new_leaks=$(echo "$log" | + sed -n 's/.*kmemleak: \([0-9]\+\) new suspected.*/\1/p' | + awk '{s+=$1} END{print s+0}') +printed=$(echo "$log" | grep -c 'kmemleak: unreferenced object') +if [ "$new_leaks" -gt 0 ] && [ "$printed" -eq 0 ]; then + fail "verbose path broken: $new_leaks leaks counted, 0 printed in $SCAN_COUNT scans" +fi + +# Walk the log: split into per-scan chunks at "N new suspected memory +# leaks" boundaries; within each chunk, capture each "unreferenced +# object" report's backtrace and check that no backtrace is reported +# more than once. A duplicate within a single scan means dedup failed +# to collapse two leaks that share an allocation site. +violations=$(echo "$log" | awk ' + function flush_block() { + if (in_block) { + # Skip empty backtraces: leaks with trace_handle == 0 + # (early-boot allocations or stack_depot_save() failures + # under memory pressure) are intentionally not deduped, + # so multiple such reports in one scan are expected and + # must not be flagged as a regression. + if (bt != "") + seen[bt]++ + in_block = 0 + collecting = 0 + bt = "" + } + } + function check_and_reset( b) { + for (b in seen) + if (seen[b] > 1) + printf("backtrace seen %d times in one scan:\n%s\n", + seen[b], b) + delete seen + } + # Scan boundary: the per-scan summary line. + /kmemleak: [0-9]+ new suspected memory leaks/ { + flush_block() + check_and_reset() + next + } + # Start of a new "unreferenced object" report. + /kmemleak: unreferenced object/ { + flush_block() + in_block = 1 + next + } + # Inside a report, the "backtrace (crc ...):" line switches us to + # backtrace-collecting mode. + in_block && /kmemleak:[[:space:]]+backtrace \(crc/ { + collecting = 1 + next + } + # Once collecting, capture only deeply-indented "kmemleak: " lines + # (stack frames have 4+ spaces of indentation under "kmemleak: "; + # headers and the "... and N more" tail line have less). This stops + # unrelated kmemleak warns landing between reports from being lumped + # into the backtrace key, which would mask a genuine duplicate. + in_block && collecting && /kmemleak:[[:space:]]{4,}/ { + bt = bt $0 "\n" + next + } + END { + flush_block() + check_and_reset() + } +') + +if [ -n "$violations" ]; then + echo "$violations" + fail "kmemleak dedup regression: same backtrace reported more than once in a single scan" +fi + +# Count the dedup summary lines so the report distinguishes "dedup +# actually fired" from "no same-backtrace leaks turned up to dedup". +dedup_lines=$(echo "$log" | grep -c 'more object(s) with the same backtrace') + +if [ "$dedup_lines" -gt 0 ]; then + pass "no dedup violations across $SCAN_COUNT scans; dedup fired ($dedup_lines summary line(s) observed)" +else + pass "no dedup violations across $SCAN_COUNT scans; dedup had nothing to collapse" +fi diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index 8d874c4754f38..31c06c72203fd 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -498,6 +498,7 @@ static void test_prctl_fork(void) static int start_ksmd_and_set_frequency(char *pages_to_scan, char *sleep_ms) { int ksm_fd; + size_t len; ksm_fd = open("/sys/kernel/mm/ksm/run", O_RDWR); if (ksm_fd < 0) @@ -506,11 +507,13 @@ static int start_ksmd_and_set_frequency(char *pages_to_scan, char *sleep_ms) if (write(ksm_fd, "1", 1) != 1) return -errno; - if (write(pages_to_scan_fd, pages_to_scan, strlen(pages_to_scan)) <= 0) - return -errno; + len = strlen(pages_to_scan); + if (write(pages_to_scan_fd, pages_to_scan, len) != len) + return -1; - if (write(sleep_millisecs_fd, sleep_ms, strlen(sleep_ms)) <= 0) - return -errno; + len = strlen(sleep_ms); + if (write(sleep_millisecs_fd, sleep_ms, len) != len) + return -1; return 0; } @@ -526,11 +529,11 @@ static int stop_ksmd_and_restore_frequency(void) if (write(ksm_fd, "2", 1) != 1) return -errno; - if (write(pages_to_scan_fd, "100", 3) <= 0) - return -errno; + if (write(pages_to_scan_fd, "100", 3) != 3) + return -1; - if (write(sleep_millisecs_fd, "20", 2) <= 0) - return -errno; + if (write(sleep_millisecs_fd, "20", 2) != 2) + return -1; return 0; } diff --git a/tools/testing/selftests/mm/mlock2-tests.c b/tools/testing/selftests/mm/mlock2-tests.c index b474f2b20def2..e16e288cc7c1f 100644 --- a/tools/testing/selftests/mm/mlock2-tests.c +++ b/tools/testing/selftests/mm/mlock2-tests.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 #define _GNU_SOURCE #include <sys/mman.h> +#include <linux/mman.h> #include <stdint.h> #include <unistd.h> #include <string.h> @@ -163,14 +164,17 @@ static int lock_check(unsigned long addr) return (vma_rss == vma_size); } -static int unlock_lock_check(char *map) +static int unlock_lock_check(char *map, bool mlock_supported) { - if (is_vmflag_set((unsigned long)map, LOCKED)) { + if (!is_vmflag_set((unsigned long)map, LOCKED)) + return 0; + + if (mlock_supported) ksft_print_msg("VMA flag %s is present on page 1 after unlock\n", LOCKED); - return 1; - } + else + ksft_print_msg("VMA flag %s is present on an unsupported VMA\n", LOCKED); - return 0; + return 1; } static void test_mlock_lock(void) @@ -196,7 +200,7 @@ static void test_mlock_lock(void) ksft_exit_fail_msg("munlock(): %s\n", strerror(errno)); } - ksft_test_result(!unlock_lock_check(map), "%s: Unlocked\n", __func__); + ksft_test_result(!unlock_lock_check(map, true), "%s: Unlocked\n", __func__); munmap(map, 2 * page_size); } @@ -296,7 +300,7 @@ static void test_munlockall0(void) ksft_exit_fail_msg("munlockall(): %s\n", strerror(errno)); } - ksft_test_result(!unlock_lock_check(map), "%s: No locked memory\n", __func__); + ksft_test_result(!unlock_lock_check(map, true), "%s: No locked memory\n", __func__); munmap(map, 2 * page_size); } @@ -336,7 +340,67 @@ static void test_munlockall1(void) ksft_exit_fail_msg("munlockall() %s\n", strerror(errno)); } - ksft_test_result(!unlock_lock_check(map), "%s: No locked memory\n", __func__); + ksft_test_result(!unlock_lock_check(map, true), "%s: No locked memory\n", __func__); + munmap(map, 2 * page_size); +} + +/* Droppable memory should not be lockable. */ +static void test_mlock_droppable(void) +{ + char *map; + unsigned long page_size = getpagesize(); + + /* Ensure MCL_FUTURE is not set. */ + if (munlockall()) { + ksft_test_result_fail("munlockall() %s\n", strerror(errno)); + return; + } + + map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_DROPPABLE, -1, 0); + if (map == MAP_FAILED) { + if ((errno == EOPNOTSUPP) || (errno == EINVAL)) + ksft_test_result_skip("%s: MAP_DROPPABLE not supported\n", __func__); + else + ksft_test_result_fail("mmap error: %s\n", strerror(errno)); + return; + } + + if (mlock2_(map, 2 * page_size, 0)) + ksft_test_result_fail("mlock2(0): %s\n", strerror(errno)); + else + ksft_test_result(!unlock_lock_check(map, false), + "%s: droppable memory not locked\n", __func__); + + munmap(map, 2 * page_size); +} + +static void test_mlockall_future_droppable(void) +{ + char *map; + unsigned long page_size = getpagesize(); + + if (mlockall(MCL_CURRENT | MCL_FUTURE)) { + ksft_test_result_fail("mlockall(MCL_CURRENT | MCL_FUTURE): %s\n", strerror(errno)); + return; + } + + map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_DROPPABLE, -1, 0); + + if (map == MAP_FAILED) { + if ((errno == EOPNOTSUPP) || (errno == EINVAL)) + ksft_test_result_skip("%s: MAP_DROPPABLE not supported\n", __func__); + else + ksft_test_result_fail("mmap error: %s\n", strerror(errno)); + munlockall(); + return; + } + + ksft_test_result(!unlock_lock_check(map, false), "%s: droppable memory not locked\n", + __func__); + + munlockall(); munmap(map, 2 * page_size); } @@ -442,7 +506,7 @@ int main(int argc, char **argv) munmap(map, size); - ksft_set_plan(13); + ksft_set_plan(15); test_mlock_lock(); test_mlock_onfault(); @@ -451,6 +515,8 @@ int main(int argc, char **argv) test_lock_onfault_of_present(); test_vma_management(true); test_mlockall(); + test_mlock_droppable(); + test_mlockall_future_droppable(); ksft_finished(); } diff --git a/tools/testing/selftests/mm/mremap_test.c b/tools/testing/selftests/mm/mremap_test.c index 308576437228c..131d9d6db8679 100644 --- a/tools/testing/selftests/mm/mremap_test.c +++ b/tools/testing/selftests/mm/mremap_test.c @@ -76,27 +76,6 @@ enum { .expect_failure = should_fail \ } -/* compute square root using binary search */ -static unsigned long get_sqrt(unsigned long val) -{ - unsigned long low = 1; - - /* assuming rand_size is less than 1TB */ - unsigned long high = (1UL << 20); - - while (low <= high) { - unsigned long mid = low + (high - low) / 2; - unsigned long temp = mid * mid; - - if (temp == val) - return mid; - if (temp < val) - low = mid + 1; - high = mid - 1; - } - return low; -} - /* * Returns false if the requested remap region overlaps with an * existing mapping (e.g text, stack) else returns true. @@ -995,11 +974,9 @@ static long long remap_region(struct config c, unsigned int threshold_mb, char *rand_addr) { void *addr, *tmp_addr, *src_addr, *dest_addr, *dest_preamble_addr = NULL; - unsigned long long t, d; struct timespec t_start = {0, 0}, t_end = {0, 0}; long long start_ns, end_ns, align_mask, ret, offset; unsigned long long threshold; - unsigned long num_chunks; if (threshold_mb == VALIDATION_NO_THRESHOLD) threshold = c.region_size; @@ -1068,87 +1045,21 @@ static long long remap_region(struct config c, unsigned int threshold_mb, goto clean_up_dest_preamble; } - /* - * Verify byte pattern after remapping. Employ an algorithm with a - * square root time complexity in threshold: divide the range into - * chunks, if memcmp() returns non-zero, only then perform an - * iteration in that chunk to find the mismatch index. - */ - num_chunks = get_sqrt(threshold); - for (unsigned long i = 0; i < num_chunks; ++i) { - size_t chunk_size = threshold / num_chunks; - unsigned long shift = i * chunk_size; - - if (!memcmp(dest_addr + shift, rand_addr + shift, chunk_size)) - continue; - - /* brute force iteration only over mismatch segment */ - for (t = shift; t < shift + chunk_size; ++t) { - if (((char *) dest_addr)[t] != rand_addr[t]) { - ksft_print_msg("Data after remap doesn't match at offset %llu\n", - t); - ksft_print_msg("Expected: %#x\t Got: %#x\n", rand_addr[t] & 0xff, - ((char *) dest_addr)[t] & 0xff); - ret = -1; - goto clean_up_dest; - } - } - } - - /* - * if threshold is not divisible by num_chunks, then check the - * last chunk - */ - for (t = num_chunks * (threshold / num_chunks); t < threshold; ++t) { - if (((char *) dest_addr)[t] != rand_addr[t]) { - ksft_print_msg("Data after remap doesn't match at offset %llu\n", - t); - ksft_print_msg("Expected: %#x\t Got: %#x\n", rand_addr[t] & 0xff, - ((char *) dest_addr)[t] & 0xff); - ret = -1; - goto clean_up_dest; - } + /* Verify byte pattern after remapping */ + if (memcmp(dest_addr, rand_addr, threshold)) { + ksft_print_msg("Data after remap doesn't match\n"); + ret = -1; + goto clean_up_dest; } /* Verify the dest preamble byte pattern after remapping */ - if (!c.dest_preamble_size) - goto no_preamble; - - num_chunks = get_sqrt(c.dest_preamble_size); - - for (unsigned long i = 0; i < num_chunks; ++i) { - size_t chunk_size = c.dest_preamble_size / num_chunks; - unsigned long shift = i * chunk_size; - - if (!memcmp(dest_preamble_addr + shift, rand_addr + shift, - chunk_size)) - continue; - - /* brute force iteration only over mismatched segment */ - for (d = shift; d < shift + chunk_size; ++d) { - if (((char *) dest_preamble_addr)[d] != rand_addr[d]) { - ksft_print_msg("Preamble data after remap doesn't match at offset %llu\n", - d); - ksft_print_msg("Expected: %#x\t Got: %#x\n", rand_addr[d] & 0xff, - ((char *) dest_preamble_addr)[d] & 0xff); - ret = -1; - goto clean_up_dest; - } - } - } - - for (d = num_chunks * (c.dest_preamble_size / num_chunks); d < c.dest_preamble_size; ++d) { - if (((char *) dest_preamble_addr)[d] != rand_addr[d]) { - ksft_print_msg("Preamble data after remap doesn't match at offset %llu\n", - d); - ksft_print_msg("Expected: %#x\t Got: %#x\n", rand_addr[d] & 0xff, - ((char *) dest_preamble_addr)[d] & 0xff); - ret = -1; - goto clean_up_dest; - } + if (c.dest_preamble_size && + memcmp(dest_preamble_addr, rand_addr, c.dest_preamble_size)) { + ksft_print_msg("Preamble data after remap doesn't match\n"); + ret = -1; + goto clean_up_dest; } -no_preamble: start_ns = t_start.tv_sec * NS_PER_SEC + t_start.tv_nsec; end_ns = t_end.tv_sec * NS_PER_SEC + t_end.tv_nsec; ret = end_ns - start_ns; diff --git a/tools/testing/selftests/mm/process_madv.c b/tools/testing/selftests/mm/process_madv.c index cd4610baf5d7d..3fffd5f7e6fb4 100644 --- a/tools/testing/selftests/mm/process_madv.c +++ b/tools/testing/selftests/mm/process_madv.c @@ -310,6 +310,34 @@ TEST_F(process_madvise, invalid_vlen) } /* + * Test that invalid advice is rejected even when the iovec has zero total + * length. A request with valid advice and zero length is a noop, but + * invalid advice should still fail with EINVAL. + */ +TEST_F(process_madvise, invalid_advice_zero_length) +{ + struct iovec vec = { + .iov_base = NULL, + .iov_len = 0, + }; + int pidfd = self->pidfd; + ssize_t ret; + + errno = 0; + ret = sys_process_madvise(pidfd, &vec, 1, -1, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EINVAL); + + errno = 0; + ret = sys_process_madvise(pidfd, &vec, 1, MADV_DONTNEED, 0); + ASSERT_EQ(ret, 0); + + ret = sys_process_madvise(pidfd, NULL, 0, -1, 0); + ASSERT_EQ(ret, -1); + ASSERT_EQ(errno, EINVAL); +} + +/* * Test process_madvise() with an invalid flag value. Currently, only a flag * value of 0 is supported. This test is reserved for the future, e.g., if * synchronous flags are added. diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index c17b133a81d24..3b61677fe9840 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -382,6 +382,7 @@ else fi CATEGORY="mmap" run_test ./map_populate +CATEGORY="mmap" run_test ./droppable CATEGORY="mlock" run_test ./mlock-random-test diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index 500d07c4938b1..40a5093917e74 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -609,9 +609,13 @@ static int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, assert(fd_size % sizeof(buf) == 0); for (i = 0; i < sizeof(buf); i++) buf[i] = (unsigned char)i; - for (i = 0; i < fd_size; i += sizeof(buf)) - write(*fd, buf, sizeof(buf)); - + for (i = 0; i < fd_size; i += sizeof(buf)) { + if (write(*fd, buf, sizeof(buf)) != sizeof(buf)) { + ksft_perror("write testfile"); + close(*fd); + goto err_out_unlink; + } + } close(*fd); sync(); *fd = open("/proc/sys/vm/drop_caches", O_WRONLY); @@ -621,7 +625,7 @@ static int create_pagecache_thp_and_fd(const char *testfile, size_t fd_size, } if (write(*fd, "3", 1) != 1) { ksft_perror("write to drop_caches"); - goto err_out_unlink; + goto err_out_close; } close(*fd); diff --git a/tools/testing/selftests/proc/proc-maps-race.c b/tools/testing/selftests/proc/proc-maps-race.c index a734553718dac..1026d8c400e1b 100644 --- a/tools/testing/selftests/proc/proc-maps-race.c +++ b/tools/testing/selftests/proc/proc-maps-race.c @@ -17,8 +17,8 @@ */ /* * Fork a child that concurrently modifies address space while the main - * process is reading /proc/$PID/maps and verifying the results. Address - * space modifications include: + * process is reading /proc/$PID/maps and /proc/$PID/smaps, verifying the + * results. Address space modifications include: * VMA splitting and merging * */ @@ -39,6 +39,13 @@ #include <sys/types.h> #include <sys/wait.h> +#define min(a, b) \ + ({ \ + typeof(a) _a = (a); \ + typeof(b) _b = (b); \ + _a < _b ? _a : _b; \ + }) + /* /proc/pid/maps parsing routines */ struct page_content { char *data; @@ -66,6 +73,11 @@ enum test_state { TEST_DONE, }; +enum maps_file { + MAPS, + SMAPS, +}; + struct vma_modifier_info; FIXTURE(proc_maps_race) @@ -76,7 +88,9 @@ FIXTURE(proc_maps_race) struct line_content last_line; struct line_content first_line; unsigned long duration_sec; + enum maps_file maps_file; int shared_mem_size; + int skip_pages; int page_size; int vma_count; bool verbose; @@ -84,6 +98,19 @@ FIXTURE(proc_maps_race) pid_t pid; }; +FIXTURE_VARIANT(proc_maps_race) +{ + const enum maps_file maps_file; +}; + +FIXTURE_VARIANT_ADD(proc_maps_race, maps) { + .maps_file = MAPS, +}; + +FIXTURE_VARIANT_ADD(proc_maps_race, smaps) { + .maps_file = SMAPS, +}; + typedef bool (*vma_modifier_op)(FIXTURE_DATA(proc_maps_race) *self); typedef bool (*vma_mod_result_check_op)(struct line_content *mod_last_line, struct line_content *mod_first_line, @@ -105,38 +132,102 @@ struct vma_modifier_info { void *child_mapped_addr[]; }; - -static bool read_two_pages(FIXTURE_DATA(proc_maps_race) *self) +static bool read_page(FIXTURE_DATA(proc_maps_race) *self, + struct page_content *page) { ssize_t bytes_read; - if (lseek(self->maps_fd, 0, SEEK_SET) < 0) + bytes_read = read(self->maps_fd, page->data, self->page_size); + if (bytes_read <= 0) return false; - bytes_read = read(self->maps_fd, self->page1.data, self->page_size); - if (bytes_read <= 0) + /* Make sure data always ends with a newline character. */ + if (page->data[bytes_read - 1] != '\n') return false; - self->page1.size = bytes_read; + page->size = bytes_read; - bytes_read = read(self->maps_fd, self->page2.data, self->page_size); - if (bytes_read <= 0) + return true; +} + +static bool parse_vma_line(char *line_start, char *line_end, + unsigned long *start, unsigned long *end) +{ + bool found; + + *line_end = '\0'; /* stop sscanf at the EOL */ + found = (sscanf(line_start, "%lx-%lx", start, end) == 2); + *line_end = '\n'; + + return found; +} + +static int locate_containing_page(FIXTURE_DATA(proc_maps_race) *self, + unsigned long addr, unsigned long size) +{ + unsigned long start, end; + int page = 0; + + if (lseek(self->maps_fd, 0, SEEK_SET) < 0) + return -1; + + while (true) { + char *curr_pos; + char *end_pos; + + if (!read_page(self, &self->page1)) + return -1; + + curr_pos = self->page1.data; + end_pos = self->page1.data + self->page1.size; + while (curr_pos < end_pos) { + char *line_end; + + line_end = strchr(curr_pos, '\n'); + if (!line_end) + break; + + if (parse_vma_line(curr_pos, line_end, &start, &end) && + start == addr && end == addr + size) + return page; + + curr_pos = line_end + 1; + } + page++; + } + + return 0; +} + +static bool read_two_pages(FIXTURE_DATA(proc_maps_race) *self) +{ + if (lseek(self->maps_fd, 0, SEEK_SET) < 0) return false; - self->page2.size = bytes_read; + for (int i = 0; i < self->skip_pages; i++) + if (!read_page(self, &self->page1)) + return false; - return true; + return read_page(self, &self->page1) && read_page(self, &self->page2); } -static void copy_first_line(struct page_content *page, char *first_line) +static void copy_line(const char *line_start, const char *line_end, + char *buf, size_t buf_size) { - char *pos = strchr(page->data, '\n'); + size_t len = min(line_end - line_start, buf_size - 1); - strncpy(first_line, page->data, pos - page->data); - first_line[pos - page->data] = '\0'; + strncpy(buf, line_start, len); + buf[len] = '\0'; } -static void copy_last_line(struct page_content *page, char *last_line) +static void copy_first_line(struct page_content *page, char *first_line, + size_t line_size) +{ + copy_line(page->data, strchr(page->data, '\n'), first_line, line_size); +} + +static void copy_last_line(struct page_content *page, char *last_line, + size_t line_size) { /* Get the last line in the first page */ const char *end = page->data + page->size - 1; @@ -146,8 +237,59 @@ static void copy_last_line(struct page_content *page, char *last_line) /* search previous newline */ while (pos[-1] != '\n') pos--; - strncpy(last_line, pos, end - pos); - last_line[end - pos] = '\0'; + + copy_line(pos, end, last_line, line_size); +} + +static bool copy_first_entry(struct page_content *page, char *first_line, + size_t line_size) +{ + char *start_pos = page->data; + + while (start_pos < page->data + page->size) { + unsigned long start_addr; + unsigned long end_addr; + char *end_pos; + + end_pos = strchr(start_pos, '\n'); + if (!end_pos) + break; + + if (parse_vma_line(start_pos, end_pos, &start_addr, &end_addr)) { + copy_line(start_pos, end_pos, first_line, line_size); + return true; + } + + start_pos = end_pos + 1; + } + + return false; +} + +static bool copy_last_entry(struct page_content *page, char *last_line, + size_t line_size) +{ + char *end_pos = page->data + page->size - 1; + char *start_pos; + + while (end_pos > page->data) { + unsigned long start_addr; + unsigned long end_addr; + + /* skip last newline */ + start_pos = end_pos - 1; + /* search previous newline */ + while (start_pos > page->data && start_pos[-1] != '\n') + start_pos--; + if (parse_vma_line(start_pos, end_pos, &start_addr, &end_addr)) { + copy_line(start_pos, end_pos, last_line, line_size); + return true; + } + + end_pos = start_pos - 1; + } + + return false; } /* Read the last line of the first page and the first line of the second page */ @@ -158,8 +300,16 @@ static bool read_boundary_lines(FIXTURE_DATA(proc_maps_race) *self, if (!read_two_pages(self)) return false; - copy_last_line(&self->page1, last_line->text); - copy_first_line(&self->page2, first_line->text); + if (self->maps_file == MAPS) { + copy_last_line(&self->page1, last_line->text, LINE_MAX_SIZE); + copy_first_line(&self->page2, first_line->text, LINE_MAX_SIZE); + } else if (self->maps_file == SMAPS) { + if (!copy_last_entry(&self->page1, last_line->text, LINE_MAX_SIZE) || + !copy_first_entry(&self->page2, first_line->text, LINE_MAX_SIZE)) + return false; + } else { + return false; + } return sscanf(last_line->text, "%lx-%lx", &last_line->start_addr, &last_line->end_addr) == 2 && @@ -418,11 +568,14 @@ FIXTURE_SETUP(proc_maps_race) struct vma_modifier_info *mod_info; pthread_mutexattr_t mutex_attr; pthread_condattr_t cond_attr; + unsigned long first_map_addr; + unsigned long last_map_addr; unsigned long duration_sec; char fname[32]; self->page_size = (unsigned long)sysconf(_SC_PAGESIZE); self->verbose = verbose && !strncmp(verbose, "1", 1); + self->maps_file = variant->maps_file; duration_sec = duration ? atol(duration) : 0; self->duration_sec = duration_sec ? duration_sec : 5UL; @@ -489,7 +642,16 @@ FIXTURE_SETUP(proc_maps_race) exit(0); } - sprintf(fname, "/proc/%d/maps", self->pid); + switch (self->maps_file) { + case MAPS: + sprintf(fname, "/proc/%d/maps", self->pid); + break; + case SMAPS: + sprintf(fname, "/proc/%d/smaps", self->pid); + break; + default: + ksft_exit_fail(); + } self->maps_fd = open(fname, O_RDONLY); ASSERT_NE(self->maps_fd, -1); @@ -502,6 +664,13 @@ FIXTURE_SETUP(proc_maps_race) self->page2.data = malloc(self->page_size); ASSERT_NE(self->page2.data, NULL); + first_map_addr = (unsigned long)mod_info->child_mapped_addr[0]; + last_map_addr = (unsigned long)mod_info->child_mapped_addr[mod_info->vma_count - 1]; + + self->skip_pages = locate_containing_page(self, + min(first_map_addr, last_map_addr), + self->page_size * 3); + ASSERT_NE(self->skip_pages, -1); ASSERT_TRUE(read_boundary_lines(self, &self->last_line, &self->first_line)); /* @@ -527,7 +696,6 @@ FIXTURE_SETUP(proc_maps_race) ASSERT_TRUE(mod_info->addr && mod_info->next_addr); signal_state(mod_info, PARENT_READY); - } FIXTURE_TEARDOWN(proc_maps_race) @@ -617,20 +785,20 @@ TEST_F(proc_maps_race, test_maps_tearing_from_split) last_line_changed = strcmp(new_last_line.text, self->last_line.text) != 0; first_line_changed = strcmp(new_first_line.text, self->first_line.text) != 0; ASSERT_EQ(last_line_changed, first_line_changed); - - /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ - ASSERT_TRUE(query_addr_at(self->maps_fd, mod_info->addr + self->page_size, - &vma_start, &vma_end)); - /* - * The vma at the split address can be either the same as - * original one (if read before the split) or the same as the - * first line in the second page (if read after the split). - */ - ASSERT_TRUE((vma_start == self->last_line.start_addr && - vma_end == self->last_line.end_addr) || - (vma_start == split_first_line.start_addr && - vma_end == split_first_line.end_addr)); - + if (self->maps_file == MAPS) { + /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ + ASSERT_TRUE(query_addr_at(self->maps_fd, mod_info->addr + self->page_size, + &vma_start, &vma_end)); + /* + * The vma at the split address can be either the same as + * original one (if read before the split) or the same as the + * first line in the second page (if read after the split). + */ + ASSERT_TRUE((vma_start == self->last_line.start_addr && + vma_end == self->last_line.end_addr) || + (vma_start == split_first_line.start_addr && + vma_end == split_first_line.end_addr)); + } clock_gettime(CLOCK_MONOTONIC_COARSE, &end_ts); end_test_iteration(&end_ts, self->verbose); } while (end_ts.tv_sec - start_ts.tv_sec < self->duration_sec); @@ -700,17 +868,18 @@ TEST_F(proc_maps_race, test_maps_tearing_from_resize) strcmp(new_first_line.text, restored_first_line.text), "Expand result invalid", self)); } - - /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ - ASSERT_TRUE(query_addr_at(self->maps_fd, mod_info->addr, &vma_start, &vma_end)); - /* - * The vma should stay at the same address and have either the - * original size of 3 pages or 1 page if read after shrinking. - */ - ASSERT_TRUE(vma_start == self->last_line.start_addr && - (vma_end - vma_start == self->page_size * 3 || - vma_end - vma_start == self->page_size)); - + if (self->maps_file == MAPS) { + /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ + ASSERT_TRUE(query_addr_at(self->maps_fd, mod_info->addr, + &vma_start, &vma_end)); + /* + * The vma should stay at the same address and have either the + * original size of 3 pages or 1 page if read after shrinking. + */ + ASSERT_TRUE(vma_start == self->last_line.start_addr && + (vma_end - vma_start == self->page_size * 3 || + vma_end - vma_start == self->page_size)); + } clock_gettime(CLOCK_MONOTONIC_COARSE, &end_ts); end_test_iteration(&end_ts, self->verbose); } while (end_ts.tv_sec - start_ts.tv_sec < self->duration_sec); @@ -780,20 +949,20 @@ TEST_F(proc_maps_race, test_maps_tearing_from_remap) strcmp(new_first_line.text, restored_first_line.text), "Remap restore result invalid", self)); } - - /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ - ASSERT_TRUE(query_addr_at(self->maps_fd, mod_info->addr + self->page_size, - &vma_start, &vma_end)); - /* - * The vma should either stay at the same address and have the - * original size of 3 pages or we should find the remapped vma - * at the remap destination address with size of 1 page. - */ - ASSERT_TRUE((vma_start == self->last_line.start_addr && - vma_end - vma_start == self->page_size * 3) || - (vma_start == self->last_line.start_addr + self->page_size && - vma_end - vma_start == self->page_size)); - + if (self->maps_file == MAPS) { + /* Check if PROCMAP_QUERY ioclt() finds the right VMA */ + ASSERT_TRUE(query_addr_at(self->maps_fd, mod_info->addr + self->page_size, + &vma_start, &vma_end)); + /* + * The vma should either stay at the same address and have the + * original size of 3 pages or we should find the remapped vma + * at the remap destination address with size of 1 page. + */ + ASSERT_TRUE((vma_start == self->last_line.start_addr && + vma_end - vma_start == self->page_size * 3) || + (vma_start == self->last_line.start_addr + self->page_size && + vma_end - vma_start == self->page_size)); + } clock_gettime(CLOCK_MONOTONIC_COARSE, &end_ts); end_test_iteration(&end_ts, self->verbose); } while (end_ts.tv_sec - start_ts.tv_sec < self->duration_sec); diff --git a/tools/testing/vma/include/dup.h b/tools/testing/vma/include/dup.h index 9e0dfd3a85b0e..bf26b3f48d3ab 100644 --- a/tools/testing/vma/include/dup.h +++ b/tools/testing/vma/include/dup.h @@ -483,23 +483,10 @@ struct mmap_action { enum mmap_action_type type; /* - * If specified, this hook is invoked after the selected action has been - * successfully completed. Note that the VMA write lock still held. - * - * The absolute minimum ought to be done here. - * - * Returns 0 on success, or an error code. - */ - int (*success_hook)(const struct vm_area_struct *vma); - - /* - * If specified, this hook is invoked when an error occurred when - * attempting the selection action. - * - * The hook can return an error code in order to filter the error, but - * it is not valid to clear the error here. + * If non-zero, replace errors that arise from mmap actions with this + * value instead. Only valid error codes may be specified. */ - int (*error_hook)(int err); + int error_override; /* * This should be set in rare instances where the operation required @@ -1303,6 +1290,7 @@ static inline void compat_set_desc_from_vma(struct vm_area_desc *desc, desc->vm_file = vma->vm_file; desc->vma_flags = vma->flags; desc->page_prot = vma->vm_page_prot; + desc->vm_ops = vma->vm_ops; /* Default. */ desc->action.type = MMAP_NOTHING; |
