diff options
| author | Asier Gutierrez <gutierrez.asier@huawei-partners.com> | 2026-04-26 16:16:17 -0700 |
|---|---|---|
| committer | Andrew Morton <akpm@linux-foundation.org> | 2026-05-28 21:04:56 -0700 |
| commit | 58996503b631adc6a268a42f4624a34513c16199 (patch) | |
| tree | e7e288357ba0c3177f7caecea322a8cb48c80a96 /Documentation | |
| parent | de3c60e1c8314f3408a72836483772e17f279aca (diff) | |
| download | linux-next-history-58996503b631adc6a268a42f4624a34513c16199.tar.gz | |
mm/damon: support MADV_COLLAPSE via DAMOS_COLLAPSE scheme action
This patch set introces a new action: DAMOS_COLLAPSE.
For DAMOS_HUGEPAGE and DAMOS_NOHUGEPAGE to work, khugepaged should be
working, since it relies on hugepage_madvise to add a new slot. This slot
should be picked up by khugepaged and eventually collapse (or not, if we
are using DAMOS_NOHUGEPAGE) the pages. If THP is not enabled, khugepaged
will not be working, and therefore no collapse will happen.
DAMOS_COLLAPSE eventually calls madvise_collapse, which will collapse the
address range synchronously. In cases where there is a large VMA
(databases, for example), DAMOS_COLLAPSE allows us to collapse only the
hot region, and not the entire VMA.
This new action may be required to support autotuning with hugepage
as a goal[1].
=========
Benchmarks:
=========
MySQL
=====
Tests were performed in an ARM physical server with MariaDB 10.5 and
sysbench. Read only benchmark was perform with gaussian row hitting,
which follows a normal distribution.
T n, D h: THP set to never, DAMON action set to hugepage
T m, D h: THP set to madvise, DAMON action set to hugepage
T n, D c: THP set to never, DAMON action set to collapse
Memory consumption. Lower is better.
+------------------+----------+----------+----------+
| | T n, D h | T m, D h | T n, D c |
+------------------+----------+----------+----------+
| Total memory use | 2.13 | 2.20 | 2.20 |
| Huge pages | 0 | 1.3 | 1.27 |
+------------------+----------+----------+----------+
Performance in TPS (Transactions Per Second). Higher is better.
T n, D h: 18225.58
T m, D h 18252.93
T n, D c: 18270.21
Performance counter
I got the number of L1 D/I TLB accesses and the number a D/I TLB
accesses that triggered a page walk. I divided the second by the
first to get the percentage of page walkes per TLB access. The
lower the better.
+---------------+--------------+--------------+--------------+
| | T n, D h | T m, D h | T n, D c |
+---------------+--------------+--------------+--------------+
| L1 DTLB | 127248242753 | 125431020479 | 125327001821 |
| L1 ITLB | 80332558619 | 79346759071 | 79298139590 |
| DTLB walk | 75011087 | 52800418 | 55895794 |
| ITLB walk | 71577076 | 71505137 | 67262140 |
| DTLB % misses | 0.058948623 | 0.042095183 | 0.044599961 |
| ITLB % misses | 0.089100954 | 0.090117275 | 0.084821839 |
+---------------+--------------+--------------+--------------+
Masim
=====
I used masim with the "demo" configuration, but changing the times
to 100 seconds for the initial phase and 50 seconds for the rest of
the phases.
Memory consumption:
+------------------+----------+----------+----------+
| | T n, D h | T m, D h | T n, D c |
+------------------+----------+----------+----------+
| Total memory use | 2.38 GB | 2.36 GB | 2.37 GB |
| Huge pages | 0 | 190 MB | 188 MB |
+------------------+----------+----------+----------+
Performance:
THP never, DAMOS_HUGEPAGE
initial phase: 40,491 accesses/msec, 100001 msecs run
low phase 0: 39,658 accesses/msec, 50002 msecs run
high phase 0: 41,678 accesses/msec, 50000 msecs run
low phase 1: 39,625 accesses/msec, 50003 msecs run
high phase 1: 41,658 accesses/msec, 50002 msecs run
low phase 2: 39,642 accesses/msec, 50002 msecs run
high phase 2: 41,640 accesses/msec, 50001 msecs run
THP madvise, DAMOS_HUGEPAGE
initial phase: 51,977 accesses/msec, 100000 msecs run
low phase 0: 86,953 accesses/msec, 50000 msecs run
high phase 0: 94,812 accesses/msec, 50000 msecs run
low phase 1: 101,017 accesses/msec, 50000 msecs run
high phase 1: 94,841 accesses/msec, 50000 msecs run
low phase 2: 100,993 accesses/msec, 50000 msecs run
high phase 2: 94,791 accesses/msec, 50001 msecs run
THP never, DAMOS_COLLAPSE
initial phase: 93,678 accesses/msec, 100001 msecs run
low phase 0: 101,475 accesses/msec, 50000 msecs run
high phase 0: 98,589 accesses/msec, 50000 msecs run
low phase 1: 101,531 accesses/msec, 50001 msecs run
high phase 1: 98,506 accesses/msec, 50001 msecs run
low phase 2: 101,458 accesses/msec, 50001 msecs run
high phase 2: 98,555 accesses/msec, 50000 msecs run
Memory consumption dynamic (how quickly collapses occur):
It shows in seconds how many huge pages are allocated.
+----+----------+----------+
| | T m, D h | T n, D c |
+----+----------+----------+
| 5 | 32 | 188 |
| 10 | 48 | 188 |
| 15 | 64 | 188 |
| 20 | 96 | 188 |
| 30 | 112 | 188 |
| 35 | 144 | 188 |
| 40 | 160 | 188 |
| 45 | 190 | 188 |
| 50 | 190 | 188 |
| 55 | 190 | 188 |
| 60 | 190 | 188 |
+----+----------+----------+
=========
- We can see that DAMOS "hugepage" action works only when THP is set
to madvise. "collapse" action works even when THP is set to never.
- Performance for "collapse" action is slightly lower than "hugepage"
action and THP madvise. This is due to the fact that collapases
occur synchronously. With "hugepage" they may occur during page
faults.
- Memory consumption is slighly lower for "collapse" than "hugepage"
with THP madvise. This is due to the khugepage collapses all VMAs,
while "collapse" action only collapses the VMAs in the hot region.
- There is an improvement in TLB utilization when collapse through
"hugepage" or "collapse" actions are triggered. The amount of
TLB misses is lower.
- "collapse" action is performance synchronously, which means that
page collapses happen earlier and more rapidly. This can be
useful or not, depending on the scenario.
- "hugepage" action may trigger a VMA split in some scenarios, since
it needs to change the flag of the VMA to THP enabled. This may
lead to additional overhead.
Collapse action just adds a new option to chose the correct system
balance.
Link: https://lore.kernel.org/20260426231619.107231-5-sj@kernel.org
Link: https://lore.kernel.org/damon/20260313000816.79933-1-sj@kernel.org/ [1]
Signed-off-by: Asier Gutierrez <gutierrez.asier@huawei-partners.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Cc: Cheng-Han Wu <hank20010209@gmail.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Liew Rui Yan <aethernet65535@gmail.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/mm/damon/design.rst | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst index bacb457f553a1..da74ab20e2891 100644 --- a/Documentation/mm/damon/design.rst +++ b/Documentation/mm/damon/design.rst @@ -474,6 +474,10 @@ that supports each action are as below. Supported by ``vaddr`` and ``fvaddr`` operations set. When TRANSPARENT_HUGEPAGE is disabled, the application of the action will just fail. + - ``collapse``: Call ``madvise()`` for the region with ``MADV_COLLAPSE``. + Supported by ``vaddr`` and ``fvaddr`` operations set. When + TRANSPARENT_HUGEPAGE is disabled, the application of the action will just + fail. - ``lru_prio``: Prioritize the region on its LRU lists. Supported by ``paddr`` operations set. - ``lru_deprio``: Deprioritize the region on its LRU lists. |
