diff options
| author | SeongJae Park <sj@kernel.org> | 2026-04-27 08:12:20 -0700 |
|---|---|---|
| committer | Andrew Morton <akpm@linux-foundation.org> | 2026-05-28 21:04:57 -0700 |
| commit | 3b9e3cc0405b422db884054ea2417b7b85220c56 (patch) | |
| tree | a1a826b1a8bd6183c771977c5026cc2d0b45247c /include | |
| parent | 7b32f64bc512b40b268776c5ac4d354b325b3197 (diff) | |
| download | linux-next-history-3b9e3cc0405b422db884054ea2417b7b85220c56.tar.gz | |
mm/damon/core: introduce damon_ctx->paused
Patch series "mm/damon: let DAMON be paused and resumed", v2.
DAMON utilizes a few mechanisms that enhance itself over time. Adaptive
regions adjustment, goal-based DAMOS quota auto-tuning and monitoring
intervals auto-tuning like self-training mechanisms are such examples. It
also adds access frequency stability information (age) to the monitoring
results, which makes it enhanced over time.
Sometimes users have to stop DAMON. In this case, DAMON internal state
that enhanced over the time of the last execution simply goes away.
Restarted DAMON have to train itself and enhance its output from the
scratch. This makes DAMON less useful in such cases. Introducing three
such use cases below.
Investigation of DAMON. It is best to do the investigation online,
especially when it is a production environment. DAMON therefore provides
features for such online investigations, including DAMOS stats, monitoring
result snapshot exposure, and multiple tracepoints. When those are
insufficient, and there are additional clues that could be interfered by
DAMON, users have to temporarily stop DAMON to collect the additional
clues. It is not very useful since many of DAMON internal clues are gone
when DAMON is stopped. The loss of the monitoring results that improved
over time is also problematic, especially in production environments.
Monitoring of workloads that have different user-known phases. For
example, in Android, applications are known to have very different access
patterns and behaviors when they are running on the foreground and the
background. It can therefore be useful to separate monitoring of apps
based on whether they are running on the foreground and on the background.
Having two DAMON threads per application that paused and resumed for the
apps foreground/background switches can be useful for the purpose. But
such pause/resume of the execution is not supported.
Tests of DAMON. A few DAMON selftests are using drgn to dump the internal
DAMON status. The tests show if the dumped status is the same as what the
test code expected. Because DAMON keeps running and modifying its
internal status, there are chances of data races that can cause false test
results. Stopping DAMON can avoid the race. But, since the internal
state of DAMON is dropped, the test coverage will be limited.
Let DAMON execution be paused and resumed without loss of the internal
state, to overhaul the limitations. For this, introduce a new DAMON
context parameter, namely 'pause'. API callers can update it while the
context is running, using the online parameters update functions
(damon_commit_ctx() and damon_call()). Once it is set, kdamond_fn() main
loop will do only limited works excluding the monitoring and DAMOS works,
while sleeping sampling intervals per the work. The limited works include
handling of the online parameters update. Hence users can unset the
'pause' parameter again. Once it is unset, kdamond_fn() main loop will do
all the work again (resumed). Under the paused state, it also does stop
condition checks and handling of it, so that paused DAMON can also be
stopped if needed. Expose the feature to the user space via DAMON sysfs
interface. Also, update existing drgn-based tests to test and use the
feature.
Tests
=====
I confirmed the feature functionality using real time tracing ('perf
trace' or 'trace-cmd stream') of damon:damon_aggregated DAMON tracepoint.
By pausing and resuming the DAMON execution, I was able to see the trace
stops and continued as expected. Note that the pause feature support is
added to DAMON user-space tool (damo) after v3.1.9. Users can use
'--pause_ctx' command line option of damo for that, and I actually used it
for my test. The extended drgn-based selftests are also testing a part of
the functionality.
Patches Sequence
================
Patch 1 introduces the new core API for the pause feature. Patch 2 extend
DAMON sysfs interface for the new parameter. Patches 3-5 update design,
usage and ABI documents for the new sysfs file, respectively. The
following five patches are for tests. Patch 6 implements a new kunit test
for the pause parameter online commitment. Patches 7 and 8 extend DAMON
selftest helpers to support the new feature. Patch 9 extends selftest to
test the commitment of the feature. Finally, patch 10 updates existing
selftest to be safe from the race condition using the pause/resume
feature.
This patch (of 10):
DAMON supports only start and stop of the execution. When it is stopped,
its internal data that it self-trained goes away. It will be useful if
the execution can be paused and resumed with the previous self-trained
data.
Introduce per-context API parameter, 'paused', for the purpose. The
parameter can be set and unset while DAMON is running and paused, using
the online parameters commit helper functions (damon_commit_ctx() and
damon_call()). Once 'paused' is set, the kdamond_fn() main loop does only
limited works with sampling interval sleep during the works. The limited
works include the handling of the online parameters update, so that users
can unset the 'pause' and resume the execution when they want. It also
keep checking DAMON stop conditions and handling of it, so that DAMON can
be stopped while paused if needed.
Link: https://lore.kernel.org/20260427151231.113429-1-sj@kernel.org
Link: https://lore.kernel.org/20260427151231.113429-2-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendan.higgins@linux.dev>
Cc: David Hildenbrand <david@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam R. Howlett <liam@infradead.org>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'include')
| -rw-r--r-- | include/linux/damon.h | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/include/linux/damon.h b/include/linux/damon.h index d3a231275c23e..f2370a3a4a9a3 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -801,6 +801,7 @@ struct damon_attrs { * @ops: Set of monitoring operations for given use cases. * @addr_unit: Scale factor for core to ops address conversion. * @min_region_sz: Minimum region size. + * @pause: Pause kdamond main loop. * @adaptive_targets: Head of monitoring targets (&damon_target) list. * @schemes: Head of schemes (&damos) list. */ @@ -854,6 +855,7 @@ struct damon_ctx { struct damon_operations ops; unsigned long addr_unit; unsigned long min_region_sz; + bool pause; struct list_head adaptive_targets; struct list_head schemes; |
