AAP-41776 Enable new fancy asyncio metrics for dispatcherd #16233

AlanCoding · 2026-01-19T19:47:43Z

SUMMARY

Still WIP but this has the basics.

Next up: validate that /api/v2/metrics/ can return the new keys, for what those are see:

https://github.com/ansible/dispatcherd/blob/main/dispatcherd/service/metrics.py

ISSUE TYPE

Bug, Docs Fix or other nominal change

COMPONENT NAME

API

Note

Medium Risk
Adds a runtime HTTP call to the metrics endpoint and changes dispatcher service configuration, which could affect metrics availability/latency if misconfigured or if the dispatcherd endpoint is unreachable.

Overview
/api/v2/metrics/ now appends Prometheus output scraped directly from the local dispatcherd metrics HTTP endpoint (via new _get_dispatcherd_metrics), while keeping the existing Redis-backed subsystem metrics generation.

Dispatcher-related metric keys are removed from DispatcherMetrics.METRICSLIST and filtering is optimized so the HTTP scrape is skipped when node excludes the local host or when the requested metric set doesn’t overlap dispatcher metrics; failures/timeouts are swallowed with debug logging.

dispatcherd service config is updated to pass metrics_kwargs (host/port) from METRICS_SUBSYSTEM_CONFIG, test-mode broker naming is updated, functional tests cover the new node/metric filter behavior, and the dispatcherd dependency is bumped to 2026.01.27.

^{Written by Cursor Bugbot for commit 23af6f0. This will update automatically on new commits. Configure here.}

AlanCoding · 2026-01-19T19:59:45Z

from /api/v2/metrics/

dispatcher_pool_scale_up_events{node="awx-1"} 0
# HELP dispatcher_pool_active_task_count Number of active tasks in the worker pool when last task was submitted
# TYPE dispatcher_pool_active_task_count gauge
dispatcher_pool_active_task_count{node="awx-1"} 0
# HELP dispatcher_pool_max_worker_count Highest number of workers in worker pool in last collection interval, about 20s
# TYPE dispatcher_pool_max_worker_count gauge
dispatcher_pool_max_worker_count{node="awx-1"} 0
# HELP dispatcher_availability Fraction of time (in last collection interval) dispatcher was able to receive messages
# TYPE dispatcher_availability gauge
dispatcher_availability{node="awx-1"} 0.0
# HELP subsystem_metrics_pipe_execute_seconds Time spent saving metrics to redis
# TYPE subsystem_metrics_pipe_execute_seconds gauge

So this looks wrong and will look into what the deal is.

AlanCoding · 2026-01-19T20:29:53Z

With the latest change, those old numbers going away and I am seeing:

# HELP dispatcher_messages_received_total Number of messages received by dispatchermain
# TYPE dispatcher_messages_received_total counter
dispatcher_messages_received_total 2.0
# HELP dispatcher_control_messages_count_total Number of control messages received.
# TYPE dispatcher_control_messages_count_total counter
dispatcher_control_messages_count_total 0.0
# HELP dispatcher_worker_count_total Number of workers running.
# TYPE dispatcher_worker_count_total counter
dispatcher_worker_count_total 4.0

AlanCoding · 2026-01-20T15:11:15Z

With the last push the tests are passing now. The security notice would apply generally to the whole idea of serving metrics locally which I was fairly sure was already happening.

AlanCoding · 2026-01-20T15:14:22Z

Currently, I think that this posts dispatcher metrics through redis

https://github.com/ansible/awx/blob/devel/awx/main/dispatch/worker/base.py#L192-L202

For metrics that aren't actually dispatcher related, like task manager stuff, this will still be the case. But that specific method will be removed in #16209, and this new stuff is its replacement.

fosterseth · 2026-01-27T15:22:49Z

awx/main/analytics/subsystem_metrics.py

        super().__init__(settings.METRICS_SERVICE_CALLBACK_RECEIVER, *args, **kwargs)


+def _get_dispatcherd_metrics():


is there any reason we can't get metrics via dispatcherctl? seems unix socket might be more reliable / robust than http server

dispatcherctl is just the client. The service is dispatcherd. All dispatcherctl does (now) is send pg_notify messages to get data. So to the question, can metrics go over pg_notify, yeah, I think that's going to be a "no". They expect data served over a local port. Without adding this stuff, there is no port dispatcherd is serving from. So based on the OS first-principles, yeah, we have to add a port dispatcherd listens to. Because that's just how metrics collection works.

AlanCoding · 2026-01-27T16:02:10Z

I see task manager metrics are still being gathered.

# HELP task_manager_get_tasks_seconds Time spent in loading tasks from db
# TYPE task_manager_get_tasks_seconds gauge
task_manager_get_tasks_seconds{node="awx-1"} 0.013124978635460138
# HELP task_manager_start_task_seconds Time spent starting task
# TYPE task_manager_start_task_seconds gauge
task_manager_start_task_seconds{node="awx-1"} 0.010372716002166271
# HELP task_manager_process_running_tasks_seconds Time spent processing running tasks
# TYPE task_manager_process_running_tasks_seconds gauge
task_manager_process_running_tasks_seconds{node="awx-1"} 6.309710443019867e-07
# HELP task_manager_process_pending_tasks_seconds Time spent processing pending tasks
# TYPE task_manager_process_pending_tasks_seconds gauge
task_manager_process_pending_tasks_seconds{node="awx-1"} 0.01111389696598053
# HELP task_manager__schedule_seconds Time spent in running the entire _schedule
# TYPE task_manager__schedule_seconds gauge
task_manager__schedule_seconds{node="awx-1"} 0.03407588880509138
# HELP task_manager__schedule_calls Number of calls to _schedule, after lock is acquired
# TYPE task_manager__schedule_calls gauge
task_manager__schedule_calls{node="awx-1"} 34
# HELP task_manager_recorded_timestamp Unix timestamp when metrics were last recorded
# TYPE task_manager_recorded_timestamp gauge
task_manager_recorded_timestamp{node="awx-1"} 1769529661.0170753

However, I should disclose that I don't understand how.

AlanCoding · 2026-01-27T21:57:09Z

Ok, multiple times, I have unambiguously confirmed that the task manager metrics are still updating. I admit that I do not know how this is working, but somehow it appears to be working.

This is still a bit of a separate subject from this.

I'm still iffy if people are okay with me adding this data without formalizing this schema. But marking as ready for review. I don't have another approach, even if this feels hacky, it does what's needed.

awx/main/dispatch/config.py

awx/main/analytics/subsystem_metrics.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

cursor · 2026-01-28T15:35:17Z

awx/main/analytics/subsystem_metrics.py

+            return payload.decode('utf-8')
+    except (urllib.error.URLError, UnicodeError, socket.timeout, TimeoutError, http.client.HTTPException) as exc:
+        logger.debug(f"Failed to collect dispatcherd metrics from {url}: {exc}")
+        return ''


Dispatcherd metrics ignores metric query parameter filter

Low Severity

The _get_dispatcherd_metrics function respects the node query parameter filter but doesn't filter by the metric query parameter. The existing generate_metrics method filters metrics by both node and metric parameters. When users request specific metrics via ?metric=<name>, they receive filtered Redis-based metrics but ALL dispatcherd metrics, creating inconsistent API behavior. The API documentation explicitly shows metric= as a supported filter parameter.

Remove old dispatcher metrics and patch in new data from local whatever Update test fixture to new dispatcherd version

sonarqubecloud · 2026-01-28T19:34:01Z

Quality Gate failed

Failed conditions
2 Security Hotspots
67.2% Coverage on New Code (required ≥ 80%)

See analysis details on SonarQube Cloud

github-actions bot added component:api dependencies Pull requests that update a dependency file labels Jan 19, 2026

AlanCoding changed the title ~~Enable new fancy asyncio metrics for dispatcherd~~ Jan 19, 2026

AlanCoding requested a review from fosterseth January 19, 2026 19:47

AlanCoding requested a review from kdelee January 19, 2026 20:34

AlanCoding force-pushed the dispatcherd_metrics branch 2 times, most recently from e2b45d0 to 958249a Compare January 26, 2026 15:05

fosterseth reviewed Jan 27, 2026

View reviewed changes

AlanCoding force-pushed the dispatcherd_metrics branch from 958249a to a9198d0 Compare January 27, 2026 21:47

AlanCoding marked this pull request as ready for review January 27, 2026 21:57

cursor bot reviewed Jan 27, 2026

View reviewed changes

awx/main/dispatch/config.py Outdated Show resolved Hide resolved

awx/main/analytics/subsystem_metrics.py Outdated Show resolved Hide resolved

cursor bot reviewed Jan 28, 2026

View reviewed changes

awx/main/analytics/subsystem_metrics.py Outdated Show resolved Hide resolved

awx/main/analytics/subsystem_metrics.py Show resolved Hide resolved

awx/main/analytics/subsystem_metrics.py Show resolved Hide resolved

cursor bot reviewed Jan 28, 2026

View reviewed changes

AlanCoding added 5 commits January 28, 2026 14:12

Enable new fancy asyncio metrics for dispatcherd

f92d8bf

Remove old dispatcher metrics and patch in new data from local whatever Update test fixture to new dispatcherd version

Update dispatcherd again

641f7c6

Address bugbot comments

5764eba

Handle node filter in URL, and catch more errors

3776965

Add test for metric filter

23af6f0

AlanCoding force-pushed the dispatcherd_metrics branch from 48c7f74 to 23af6f0 Compare January 28, 2026 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AAP-41776 Enable new fancy asyncio metrics for dispatcherd #16233

AAP-41776 Enable new fancy asyncio metrics for dispatcherd #16233

Uh oh!

AlanCoding commented Jan 19, 2026 •

edited by cursor bot

Loading

AlanCoding commented Jan 19, 2026

AlanCoding commented Jan 19, 2026

AlanCoding commented Jan 20, 2026

AlanCoding commented Jan 20, 2026

fosterseth Jan 27, 2026

AlanCoding Jan 27, 2026

AlanCoding commented Jan 27, 2026

AlanCoding commented Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

cursor bot Jan 28, 2026

sonarqubecloud bot commented Jan 28, 2026

Labels

2 participants

		super().__init__(settings.METRICS_SERVICE_CALLBACK_RECEIVER, args, *kwargs)


		def _get_dispatcherd_metrics():

AAP-41776 Enable new fancy asyncio metrics for dispatcherd #16233

Are you sure you want to change the base?

AAP-41776 Enable new fancy asyncio metrics for dispatcherd #16233

Uh oh!

Conversation

AlanCoding commented Jan 19, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

ISSUE TYPE

COMPONENT NAME

AlanCoding commented Jan 19, 2026

AlanCoding commented Jan 19, 2026

AlanCoding commented Jan 20, 2026

AlanCoding commented Jan 20, 2026

fosterseth Jan 27, 2026

Choose a reason for hiding this comment

AlanCoding Jan 27, 2026

Choose a reason for hiding this comment

AlanCoding commented Jan 27, 2026

AlanCoding commented Jan 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

cursor bot Jan 28, 2026

Choose a reason for hiding this comment

Dispatcherd metrics ignores metric query parameter filter

sonarqubecloud bot commented Jan 28, 2026

Quality Gate failed

Labels

2 participants

AlanCoding commented Jan 19, 2026 •

edited by cursor bot

Loading