-
Notifications
You must be signed in to change notification settings - Fork 3.6k
AAP-41776 Enable new fancy asyncio metrics for dispatcherd #16233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
|
from So this looks wrong and will look into what the deal is. |
|
With the latest change, those old numbers going away and I am seeing: |
|
With the last push the tests are passing now. The security notice would apply generally to the whole idea of serving metrics locally which I was fairly sure was already happening. |
|
Currently, I think that this posts dispatcher metrics through redis https://github.com/ansible/awx/blob/devel/awx/main/dispatch/worker/base.py#L192-L202 For metrics that aren't actually dispatcher related, like task manager stuff, this will still be the case. But that specific method will be removed in #16209, and this new stuff is its replacement. |
e2b45d0 to
958249a
Compare
| super().__init__(settings.METRICS_SERVICE_CALLBACK_RECEIVER, *args, **kwargs) | ||
|
|
||
|
|
||
| def _get_dispatcherd_metrics(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason we can't get metrics via dispatcherctl? seems unix socket might be more reliable / robust than http server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dispatcherctl is just the client. The service is dispatcherd. All dispatcherctl does (now) is send pg_notify messages to get data. So to the question, can metrics go over pg_notify, yeah, I think that's going to be a "no". They expect data served over a local port. Without adding this stuff, there is no port dispatcherd is serving from. So based on the OS first-principles, yeah, we have to add a port dispatcherd listens to. Because that's just how metrics collection works.
|
I see task manager metrics are still being gathered. However, I should disclose that I don't understand how. |
958249a to
a9198d0
Compare
|
Ok, multiple times, I have unambiguously confirmed that the task manager metrics are still updating. I admit that I do not know how this is working, but somehow it appears to be working. This is still a bit of a separate subject from this. I'm still iffy if people are okay with me adding this data without formalizing this schema. But marking as ready for review. I don't have another approach, even if this feels hacky, it does what's needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
| return payload.decode('utf-8') | ||
| except (urllib.error.URLError, UnicodeError, socket.timeout, TimeoutError, http.client.HTTPException) as exc: | ||
| logger.debug(f"Failed to collect dispatcherd metrics from {url}: {exc}") | ||
| return '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dispatcherd metrics ignores metric query parameter filter
Low Severity
The _get_dispatcherd_metrics function respects the node query parameter filter but doesn't filter by the metric query parameter. The existing generate_metrics method filters metrics by both node and metric parameters. When users request specific metrics via ?metric=<name>, they receive filtered Redis-based metrics but ALL dispatcherd metrics, creating inconsistent API behavior. The API documentation explicitly shows metric= as a supported filter parameter.
Remove old dispatcher metrics and patch in new data from local whatever Update test fixture to new dispatcherd version
48c7f74 to
23af6f0
Compare
|


SUMMARY
Still WIP but this has the basics.
Next up: validate that
/api/v2/metrics/can return the new keys, for what those are see:https://github.com/ansible/dispatcherd/blob/main/dispatcherd/service/metrics.py
ISSUE TYPE
COMPONENT NAME
Note
Medium Risk
Adds a runtime HTTP call to the metrics endpoint and changes dispatcher service configuration, which could affect metrics availability/latency if misconfigured or if the
dispatcherdendpoint is unreachable.Overview
/api/v2/metrics/now appends Prometheus output scraped directly from the localdispatcherdmetrics HTTP endpoint (via new_get_dispatcherd_metrics), while keeping the existing Redis-backed subsystem metrics generation.Dispatcher-related metric keys are removed from
DispatcherMetrics.METRICSLISTand filtering is optimized so the HTTP scrape is skipped whennodeexcludes the local host or when the requestedmetricset doesn’t overlap dispatcher metrics; failures/timeouts are swallowed with debug logging.dispatcherdservice config is updated to passmetrics_kwargs(host/port) fromMETRICS_SUBSYSTEM_CONFIG, test-mode broker naming is updated, functional tests cover the new node/metric filter behavior, and thedispatcherddependency is bumped to2026.01.27.Written by Cursor Bugbot for commit 23af6f0. This will update automatically on new commits. Configure here.