Skip to content

xds: Fix XdsDepManager aggregate cluster child ordering and loop detection #12129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2025

Conversation

ejona86
Copy link
Member

@ejona86 ejona86 commented Jun 4, 2025

The children of aggregate clusters have a priority order, so we can't ever throw them in an ordinary set for later iteration.

This now detects recusion limits only after subscribing, but that matches our existing behavior in CdsLoadBalancer2. We don't get much value detecting the limit before subscribing and doing so makes watcher types more different.

Loops are still a bit broken as they won't be unwatched when orphaned, as they will form a reference loop. In CdsLoadBalancer2, duplicate clusters had duplicate watchers so there was single-ownership and reference cycles couldn't form. Fixing that is a bigger change.

Intermediate aggregate clusters are now included in XdsConfig, just for simplicity. It doesn't hurt anything whether they are present or missing. but it required updates to some tests.

…ction

The children of aggregate clusters have a priority order, so we can't
ever throw them in an ordinary set for later iteration.

This now detects recusion limits only after subscribing, but that
matches our existing behavior in CdsLoadBalancer2. We don't get much
value detecting the limit before subscribing and doing so makes watcher
types more different.

Loops are still a bit broken as they won't be unwatched when orphaned,
as they will form a reference loop. In CdsLoadBalancer2, duplicate
clusters had duplicate watchers so there was single-ownership and
reference cycles couldn't form. Fixing that is a bigger change.

Intermediate aggregate clusters are now included in XdsConfig, just for
simplicity. It doesn't hurt anything whether they are present or
missing. but it required updates to some tests.
@ejona86 ejona86 requested a review from kannanjgithub June 4, 2025 22:05
if (!config.hasValue()) {
clusters.put(clusterName, StatusOr.fromStatus(Status.INTERNAL.withDescription(
"Unable to get leaves for " + clusterName + ": "
+ config.getStatus().getDescription())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we include the childCluster name as well in the error status?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config.getStatus().getDescription() includes the child's name. XdsWatcherBase's onError/onDoesNotExist formats an error message including toContextString(), which propagates that important error information.

I do suspect I will want to change this at some point to leave a reference to the broken cluster, as it appears we previously just discarded such failures. But I'll figure that out in later work as I rewrite CdsLB2, and check the gRFCs to see if they dictated the bahavior.

@ejona86 ejona86 merged commit 4cd7881 into grpc:master Jun 5, 2025
16 checks passed
@ejona86 ejona86 deleted the xdsdepman-aggordering-loops branch June 5, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants