-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
fix(rpc): Handle organization slug collision gracefully during slug update #107053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| # Slug collision - raise a specific exception that can be caught | ||
| raise RpcSlugCollisionException( | ||
| self.service_name, self.method_name, "Organization slug already in use" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
409 handling unreachable in test environment
High Severity
The new 409 status code handling that raises RpcSlugCollisionException is unreachable in test environments. The if in_test_environment(): block (lines 613-621) always raises a generic RpcRemoteException for any non-500 status code before reaching the 409 check. This means _control_based_slug_change in provisioning.py, which expects to catch RpcSlugCollisionException, will instead receive an uncaught generic exception in tests.
| ) | ||
| except RpcSlugCollisionException: | ||
| raise OrganizationSlugCollisionException( | ||
| f"Organization slug '{slug}' is already in use" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local call path misses slug collision exception
High Severity
In monolith or control silo mode, the local implementation is called directly without going through the RPC endpoint. When a slug collision occurs, SlugCollisionRpcException is raised by impl.py, but _control_based_slug_change catches RpcSlugCollisionException instead. These are unrelated exception classes with no inheritance relationship, so the exception propagates uncaught rather than being translated to OrganizationSlugCollisionException.
Additional Locations (1)
| raise RpcSlugCollisionException( | ||
| self.service_name, self.method_name, "Organization slug already in use" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can generically assign all 409 response codes to slug collisions. I think we'd need a more general solution for getting exception messages out to handle this scenario though.
This PR addresses an issue where updating an organization's slug could result in a generic
RpcRemoteException: Invalid service requestif the desired slug was already reserved by another organization in the control silo.Root Cause:
The original flow involved the region silo checking for slug availability against its local
sentry_organizationtable. If no local conflict was found, it would proceed to call the control silo'supdate_organization_slugRPC. The control silo would then attempt to insert aTEMPORARY_RENAME_ALIASinto its globalOrganizationSlugReservationtable. If another organization had already reserved that slug, thisINSERTwould fail with anIntegrityError(unique constraint violation). ThisIntegrityErrorwas caught by a generic exception handler in the RPC endpoint, which converted it into aValidationErrorand returned a 400 status, leading to the unhelpfulRpcRemoteExceptionon the region side.Solution:
src/sentry/hybridcloud/services/control_organization_provisioning/impl.py): Theupdate_organization_slugmethod now explicitly catchesIntegrityErrorwhen attempting to save theOrganizationSlugReservation. If the error is due to a unique constraint violation on the slug, it raises a new, specificSlugCollisionRpcException.src/sentry/api/endpoints/internal/rpc.py): The internal RPC endpoint now specifically catchesSlugCollisionRpcException. Instead of converting it to a genericValidationError(400), it returns an HTTP 409 Conflict status with a descriptive payload ({"detail": "slug_collision", "message": "Organization slug already in use"}).src/sentry/hybridcloud/rpc/service.py): The_raise_from_response_status_errormethod now checks for a 409 status code. If detected, it raises a newRpcSlugCollisionException(a subclass ofRpcRemoteException) on the client side.src/sentry/services/organization/provisioning.py): The_control_based_slug_changemethod now catches theRpcSlugCollisionExceptionand re-raises it as the existingOrganizationSlugCollisionException.src/sentry/core/endpoints/organization_details.py): This endpoint already has atry/except OrganizationSlugCollisionExceptionblock that returns a 409 Conflict with the user-friendly message "An organization with this slug already exists."This change ensures that slug collision errors are properly identified and propagated through the hybrid cloud RPC system, providing a clear and actionable error message to the user.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.