-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
impr(workflow engine): Add caching for detector lookup by data source #106085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
saponifi3d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if it'd help to hop on a call / discuss any of these comments too! this whole area is pretty confusing, even for those with context. haha
src/sentry/workflow_engine/endpoints/validators/base/detector.py
Outdated
Show resolved
Hide resolved
src/sentry/workflow_engine/endpoints/validators/base/detector.py
Outdated
Show resolved
Hide resolved
saponifi3d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you on flattening the cache stuff and pulling it off of the detector model, i would recommend it as it will allow us to DRY up the code around purging / updating the cache.
saponifi3d
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love the change with the transaction!
mind adding some tests to make sure all the invalidation stuff is working as expected?
One specific test case i'd like to include is changing a detector's enabled attribute to false and save that to see if it correctly invalidates the cache. (this is a code path that exists with Uptime, they have some code that will disable a detector based on billing information).
Another test case would be on the detector endpoint; have a hot cache then trigger a successful PUT / POST, and then ensure the cache is correctly invalidated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| This will also prefetch all the subsequent data models for evaluating the detector. | ||
| """ | ||
| return list( | ||
| Detector.objects.filter( | ||
| enabled=True, data_sources__source_id=source_id, data_sources__type=query_type | ||
| cache_key = get_detectors_by_data_source_cache_key(source_id, query_type) | ||
| detectors = cache.get(cache_key) | ||
| if detectors is None: | ||
| detectors = list( | ||
| Detector.objects.filter( | ||
| data_sources__source_id=source_id, | ||
| data_sources__type=query_type, | ||
| enabled=True, | ||
| ) | ||
| .select_related("workflow_condition_group") | ||
| .prefetch_related("workflow_condition_group__conditions") | ||
| .distinct() | ||
| .order_by("id") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache storage mechanism doesn't have a version or cache invalidation strategy for the Detector objects themselves. While individual cache entries are invalidated correctly by invalidate_detectors_by_data_source_cache(), there's a risk that stale Detector model instances (with outdated attributes like name, enabled, etc.) could be cached and returned. Consider implementing cache versioning or model change hooks (e.g., via Django signals on Detector.save()) to ensure consistency, especially since detector attributes can be modified directly via .update() calls without always triggering the cache invalidation callbacks.
Severity: MEDIUM
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/workflow_engine/processors/data_source.py#L14-L30
Potential issue: The cache storage mechanism doesn't have a version or cache
invalidation strategy for the `Detector` objects themselves. While individual cache
entries are invalidated correctly by `invalidate_detectors_by_data_source_cache()`,
there's a risk that stale `Detector` model instances (with outdated attributes like
`name`, `enabled`, etc.) could be cached and returned. Consider implementing cache
versioning or model change hooks (e.g., via Django signals on Detector.save()) to ensure
consistency, especially since detector attributes can be modified directly via
`.update()` calls without always triggering the cache invalidation callbacks.
Did we get this right? 👍 / 👎 to inform future reviews.
| def delete_uptime_detector(detector: Detector): | ||
| uptime_subscription = get_uptime_subscription(detector) | ||
|
|
||
| # Capture data source info before any state changes | ||
| data_sources = list(detector.data_sources.values_list("source_id", "type")) | ||
|
|
||
| remove_uptime_seat(detector) | ||
| detector.update(status=ObjectStatus.PENDING_DELETION) | ||
| RegionScheduledDeletion.schedule(detector, days=0) | ||
| delete_uptime_subscription(uptime_subscription) | ||
|
|
||
| for source_id, source_type in data_sources: | ||
| invalidate_detectors_by_data_source_cache(source_id, source_type) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In delete_uptime_detector() at line 585-598, the cache is invalidated synchronously (not via transaction.on_commit()). This differs from all other functions in this file that use transaction.on_commit(). If the subsequent delete_uptime_subscription() or RegionScheduledDeletion.schedule() calls modify database state, the synchronous invalidation could occur before the transaction commits, creating a brief window where the cache is cleared but the detector still exists in the database. For consistency and safety, this should use transaction.on_commit() like the other functions.
Severity: HIGH
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/uptime/subscriptions/subscriptions.py#L585-L598
Potential issue: In `delete_uptime_detector()` at line 585-598, the cache is invalidated
synchronously (not via `transaction.on_commit()`). This differs from all other functions
in this file that use `transaction.on_commit()`. If the subsequent
`delete_uptime_subscription()` or `RegionScheduledDeletion.schedule()` calls modify
database state, the synchronous invalidation could occur before the transaction commits,
creating a brief window where the cache is cleared but the detector still exists in the
database. For consistency and safety, this should use `transaction.on_commit()` like the
other functions.
Did we get this right? 👍 / 👎 to inform future reviews.
|
|
||
| instance.save() | ||
|
|
||
| self._invalidate_cache_by_detector(instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _invalidate_cache_by_detector() method (lines 145-149) is called in both update() (line 214) and delete() (line 236). However, in update() it's called INSIDE the atomic transaction block, while in other locations like uptime subscriptions it's called via transaction.on_commit(). This inconsistency could cause cache invalidation to occur before the transaction commits. For consistency with other parts of the codebase and to avoid race conditions, consider wrapping the _invalidate_cache_by_detector() call in update() with a transaction.on_commit() callback instead of calling it synchronously.
Severity: HIGH
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/workflow_engine/endpoints/validators/base/detector.py#L145-L214
Potential issue: The `_invalidate_cache_by_detector()` method (lines 145-149) is called
in both `update()` (line 214) and `delete()` (line 236). However, in `update()` it's
called INSIDE the atomic transaction block, while in other locations like uptime
subscriptions it's called via `transaction.on_commit()`. This inconsistency could cause
cache invalidation to occur before the transaction commits. For consistency with other
parts of the codebase and to avoid race conditions, consider wrapping the
`_invalidate_cache_by_detector()` call in `update()` with a `transaction.on_commit()`
callback instead of calling it synchronously.
Did we get this right? 👍 / 👎 to inform future reviews.
| detector = alert_rule_detector.detector | ||
| detector.update(enabled=is_enabled) | ||
|
|
||
| # Invalidate cache after transaction commits (signal may be called within a transaction) | ||
| data_sources = list(detector.data_sources.values_list("source_id", "type")) | ||
|
|
||
| def invalidate_cache(): | ||
| for source_id, source_type in data_sources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In _update_workflow_engine_models() at line 23-30, the code defines a nested function invalidate_cache() inside a closure and then schedules it via transaction.on_commit(). This is correct, but ensure the signal handler context doesn't already have an active transaction that might not commit as expected. Django signals can be tricky with transactions. Document or verify that this signal is always called within a request-response cycle where transaction behavior is predictable.
Severity: MEDIUM
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/sentry/receivers/rule_snooze.py#L23-L30
Potential issue: In `_update_workflow_engine_models()` at line 23-30, the code defines a
nested function `invalidate_cache()` inside a closure and then schedules it via
`transaction.on_commit()`. This is correct, but ensure the signal handler context
doesn't already have an active transaction that might not commit as expected. Django
signals can be tricky with transactions. Document or verify that this signal is always
called within a request-response cycle where transaction behavior is predictable.
Did we get this right? 👍 / 👎 to inform future reviews.
| from sentry.workflow_engine.processors.data_source import bulk_fetch_enabled_detectors | ||
|
|
||
| data_source = self.detector.data_sources.first() | ||
| assert data_source is not None | ||
|
|
||
| # Warm the cache | ||
| cached_detectors = bulk_fetch_enabled_detectors(data_source.source_id, data_source.type) | ||
| assert len(cached_detectors) == 1 | ||
| assert cached_detectors[0].id == self.detector.id | ||
|
|
||
| valid_data = { | ||
| "id": self.detector.id, | ||
| "projectId": self.project.id, | ||
| "name": "Updated Detector Name", | ||
| "type": MetricIssue.slug, | ||
| "dateCreated": self.detector.date_added, | ||
| "dateUpdated": timezone.now(), | ||
| "conditionGroup": { | ||
| "id": self.data_condition_group.id, | ||
| "organizationId": self.organization.id, | ||
| "logicType": self.data_condition_group.logic_type, | ||
| "conditions": [ | ||
| { | ||
| "id": self.condition.id, | ||
| "comparison": 100, | ||
| "type": Condition.GREATER, | ||
| "conditionResult": DetectorPriorityLevel.HIGH, | ||
| "conditionGroupId": self.data_condition_group.id, | ||
| }, | ||
| { | ||
| "id": self.resolve_condition.id, | ||
| "comparison": 100, | ||
| "type": Condition.LESS_OR_EQUAL, | ||
| "conditionResult": DetectorPriorityLevel.OK, | ||
| "conditionGroupId": self.data_condition_group.id, | ||
| }, | ||
| ], | ||
| }, | ||
| "config": self.detector.config, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In test_put_invalidates_cache() at line 1071-1110, the test verifies that the cache is invalidated after a PUT request. However, it only tests one data source. If a detector has multiple data sources (allowed by the system depending on configuration), ensure the cache is invalidated for ALL of them. Consider adding a test case with multiple data sources to verify comprehensive coverage.
Severity: MEDIUM
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location:
tests/sentry/workflow_engine/endpoints/test_organization_detector_details.py#L1071-L1110
Potential issue: In `test_put_invalidates_cache()` at line 1071-1110, the test verifies
that the cache is invalidated after a PUT request. However, it only tests one data
source. If a detector has multiple data sources (allowed by the system depending on
configuration), ensure the cache is invalidated for ALL of them. Consider adding a test
case with multiple data sources to verify comprehensive coverage.
Did we get this right? 👍 / 👎 to inform future reviews.
Okay basically we're caching the bulk query and adapting the single detector query in the subscription processor work with that.