-
Notifications
You must be signed in to change notification settings - Fork 6.1k
planner: bound the max out of range estimate | tidb-test=pr/2672 #65888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request aims to fix over-estimation issues in out-of-range row count calculations for index time ranges, addressing issue #65294 where estimates increased from ~52K to ~10M rows (193x). The changes refactor the OutOfRangeRowCount function in histogram.go to better bound maximum estimates and provide more accurate Min/Max estimate ranges.
Changes:
- Refactored
oneValuecalculation logic to handle low NDV cases more conservatively - Restructured
estRowsandmaxAddedRowscalculations with separate logic for skew ratio scenarios - Updated test expectations to include MinEst and MaxEst fields in cardinality estimation outputs
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/statistics/histogram.go | Refactors OutOfRangeRowCount function with new logic for calculating oneValue, estRows, minEst, and maxAddedRows to bound out-of-range estimates |
| tests/integrationtest/r/imdbload.result | Updates expected row count estimate from 1027.81 to 5283.37 for out-of-range query test case |
| pkg/planner/cardinality/testdata/cardinality_suite_out.json | Adds MinEst and MaxEst fields to test expectations for TestOutOfRangeEstimation cases |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #65888 +/- ##
================================================
+ Coverage 77.7807% 77.9959% +0.2151%
================================================
Files 2001 1922 -79
Lines 545541 533946 -11595
================================================
- Hits 424326 416456 -7870
+ Misses 119553 117005 -2548
+ Partials 1662 485 -1177
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Hi @terry1purcell, I noticed you mentioned me, but I couldn't find your Pantheon account linked to your GitHub account. Please connect your GitHub account to Pantheon first, and then try again. Thank you! Learn more about Pantheon AI |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| if float64(histNDV) < outOfRangeBetweenRate { | ||
| // If NDV is low, it may no longer be representative of the data since ANALYZE | ||
| // was last run. Use a default value against realtimeRowCount. | ||
| // If NDV is not representitative, then hg.NotNullCount may not be either. |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "representitative" should be "representative".
| // If NDV is not representitative, then hg.NotNullCount may not be either. | |
| // If NDV is not representative, then hg.NotNullCount may not be either. |
| // Multiplying addedRows by 0.5 provides the assumption that 50% "addedRows" are inside | ||
| // the histogram range, and 50% (0.5) are out-of-range. Users can adjust this | ||
| // magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`. |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is misleading about how tidb_opt_risk_range_skew_ratio works. When skewRatio > 0, it doesn't simply replace the 0.5 multiplier - instead, it triggers a completely different calculation method using CalculateSkewRatioCounts at line 1227. The comment should clarify that skewRatio > 0 enables a skew-aware estimation strategy that provides min/max bounds, rather than just adjusting the multiplier.
| // Multiplying addedRows by 0.5 provides the assumption that 50% "addedRows" are inside | |
| // the histogram range, and 50% (0.5) are out-of-range. Users can adjust this | |
| // magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`. | |
| // Multiplying addedRows by 0.5 encodes the default assumption that 50% of "addedRows" | |
| // fall inside the histogram range and 50% are out-of-range. The session variable | |
| // `tidb_opt_risk_range_skew_ratio` controls skew-aware out-of-range estimation; when set | |
| // to a positive value it provides a custom risk factor and can enable a skew-sensitive | |
| // estimation strategy that derives more conservative (min/max) bounds. |
| // the histogram range, and 50% (0.5) are out-of-range. Users can adjust this | ||
| // magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`. | ||
| addedRowMultiplier := 0.5 | ||
| if skewRatio > 0 { | ||
| addedRowMultiplier = skewRatio | ||
| } | ||
| estRows = (addedRows * addedRowMultiplier) * totalPercent |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When skewRatio > 0, the addedRowMultiplier assignment on line 1202 and the estRows calculation on line 1204 are effectively unused because line 1227 recalculates the result using CalculateSkewRatioCounts. The estRows value only affects minEst (line 1225), which is then passed to CalculateSkewRatioCounts. Consider simplifying this logic by either: (1) removing the conditional assignment on lines 1201-1202 since it doesn't meaningfully affect the outcome when skewRatio > 0, or (2) restructuring the code to make it clearer how skewRatio affects the calculation.
| // the histogram range, and 50% (0.5) are out-of-range. Users can adjust this | |
| // magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`. | |
| addedRowMultiplier := 0.5 | |
| if skewRatio > 0 { | |
| addedRowMultiplier = skewRatio | |
| } | |
| estRows = (addedRows * addedRowMultiplier) * totalPercent | |
| // the histogram range, and 50% (0.5) are out-of-range. This provides a baseline estimate; | |
| // the session variable `tidb_opt_risk_range_skew_ratio` is taken into account later when | |
| // calculating the final skew-aware row counts. | |
| estRows = (addedRows * 0.5) * totalPercent |
|
@terry1purcell: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #65294
Problem Summary:
What changed and how does it work?
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.