Skip to content

Conversation

@terry1purcell
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #65294

Problem Summary:

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. component/statistics sig/planner SIG: Planner labels Jan 28, 2026
@terry1purcell terry1purcell requested a review from Copilot January 28, 2026 19:04
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aims to fix over-estimation issues in out-of-range row count calculations for index time ranges, addressing issue #65294 where estimates increased from ~52K to ~10M rows (193x). The changes refactor the OutOfRangeRowCount function in histogram.go to better bound maximum estimates and provide more accurate Min/Max estimate ranges.

Changes:

  • Refactored oneValue calculation logic to handle low NDV cases more conservatively
  • Restructured estRows and maxAddedRows calculations with separate logic for skew ratio scenarios
  • Updated test expectations to include MinEst and MaxEst fields in cardinality estimation outputs

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pkg/statistics/histogram.go Refactors OutOfRangeRowCount function with new logic for calculating oneValue, estRows, minEst, and maxAddedRows to bound out-of-range estimates
tests/integrationtest/r/imdbload.result Updates expected row count estimate from 1027.81 to 5283.37 for out-of-range query test case
pkg/planner/cardinality/testdata/cardinality_suite_out.json Adds MinEst and MaxEst fields to test expectations for TestOutOfRangeEstimation cases
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 86.48649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.9959%. Comparing base (32d5e26) to head (acd542e).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #65888        +/-   ##
================================================
+ Coverage   77.7807%   77.9959%   +0.2151%     
================================================
  Files          2001       1922        -79     
  Lines        545541     533946     -11595     
================================================
- Hits         424326     416456      -7870     
+ Misses       119553     117005      -2548     
+ Partials       1662        485      -1177     
Flag Coverage Δ
integration 44.3478% <86.4864%> (-3.8170%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 56.7974% <ø> (ø)
parser ∅ <ø> (∅)
br 48.3575% <ø> (-12.6259%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@terry1purcell terry1purcell changed the title planner: bound the max out of range estimate Jan 28, 2026
@pantheon-ai
Copy link

pantheon-ai bot commented Jan 30, 2026

Hi @terry1purcell,

I noticed you mentioned me, but I couldn't find your Pantheon account linked to your GitHub account. Please connect your GitHub account to Pantheon first, and then try again.

Thank you!

Learn more about Pantheon AI

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 30, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign terry1purcell for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

if float64(histNDV) < outOfRangeBetweenRate {
// If NDV is low, it may no longer be representative of the data since ANALYZE
// was last run. Use a default value against realtimeRowCount.
// If NDV is not representitative, then hg.NotNullCount may not be either.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "representitative" should be "representative".

Suggested change
// If NDV is not representitative, then hg.NotNullCount may not be either.
// If NDV is not representative, then hg.NotNullCount may not be either.
Copilot uses AI. Check for mistakes.
Comment on lines +1197 to +1199
// Multiplying addedRows by 0.5 provides the assumption that 50% "addedRows" are inside
// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this
// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is misleading about how tidb_opt_risk_range_skew_ratio works. When skewRatio > 0, it doesn't simply replace the 0.5 multiplier - instead, it triggers a completely different calculation method using CalculateSkewRatioCounts at line 1227. The comment should clarify that skewRatio > 0 enables a skew-aware estimation strategy that provides min/max bounds, rather than just adjusting the multiplier.

Suggested change
// Multiplying addedRows by 0.5 provides the assumption that 50% "addedRows" are inside
// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this
// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.
// Multiplying addedRows by 0.5 encodes the default assumption that 50% of "addedRows"
// fall inside the histogram range and 50% are out-of-range. The session variable
// `tidb_opt_risk_range_skew_ratio` controls skew-aware out-of-range estimation; when set
// to a positive value it provides a custom risk factor and can enable a skew-sensitive
// estimation strategy that derives more conservative (min/max) bounds.
Copilot uses AI. Check for mistakes.
Comment on lines +1198 to +1204
// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this
// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.
addedRowMultiplier := 0.5
if skewRatio > 0 {
addedRowMultiplier = skewRatio
}
estRows = (addedRows * addedRowMultiplier) * totalPercent
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When skewRatio > 0, the addedRowMultiplier assignment on line 1202 and the estRows calculation on line 1204 are effectively unused because line 1227 recalculates the result using CalculateSkewRatioCounts. The estRows value only affects minEst (line 1225), which is then passed to CalculateSkewRatioCounts. Consider simplifying this logic by either: (1) removing the conditional assignment on lines 1201-1202 since it doesn't meaningfully affect the outcome when skewRatio > 0, or (2) restructuring the code to make it clearer how skewRatio affects the calculation.

Suggested change
// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this
// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.
addedRowMultiplier := 0.5
if skewRatio > 0 {
addedRowMultiplier = skewRatio
}
estRows = (addedRows * addedRowMultiplier) * totalPercent
// the histogram range, and 50% (0.5) are out-of-range. This provides a baseline estimate;
// the session variable `tidb_opt_risk_range_skew_ratio` is taken into account later when
// calculating the final skew-aware row counts.
estRows = (addedRows * 0.5) * totalPercent
Copilot uses AI. Check for mistakes.
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 30, 2026

@terry1purcell: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen acd542e link true /test pull-unit-test-next-gen
idc-jenkins-ci-tidb/unit-test acd542e link true /test unit-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/statistics release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

1 participant