planner: bound the max out of range estimate | tidb-test=pr/2672 #65888

terry1purcell · 2026-01-28T18:53:37Z

What problem does this PR solve?

Issue Number: close #65294

Problem Summary:

What changed and how does it work?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copilot

Pull request overview

This pull request aims to fix over-estimation issues in out-of-range row count calculations for index time ranges, addressing issue #65294 where estimates increased from ~52K to ~10M rows (193x). The changes refactor the OutOfRangeRowCount function in histogram.go to better bound maximum estimates and provide more accurate Min/Max estimate ranges.

Changes:

Refactored oneValue calculation logic to handle low NDV cases more conservatively
Restructured estRows and maxAddedRows calculations with separate logic for skew ratio scenarios
Updated test expectations to include MinEst and MaxEst fields in cardinality estimation outputs

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
pkg/statistics/histogram.go	Refactors OutOfRangeRowCount function with new logic for calculating oneValue, estRows, minEst, and maxAddedRows to bound out-of-range estimates
tests/integrationtest/r/imdbload.result	Updates expected row count estimate from 1027.81 to 5283.37 for out-of-range query test case
pkg/planner/cardinality/testdata/cardinality_suite_out.json	Adds MinEst and MaxEst fields to test expectations for TestOutOfRangeEstimation cases

pkg/statistics/histogram.go

codecov · 2026-01-28T19:21:21Z

Codecov Report

❌ Patch coverage is 86.48649% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.9959%. Comparing base (32d5e26) to head (acd542e).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #65888        +/-   ##
================================================
+ Coverage   77.7807%   77.9959%   +0.2151%     
================================================
  Files          2001       1922        -79     
  Lines        545541     533946     -11595     
================================================
- Hits         424326     416456      -7870     
+ Misses       119553     117005      -2548     
+ Partials       1662        485      -1177

Flag	Coverage Δ
integration	`44.3478% <86.4864%> (-3.8170%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`56.7974% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`48.3575% <ø> (-12.6259%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

pantheon-ai · 2026-01-30T04:33:34Z

Hi @terry1purcell,

I noticed you mentioned me, but I couldn't find your Pantheon account linked to your GitHub account. Please connect your GitHub account to Pantheon first, and then try again.

Thank you!

_{Learn more about Pantheon AI}

ti-chi-bot · 2026-01-30T04:33:37Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign terry1purcell for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Copilot · 2026-01-30T04:49:24Z

pkg/statistics/histogram.go

+	if float64(histNDV) < outOfRangeBetweenRate {
+		// If NDV is low, it may no longer be representative of the data since ANALYZE
+		// was last run. Use a default value against realtimeRowCount.
+		// If NDV is not representitative, then hg.NotNullCount may not be either.


Spelling error: "representitative" should be "representative".

Suggested change

// If NDV is not representitative, then hg.NotNullCount may not be either.

// If NDV is not representative, then hg.NotNullCount may not be either.

Copilot · 2026-01-30T04:49:24Z

pkg/statistics/histogram.go

+		// Multiplying addedRows by 0.5 provides the assumption that 50% "addedRows" are inside
+		// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this
+		// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.


The comment is misleading about how tidb_opt_risk_range_skew_ratio works. When skewRatio > 0, it doesn't simply replace the 0.5 multiplier - instead, it triggers a completely different calculation method using CalculateSkewRatioCounts at line 1227. The comment should clarify that skewRatio > 0 enables a skew-aware estimation strategy that provides min/max bounds, rather than just adjusting the multiplier.

Suggested change

// Multiplying addedRows by 0.5 provides the assumption that 50% "addedRows" are inside

// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this

// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.

// Multiplying addedRows by 0.5 encodes the default assumption that 50% of "addedRows"

// fall inside the histogram range and 50% are out-of-range. The session variable

// `tidb_opt_risk_range_skew_ratio` controls skew-aware out-of-range estimation; when set

// to a positive value it provides a custom risk factor and can enable a skew-sensitive

// estimation strategy that derives more conservative (min/max) bounds.

Copilot · 2026-01-30T04:49:24Z

pkg/statistics/histogram.go

+		// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this
+		// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.
+		addedRowMultiplier := 0.5
+		if skewRatio > 0 {
+			addedRowMultiplier = skewRatio
+		}
+		estRows = (addedRows * addedRowMultiplier) * totalPercent


When skewRatio > 0, the addedRowMultiplier assignment on line 1202 and the estRows calculation on line 1204 are effectively unused because line 1227 recalculates the result using CalculateSkewRatioCounts. The estRows value only affects minEst (line 1225), which is then passed to CalculateSkewRatioCounts. Consider simplifying this logic by either: (1) removing the conditional assignment on lines 1201-1202 since it doesn't meaningfully affect the outcome when skewRatio > 0, or (2) restructuring the code to make it clearer how skewRatio affects the calculation.

Suggested change

// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this

// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.

addedRowMultiplier := 0.5

if skewRatio > 0 {

addedRowMultiplier = skewRatio

}

estRows = (addedRows * addedRowMultiplier) * totalPercent

// the histogram range, and 50% (0.5) are out-of-range. This provides a baseline estimate;

// the session variable `tidb_opt_risk_range_skew_ratio` is taken into account later when

// calculating the final skew-aware row counts.

estRows = (addedRows * 0.5) * totalPercent

ti-chi-bot · 2026-01-30T05:00:07Z

@terry1purcell: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-unit-test-next-gen	`acd542e`	link	true	`/test pull-unit-test-next-gen`
idc-jenkins-ci-tidb/unit-test	`acd542e`	link	true	`/test unit-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

planner: bound the max out of range estimate

814e405

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. component/statistics sig/planner SIG: Planner labels Jan 28, 2026

terry1purcell requested a review from Copilot January 28, 2026 19:04

Copilot started reviewing on behalf of terry1purcell January 28, 2026 19:04 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

pkg/statistics/histogram.go Show resolved Hide resolved

pkg/statistics/histogram.go Show resolved Hide resolved

pkg/statistics/histogram.go Outdated Show resolved Hide resolved

terry1purcell changed the title ~~planner: bound the max out of range estimate~~ Jan 28, 2026

terry1purcell and others added 4 commits January 28, 2026 12:21

testcase1

e54196b

Merge branch 'pingcap:master' into outofrangeupper

fd2bba7

Merge branch 'master' into outofrangeupper

e3473fd

testcase updates after refactor

acd542e

terry1purcell requested a review from Copilot January 30, 2026 04:38

Copilot started reviewing on behalf of terry1purcell January 30, 2026 04:39 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner: bound the max out of range estimate | tidb-test=pr/2672 #65888

planner: bound the max out of range estimate | tidb-test=pr/2672 #65888

terry1purcell commented Jan 28, 2026

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 28, 2026 •

edited

Loading

pantheon-ai bot commented Jan 30, 2026

ti-chi-bot bot commented Jan 30, 2026

Copilot AI left a comment

Copilot AI Jan 30, 2026

Copilot AI Jan 30, 2026

Copilot AI Jan 30, 2026

ti-chi-bot bot commented Jan 30, 2026

Labels

1 participant

	// If NDV is not representitative, then hg.NotNullCount may not be either.
	// If NDV is not representative, then hg.NotNullCount may not be either.

-		// Multiplying addedRows by 0.5 provides the assumption that 50% "addedRows" are inside
-		// the histogram range, and 50% (0.5) are out-of-range. Users can adjust this
-		// magic number by setting the session variable `tidb_opt_risk_range_skew_ratio`.
+		// Multiplying addedRows by 0.5 encodes the default assumption that 50% of "addedRows"
+		// fall inside the histogram range and 50% are out-of-range. The session variable
+		// `tidb_opt_risk_range_skew_ratio` controls skew-aware out-of-range estimation; when set
+		// to a positive value it provides a custom risk factor and can enable a skew-sensitive
+		// estimation strategy that derives more conservative (min/max) bounds.

planner: bound the max out of range estimate | tidb-test=pr/2672 #65888

Are you sure you want to change the base?

planner: bound the max out of range estimate | tidb-test=pr/2672 #65888

Conversation

terry1purcell commented Jan 28, 2026

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

pantheon-ai bot commented Jan 30, 2026

ti-chi-bot bot commented Jan 30, 2026

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

ti-chi-bot bot commented Jan 30, 2026

Labels

1 participant

codecov bot commented Jan 28, 2026 •

edited

Loading