Skip to content

Add rebalance advisoryPartitionSizeInBytes config#7531

Open
ulysses-you wants to merge 3 commits into
apache:masterfrom
ulysses-you:rebalance
Open

Add rebalance advisoryPartitionSizeInBytes config#7531
ulysses-you wants to merge 3 commits into
apache:masterfrom
ulysses-you:rebalance

Conversation

@ulysses-you

Copy link
Copy Markdown
Contributor

Why are the changes needed?

Starting from Spark 3.5, the RebalancePartitions operator supports an optAdvisoryPartitionSize parameter to specify the desired partition size after shuffle. This allows Kyuubi to directly control the target output size of the rebalance shuffle, helping AQE's CoalesceShufflePartitions to produce appropriately sized output files and avoid small file issues.

Currently, RebalanceBeforeWriting inserts RebalancePartitions without any advisory size, leaving AQE to use the global spark.sql.adaptive.advisoryPartitionSizeInBytes which may not be suitable for the final write stage.

This patch introduces a new Kyuubi configuration key spark.sql.adaptive.rebalancePartitionsAdvisoryPartitionSizeInBytes that takes precedence over the final-stage config spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes as a fallback.

How was this patch tested?

Added 5 unit tests in RebalanceBeforeWritingSuite covering all scenarios:

  • Final-stage config takes effect
  • Kyuubi config fallback
  • Kyuubi config takes precedence over final-stage when both are set
  • Neither config is set (optAdvisoryPartitionSize is None)
  • Raw getAdvisoryPartitionSize(conf) config resolution

Was this patch authored or co-authored using generative AI tooling?

Claude Code (Anthropic Claude Opus 4.7)

@wForget wForget left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ulysses-you .

Slightly off topic, but do you think it would be possible to introduce a REBALANCE_WITH_SIZE hint in upstream spark? That would provide a simple way to control small file generation when the kyuubi spark extension is not installed.


def getAdvisoryPartitionSize(conf: SQLConf): Option[Long] = {
conf.getConf(REBALANCE_PARTITIONS_ADVISORY_PARTITION_SIZE).orElse {
if (conf.contains(FINAL_STAGE_ADVISORY_PARTITION_SIZE_KEY)) {

@wForget wForget Jul 1, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also check spark.sql.optimizer.finalStageConfigIsolation.enabled

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, the finalStage related configs should only appear with finalStage enabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

@wForget

wForget commented Jul 1, 2026

Copy link
Copy Markdown
Member
@ulysses-you

Copy link
Copy Markdown
Contributor Author

do you think it would be possible to introduce a REBALANCE_WITH_SIZE hint in upstream spark?

@wForget good idea, I think it's a useful hint

@github-actions github-actions Bot added the kind:documentation Documentation is a feature! label Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

3 participants