Add rebalance advisoryPartitionSizeInBytes config#7531
Conversation
wForget
left a comment
There was a problem hiding this comment.
Thanks @ulysses-you .
Slightly off topic, but do you think it would be possible to introduce a REBALANCE_WITH_SIZE hint in upstream spark? That would provide a simple way to control small file generation when the kyuubi spark extension is not installed.
|
|
||
| def getAdvisoryPartitionSize(conf: SQLConf): Option[Long] = { | ||
| conf.getConf(REBALANCE_PARTITIONS_ADVISORY_PARTITION_SIZE).orElse { | ||
| if (conf.contains(FINAL_STAGE_ADVISORY_PARTITION_SIZE_KEY)) { |
There was a problem hiding this comment.
should also check spark.sql.optimizer.finalStageConfigIsolation.enabled
There was a problem hiding this comment.
In general, the finalStage related configs should only appear with finalStage enabled.
|
We may also need to update the documentation here: https://github.com/apache/kyuubi/blob/master/docs/extensions/engines/spark/rules.md |
@wForget good idea, I think it's a useful hint |
Why are the changes needed?
Starting from Spark 3.5, the RebalancePartitions operator supports an optAdvisoryPartitionSize parameter to specify the desired partition size after shuffle. This allows Kyuubi to directly control the target output size of the rebalance shuffle, helping AQE's CoalesceShufflePartitions to produce appropriately sized output files and avoid small file issues.
Currently, RebalanceBeforeWriting inserts RebalancePartitions without any advisory size, leaving AQE to use the global spark.sql.adaptive.advisoryPartitionSizeInBytes which may not be suitable for the final write stage.
This patch introduces a new Kyuubi configuration key spark.sql.adaptive.rebalancePartitionsAdvisoryPartitionSizeInBytes that takes precedence over the final-stage config spark.sql.finalStage.adaptive.advisoryPartitionSizeInBytes as a fallback.
How was this patch tested?
Added 5 unit tests in RebalanceBeforeWritingSuite covering all scenarios:
Was this patch authored or co-authored using generative AI tooling?
Claude Code (Anthropic Claude Opus 4.7)