[Data] Get rid of generators to avoid intermediate state pinning #60598

alexeykudinkin · 2026-01-30T04:17:46Z

Description

I’ve realized that for fused Map transforms we’re holding a whole stack of intermediate results (batches) simply due to how yield works in Python:

When method yields all of its frame state (local vars) is preserved, pinning all of its intermediate state till the next iteration and not releasing it.
This is in contrast with the pure Iterator.__next__ method, returning from which, stack frame with all of its intermediate state is destroyed.

While this is not an issue most of the time, it's a big problem in cases when multiple Maps are fused:

With multiple operators & corresponding transformations being fused
Intermediate state along with inputs and outputs of each one are pinned until the next iteration
Total size of required heap memory scales up proportionally to the # of operators fused (ie more operators more heap)
This is exacerbated by the fact that now batch_size is None by default meaning that the whole block is an input and an output substantially increasing memory requirements.

Consider following example:

Generator Chain (Problem)                                                                                                                                                     
                                                                                                                                                                                
  ┌─────────────────────────────────────────────────────────────────────────────┐                                                                                               
  │  Generator A                                                                │                                                                                               
  │  ┌────────────────────────────────────────────────────────────────────────┐ │                                                                                               
  │  │  def transform_a(inputs):                                              │ │                                                                                               
  │  │      for batch in inputs:           ◄─── suspended at yield            │ │                                                                                               
  │  │          result = process(batch)         `batch` PINNED in frame       │ │                                                                                               
  │  │          yield result               ◄─── `result` PINNED in frame      │ │                                                                                               
  │  └────────────────────────────────────────────────────────────────────────┘ │                                                                                               
  │                           │                                                 │                                                                                               
  │                           ▼                                                 │                                                                                               
  │  ┌────────────────────────────────────────────────────────────────────────┐ │                                                                                               
  │  │  def transform_b(inputs):                                              │ │                                                                                               
  │  │      for batch in inputs:           ◄─── suspended at yield            │ │                                                                                               
  │  │          result = process(batch)         `batch` PINNED (output of A)  │ │                                                                                               
  │  │          yield result               ◄─── `result` PINNED in frame      │ │                                                                                               
  │  └────────────────────────────────────────────────────────────────────────┘ │                                                                                               
  │                           │                                                 │                                                                                               
  │                           ▼                                                 │                                                                                               
  │  ┌────────────────────────────────────────────────────────────────────────┐ │                                                                                               
  │  │  def transform_c(inputs):                                              │ │                                                                                               
  │  │      for batch in inputs:           ◄─── suspended at yield            │ │                                                                                               
  │  │          result = process(batch)         `batch` PINNED (output of B)  │ │                                                                                               
  │  │          yield result               ◄─── `result` PINNED in frame      │ │                                                                                               
  │  └────────────────────────────────────────────────────────────────────────┘ │                                                                                               
  │                           │                                                 │                                                                                               
  │                           ▼                                                 │                                                                                               
  │                      to consumer                                            │                                                                                               
  └─────────────────────────────────────────────────────────────────────────────┘                                                                                               
                                                                                                                                                                                
  Memory at yield point:                                                                                                                                                        
  ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐                                                                                                                 
  │ input   │ A.batch │ A.result│ B.batch │ B.result│ C.batch │ ... ALL PINNED                                                                                                  
  └─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘                                                                                                                 
             ═══════════════════════════════════════════════                                                                                                                    
                      Cannot be GC'd until next iteration                                                                                                                       
                                                                                                                                                                                
  Iterator Chain (Solution)                                                                                                                                                     
                                                                                                                                                                                
  ┌─────────────────────────────────────────────────────────────────────────────┐                                                                                               
  │  Iterator A                                                                 │                                                                                               
  │  ┌────────────────────────────────────────────────────────────────────────┐ │                                                                                               
  │  │  def __next__(self):                                                   │ │                                                                                               
  │  │      batch = next(self._input)      # local var                        │ │                                                                                               
  │  │      result = process(batch)        # local var                        │ │                                                                                               
  │  │      return result                  ◄─── method RETURNS                │ │                                                                                               
  │  │                                          locals GO OUT OF SCOPE        │ │                                                                                               
  │  └────────────────────────────────────────────────────────────────────────┘ │                                                                                               
  │                           │                                                 │                                                                                               
  │                           ▼                                                 │                                                                                               
  │  ┌────────────────────────────────────────────────────────────────────────┐ │                                                                                               
  │  │  def __next__(self):                                                   │ │                                                                                               
  │  │      batch = next(self._input)      # local var                        │ │                                                                                               
  │  │      result = process(batch)        # local var                        │ │                                                                                               
  │  │      return result                  ◄─── method RETURNS                │ │                                                                                               
  │  │                                          locals GO OUT OF SCOPE        │ │                                                                                               
  │  └────────────────────────────────────────────────────────────────────────┘ │                                                                                               
  │                           │                                                 │                                                                                               
  │                           ▼                                                 │                                                                                               
  │  ┌────────────────────────────────────────────────────────────────────────┐ │                                                                                               
  │  │  def __next__(self):                                                   │ │                                                                                               
  │  │      batch = next(self._input)      # local var                        │ │                                                                                               
  │  │      result = process(batch)        # local var                        │ │                                                                                               
  │  │      return result                  ◄─── method RETURNS                │ │                                                                                               
  │  │                                          locals GO OUT OF SCOPE        │ │                                                                                               
  │  └────────────────────────────────────────────────────────────────────────┘ │                                                                                               
  │                           │                                                 │                                                                                               
  │                           ▼                                                 │                                                                                               
  │                      to consumer                                            │                                                                                               
  └─────────────────────────────────────────────────────────────────────────────┘                                                                                               
                                                                                                                                                                                
  Memory after return:                                                                                                                                                          
  ┌─────────┬─────────┐                                                                                                                                                         
  │ input   │ output  │  ... ONLY 2 objects pinned                                                                                                                              
  └─────────┴─────────┘                                                                                                                                                         
                                                                                                                                                                                
  All intermediates eligible for GC immediately after each __next__ returns                                                                                                     
                                                                                                                                                                                
  Key Difference                                                                                                                                                                
                                                                                                                                                                                
  GENERATOR                              ITERATOR                                                                                                                               
  ─────────────────────────────────────────────────────────────────                                                                                                             
  yield suspends execution         vs    return completes execution                                                                                                             
  frame stays alive                vs    frame is destroyed                                                                                                                     
  locals pinned until resume       vs    locals released immediately                                                                                                            
                                                                                                                                                                                
             ┌──────────┐                        ┌──────────┐                                                                                                                   
    yield ──►│ SUSPENDED│               return ──►│ COMPLETE │                                                                                                                  
             │  frame   │                        │  frame   │                                                                                                                   
             │  alive   │                        │destroyed │                                                                                                                   
             └──────────┘                        └──────────┘                                                                                                                   
                 │                                    │                                                                                                                         
                 ▼                                    ▼                                                                                                                         
           refs HELD                            refs RELEASED

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

gemini-code-assist

Code Review

This pull request is a solid piece of engineering that refactors the data processing pipeline to replace generator functions with iterator classes. This is a crucial change to prevent potential memory leaks caused by chained generators holding references to intermediate data. The changes are applied consistently across block_batching and map_transformer components. New iterator classes like _BatchingIterator, ShapeBlocksIterator, and _TransformingBatchIterator are introduced to encapsulate the iteration logic previously found in generators. A new test, test_chained_transforms_release_intermediates_between_batches, is added to verify that intermediate object references are correctly released, which is an excellent addition. The overall change is well-executed and improves memory management in Ray Data's critical path.

gemini-code-assist · 2026-01-30T04:20:03Z

python/ray/data/_internal/planner/plan_udf_map_op.py

-                        res = [batch]
+                    out_batch = next(self._cur_output_iter)
+                except StopIteration:
+                    pass


For improved clarity and robustness, it's better to explicitly reset self._cur_output_iter to None and continue the loop when the iterator is exhausted. This makes the state transition explicit and avoids relying on the iterator being overwritten later in the loop.

Suggested change

pass

self._cur_output_iter = None

continue

python/ray/data/_internal/block_batching/block_batching.py

python/ray/data/tests/test_map_transformer.py

python/ray/data/_internal/block_batching/block_batching.py

python/ray/data/_internal/execution/operators/map_transformer.py

raulchen · 2026-01-30T22:56:34Z

The idea and motivation look reasonable. Have you done any benchmarks on real workloads? E.g., how much memory can we save?

alexeykudinkin · 2026-01-30T23:53:13Z

The idea and motivation look reasonable. Have you done any benchmarks on real workloads? E.g., how much memory can we save?

Not yet. But we can math it out:

Currently we're using per single Map task

block-size (128Mb default) x N (number of fused transformations)

With this change it will be just the block-size