Skip to content

Conversation

@hzy46
Copy link
Contributor

@hzy46 hzy46 commented Oct 13, 2025

We introduce a suffix to distinguish between metrics computed before and after AgentLightning’s post-processing.

"Before" refers to raw reward and advantage values.

"After" refers to values computed following post-processing, which involves:

  • Dropping prompts that exceed the maximum allowed length.
  • Adjusting the batch size to be a multiple of the mini PPO size.

Different suffixes are used to label these two stages accordingly.

image

The suffix _before_processing indicates the raw rewards, returns, and prompt lengths gathered directly from agent traces.
In contrast, the suffix _after_processing refers to the traces that have been filtered and adjusted for training.

Copilot AI review requested due to automatic review settings October 13, 2025 04:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces suffix-based labeling to distinguish between training metrics computed before and after AgentLightning's post-processing pipeline. The change enables better tracking of how data filtering and batch adjustments affect the training metrics.

Key changes:

  • Adds a custom compute_data_metrics function with suffix parameter support
  • Implements "_before_processing" and "_after_processing" metric labeling
  • Updates metric computation calls to use the new suffixed approach

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

# (1) Dropping prompts that exceed the maximum allowed length.
# (2) Adjusting the batch size to be a multiple of the mini PPO size.
# Different suffixes are used to label these two stages accordingly.
def compute_data_metrics(batch: DataProto, use_critic: bool = True, suffix="") -> Dict[str, Any]:
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix parameter is missing from the docstring. Please add documentation for this parameter explaining its purpose and expected values.

Copilot uses AI. Check for mistakes.
# Calculate the metrics before processing. Refer to the comments of function `compute_data_metrics` for details.
metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic, suffix="_before_processing"))

# after advantages are assinged, we begin to drop (1) long prompt (2) floor to ppo minisize
Copy link

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'assinged' to 'assigned'.

Suggested change
# after advantages are assinged, we begin to drop (1) long prompt (2) floor to ppo minisize
# after advantages are assigned, we begin to drop (1) long prompt (2) floor to ppo minisize
Copilot uses AI. Check for mistakes.
@hzy46 hzy46 requested a review from ultmaster October 13, 2025 06:08

# compute training metrics
metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic))
metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic, suffix="_after_processing"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this adds new metrics in addition to the original metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previsouly there's only one metric critic/score/mean.

Now it becomes critic_score_mean_before_processing and critic_score_mean_after_processing. critic_score_mean_before_processing is the raw reward mean, and critic_score_mean_after_processing is the reward mean used for training.

The previous metric corresponds to critic_score_mean_after_processing.

# (1) Dropping prompts that exceed the maximum allowed length.
# (2) Adjusting the batch size to be a multiple of the mini PPO size.
# Different suffixes are used to label these two stages accordingly.
def compute_data_metrics(batch: DataProto, use_critic: bool = True, suffix="") -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest suffix: str = ""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ultmaster
Copy link
Contributor

Please merge from main as there are CI updates.

@ultmaster
Copy link
Contributor

/ci

@github-actions
Copy link

github-actions bot commented Oct 31, 2025

🚀 CI Watcher for correlation id-3472526939-mheqrxlo triggered by comment 3472526939
🏃‍♀️ Tracking 2 workflow run(s):

✅ All runs completed.

@ultmaster ultmaster merged commit 3ed5e1e into microsoft:main Oct 31, 2025
8 checks passed
totoluo pushed a commit to totoluo/agent-lightning that referenced this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants