Fix training metrics before and after processing #145

hzy46 · 2025-10-13T04:31:42Z

We introduce a suffix to distinguish between metrics computed before and after AgentLightning’s post-processing.

"Before" refers to raw reward and advantage values.

"After" refers to values computed following post-processing, which involves:

Dropping prompts that exceed the maximum allowed length.
Adjusting the batch size to be a multiple of the mini PPO size.

Different suffixes are used to label these two stages accordingly.

The suffix _before_processing indicates the raw rewards, returns, and prompt lengths gathered directly from agent traces.
In contrast, the suffix _after_processing refers to the traces that have been filtered and adjusted for training.

Copilot

Pull Request Overview

This PR introduces suffix-based labeling to distinguish between training metrics computed before and after AgentLightning's post-processing pipeline. The change enables better tracking of how data filtering and batch adjustments affect the training metrics.

Key changes:

Adds a custom compute_data_metrics function with suffix parameter support
Implements "_before_processing" and "_after_processing" metric labeling
Updates metric computation calls to use the new suffixed approach

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-13T04:32:11Z

agentlightning/verl/trainer.py

+#     (1) Dropping prompts that exceed the maximum allowed length.
+#     (2) Adjusting the batch size to be a multiple of the mini PPO size.
+# Different suffixes are used to label these two stages accordingly.
+def compute_data_metrics(batch: DataProto, use_critic: bool = True, suffix="") -> Dict[str, Any]:


The suffix parameter is missing from the docstring. Please add documentation for this parameter explaining its purpose and expected values.

Copilot · 2025-10-13T04:32:11Z

agentlightning/verl/trainer.py

+            # Calculate the metrics before processing. Refer to the comments of function `compute_data_metrics` for details.
+            metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic, suffix="_before_processing"))
+
            # after advantages are assinged, we begin to drop (1) long prompt (2) floor to ppo minisize


Corrected spelling of 'assinged' to 'assigned'.

Suggested change

# after advantages are assinged, we begin to drop (1) long prompt (2) floor to ppo minisize

# after advantages are assigned, we begin to drop (1) long prompt (2) floor to ppo minisize

ultmaster · 2025-10-13T08:10:52Z

agentlightning/verl/trainer.py


        # compute training metrics
-        metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic))
+        metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic, suffix="_after_processing"))


So this adds new metrics in addition to the original metrics?

Previsouly there's only one metric critic/score/mean.

Now it becomes critic_score_mean_before_processing and critic_score_mean_after_processing. critic_score_mean_before_processing is the raw reward mean, and critic_score_mean_after_processing is the reward mean used for training.

The previous metric corresponds to critic_score_mean_after_processing.

ultmaster · 2025-10-13T08:11:02Z

agentlightning/verl/trainer.py

+#     (1) Dropping prompts that exceed the maximum allowed length.
+#     (2) Adjusting the batch size to be a multiple of the mini PPO size.
+# Different suffixes are used to label these two stages accordingly.
+def compute_data_metrics(batch: DataProto, use_critic: bool = True, suffix="") -> Dict[str, Any]:


Suggest suffix: str = ""

ultmaster · 2025-10-20T03:52:51Z

Please merge from main as there are CI updates.

…nto zhiyuhe/fix_training_metrics

ultmaster · 2025-10-31T10:59:54Z

/ci

github-actions · 2025-10-31T11:00:10Z

🚀 CI Watcher for correlation id-3472526939-mheqrxlo triggered by comment 3472526939
🏃‍♀️ Tracking 2 workflow run(s):

🔴 PR #145 - Label ci-spider - id-3472526939-mheqrxlo — completed/failure
🟢 PR #145 - Label ci-calc-x - id-3472526939-mheqrxlo — completed/success

✅ All runs completed.

fix

d820b15

Copilot AI review requested due to automatic review settings October 13, 2025 04:31

Copilot AI reviewed Oct 13, 2025

View reviewed changes

hzy46 requested a review from ultmaster October 13, 2025 06:08

ultmaster reviewed Oct 13, 2025

View reviewed changes

fix type hint

85d99e4

ultmaster approved these changes Oct 20, 2025

View reviewed changes

ultmaster added ci-spider ci-calc-x labels Oct 22, 2025

Merge branch 'main' of https://github.com/microsoft/agent-lightning i…

0a5a2e5

…nto zhiyuhe/fix_training_metrics

ultmaster merged commit 3ed5e1e into microsoft:main Oct 31, 2025
8 checks passed

totoluo pushed a commit to totoluo/agent-lightning that referenced this pull request Nov 14, 2025

Fix training metrics before and after processing (microsoft#145)

3de9e25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix training metrics before and after processing #145

Fix training metrics before and after processing #145

Uh oh!

hzy46 commented Oct 13, 2025

Copilot AI left a comment

Copilot AI Oct 13, 2025

Copilot AI Oct 13, 2025

ultmaster Oct 13, 2025

hzy46 Oct 20, 2025

ultmaster Oct 13, 2025

hzy46 Oct 20, 2025

ultmaster commented Oct 20, 2025

ultmaster commented Oct 31, 2025

github-actions bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

Labels

2 participants

	# after advantages are assinged, we begin to drop (1) long prompt (2) floor to ppo minisize
	# after advantages are assigned, we begin to drop (1) long prompt (2) floor to ppo minisize

Fix training metrics before and after processing #145

Fix training metrics before and after processing #145

Uh oh!

Conversation

hzy46 commented Oct 13, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

ultmaster Oct 13, 2025

Choose a reason for hiding this comment

hzy46 Oct 20, 2025

Choose a reason for hiding this comment

ultmaster Oct 13, 2025

Choose a reason for hiding this comment

hzy46 Oct 20, 2025

Choose a reason for hiding this comment

ultmaster commented Oct 20, 2025

ultmaster commented Oct 31, 2025

github-actions bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Labels

2 participants

github-actions bot commented Oct 31, 2025 •

edited

Loading