-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix training metrics before and after processing #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces suffix-based labeling to distinguish between training metrics computed before and after AgentLightning's post-processing pipeline. The change enables better tracking of how data filtering and batch adjustments affect the training metrics.
Key changes:
- Adds a custom
compute_data_metricsfunction with suffix parameter support - Implements "_before_processing" and "_after_processing" metric labeling
- Updates metric computation calls to use the new suffixed approach
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
agentlightning/verl/trainer.py
Outdated
| # (1) Dropping prompts that exceed the maximum allowed length. | ||
| # (2) Adjusting the batch size to be a multiple of the mini PPO size. | ||
| # Different suffixes are used to label these two stages accordingly. | ||
| def compute_data_metrics(batch: DataProto, use_critic: bool = True, suffix="") -> Dict[str, Any]: |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix parameter is missing from the docstring. Please add documentation for this parameter explaining its purpose and expected values.
| # Calculate the metrics before processing. Refer to the comments of function `compute_data_metrics` for details. | ||
| metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic, suffix="_before_processing")) | ||
|
|
||
| # after advantages are assinged, we begin to drop (1) long prompt (2) floor to ppo minisize |
Copilot
AI
Oct 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'assinged' to 'assigned'.
| # after advantages are assinged, we begin to drop (1) long prompt (2) floor to ppo minisize | |
| # after advantages are assigned, we begin to drop (1) long prompt (2) floor to ppo minisize |
|
|
||
| # compute training metrics | ||
| metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic)) | ||
| metrics.update(compute_data_metrics(batch=batch, use_critic=self.use_critic, suffix="_after_processing")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this adds new metrics in addition to the original metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previsouly there's only one metric critic/score/mean.
Now it becomes critic_score_mean_before_processing and critic_score_mean_after_processing. critic_score_mean_before_processing is the raw reward mean, and critic_score_mean_after_processing is the reward mean used for training.
The previous metric corresponds to critic_score_mean_after_processing.
agentlightning/verl/trainer.py
Outdated
| # (1) Dropping prompts that exceed the maximum allowed length. | ||
| # (2) Adjusting the batch size to be a multiple of the mini PPO size. | ||
| # Different suffixes are used to label these two stages accordingly. | ||
| def compute_data_metrics(batch: DataProto, use_critic: bool = True, suffix="") -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest suffix: str = ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
Please merge from main as there are CI updates. |
…nto zhiyuhe/fix_training_metrics
|
/ci |
|
🚀 CI Watcher for correlation id-3472526939-mheqrxlo triggered by comment 3472526939
✅ All runs completed. |
We introduce a suffix to distinguish between metrics computed before and after AgentLightning’s post-processing.
"Before" refers to raw reward and advantage values.
"After" refers to values computed following post-processing, which involves:
Different suffixes are used to label these two stages accordingly.
The suffix
_before_processingindicates the raw rewards, returns, and prompt lengths gathered directly from agent traces.In contrast, the suffix
_after_processingrefers to the traces that have been filtered and adjusted for training.