Commits

Commits on Jan 29, 2026

Commits on Jan 28, 2026

Add validation for `sync_ref_model` in `GRPOTrainer` and `RLOOTrainer` when using PEFT models (#4912)

qgallouedec
and
albertvillanova
authored
Update learning rate comments and add assertions for reference model parameters in GRPO and RLOO tests (#4914)

qgallouedec
and
albertvillanova
authored
Remove chat template setup in dpo_vlm.py (#4906)
qgallouedec
authored
Fix extra EOS appended in DPO preprocessing for conversational data (#4908)

qgallouedec
and
albertvillanova
authored
Fix CI ValueError for 0 temperature (#4916)
albertvillanova
authored
Fix CI AssertionError: assert not True (#4921)
albertvillanova
authored
docs: add DoRA (2402.09353) to Paper Index (#4892)

billycrapediem
and
qgallouedec
authored
Remove gradient checkpointing option from various training scripts (#4905)
qgallouedec
authored
Comment about overriding prediction_step in GRPOTrainer and RLOOTrainer (#4913)
qgallouedec
authored
`device_map` init consistency in GRPO/RLOO/KTO (#4909)
qgallouedec
authored
Fix help text formatting for `max_length` in `RewardConfig` and `SFTConfig` (#4910)
qgallouedec
authored
Rearrange variable assignments in `DataCollatorForVisionLanguageModeling` (#4911)
qgallouedec
authored
Fix CI TypeError in llm-blender tests (#4919)
albertvillanova
authored
Created new PTT integration docs as requested (#4907)

authored

Commits on Jan 27, 2026

Commits on Jan 26, 2026

Commits on Jan 23, 2026

Fix RewardTrainer's results not reproducible (#4887)

liyc-ai
and
qgallouedec
authored

Commits on Jan 22, 2026

Commits on Jan 21, 2026