Skip to content
Navigation Menu
Toggle navigation
Sign in
Appearance settings
Platform
AI CODE CREATION
GitHub Copilot
Write better code with AI
GitHub Spark
Build and deploy intelligent apps
GitHub Models
Manage and compare prompts
MCP Registry
New
Integrate external tools
DEVELOPER WORKFLOWS
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
APPLICATION SECURITY
GitHub Advanced Security
Find and fix vulnerabilities
Code security
Secure your code as you build
Secret protection
Stop leaks before they start
EXPLORE
Why GitHub
Documentation
Blog
Changelog
Marketplace
View all features
Solutions
BY COMPANY SIZE
Enterprises
Small and medium teams
Startups
Nonprofits
BY USE CASE
App Modernization
DevSecOps
DevOps
CI/CD
View all use cases
BY INDUSTRY
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
EXPLORE BY TOPIC
AI
Software Development
DevOps
Security
View all topics
EXPLORE BY TYPE
Customer stories
Events & webinars
Ebooks & reports
Business insights
GitHub Skills
SUPPORT & SERVICES
Documentation
Customer support
Community forum
Trust center
Partners
Open Source
COMMUNITY
GitHub Sponsors
Fund open source developers
PROGRAMS
Security Lab
Maintainer Community
Accelerator
Archive Program
REPOSITORIES
Topics
Trending
Collections
Enterprise
ENTERPRISE SOLUTIONS
Enterprise platform
AI-powered developer platform
AVAILABLE ADD-ONS
GitHub Advanced Security
Enterprise-grade security features
Copilot for Business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Appearance settings
Resetting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
huggingface
/
trl
Public
generated from
fastai/nbdev_template
Notifications
You must be signed in to change notification settings
Fork
2.5k
Star
17.2k
Code
Issues
555
Pull requests
96
Discussions
Actions
Projects
0
Security
0
Insights
Additional navigation options
Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights
Commits
Branch selector
main
User selector
All users
Datepicker
All time
Commit History
Commits on Jan 29, 2026
[GRPO] Add parquet logging for completions with individual rewards (#4818)
Show description for 035c3ff
5 people
authored
035c3ff
Copy full SHA for 035c3ff
Set default top_k to 0 in VLLMClient (#4927)
albertvillanova
authored
414e60f
Copy full SHA for 414e60f
Fix import statement for import_utils in vllm_client.py (#4932)
qgallouedec
authored
df332dc
Copy full SHA for df332dc
Fix profiling of VLLMGeneration.sync_weights (#4931)
albertvillanova
authored
27998e9
Copy full SHA for 27998e9
Set model dtype to float32 in experimental tests of trainers (#4925)
albertvillanova
authored
43fb8d3
Copy full SHA for 43fb8d3
Move VLLMClient to generation module (#4928)
albertvillanova
authored
5a7481e
Copy full SHA for 5a7481e
Require transformers<5 with PairRMJudge (#4926)
albertvillanova
authored
21a0d70
Copy full SHA for 21a0d70
Set model dtype to float32 in tests of trainers (#4924)
albertvillanova
authored
4348375
Copy full SHA for 4348375
Support tool call data in `is_conversational` (#4923)
qgallouedec
authored
a6cbf27
Copy full SHA for a6cbf27
Commits on Jan 28, 2026
Add validation for `sync_ref_model` in `GRPOTrainer` and `RLOOTrainer` when using PEFT models (#4912)
Show description for ad91c6f
qgallouedec
and
albertvillanova
authored
ad91c6f
Copy full SHA for ad91c6f
Update learning rate comments and add assertions for reference model parameters in GRPO and RLOO tests (#4914)
Show description for 04717ff
qgallouedec
and
albertvillanova
authored
04717ff
Copy full SHA for 04717ff
Remove chat template setup in dpo_vlm.py (#4906)
qgallouedec
authored
b322d9b
Copy full SHA for b322d9b
Fix extra EOS appended in DPO preprocessing for conversational data (#4908)
Show description for a70b4e0
qgallouedec
and
albertvillanova
authored
a70b4e0
Copy full SHA for a70b4e0
Fix CI ValueError for 0 temperature (#4916)
albertvillanova
authored
8464b0e
Copy full SHA for 8464b0e
Fix CI AssertionError: assert not True (#4921)
albertvillanova
authored
5461a74
Copy full SHA for 5461a74
docs: add DoRA (2402.09353) to Paper Index (#4892)
Show description for d54381a
billycrapediem
and
qgallouedec
authored
d54381a
Copy full SHA for d54381a
Remove gradient checkpointing option from various training scripts (#4905)
qgallouedec
authored
f2f6b32
Copy full SHA for f2f6b32
Comment about overriding prediction_step in GRPOTrainer and RLOOTrainer (#4913)
qgallouedec
authored
6cbc102
Copy full SHA for 6cbc102
`device_map` init consistency in GRPO/RLOO/KTO (#4909)
qgallouedec
authored
f40edf9
Copy full SHA for f40edf9
Fix help text formatting for `max_length` in `RewardConfig` and `SFTConfig` (#4910)
qgallouedec
authored
a7070f9
Copy full SHA for a7070f9
Rearrange variable assignments in `DataCollatorForVisionLanguageModeling` (#4911)
qgallouedec
authored
66efc0e
Copy full SHA for 66efc0e
Fix CI TypeError in llm-blender tests (#4919)
albertvillanova
authored
e9a2f16
Copy full SHA for e9a2f16
Created new PTT integration docs as requested (#4907)
Show description for 4f82320
3 people
authored
4f82320
Copy full SHA for 4f82320
Commits on Jan 27, 2026
Refactor vLLM generation [1/N]: Extract vLLM generation (#4700)
albertvillanova
authored
0eb66d8
Copy full SHA for 0eb66d8
Fix CI AssertionError: Parameter has not changed (#4904)
albertvillanova
authored
226ef57
Copy full SHA for 226ef57
Fix CI NotImplementedError for bfloat16 (#4902)
albertvillanova
authored
956986e
Copy full SHA for 956986e
Commits on Jan 26, 2026
Transformers v5 release: extend xfail condition for `TestGRPOTrainer.test_training_vlm_and_liger` and update version checks (#4898)
qgallouedec
authored
4322778
Copy full SHA for 4322778
GOLD training speed up (#4888)
Show description for e106972
141forever
and
kashif
authored
e106972
Copy full SHA for e106972
Commits on Jan 23, 2026
Fix RewardTrainer's results not reproducible (#4887)
Show description for c477e88
liyc-ai
and
qgallouedec
authored
c477e88
Copy full SHA for c477e88
Commits on Jan 22, 2026
Fix import path for `get_open_port` based on vLLM version (#4883)
qgallouedec
authored
ba05323
Copy full SHA for ba05323
Mark ZeRO 2 as xfail in distributed tests due to current failure (#4885)
qgallouedec
authored
e66a138
Copy full SHA for e66a138
Commits on Jan 21, 2026
Test distributed training for `RewardTrainer`, `RLOOTrainer` and `GRPOTrainer` (#4823)
Show description for a60d75a
qgallouedec
and
albertvillanova
authored
a60d75a
Copy full SHA for a60d75a
Enable vLLM sleep mode for generation in Online DPO (#4882)
Show description for 60e4674
3 people
authored
60e4674
Copy full SHA for 60e4674
Bugfix: Logprob drift in vLLM serving mode (compared to colocate mode) (#4873)
Show description for 0a881bc
3 people
authored
0a881bc
Copy full SHA for 0a881bc
Fix SFT training for prompt-completion type and transformers v5 (#4880)
qgallouedec
authored
16b0903
Copy full SHA for 16b0903
Pagination
Previous
Next
You can’t perform that action at this time.