huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 2.5k
Star 17.2k

Code
Issues 555
Pull requests 96
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 36 Milestones 0

New pull request New

96 Open 2,518 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Experimental] Add SDFT trainer, config, docs, and tests

#4941 opened Jan 31, 2026 by Shekswess

Loading…

4 of 5 tasks

Update RewardFunc type to use RewardCallable protocol

#4938 opened Jan 31, 2026 by amit9oct

Loading…

2 of 5 tasks

documentation for modifying chat templates for assistant-only loss

#4937 opened Jan 30, 2026 by jiosephlee

Loading…

Add Wordle example with Qwen3 thinking activated

#4936 opened Jan 30, 2026 by sergiopaniego • Draft

5 tasks

Add SDPO (Self-Distillation Policy Optimization) trainer

#4935 opened Jan 30, 2026 by MengAiDev

Loading…

fix: prevent O(2^n) regex backtracking in qwen3_schema

#4934 opened Jan 29, 2026 by wingding12 • Draft

1 task done

fix: sanitize malformed tool calls to prevent TypeError in chat templates

#4933 opened Jan 29, 2026 by wingding12 • Draft

3 tasks done

Automatically add generation tags to chat template for assistant_only_loss=True training (TRL Issue #4879)

#4900 opened Jan 26, 2026 by Neelectric • Draft

3 of 5 tasks

Update wordle.py example with masking of env tokens

#4895 opened Jan 26, 2026 by sergiopaniego

Loading…

5 tasks

Expose generation index to tool callables in GRPOTrainer

#4894 opened Jan 25, 2026 by lukehinds

Loading…

4 tasks done

Upgrade GitHub Actions to latest versions

#4893 opened Jan 24, 2026 by salmanmkc

Loading…

[GRPO] feat: Geometric Sequence Masking

#4891 opened Jan 24, 2026 by LeonEricsson

Loading…

5 tasks

Fix grpo tool calling

#4890 opened Jan 23, 2026 by akshayballal95

Loading…

2 tasks done

fix(vLLM): Add tool calling support to VLLMClient.chat()

#4889 opened Jan 23, 2026 by kansalaman

Loading…

1 of 2 tasks

Add History-Aware Adaptive Difficulty Weighting (HA-DW) to GRPO

#4872 opened Jan 20, 2026 by anonx3247

Loading…

NeMo-Gym Integration

#4848 opened Jan 17, 2026 by cmunley1

Loading…

make dpo compatible with fsdp2

#4838 opened Jan 16, 2026 by flutist

Loading…

4 of 5 tasks

feat: Support log_completion for swanlab backend

#4826 opened Jan 14, 2026 by ZiyiTsang

Loading…

2 of 5 tasks

Add support for training with multiple OpenEnv environments

#4824 opened Jan 13, 2026 by lewtun • Draft

5 tasks

Add Entropy Adaptive Fine Tuning to SFT Trainer

#4802 opened Jan 10, 2026 by electroglyph

Loading…

forward_masked_logits in SFTTrainer

#4794 opened Jan 8, 2026 by qgallouedec • Draft

5 tasks

Refactor KTO [3/N]: Extract dataset processing to _prepare_dataset method

#4788 opened Jan 8, 2026 by albertvillanova

Loading…

Refactor KTO [2/N]: Improve config validation in KTOConfig

#4787 opened Jan 8, 2026 by albertvillanova

Loading…

make dpo compatible with qwen3vl

#4773 opened Jan 4, 2026 by flutist

Loading…

feat(sft): add generation-based evaluation support to SFTTrainer

#4768 opened Jan 2, 2026 by CodersAcademy006

Loading…

Previous 1 2 3 4 Next

Previous Next

ProTip! Updated in the last three days: updated:>2026-01-29.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!