Skip to content
View guox18's full-sized avatar

Block or report guox18

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. IFDecorator IFDecorator Public

    Introduce difficulty (rather than complexity) to instruction data; mitigate reward hacking during RLVR training

    Python 9 1

  2. rejection-sampling-recipes rejection-sampling-recipes Public

    Python 5