Pulse · pytorch/ao · GitHub

June 2, 2025 – June 9, 2025

Overview

46 Active pull requests

7 Active issues

24 Pull requests merged by 15 people

Fix Per Tensor 3d rehsape
#2293 merged Jun 9, 2025
Update Quantization docs to show newer AOConfigs
#2317 merged Jun 9, 2025
Enhance test_autoquant_compile to support ROCm
#2100 merged Jun 9, 2025
Migrate xnnpack/vulkan/boltnn pt2e from torch.ao to torchao (#11363)
#2302 merged Jun 9, 2025
Fix Windows Build
#2333 merged Jun 7, 2025
Add slicing support for fbgemm fp8 and int4
#2308 merged Jun 6, 2025
Rename kleidi_ai in PackedWeightsType and update references
#2318 merged Jun 6, 2025
[BE/docs] Add fp8 rowwise perf table to float8 training readme
#2312 merged Jun 6, 2025
Fix broken circular dep error
#2320 merged Jun 6, 2025
[BE] [docs] Add float8 pretraining tutorial to docsite
#2304 merged Jun 5, 2025
Add Float8ActInt4WeightQATQuantizer
#2289 merged Jun 5, 2025
Enable doc build to run on PRs
#2315 merged Jun 5, 2025
Fix slicing and get_plain() in GemLite
#2288 merged Jun 5, 2025
[BE/docs] Add float8 training api ref to docsite
#2313 merged Jun 5, 2025
primitive scale fix
#2210 merged Jun 5, 2025
Add support for fbgemm fp8 kernels
#2276 merged Jun 5, 2025
Skip native modules if USE_CPP = 0
#2301 merged Jun 5, 2025
[sparse] marlin fixes
#2305 merged Jun 4, 2025
update float8 training readme to include time measurement
#2291 merged Jun 4, 2025
[optim] Fix bug when default dtype is BF16
#2286 merged Jun 4, 2025
Define torchao op library by srcs instead of object libraries
#2290 merged Jun 3, 2025
Remove valpacking code and associated tests
#2295 merged Jun 3, 2025
Fix QAT range learning, ensure scales get gradients
#2280 merged Jun 3, 2025
Removing DocBlock to unblock MXFP4 w/ Unwrap Tensor
#2292 merged Jun 3, 2025

22 Pull requests opened by 15 people

make bfs_graph_trace as internal function
#2294 opened Jun 3, 2025
skip quant/dequant decomposed
#2299 opened Jun 4, 2025
In the folder fbcode/pytorch/ao/torchao/ document the following symbol paths:
#2300 opened Jun 4, 2025
[Benchmarks] Remove additional baseline calculation
#2303 opened Jun 4, 2025
Adding 64x8 Triton kernel
#2307 opened Jun 4, 2025
Back out "Add back AOPerModuleConfig for BC (#2282)"
#2309 opened Jun 5, 2025
Add Claude MD file
#2311 opened Jun 5, 2025
[BE] Make ScalingGranularity module level so it can be rendered in API ref on docsite
#2314 opened Jun 5, 2025
[BE] Make internal torchao.float8 functions private
#2321 opened Jun 5, 2025
add recipe config in aps for fp8
#2322 opened Jun 5, 2025
Add round_scales_to_power_of_2 option for float quantization
#2323 opened Jun 6, 2025
turn off building tests with cpuinfo
#2324 opened Jun 6, 2025
moe quant with dedicated kernels [wip]
#2325 opened Jun 6, 2025
DUMMY PR: add support for hpu in float8 base and compile test for torch ao
#2326 opened Jun 6, 2025
Add dynamic quantization support to gemlite layout
#2327 opened Jun 6, 2025
add cast config for fp8 enablement
#2328 opened Jun 6, 2025
Add support for bmm and `to` for fbgemm Tensor
#2337 opened Jun 8, 2025
Replace debug handle with `from_node` to trace operator transformation
#2339 opened Jun 9, 2025
Implemented a new test case for LUT quantization
#2342 opened Jun 9, 2025
Inference tutorial - Part 3 of e2e series [WIP]
#2343 opened Jun 9, 2025
[BE] Rename qparams for tinygemm
#2344 opened Jun 9, 2025
Add inplace quantizer examples
#2345 opened Jun 10, 2025

1 Issue closed by 1 person

convert_to_float8_training and torch.compile make model slow
#2262 closed Jun 4, 2025

6 Issues opened by 6 people

Distributing ao tensor subclasses in .safetensors checkpoints
#2338 opened Jun 9, 2025
[Question] Combining QAT and Sparsity Training
#2310 opened Jun 5, 2025
[FP8 optimizer feature request] Better FP8 Optimizer with Dynamic Range Expansion
#2306 opened Jun 4, 2025
Use Int8WeightOnlyConfig to quant wan2.1 model, and export to onnx file, Why the onnx weights in my disk are fp32 precision?
#2298 opened Jun 4, 2025
[Windows][build]two Build failure on Windows on latest main branch
#2297 opened Jun 4, 2025
BF16 stochastic rounding does not work distributed (FSDP)
#2296 opened Jun 4, 2025

10 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[CPU] Enable DA8W4 on CPU
#2128 commented on Jun 8, 2025 • 6 new comments
float8 moe training conversion API prototype
#2275 commented on Jun 10, 2025 • 4 new comments
ROCm mx-fp8 Gemm
#2066 commented on Jun 10, 2025 • 2 new comments
Add benchmark numbers to dashboard
#2260 commented on Jun 5, 2025 • 1 new comment
Tensor Subclass + VLLM Compile
#2239 commented on Jun 3, 2025 • 0 new comments
QAT range learning tracker
#2271 commented on Jun 6, 2025 • 0 new comments
Eval hf models using lm_eval
#2179 commented on Jun 9, 2025 • 0 new comments
[WIP] Enable Int4WeightOnlyGPTQQuantizer on Intel GPU.
#2200 commented on Jun 9, 2025 • 0 new comments
Fix failing tests on h100
#2231 commented on Jun 9, 2025 • 0 new comments
Build mxfp4 kernel for sm120a
#2285 commented on Jun 8, 2025 • 0 new comments