Pulse · pytorch/ao · GitHub

May 11, 2025 – June 11, 2025

Overview

134 Active pull requests

27 Active issues

92 Pull requests merged by 29 people

Update README.md to include seamless v2
#2355 merged Jun 11, 2025
[BE] Make ScalingGranularity module level so it can be rendered in API ref on docsite
#2314 merged Jun 11, 2025
Add float8 MoE training readme and runnable example
#2353 merged Jun 11, 2025
float8 moe training conversion API prototype
#2275 merged Jun 10, 2025
Add static quant tutorial
#2047 merged Jun 10, 2025
Update QAT docs, highlight axolotl integration
#2266 merged Jun 10, 2025
[BE] Rename qparams for tinygemm
#2344 merged Jun 10, 2025
Add support for bmm and to for fbgemm Tensor
#2337 merged Jun 10, 2025
add cast config for fp8 enablement
#2328 merged Jun 10, 2025
Fix Per Tensor 3d rehsape
#2293 merged Jun 9, 2025
Update Quantization docs to show newer AOConfigs
#2317 merged Jun 9, 2025
Enhance test_autoquant_compile to support ROCm
#2100 merged Jun 9, 2025
Migrate xnnpack/vulkan/boltnn pt2e from torch.ao to torchao (#11363)
#2302 merged Jun 9, 2025
Fix Windows Build
#2333 merged Jun 7, 2025
Add slicing support for fbgemm fp8 and int4
#2308 merged Jun 6, 2025
Rename kleidi_ai in PackedWeightsType and update references
#2318 merged Jun 6, 2025
[BE/docs] Add fp8 rowwise perf table to float8 training readme
#2312 merged Jun 6, 2025
Fix broken circular dep error
#2320 merged Jun 6, 2025
[BE] [docs] Add float8 pretraining tutorial to docsite
#2304 merged Jun 5, 2025
Add Float8ActInt4WeightQATQuantizer
#2289 merged Jun 5, 2025
Enable doc build to run on PRs
#2315 merged Jun 5, 2025
Fix slicing and get_plain() in GemLite
#2288 merged Jun 5, 2025
[BE/docs] Add float8 training api ref to docsite
#2313 merged Jun 5, 2025
primitive scale fix
#2210 merged Jun 5, 2025
Add support for fbgemm fp8 kernels
#2276 merged Jun 5, 2025
Skip native modules if USE_CPP = 0
#2301 merged Jun 5, 2025
[sparse] marlin fixes
#2305 merged Jun 4, 2025
update float8 training readme to include time measurement
#2291 merged Jun 4, 2025
[optim] Fix bug when default dtype is BF16
#2286 merged Jun 4, 2025
Define torchao op library by srcs instead of object libraries
#2290 merged Jun 3, 2025
Remove valpacking code and associated tests
#2295 merged Jun 3, 2025
Fix QAT range learning, ensure scales get gradients
#2280 merged Jun 3, 2025
Removing DocBlock to unblock MXFP4 w/ Unwrap Tensor
#2292 merged Jun 3, 2025
GPTQ updates
#2235 merged Jun 2, 2025
[float8 training] remove duplicate override for view
#2269 merged Jun 2, 2025
Remove Constraint for sm89 hardware
#2281 merged Jun 2, 2025
Fix benchmark_low_bit_adam.py reference
#2287 merged Jun 1, 2025
Fix Bug in MX Builds
#2284 merged May 31, 2025
Add back AOPerModuleConfig for BC
#2282 merged May 31, 2025
Patch the _is_conv_node function
#2257 merged May 31, 2025
Fixes MX formats build for blackwell
#2278 merged May 30, 2025
Update CMake to enable building ops on iOS
#2274 merged May 30, 2025
Resolve logger warnings
#2250 merged May 30, 2025
Add Integration Tests to H100 CI
#2268 merged May 30, 2025
Make optim lazily intialize global state
#2277 merged May 30, 2025
Fix generate.py for fbgemm int4 integration
#2273 merged May 29, 2025
Mark QAT range learning as prototype for now
#2272 merged May 29, 2025
Enable range learning for QAT
#2033 merged May 29, 2025
Fix torchao generate script for cpu device
#2267 merged May 29, 2025
Enable fp16+int4 mixed precission path for int4 xpu path with int zero point
#2240 merged May 29, 2025
integration-vllm-test
#2258 merged May 28, 2025
Add support for fbgemm int4 mm kernel
#2255 merged May 28, 2025
[reland2][ROCm] preshuffled weight mm
#2207 merged May 28, 2025
Support INT8 SDPA template for CPU
#2148 merged May 28, 2025
Fix Per Row scaling for inference
#2253 merged May 27, 2025
Revert "Try fixing CI by pinning pytest (#2238)"
#2263 merged May 27, 2025
Rename AOPerModuleConfig to ModuleFqnToConfig
#2243 merged May 24, 2025
Add backward compatible types to pt2e prepare
#2244 merged May 23, 2025
Relax int4wo device mismatch error
#2254 merged May 23, 2025
Revert "Patch the _is_conv_node function"
#2247 merged May 23, 2025
Patch the _is_conv_node function
#2223 merged May 22, 2025
Update Readme
#1526 merged May 22, 2025
[sparse] Add fp8 sparse gemm with rowwise scaling for activation sparsity
#2242 merged May 22, 2025
Try fixing CI by pinning pytest
#2238 merged May 22, 2025
Relax MOE constraints and add test for torch.mm computation
#2227 merged May 22, 2025
clean up prototype folder
#2232 merged May 21, 2025
remove benchmarks from top level repo
#2233 merged May 21, 2025
Update GemLite to support vLLM V1
#2199 merged May 21, 2025
Remove preserve_zero and zero_point_domain from choose_qparams_affine
#2149 merged May 21, 2025
use correct fp8 quantization dtype for AMD GPU
#2225 merged May 21, 2025
Re-land the PR of "Add INT8 SDPA path for CPU"
#2215 merged May 21, 2025
Update config.py
#2224 merged May 20, 2025
Make torchao pt2e prepare/convert functions compatible with quantizers in torch.ao
#2221 merged May 19, 2025
Enable {conv3d, conv_transpose3d} + bn fusion in pt2e
#2212 merged May 15, 2025
Add CI for Arm Linux
#2211 merged May 15, 2025
ROCm mxfp4 Skips
#2209 merged May 14, 2025
Add support for KleidiAI int4 kernels on aarch64 Linux
#2169 merged May 14, 2025
unbreak CI by fixing MX tests
#2208 merged May 14, 2025
Update __init__.py
#2206 merged May 14, 2025
Add mx_fp4 path
#2201 merged May 13, 2025
Arm_inductor_quantizer for Pt2e quantization
#2139 merged May 13, 2025
[float] document e2e training -> inference flow
#2190 merged May 13, 2025
Remove sparsity/prototype/blocksparse
#2205 merged May 13, 2025
Skips for ROCm (X86 Inductor Tests)
#2202 merged May 13, 2025
Add blockwise fp8 gemm benchmarks to README
#2203 merged May 12, 2025
Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors
#1763 merged May 12, 2025
Add noindex to 0.10 and 0.9 docs
#2194 merged May 12, 2025
Add subclass based method for inference w/ MXFP8
#2132 merged May 12, 2025
unpin torch to unbreak mac tests
#2198 merged May 12, 2025
2:4 activation sparsity packing kernels
#2012 merged May 12, 2025
Forward fix lint
#2197 merged May 12, 2025
Skip ROCm MoE Quantization
#2191 merged May 12, 2025

42 Pull requests opened by 24 people

Enable Int4WeightOnlyGPTQQuantizer on Intel GPU.
#2200 opened May 12, 2025
Add activation sparsity (24 + fp8 dynamic quant) subclass
#2213 opened May 15, 2025
Convert Pytest to Unittest for tests under test/dtypes/
#2216 opened May 16, 2025
Update temp_build.py
#2218 opened May 17, 2025
Fix failing tests on h100
#2231 opened May 21, 2025
Test older almalinux image
#2236 opened May 21, 2025
[draft] Update regression_test.yml
#2237 opened May 22, 2025
fix _replace_with_custom_fn_if_matches_filter in quant_api.py
#2252 opened May 23, 2025
Add a way to do power of 2 scaling
#2256 opened May 23, 2025
Add benchmark numbers to dashboard
#2260 opened May 24, 2025
test_affine_quantized_float.py pytest too unittest
#2261 opened May 25, 2025
Test d script
#2264 opened May 27, 2025
[do not land] testing if moving this breaks my PRs
#2283 opened May 30, 2025
Build mxfp4 kernel for sm120a
#2285 opened May 31, 2025
make bfs_graph_trace as internal function
#2294 opened Jun 3, 2025
skip quant/dequant decomposed
#2299 opened Jun 4, 2025
In the folder fbcode/pytorch/ao/torchao/ document the following symbol paths:
#2300 opened Jun 4, 2025
[Benchmarks] Remove additional baseline calculation
#2303 opened Jun 4, 2025
Adding 64x8 Triton kernel
#2307 opened Jun 4, 2025
Back out "Add back AOPerModuleConfig for BC (#2282)"
#2309 opened Jun 5, 2025
Add Claude MD file
#2311 opened Jun 5, 2025
[BE] Make internal torchao.float8 functions private
#2321 opened Jun 5, 2025
add recipe config in aps for fp8
#2322 opened Jun 5, 2025
Add round_scales_to_power_of_2 option for float quantization
#2323 opened Jun 6, 2025
turn off building tests with cpuinfo
#2324 opened Jun 6, 2025
moe quant with dedicated kernels [wip]
#2325 opened Jun 6, 2025
DUMMY PR: add support for hpu in float8 base and compile test for torch ao
#2326 opened Jun 6, 2025
Add dynamic quantization support to gemlite layout
#2327 opened Jun 6, 2025
Replace debug handle with `from_node` to trace operator transformation
#2339 opened Jun 9, 2025
Inference tutorial - Part 3 of e2e series [WIP]
#2343 opened Jun 9, 2025
Add inplace quantizer examples
#2345 opened Jun 10, 2025
Add Tutorial on E2E integration into VLLM and minimal Subclass
#2346 opened Jun 10, 2025
[BE] Convert quant_primitives methods private
#2350 opened Jun 10, 2025
[float8] Add fnuz fp8 dtypes to Float8Layout
#2351 opened Jun 10, 2025
make float8 training's force_recompute_fp8_weight_in_bwd flag do nothing
#2356 opened Jun 11, 2025
[WIP] FSDP support for MoE training
#2357 opened Jun 11, 2025
Add test case generator for groupwise low bit LUT based quantization
#2359 opened Jun 11, 2025
Back out "Add fbgemm as a dep for torchao in fbcode"
#2360 opened Jun 11, 2025
Update to new PT Theme
#2361 opened Jun 11, 2025
fix ROCM test failures
#2362 opened Jun 12, 2025
fixing autoquant bug
#2363 opened Jun 12, 2025
[not for land] checking ROCM test length issue
#2364 opened Jun 12, 2025

4 Issues closed by 4 people

convert_to_float8_training and torch.compile make model slow
#2262 closed Jun 4, 2025
cannot save fp8-wo model
#2230 closed May 21, 2025
KleidiAI int4 kernels not loading properly on aarch64 Linux
#2143 closed May 16, 2025
New test files will likely fail on ROCM
#2204 closed May 13, 2025

23 Issues opened by 17 people

Support `torch.int4` `target_dtype` for ops `choose_qparams_affine`, `quantize_affine`, `dequantize_affine`
#2354 opened Jun 10, 2025
DISABLED test_int4_weight_only_quant_subclass_grouped_5_cuda (__main__.TestSubclass)
#2352 opened Jun 10, 2025
Add _apply_fn_to_data in AOBaseClass
#2349 opened Jun 10, 2025
Distributing ao tensor subclasses in .safetensors checkpoints
#2338 opened Jun 9, 2025
[Question] Combining QAT and Sparsity Training
#2310 opened Jun 5, 2025
[FP8 optimizer feature request] Better FP8 Optimizer with Dynamic Range Expansion
#2306 opened Jun 4, 2025
Use Int8WeightOnlyConfig to quant wan2.1 model, and export to onnx file, Why the onnx weights in my disk are fp32 precision?
#2298 opened Jun 4, 2025
[Windows][build]two Build failure on Windows on latest main branch
#2297 opened Jun 4, 2025
BF16 stochastic rounding does not work distributed (FSDP)
#2296 opened Jun 4, 2025
QAT range learning tracker
#2271 opened May 29, 2025
[pt2e] QAT training and FSDP support
#2265 opened May 27, 2025
torch.ao.quantization deprecation tracker
#2259 opened May 24, 2025
We should deprecate Float8LinearConfig.force_recompute_fp8_weight_in_bwd
#2251 opened May 23, 2025
int4_weight_only get plain weight are padded
#2249 opened May 23, 2025
`quantize_(nn.Linear)` doesn't work with module swaps
#2241 opened May 22, 2025
BatchNorm + Convolution fusion in `prepare_pt2e` removal
#2245 opened May 22, 2025
Tensor Subclass + VLLM Compile
#2239 opened May 22, 2025
MXFP Inference Tracking Doc
#2229 opened May 21, 2025
[Quant] Can quant not be decomposed on inductor?
#2228 opened May 20, 2025
newer torchao breaks sglang?
#2226 opened May 19, 2025
TorchAO needs to update its build system
#2222 opened May 19, 2025
Ship all CUDA kernels in a single .so
#2220 opened May 19, 2025
Add MXFP casting kernels from triton Repro
#2217 opened May 16, 2025

18 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[PT2E] Fix per-tensor observer issue with varying shape & rank
#2177 commented on May 22, 2025 • 15 new comments
[CPU] Enable DA8W4 on CPU
#2128 commented on Jun 11, 2025 • 11 new comments
Eval hf models using lm_eval
#2179 commented on Jun 9, 2025 • 4 new comments
ROCm mx-fp8 Gemm
#2066 commented on Jun 10, 2025 • 2 new comments
[WIP] all-gather fp8 for rowwise
#2145 commented on May 23, 2025 • 0 new comments
[sparsity] Add PartialLinear module for structured sparsity
#1982 commented on May 15, 2025 • 0 new comments
Fix wrong scale eps applied
#1770 commented on May 19, 2025 • 0 new comments
[draft] add all_gather_into_tensor
#1737 commented on May 16, 2025 • 0 new comments
MX single node performance tracker
#1768 commented on Jun 11, 2025 • 0 new comments
EfficientTAM
#1384 commented on Jun 2, 2025 • 0 new comments
Sam2 video
#1564 commented on Jun 1, 2025 • 0 new comments
[roadmap/tracker] Low precision training for MoEs
#2147 commented on May 27, 2025 • 0 new comments
[feature request] np.packbits / np.unpackbits, general BitTensors (maybe can be just tensors with dtype torch.bits8 or have a new dtype torch.bits introduced) and bit packed tensors utilities for saving memory / accesses, support for BitTensors wherever BoolTensors are used
#292 commented on May 15, 2025 • 0 new comments
How does this work with ONNX export and quantization?
#777 commented on May 14, 2025 • 0 new comments
[QAT] Linear layer's weight quantization granularity can only be per_group
#2189 commented on May 14, 2025 • 0 new comments
[float8] Add support for blockwise fp8 quantization scheme used in DeepSeek v3
#1594 commented on May 13, 2025 • 0 new comments
Dynamo error with large mesh + AdamWFp8 + bf16 stochastic rounding
#2074 commented on May 12, 2025 • 0 new comments
Can FP8 GEMM be enabled via module hooks instead of module swapping?
#1887 commented on May 12, 2025 • 0 new comments