-
Notifications
You must be signed in to change notification settings - Fork 284
Insights: pytorch/ao
Overview
Could not load contribution data
Please try again later
92 Pull requests merged by 29 people
-
Update README.md to include seamless v2
#2355 merged
Jun 11, 2025 -
[BE] Make ScalingGranularity module level so it can be rendered in API ref on docsite
#2314 merged
Jun 11, 2025 -
Add float8 MoE training readme and runnable example
#2353 merged
Jun 11, 2025 -
float8 moe training conversion API prototype
#2275 merged
Jun 10, 2025 -
Add static quant tutorial
#2047 merged
Jun 10, 2025 -
Update QAT docs, highlight axolotl integration
#2266 merged
Jun 10, 2025 -
[BE] Rename qparams for tinygemm
#2344 merged
Jun 10, 2025 -
Add support for bmm and
to
for fbgemm Tensor#2337 merged
Jun 10, 2025 -
add cast config for fp8 enablement
#2328 merged
Jun 10, 2025 -
Fix Per Tensor 3d rehsape
#2293 merged
Jun 9, 2025 -
Update Quantization docs to show newer AOConfigs
#2317 merged
Jun 9, 2025 -
Enhance test_autoquant_compile to support ROCm
#2100 merged
Jun 9, 2025 -
Migrate xnnpack/vulkan/boltnn pt2e from torch.ao to torchao (#11363)
#2302 merged
Jun 9, 2025 -
Fix Windows Build
#2333 merged
Jun 7, 2025 -
Add slicing support for fbgemm fp8 and int4
#2308 merged
Jun 6, 2025 -
Rename kleidi_ai in PackedWeightsType and update references
#2318 merged
Jun 6, 2025 -
[BE/docs] Add fp8 rowwise perf table to float8 training readme
#2312 merged
Jun 6, 2025 -
Fix broken circular dep error
#2320 merged
Jun 6, 2025 -
[BE] [docs] Add float8 pretraining tutorial to docsite
#2304 merged
Jun 5, 2025 -
Add Float8ActInt4WeightQATQuantizer
#2289 merged
Jun 5, 2025 -
Enable doc build to run on PRs
#2315 merged
Jun 5, 2025 -
Fix slicing and get_plain() in GemLite
#2288 merged
Jun 5, 2025 -
[BE/docs] Add float8 training api ref to docsite
#2313 merged
Jun 5, 2025 -
primitive scale fix
#2210 merged
Jun 5, 2025 -
Add support for fbgemm fp8 kernels
#2276 merged
Jun 5, 2025 -
Skip native modules if USE_CPP = 0
#2301 merged
Jun 5, 2025 -
[sparse] marlin fixes
#2305 merged
Jun 4, 2025 -
update float8 training readme to include time measurement
#2291 merged
Jun 4, 2025 -
[optim] Fix bug when default dtype is BF16
#2286 merged
Jun 4, 2025 -
Define torchao op library by srcs instead of object libraries
#2290 merged
Jun 3, 2025 -
Remove valpacking code and associated tests
#2295 merged
Jun 3, 2025 -
Fix QAT range learning, ensure scales get gradients
#2280 merged
Jun 3, 2025 -
Removing DocBlock to unblock MXFP4 w/ Unwrap Tensor
#2292 merged
Jun 3, 2025 -
GPTQ updates
#2235 merged
Jun 2, 2025 -
[float8 training] remove duplicate override for view
#2269 merged
Jun 2, 2025 -
Remove Constraint for sm89 hardware
#2281 merged
Jun 2, 2025 -
Fix benchmark_low_bit_adam.py reference
#2287 merged
Jun 1, 2025 -
Fix Bug in MX Builds
#2284 merged
May 31, 2025 -
Add back AOPerModuleConfig for BC
#2282 merged
May 31, 2025 -
Patch the _is_conv_node function
#2257 merged
May 31, 2025 -
Fixes MX formats build for blackwell
#2278 merged
May 30, 2025 -
Update CMake to enable building ops on iOS
#2274 merged
May 30, 2025 -
Resolve logger warnings
#2250 merged
May 30, 2025 -
Add Integration Tests to H100 CI
#2268 merged
May 30, 2025 -
Make optim lazily intialize global state
#2277 merged
May 30, 2025 -
Fix generate.py for fbgemm int4 integration
#2273 merged
May 29, 2025 -
Mark QAT range learning as prototype for now
#2272 merged
May 29, 2025 -
Enable range learning for QAT
#2033 merged
May 29, 2025 -
Fix torchao generate script for cpu device
#2267 merged
May 29, 2025 -
Enable fp16+int4 mixed precission path for int4 xpu path with int zero point
#2240 merged
May 29, 2025 -
integration-vllm-test
#2258 merged
May 28, 2025 -
Add support for fbgemm int4 mm kernel
#2255 merged
May 28, 2025 -
[reland2][ROCm] preshuffled weight mm
#2207 merged
May 28, 2025 -
Support INT8 SDPA template for CPU
#2148 merged
May 28, 2025 -
Fix Per Row scaling for inference
#2253 merged
May 27, 2025 -
Revert "Try fixing CI by pinning pytest (#2238)"
#2263 merged
May 27, 2025 -
Rename AOPerModuleConfig to ModuleFqnToConfig
#2243 merged
May 24, 2025 -
Add backward compatible types to pt2e prepare
#2244 merged
May 23, 2025 -
Relax int4wo device mismatch error
#2254 merged
May 23, 2025 -
Revert "Patch the _is_conv_node function"
#2247 merged
May 23, 2025 -
Patch the _is_conv_node function
#2223 merged
May 22, 2025 -
Update Readme
#1526 merged
May 22, 2025 -
[sparse] Add fp8 sparse gemm with rowwise scaling for activation sparsity
#2242 merged
May 22, 2025 -
Try fixing CI by pinning pytest
#2238 merged
May 22, 2025 -
Relax MOE constraints and add test for torch.mm computation
#2227 merged
May 22, 2025 -
clean up prototype folder
#2232 merged
May 21, 2025 -
remove benchmarks from top level repo
#2233 merged
May 21, 2025 -
Update GemLite to support vLLM V1
#2199 merged
May 21, 2025 -
Remove preserve_zero and zero_point_domain from choose_qparams_affine
#2149 merged
May 21, 2025 -
use correct fp8 quantization dtype for AMD GPU
#2225 merged
May 21, 2025 -
Re-land the PR of "Add INT8 SDPA path for CPU"
#2215 merged
May 21, 2025 -
Update config.py
#2224 merged
May 20, 2025 -
Make torchao pt2e prepare/convert functions compatible with quantizers in torch.ao
#2221 merged
May 19, 2025 -
Enable {conv3d, conv_transpose3d} + bn fusion in pt2e
#2212 merged
May 15, 2025 -
Add CI for Arm Linux
#2211 merged
May 15, 2025 -
ROCm mxfp4 Skips
#2209 merged
May 14, 2025 -
Add support for KleidiAI int4 kernels on aarch64 Linux
#2169 merged
May 14, 2025 -
unbreak CI by fixing MX tests
#2208 merged
May 14, 2025 -
Update __init__.py
#2206 merged
May 14, 2025 -
Add mx_fp4 path
#2201 merged
May 13, 2025 -
Arm_inductor_quantizer for Pt2e quantization
#2139 merged
May 13, 2025 -
[float] document e2e training -> inference flow
#2190 merged
May 13, 2025 -
Remove
sparsity/prototype/blocksparse
#2205 merged
May 13, 2025 -
Skips for ROCm (X86 Inductor Tests)
#2202 merged
May 13, 2025 -
Add blockwise fp8 gemm benchmarks to README
#2203 merged
May 12, 2025 -
Feat: Implementation of the DeepSeek blockwise quantization for fp8 tensors
#1763 merged
May 12, 2025 -
Add noindex to 0.10 and 0.9 docs
#2194 merged
May 12, 2025 -
Add subclass based method for inference w/ MXFP8
#2132 merged
May 12, 2025 -
unpin torch to unbreak mac tests
#2198 merged
May 12, 2025 -
2:4 activation sparsity packing kernels
#2012 merged
May 12, 2025 -
Forward fix lint
#2197 merged
May 12, 2025 -
Skip ROCm MoE Quantization
#2191 merged
May 12, 2025
42 Pull requests opened by 24 people
-
Enable Int4WeightOnlyGPTQQuantizer on Intel GPU.
#2200 opened
May 12, 2025 -
Add activation sparsity (24 + fp8 dynamic quant) subclass
#2213 opened
May 15, 2025 -
Convert Pytest to Unittest for tests under test/dtypes/
#2216 opened
May 16, 2025 -
Update temp_build.py
#2218 opened
May 17, 2025 -
Fix failing tests on h100
#2231 opened
May 21, 2025 -
Test older almalinux image
#2236 opened
May 21, 2025 -
[draft] Update regression_test.yml
#2237 opened
May 22, 2025 -
fix _replace_with_custom_fn_if_matches_filter in quant_api.py
#2252 opened
May 23, 2025 -
Add a way to do power of 2 scaling
#2256 opened
May 23, 2025 -
Add benchmark numbers to dashboard
#2260 opened
May 24, 2025 -
test_affine_quantized_float.py pytest too unittest
#2261 opened
May 25, 2025 -
Test d script
#2264 opened
May 27, 2025 -
[do not land] testing if moving this breaks my PRs
#2283 opened
May 30, 2025 -
Build mxfp4 kernel for sm120a
#2285 opened
May 31, 2025 -
make bfs_graph_trace as internal function
#2294 opened
Jun 3, 2025 -
skip quant/dequant decomposed
#2299 opened
Jun 4, 2025 -
In the folder fbcode/pytorch/ao/torchao/ document the following symbol paths:
#2300 opened
Jun 4, 2025 -
[Benchmarks] Remove additional baseline calculation
#2303 opened
Jun 4, 2025 -
Adding 64x8 Triton kernel
#2307 opened
Jun 4, 2025 -
Back out "Add back AOPerModuleConfig for BC (#2282)"
#2309 opened
Jun 5, 2025 -
Add Claude MD file
#2311 opened
Jun 5, 2025 -
[BE] Make internal torchao.float8 functions private
#2321 opened
Jun 5, 2025 -
add recipe config in aps for fp8
#2322 opened
Jun 5, 2025 -
Add round_scales_to_power_of_2 option for float quantization
#2323 opened
Jun 6, 2025 -
turn off building tests with cpuinfo
#2324 opened
Jun 6, 2025 -
moe quant with dedicated kernels [wip]
#2325 opened
Jun 6, 2025 -
DUMMY PR: add support for hpu in float8 base and compile test for torch ao
#2326 opened
Jun 6, 2025 -
Add dynamic quantization support to gemlite layout
#2327 opened
Jun 6, 2025 -
Replace debug handle with `from_node` to trace operator transformation
#2339 opened
Jun 9, 2025 -
Inference tutorial - Part 3 of e2e series [WIP]
#2343 opened
Jun 9, 2025 -
Add inplace quantizer examples
#2345 opened
Jun 10, 2025 -
Add Tutorial on E2E integration into VLLM and minimal Subclass
#2346 opened
Jun 10, 2025 -
[BE] Convert quant_primitives methods private
#2350 opened
Jun 10, 2025 -
[float8] Add fnuz fp8 dtypes to Float8Layout
#2351 opened
Jun 10, 2025 -
make float8 training's force_recompute_fp8_weight_in_bwd flag do nothing
#2356 opened
Jun 11, 2025 -
[WIP] FSDP support for MoE training
#2357 opened
Jun 11, 2025 -
Add test case generator for groupwise low bit LUT based quantization
#2359 opened
Jun 11, 2025 -
Back out "Add fbgemm as a dep for torchao in fbcode"
#2360 opened
Jun 11, 2025 -
Update to new PT Theme
#2361 opened
Jun 11, 2025 -
fix ROCM test failures
#2362 opened
Jun 12, 2025 -
fixing autoquant bug
#2363 opened
Jun 12, 2025 -
[not for land] checking ROCM test length issue
#2364 opened
Jun 12, 2025
4 Issues closed by 4 people
-
convert_to_float8_training and torch.compile make model slow
#2262 closed
Jun 4, 2025 -
cannot save fp8-wo model
#2230 closed
May 21, 2025 -
KleidiAI int4 kernels not loading properly on aarch64 Linux
#2143 closed
May 16, 2025 -
New test files will likely fail on ROCM
#2204 closed
May 13, 2025
23 Issues opened by 17 people
-
DISABLED test_int4_weight_only_quant_subclass_grouped_5_cuda (__main__.TestSubclass)
#2352 opened
Jun 10, 2025 -
Add _apply_fn_to_data in AOBaseClass
#2349 opened
Jun 10, 2025 -
Distributing ao tensor subclasses in .safetensors checkpoints
#2338 opened
Jun 9, 2025 -
[Question] Combining QAT and Sparsity Training
#2310 opened
Jun 5, 2025 -
[FP8 optimizer feature request] Better FP8 Optimizer with Dynamic Range Expansion
#2306 opened
Jun 4, 2025 -
[Windows][build]two Build failure on Windows on latest main branch
#2297 opened
Jun 4, 2025 -
BF16 stochastic rounding does not work distributed (FSDP)
#2296 opened
Jun 4, 2025 -
QAT range learning tracker
#2271 opened
May 29, 2025 -
[pt2e] QAT training and FSDP support
#2265 opened
May 27, 2025 -
torch.ao.quantization deprecation tracker
#2259 opened
May 24, 2025 -
We should deprecate Float8LinearConfig.force_recompute_fp8_weight_in_bwd
#2251 opened
May 23, 2025 -
int4_weight_only get plain weight are padded
#2249 opened
May 23, 2025 -
`quantize_(nn.Linear)` doesn't work with module swaps
#2241 opened
May 22, 2025 -
BatchNorm + Convolution fusion in `prepare_pt2e` removal
#2245 opened
May 22, 2025 -
Tensor Subclass + VLLM Compile
#2239 opened
May 22, 2025 -
MXFP Inference Tracking Doc
#2229 opened
May 21, 2025 -
[Quant] Can quant not be decomposed on inductor?
#2228 opened
May 20, 2025 -
newer torchao breaks sglang?
#2226 opened
May 19, 2025 -
TorchAO needs to update its build system
#2222 opened
May 19, 2025 -
Ship all CUDA kernels in a single .so
#2220 opened
May 19, 2025 -
Add MXFP casting kernels from triton Repro
#2217 opened
May 16, 2025
18 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[PT2E] Fix per-tensor observer issue with varying shape & rank
#2177 commented on
May 22, 2025 • 15 new comments -
[CPU] Enable DA8W4 on CPU
#2128 commented on
Jun 11, 2025 • 11 new comments -
Eval hf models using lm_eval
#2179 commented on
Jun 9, 2025 • 4 new comments -
ROCm mx-fp8 Gemm
#2066 commented on
Jun 10, 2025 • 2 new comments -
[WIP] all-gather fp8 for rowwise
#2145 commented on
May 23, 2025 • 0 new comments -
[sparsity] Add PartialLinear module for structured sparsity
#1982 commented on
May 15, 2025 • 0 new comments -
Fix wrong scale eps applied
#1770 commented on
May 19, 2025 • 0 new comments -
[draft] add all_gather_into_tensor
#1737 commented on
May 16, 2025 • 0 new comments -
MX single node performance tracker
#1768 commented on
Jun 11, 2025 • 0 new comments -
EfficientTAM
#1384 commented on
Jun 2, 2025 • 0 new comments -
Sam2 video
#1564 commented on
Jun 1, 2025 • 0 new comments -
[roadmap/tracker] Low precision training for MoEs
#2147 commented on
May 27, 2025 • 0 new comments -
[feature request] np.packbits / np.unpackbits, general BitTensors (maybe can be just tensors with dtype torch.bits8 or have a new dtype torch.bits introduced) and bit packed tensors utilities for saving memory / accesses, support for BitTensors wherever BoolTensors are used
#292 commented on
May 15, 2025 • 0 new comments -
How does this work with ONNX export and quantization?
#777 commented on
May 14, 2025 • 0 new comments -
[QAT] Linear layer's weight quantization granularity can only be per_group
#2189 commented on
May 14, 2025 • 0 new comments -
[float8] Add support for blockwise fp8 quantization scheme used in DeepSeek v3
#1594 commented on
May 13, 2025 • 0 new comments -
Dynamo error with large mesh + AdamWFp8 + bf16 stochastic rounding
#2074 commented on
May 12, 2025 • 0 new comments -
Can FP8 GEMM be enabled via module hooks instead of module swapping?
#1887 commented on
May 12, 2025 • 0 new comments