Skip to content

Add Float8ActInt4WeightQATQuantizer #2289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2025
Merged

Conversation

andrewor14
Copy link
Contributor

@andrewor14 andrewor14 commented Jun 2, 2025

Summary: This commit adds a QAT quantizer that performs float8 dynamic activation + int4 symmetric per channel weight fake quantization. Note that there is no corresponding config for float8 QAT yet. This will be added in a future PR.

Test Plan:
python test/quantization/test_qat.py -k test_float8_fake_quantize
python test/quantization/test_qat.py -k test_qat_fp8a4w_quantizer

Copy link

pytorch-bot bot commented Jun 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2289

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 656d17d with merge base 4610850 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 2, 2025
@andrewor14 andrewor14 marked this pull request as draft June 2, 2025 22:39
@andrewor14 andrewor14 added the topic: new feature Use this tag if this PR adds a new feature label Jun 2, 2025
@andrewor14 andrewor14 force-pushed the fp8-int4-qat-quantizer branch 2 times, most recently from 452c147 to 620f676 Compare June 3, 2025 16:37
@andrewor14 andrewor14 marked this pull request as ready for review June 3, 2025 16:37
@andrewor14 andrewor14 requested review from jerryzh168 and vkuzo June 3, 2025 16:37
@andrewor14 andrewor14 force-pushed the fp8-int4-qat-quantizer branch 2 times, most recently from c0b808c to 5e08eca Compare June 3, 2025 18:09
@@ -17,6 +17,9 @@
from torch.ao.quantization.fx._decomposed import quantized_decomposed_lib # noqa: F401

from torchao import quantize_
from torchao.float8.config import ScalingGranularity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda hate that we have ScalingGranuliarty and Ganularity of the other FP8 inference APIs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth fixing before landing. @andrewor14 , how about just using rowwise scaling (since I assume that the one you want) and removing the option to confugure it? That will at least keep this problem away from the BC surface of QAT in a way that we can more easily fix later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sure

Copy link
Contributor

@vkuzo vkuzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request changes for removing ScalingGranularity from user API

@andrewor14 andrewor14 force-pushed the fp8-int4-qat-quantizer branch from 5e08eca to 59ca3ca Compare June 3, 2025 19:37
@andrewor14
Copy link
Contributor Author

andrewor14 commented Jun 3, 2025

Removed ScalingGranularity from the public API Float8ActInt4WeightQATQuantizer, please have another look @vkuzo

@andrewor14 andrewor14 requested a review from vkuzo June 3, 2025 19:39
@andrewor14 andrewor14 changed the title Add Float8ActInt4WeightQATQuantizer Jun 3, 2025
@andrewor14 andrewor14 force-pushed the fp8-int4-qat-quantizer branch 2 times, most recently from cf45f47 to cfead5c Compare June 3, 2025 22:50
@andrewor14 andrewor14 changed the title Add Float8RowwiseActInt4WeightQATQuantizer Jun 3, 2025
@andrewor14 andrewor14 force-pushed the fp8-int4-qat-quantizer branch from cfead5c to 8269247 Compare June 4, 2025 17:58
@andrewor14 andrewor14 requested a review from vkuzo June 4, 2025 17:59
@andrewor14 andrewor14 force-pushed the fp8-int4-qat-quantizer branch from 8269247 to 2a371fb Compare June 4, 2025 20:53
**Summary:** This commit adds a QAT quantizer that performs
float8 dynamic activation + int4 symmetric per channel weight
fake quantization. Note that there is no corresponding config
for float8 QAT yet. This will be added in a future PR.

**Test Plan:**
python test/quantization/test_qat.py -k test_float8_fake_quantize
python test/quantization/test_qat.py -k test_qat_fp8a4w_quantizer
@andrewor14 andrewor14 force-pushed the fp8-int4-qat-quantizer branch from 2a371fb to 656d17d Compare June 4, 2025 20:54
@andrewor14
Copy link
Contributor Author

Ok, I'm merging this. The latest commit doesn't use any float8 training classes or functions. The implementation is also hidden so we can always change this if needed. Please let me know if there are any follow-up issues or concerns @vkuzo @drisspg

@andrewor14 andrewor14 merged commit 0d9631b into main Jun 5, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature
4 participants