-
Notifications
You must be signed in to change notification settings - Fork 281
Implemented a new test case for LUT quantization #2342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2342
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 276d953 with merge base d72a6d1 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
||
|
||
std::pair<std::vector<int8_t>, std::vector<int8_t>> | ||
generate_simple_u_to_s_lut_and_indices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "simple_u_to_s" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "simple_u_to_s" refers to converting an unsigned index to a signed index. The example provided is a simplified scenario where we only need to map values to each slot. I will work on making the function name more descriptive and readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's come up with a better name, e.g., maybe "generate_random_int8_lut_and_indices" or something like that
* input data type (T_in) to its corresponding floating-point representation for | ||
* each quantization group. | ||
* | ||
* @tparam T_in The data type of the quantized values (e.g., int8_t). This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the T_in defined in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the implementation but forgot to update the function description. I will provide a new description.
* @param has_zeros A flag indicating if the zero-points should be used. | ||
* @return A flattened std::vector<float> containing all the group LUTs concatenated. | ||
*/ | ||
std::vector<int8_t> generate_requant_lut_from_params( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we returning int8_t type, and not float32_t for LUT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of test_channelwise_8bit_activation_groupwise_lowbit_weight_lut, the code requires an int8_t type to function correctly. I initially designed the implementation to serve as a second quantization step, hence the int8_t output. However, I agree that it should be more flexible to accommodate different data types for various scenarios. I attempted to address this in the code but overlooked this function. I will make the necessary adjustments.
const size_t lut_size_per_group = static_cast<size_t>(q_max) - q_min + 1; | ||
const int lut_index_offset = q_min; | ||
|
||
const int num_groups = zeros.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this when has_zeros=false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could lead to errors. I should not rely on the size of zeros when has_zeros is false.
for (int q_val = q_min; q_val <= q_max; ++q_val) { | ||
size_t lut_idx = group_idx * lut_size_per_group + (q_val - lut_index_offset); | ||
// The LUT stores the result of (quantized_value - zero_point) | ||
luts[lut_idx] = static_cast<int8_t>(q_val - zero_point); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the lut output dtype int8_t?
} | ||
|
||
// Helper to perform quantization | ||
static std::vector<T_in> quantize_input( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for?
Can't we just generate the random LUT for testing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can generate a random LUT for testing purposes. My initial design aimed to extend this function to accommodate more scenarios, such as secondary quantization. This would allow it to work in conjunction with other components. My goal was to ensure that it could be integrated and tested as part of a larger system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow how you're planning on using this. For LUT quantization, we won't be quantizing in this way.
We also shouldn't re-invent the quantization logic because we have this elsewhere in other tests, e.g., here: https://github.com/pytorch/ao/blob/main/torchao/experimental/kernels/cpu/aarch64/tests/test_utils.h#L110.
If you want to refactor that code to use non-aarch64 code, that'd be great, but I'm not sure it's related to the LUT project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks!
} | ||
|
||
// Helper to compute the reference output | ||
static std::vector<float> compute_expected_output( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this doing the matrix product of activations * dequantized_weights?
It looks like it's just dequantizing the weights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have overdesigned this part. My initial intention was to accommodate a more complex setup, such as integrating with other quantization methods or using different LUT designs. That's why I used the GroundTruthStrategy to set the dequantization strategies. It seems unnecessary in this context.
abdb958
to
276d953
Compare
No description provided.