Skip to content

feat: enable QASYMM8_SIGNEDF32 in CpuGemmAssemblyDispatch#1297

Open
alvoron wants to merge 1 commit into
ARM-software:mainfrom
alvoron:alvoron_qasymm8_signed_f32_dispatch
Open

feat: enable QASYMM8_SIGNEDF32 in CpuGemmAssemblyDispatch#1297
alvoron wants to merge 1 commit into
ARM-software:mainfrom
alvoron:alvoron_qasymm8_signed_f32_dispatch

Conversation

@alvoron

@alvoron alvoron commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Two gaps in CpuGemmAssemblyDispatch blocked QASYMM8_SIGNED input from validating with an F32 output tensor, even though the underlying arm_gemm kernel GemmInterleaved<int8_t, int8_t, float, DequantizeFloat> already supports this combination on AArch64.

Gap 1 - has_opt_impl: The QASYMM8_SIGNED branch only tested for S32 and S8 outputs. Passing F32 fell through to the S8S8 Requantize32 check, which fails because has_opt_gemm<int8_t, int8_t, int8_t, Requantize32> and has_opt_gemm<int8_t, int8_t, float, DequantizeFloat> are different instantiations.

Gap 2 - validate: There was no output-type guard for QASYMM8_SIGNED input at all. The equivalent guard for QASYMM8 explicitly allowed QASYMM8/S32/F32; QASYMM8_SIGNED had no such allowance, so F32 output reached downstream checks with no clear error.

Two gaps in the assembly dispatch layer prevented QASYMM8_SIGNED input
from producing F32 output:

1. has_opt_impl() had no branch for F32 output when input is S8/
   QASYMM8_SIGNED, causing spurious kernel-not-found errors.  Add a
   DequantizeFloat branch mirroring the existing S32 branch.

2. validate() rejected F32 output for QASYMM8_SIGNED input because it
   had no explicit allowance for that combination.  Add a guard that
   permits QASYMM8_SIGNED/S32/F32 as output types (matching the already-
   existing QASYMM8 guard).

3. AsmGemmInfo gains dequant_a_offset / dequant_b_offset fields so that
   callers can supply quantization zero-points to create_arm_gemm_dequant
   without touching existing callers.

Also fix the __aarch64_ typo in the DequantFP32_SupportedTypes test guard
so that the test now actually executes on AArch64 targets.

Signed-off-by: Aleksandr Voron <aleksandr.voron@intel.com>
@alvoron alvoron force-pushed the alvoron_qasymm8_signed_f32_dispatch branch from e55d95c to eec8389 Compare June 19, 2026 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant