aboutsummaryrefslogtreecommitdiffstats
path: root/lib/crypto
AgeCommit message (Collapse)AuthorFilesLines
2026-06-11lib/crypto: gf128hash: mark clmul32() as noinline_for_stackArnd Bergmann1-1/+1
During randconfig testing, I came across a lot of warnings for the newly added carryless multiplication function triggering excessive stack usage from spilling temporary variables to the stack: lib/crypto/gf128hash.c:166:1: error: stack frame size (1192) exceeds limit (1024) in 'polyval_mul_generic' [-Werror,-Wframe-larger-than] In addition to the possible risk of overflowing the kernel stack, the generated object code surely performs very poorly. This only happens on architectures that don't provide uint128_t (which should be all 32-bit architectures on modern compilers), but though I tested random x86 and arm configs, I only saw this with arm's CONFIG_THUMB2_KERNEL, which adds more pressure to the register allocator. The testing was done using clang-22, I don't know if gcc has the same problem. Marking clmul32() as noinline_for_stack experimentally shows all of the affected builds to completely solve the problem, reducing the stack usage to a few bytes as expected. Since u64 arithmetic frequently leads to compilers badly optimizing 32-bit targets, keeping clmul32 out of line is likely to help on other 32-bit configurations as well when they run into this problem, though it may also result in a small performance degradation in configurations that would benefit from inlining. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260611125952.3387258-1-arnd@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-05-09lib/crypto: powerpc/md5: Drop powerpc optimized MD5 codeEric Biggers5-269/+7
MD5 is obsolete, is vulnerable to collision attacks, and is being replaced by SHA-256 in new systems. It doesn't make sense to continue to maintain architecture-optimized implementations of MD5. Effort should be spent on modern algorithms. Indeed, architecture-optimized MD5 code remains only for powerpc. It was already removed from mips and sparc, and it never existed for any other architecture (e.g. x86, arm, or arm64) in the first place. Earlier the decision was made to keep the powerpc MD5 code for a while anyway because of someone using it via AF_ALG via libkcapi-hasher (https://lore.kernel.org/r/f0d771d5-ed70-444c-957a-ad4c16f6c115@csgroup.eu/) However, with AF_ALG itself now being on its way out due to its continuous stream of security vulnerabilities (https://lore.kernel.org/r/20260430011544.31823-1-ebiggers@kernel.org/), it's also time to be a bit more forceful with nudging people towards userspace crypto code. It's always been the better solution anyway, and it's much more efficient if properly optimized code is used. Note that the md5-asm.S file contains no privileged instructions and could be run in userspace just fine. Thus, we now have two factors going against keeping the powerpc MD5 code. Different people might weigh these two factors differently, but I think the two of them together make the removal the clear choice. Let's remove it. Acked-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://patch.msgid.link/20260506030005.9698-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-21Merge tag 'libcrypto-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull more crypto library updates from Eric Biggers: "Crypto library fix and documentation update: - Fix an integer underflow in the mpi library - Improve the crypto library documentation" * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: docs: Add rst documentation to Documentation/crypto/ docs: kdoc: Expand 'at_least' when creating parameter list lib/crypto: mpi: Fix integer underflow in mpi_read_raw_from_sgl()
2026-04-14lib/crypto: mpi: Fix integer underflow in mpi_read_raw_from_sgl()Lukas Wunner1-1/+1
Yiming reports an integer underflow in mpi_read_raw_from_sgl() when subtracting "lzeros" from the unsigned "nbytes". For this to happen, the scatterlist "sgl" needs to occupy more bytes than the "nbytes" parameter and the first "nbytes + 1" bytes of the scatterlist must be zero. Under these conditions, the while loop iterating over the scatterlist will count more zeroes than "nbytes", subtract the number of zeroes from "nbytes" and cause the underflow. When commit 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") originally introduced the bug, it couldn't be triggered because all callers of mpi_read_raw_from_sgl() passed a scatterlist whose length was equal to "nbytes". However since commit 63ba4d67594a ("KEYS: asymmetric: Use new crypto interface without scatterlists"), the underflow can now actually be triggered. When invoking a KEYCTL_PKEY_ENCRYPT system call with a larger "out_len" than "in_len" and filling the "in" buffer with zeroes, crypto_akcipher_sync_prep() will create an all-zero scatterlist used for both the "src" and "dst" member of struct akcipher_request and thereby fulfil the conditions to trigger the bug: sys_keyctl() keyctl_pkey_e_d_s() asymmetric_key_eds_op() software_key_eds_op() crypto_akcipher_sync_encrypt() crypto_akcipher_sync_prep() crypto_akcipher_encrypt() rsa_enc() mpi_read_raw_from_sgl() To the user this will be visible as a DoS as the kernel spins forever, causing soft lockup splats as a side effect. Fix it. Reported-by: Yiming Qian <yimingqian591@gmail.com> # off-list Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v4.4+ Reviewed-by: Ignat Korchagin <ignat@linux.win> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/r/59eca92ff4f87e2081777f1423a0efaaadcfdb39.1776003111.git.lukas@wunner.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-14Merge tag 'bitmap-for-v7.1' of https://github.com/norov/linuxLinus Torvalds1-4/+4
Pull bitmap updates from Yury Norov: - new API: bitmap_weight_from() and bitmap_weighted_xor() (Yury) - drop unused __find_nth_andnot_bit() (Yury) - new tests and test improvements (Andy, Akinobu, Yury) - fixes for count_zeroes API (Yury) - cleanup bitmap_print_to_pagebuf() mess (Yury) - documentation updates (Andy, Kai, Kit). * tag 'bitmap-for-v7.1' of https://github.com/norov/linux: (24 commits) bitops: Update kernel-doc for sign_extendXX() powerpc/xive: simplify xive_spapr_debug_show() thermal: intel: switch cpumask_get() to using cpumask_print_to_pagebuf() coresight: don't use bitmap_print_to_pagebuf() lib/prime_numbers: drop temporary buffer in dump_primes() drm/xe: switch xe_pagefault_queue_init() to using bitmap_weighted_or() ice: use bitmap_empty() in ice_vf_has_no_qs_ena ice: use bitmap_weighted_xor() in ice_find_free_recp_res_idx() bitmap: introduce bitmap_weighted_xor() bitmap: add test_zero_nbits() bitmap: exclude nbits == 0 cases from bitmap test bitmap: test bitmap_weight() for more asm-generic/bitops: Fix a comment typo in instrumented-atomic.h bitops: fix kernel-doc parameter name for parity8() lib: count_zeros: unify count_{leading,trailing}_zeros() lib: count_zeros: fix 32/64-bit inconsistency in count_trailing_zeros() lib: crypto: fix comments for count_leading_zeros() x86/topology: use bitmap_weight_from() bitmap: add bitmap_weight_from() lib/find_bit_benchmark: avoid clearing randomly filled bitmap in test_find_first_bit() ...
2026-04-13Merge tag 'libcrypto-for-linus' of ↵Linus Torvalds63-1502/+6776
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: - Migrate more hash algorithms from the traditional crypto subsystem to lib/crypto/ Like the algorithms migrated earlier (e.g. SHA-*), this simplifies the implementations, improves performance, enables further simplifications in calling code, and solves various other issues: - AES CBC-based MACs (AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC) - Support these algorithms in lib/crypto/ using the AES library and the existing arm64 assembly code - Reimplement the traditional crypto API's "cmac(aes)", "xcbc(aes)", and "cbcmac(aes)" on top of the library - Convert mac80211 to use the AES-CMAC library. Note: several other subsystems can use it too and will be converted later - Drop the broken, nonstandard, and likely unused support for "xcbc(aes)" with key lengths other than 128 bits - Enable optimizations by default - GHASH - Migrate the standalone GHASH code into lib/crypto/ - Integrate the GHASH code more closely with the very similar POLYVAL code, and improve the generic GHASH implementation to resist cache-timing attacks and use much less memory - Reimplement the AES-GCM library and the "gcm" crypto_aead template on top of the GHASH library. Remove "ghash" from the crypto_shash API, as it's no longer needed - Enable optimizations by default - SM3 - Migrate the kernel's existing SM3 code into lib/crypto/, and reimplement the traditional crypto API's "sm3" on top of it - I don't recommend using SM3, but this cleanup is worthwhile to organize the code the same way as other algorithms - Testing improvements: - Add a KUnit test suite for each of the new library APIs - Migrate the existing ChaCha20Poly1305 test to KUnit - Make the KUnit all_tests.config enable all crypto library tests - Move the test kconfig options to the Runtime Testing menu - Other updates to arch-optimized crypto code: - Optimize SHA-256 for Zhaoxin CPUs using the Padlock Hash Engine - Remove some MD5 implementations that are no longer worth keeping - Drop big endian and voluntary preemption support from the arm64 code, as those configurations are no longer supported on arm64 - Make jitterentropy and samples/tsm-mr use the crypto library APIs * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (66 commits) lib/crypto: arm64: Assume a little-endian kernel arm64: fpsimd: Remove obsolete cond_yield macro lib/crypto: arm64/sha3: Remove obsolete chunking logic lib/crypto: arm64/sha512: Remove obsolete chunking logic lib/crypto: arm64/sha256: Remove obsolete chunking logic lib/crypto: arm64/sha1: Remove obsolete chunking logic lib/crypto: arm64/poly1305: Remove obsolete chunking logic lib/crypto: arm64/gf128hash: Remove obsolete chunking logic lib/crypto: arm64/chacha: Remove obsolete chunking logic lib/crypto: arm64/aes: Remove obsolete chunking logic lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h> lib/crypto: aesgcm: Don't disable IRQs during AES block encryption lib/crypto: aescfb: Don't disable IRQs during AES block encryption lib/crypto: tests: Migrate ChaCha20Poly1305 self-test to KUnit lib/crypto: sparc: Drop optimized MD5 code lib/crypto: mips: Drop optimized MD5 code lib: Move crypto library tests to Runtime Testing menu crypto: sm3 - Remove 'struct sm3_state' crypto: sm3 - Remove the original "sm3_block_generic()" crypto: sm3 - Remove sm3_base.h ...
2026-04-01lib/crypto: arm64: Assume a little-endian kernelEric Biggers7-65/+36
Since support for big-endian arm64 kernels was removed, the CPU_LE() macro now unconditionally emits the code it is passed, and the CPU_BE() macro now unconditionally discards the code it is passed. Simplify the assembly code in lib/crypto/arm64/ accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401003331.144065-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01lib/crypto: arm64/sha3: Remove obsolete chunking logicEric Biggers2-16/+7
Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"), kernel-mode NEON sections have been preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further restrict the preemption modes"), voluntary preemption is no longer supported on arm64 either. Therefore, there's no longer any need to limit the length of kernel-mode NEON sections on arm64. Simplify the SHA-3 code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401000548.133151-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01lib/crypto: arm64/sha512: Remove obsolete chunking logicEric Biggers2-18/+9
Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"), kernel-mode NEON sections have been preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further restrict the preemption modes"), voluntary preemption is no longer supported on arm64 either. Therefore, there's no longer any need to limit the length of kernel-mode NEON sections on arm64. Simplify the SHA-512 code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401000548.133151-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01lib/crypto: arm64/sha256: Remove obsolete chunking logicEric Biggers2-30/+13
Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"), kernel-mode NEON sections have been preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further restrict the preemption modes"), voluntary preemption is no longer supported on arm64 either. Therefore, there's no longer any need to limit the length of kernel-mode NEON sections on arm64. Simplify the SHA-256 code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401000548.133151-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01lib/crypto: arm64/sha1: Remove obsolete chunking logicEric Biggers2-20/+9
Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"), kernel-mode NEON sections have been preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further restrict the preemption modes"), voluntary preemption is no longer supported on arm64 either. Therefore, there's no longer any need to limit the length of kernel-mode NEON sections on arm64. Simplify the SHA-1 code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401000548.133151-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01lib/crypto: arm64/poly1305: Remove obsolete chunking logicEric Biggers1-10/+4
Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"), kernel-mode NEON sections have been preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further restrict the preemption modes"), voluntary preemption is no longer supported on arm64 either. Therefore, there's no longer any need to limit the length of kernel-mode NEON sections on arm64. Simplify the Poly1305 code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401000548.133151-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01lib/crypto: arm64/gf128hash: Remove obsolete chunking logicEric Biggers1-20/+4
Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"), kernel-mode NEON sections have been preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further restrict the preemption modes"), voluntary preemption is no longer supported on arm64 either. Therefore, there's no longer any need to limit the length of kernel-mode NEON sections on arm64. Simplify the GHASH and POLYVAL code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401000548.133151-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01lib/crypto: arm64/chacha: Remove obsolete chunking logicEric Biggers1-12/+4
Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"), kernel-mode NEON sections have been preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further restrict the preemption modes"), voluntary preemption is no longer supported on arm64 either. Therefore, there's no longer any need to limit the length of kernel-mode NEON sections on arm64. Simplify the ChaCha code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401000548.133151-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-04-01lib/crypto: arm64/aes: Remove obsolete chunking logicEric Biggers2-27/+16
Since commit aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"), kernel-mode NEON sections have been preemptible on arm64. And since commit 7dadeaa6e851 ("sched: Further restrict the preemption modes"), voluntary preemption is no longer supported on arm64 either. Therefore, there's no longer any need to limit the length of kernel-mode NEON sections on arm64. Simplify the AES-CBC-MAC code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260401000548.133151-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-31lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>Eric Biggers3-4/+4
Since the lib/crypto/ files that include <crypto/algapi.h> need it only for the transitive inclusion of <crypto/utils.h> (and not all the traditional crypto API stuff that the rest of <crypto/algapi.h> is filled with), replace these inclusions with direct inclusions of <crypto/utils.h>. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260331024438.51783-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-31lib/crypto: aesgcm: Don't disable IRQs during AES block encryptionEric Biggers1-22/+3
aes_encrypt() now uses AES instructions when available instead of always using table-based code. AES instructions are constant-time and don't benefit from disabling IRQs as a constant-time hardening measure. In fact, on two architectures (arm and riscv) disabling IRQs is counterproductive because it prevents the AES instructions from being used. (See the may_use_simd() implementation on those architectures.) Therefore, let's remove the IRQ disabling/enabling and leave the choice of constant-time hardening measures to the AES library code. Note that currently the arm table-based AES code (which runs on arm kernels that don't have ARMv8 CE) disables IRQs, while the generic table-based AES code does not. So this does technically regress in constant-time hardening when that generic code is used. But as discussed in commit a22fd0e3c495 ("lib/crypto: aes: Introduce improved AES library") I think just leaving IRQs enabled is the right choice. Disabling them is slow and can cause problems, and AES instructions (which modern CPUs have) solve the problem in a much better way anyway. Link: https://lore.kernel.org/r/20260331024430.51755-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-31lib/crypto: aescfb: Don't disable IRQs during AES block encryptionEric Biggers1-22/+3
aes_encrypt() now uses AES instructions when available instead of always using table-based code. AES instructions are constant-time and don't benefit from disabling IRQs as a constant-time hardening measure. In fact, on two architectures (arm and riscv) disabling IRQs is counterproductive because it prevents the AES instructions from being used. (See the may_use_simd() implementation on those architectures.) Therefore, let's remove the IRQ disabling/enabling and leave the choice of constant-time hardening measures to the AES library code. Note that currently the arm table-based AES code (which runs on arm kernels that don't have ARMv8 CE) disables IRQs, while the generic table-based AES code does not. So this does technically regress in constant-time hardening when that generic code is used. But as discussed in commit a22fd0e3c495 ("lib/crypto: aes: Introduce improved AES library") I think just leaving IRQs enabled is the right choice. Disabling them is slow and can cause problems, and AES instructions (which modern CPUs have) solve the problem in a much better way anyway. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260331024414.51545-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-30lib/crypto: tests: Migrate ChaCha20Poly1305 self-test to KUnitEric Biggers6-760/+760
Move the ChaCha20Poly1305 test from an ad-hoc self-test to a KUnit test. Keep the same test logic for now, just translated to KUnit. Moving to KUnit has multiple benefits, such as: - Consistency with the rest of the lib/crypto/ tests. - Kernel developers familiar with KUnit, which is used kernel-wide, can quickly understand the test and how to enable and run it. - The test will be automatically run by anyone using lib/crypto/.kunitconfig or KUnit's all_tests.config. - Results are reported using the standard KUnit mechanism. - It eliminates one of the few remaining back-references to crypto/ from lib/crypto/, specifically a reference to CONFIG_CRYPTO_SELFTESTS. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260327224229.137532-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-30lib/crypto: sparc: Drop optimized MD5 codeEric Biggers4-120/+0
MD5 is obsolete. Continuing to maintain architecture-optimized implementations of MD5 is unnecessary and risky. It diverts resources from the modern algorithms that are actually important. While there was demand for continuing to maintain the PowerPC optimized MD5 code to accommodate userspace programs that are misusing AF_ALG (https://lore.kernel.org/linux-crypto/c4191597-341d-4fd7-bc3d-13daf7666c41@csgroup.eu/), no such demand has been seen for the SPARC optimized MD5 code. Thus, let's drop it and focus effort on the more modern SHA algorithms, which already have optimized code for SPARC. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260326203341.60393-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-30lib/crypto: mips: Drop optimized MD5 codeEric Biggers2-66/+0
MD5 is obsolete. Continuing to maintain architecture-optimized implementations of MD5 is unnecessary and risky. It diverts resources from the modern algorithms that are actually important. While there was demand for continuing to maintain the PowerPC optimized MD5 code to accommodate userspace programs that are misusing AF_ALG (https://lore.kernel.org/linux-crypto/c4191597-341d-4fd7-bc3d-13daf7666c41@csgroup.eu/), no such demand has been seen for the MIPS Cavium Octeon optimized MD5 code. Note that this code runs on only one particular line of SoCs. Thus, let's drop it and focus effort on the more modern SHA algorithms, which already have optimized code for the same SoCs. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260326204824.62010-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-27lib/crypto: chacha: Zeroize permuted_state before it leaves scopeEric Biggers1-0/+4
Since the ChaCha permutation is invertible, the local variable 'permuted_state' is sufficient to compute the original 'state', and thus the key, even after the permutation has been done. While the kernel is quite inconsistent about zeroizing secrets on the stack (and some prominent userspace crypto libraries don't bother at all since it's not guaranteed to work anyway), the kernel does try to do it as a best practice, especially in cases involving the RNG. Thus, explicitly zeroize 'permuted_state' before it goes out of scope. Fixes: c08d0e647305 ("crypto: chacha20 - Add a generic ChaCha20 stream cipher implementation") Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260326032920.39408-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib: Move crypto library tests to Runtime Testing menuEric Biggers1-6/+0
Currently the kconfig options for the crypto library KUnit tests appear in the menu: -> Library routines -> Crypto library routines However, this is the only content of "Crypto library routines". I.e., it is empty when CONFIG_KUNIT=n. This is because the crypto library routines themselves don't have (or need to have) prompts. Since this usually ends up as an unnecessary empty menu, let's remove this menu and instead source the lib/crypto/tests/Kconfig file from lib/Kconfig.debug inside the "Runtime Testing" menu: -> Kernel hacking -> Kernel Testing and Coverage -> Runtime Testing This puts the prompts alongside the ones for most of the other lib/ KUnit tests. This seems to be a much better match to how the kconfig menus are organized. Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20260322032438.286296-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23crypto: sm3 - Remove the original "sm3_block_generic()"Eric Biggers1-16/+3
Since the architecture-optimized SM3 code was migrated into lib/crypto/, sm3_block_generic() is no longer called. Remove it. Then, since this frees up the name, rename sm3_transform() to sm3_block_generic() (matching the naming convention used in other hash algorithms). Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260321040935.410034-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: x86/sm3: Migrate optimized code into libraryEric Biggers4-0/+557
Instead of exposing the x86-optimized SM3 code via an x86-specific crypto_shash algorithm, instead just implement the sm3_blocks() library function. This is much simpler, it makes the SM3 library functions be x86-optimized, and it fixes the longstanding issue where the x86-optimized SM3 code was disabled by default. SM3 still remains available through crypto_shash, but individual architectures no longer need to handle it. Tweak the prototype of sm3_transform_avx() to match what the library expects, including changing the block count to size_t. Note that the assembly code actually already treated this argument as size_t. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260321040935.410034-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: riscv/sm3: Migrate optimized code into libraryEric Biggers4-0/+166
Instead of exposing the riscv-optimized SM3 code via a riscv-specific crypto_shash algorithm, instead just implement the sm3_blocks() library function. This is much simpler, it makes the SM3 library functions be riscv-optimized, and it fixes the longstanding issue where the riscv-optimized SM3 code was disabled by default. SM3 still remains available through crypto_shash, but individual architectures no longer need to handle it. Tweak the prototype of sm3_transform_zvksh_zvkb() to match what the library expects, including changing the block count to size_t. Note that the assembly code already treated it as size_t. Note: to see the diff from arch/riscv/crypto/sm3-riscv64-glue.c to lib/crypto/riscv/sm3.h, view this commit with 'git show -M10'. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260321040935.410034-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: arm64/sm3: Migrate optimized code into libraryEric Biggers5-3/+790
Instead of exposing the arm64-optimized SM3 code via arm64-specific crypto_shash algorithms, instead just implement the sm3_blocks() library function. This is much simpler, it makes the SM3 library functions be arm64-optimized, and it fixes the longstanding issue where the arm64-optimized SM3 code was disabled by default. SM3 still remains available through crypto_shash, but individual architectures no longer need to handle it. Tweak the SM3 assembly function prototypes to match what the library expects, including changing the block count from 'int' to 'size_t'. sm3_ce_transform() had to be updated to access 'x2' instead of 'w2', while sm3_neon_transform() already used 'x2'. Remove the CFI stubs which are no longer needed because the SM3 assembly functions are no longer ever indirectly called. Remove the dependency on KERNEL_MODE_NEON. It was unnecessary, because KERNEL_MODE_NEON is always enabled on arm64. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260321040935.410034-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: tests: Add KUnit tests for SM3Eric Biggers5-0/+273
Add a KUnit test suite for the SM3 library. It closely mirrors the test suites for the other cryptographic hash functions. The actual test and benchmark logic is already in hash-test-template.h; this just wires it up for SM3 in the usual way. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260321040935.410034-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: sm3: Add SM3 library APIEric Biggers2-19/+143
Add a straightforward library API for SM3, mirroring the ones for the other hash algorithms. It uses the existing generic implementation of SM3's compression function in lib/crypto/sm3.c. Hooks are added for architecture-optimized implementations, which later commits will wire up to the existing optimized SM3 code for arm64, riscv, and x86. Note that the rationale for this is *not* that SM3 should be used, or that any kernel subsystem currently seems like a candidate for switching from the sm3 crypto_shash to SM3 library. (SM3, in fact, shouldn't be used. Likewise you shouldn't use MD5, SHA-1, RC4, etc...) Rather, it's just that this will simplify how the kernel's existing SM3 code is integrated and make it much easier to maintain and test. SM3 is one of the only hash algorithms with arch-optimized code that is still integrated in the old way. By converting it to the new lib/crypto/ code organization, we'll only have to keep track of one way of doing things. The library will also get a KUnit test suite (as usual for lib/crypto/), so it will become more easily and comprehensively tested as well. Skip adding functions for HMAC-SM3 for now, though. There's not as much point in adding those right now. Note: similar to the other hash algorithms, the library API uses 'struct sm3_ctx', not 'struct sm3_state'. The existing 'struct sm3_state' and the sm3_block_generic() function which uses it are temporarily kept around until their users are updated by later commits. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260321040935.410034-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: aesgcm: Use GHASH library APIEric Biggers2-28/+29
Make the AES-GCM library use the GHASH library instead of directly calling gf128mul_lle(). This allows the architecture-optimized GHASH implementations to be used, or the improved generic implementation if no architecture-optimized implementation is usable. Note: this means that <crypto/gcm.h> no longer needs to include <crypto/gf128mul.h>. Remove that inclusion, and include <crypto/gf128mul.h> explicitly from arch/x86/crypto/aesni-intel_glue.c which previously was relying on the transitive inclusion. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-20-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: gf128mul: Remove unused 4k_lle functionsEric Biggers1-72/+1
Remove the 4k_lle multiplication functions and the associated gf128mul_table_le data table. Their only user was the generic implementation of GHASH, which has now been changed to use a different implementation based on standard integer multiplication. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-18-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: x86/ghash: Migrate optimized code into libraryEric Biggers3-10/+185
Remove the "ghash-pclmulqdqni" crypto_shash algorithm. Move the corresponding assembly code into lib/crypto/, and wire it up to the GHASH library. This makes the GHASH library be optimized with x86's carryless multiplication instructions. It also greatly reduces the amount of x86-specific glue code that is needed, and it fixes the issue where this GHASH optimization was disabled by default. Rename and adjust the prototypes of the assembly functions to make them fit better with the library. Remove the byte-swaps (pshufb instructions) that are no longer necessary because the library keeps the accumulator in POLYVAL format rather than GHASH format. Rename clmul_ghash_mul() to polyval_mul_pclmul() to reflect that it really does a POLYVAL style multiplication. Wire it up to both ghash_mul_arch() and polyval_mul_arch(). Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-15-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: s390/ghash: Migrate optimized code into libraryEric Biggers2-0/+55
Remove the "ghash-s390" crypto_shash algorithm, and replace it with an implementation of ghash_blocks_arch() for the GHASH library. This makes the GHASH library be optimized with CPACF. It also greatly reduces the amount of s390-specific glue code that is needed, and it fixes the issue where this GHASH optimization was disabled by default. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-14-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: riscv/ghash: Migrate optimized code into libraryEric Biggers4-0/+133
Remove the "ghash-riscv64-zvkg" crypto_shash algorithm. Move the corresponding assembly code into lib/crypto/, modify it to take the length in blocks instead of bytes, and wire it up to the GHASH library. This makes the GHASH library be optimized with the RISC-V Vector Cryptography Extension. It also greatly reduces the amount of riscv-specific glue code that is needed, and it fixes the issue where this optimized GHASH code was disabled by default. Note that this RISC-V code has multiple opportunities for improvement, such as adding more parallelism, providing an optimized multiplication function, and directly supporting POLYVAL. But for now, this commit simply tweaks ghash_zvkg() slightly to make it compatible with the library, then wires it up to ghash_blocks_arch(). ghash_preparekey_arch() is also implemented to store the copy of the raw key needed by the vghsh.vv instruction. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: powerpc/ghash: Migrate optimized code into libraryEric Biggers5-5/+375
Remove the "p8_ghash" crypto_shash algorithm. Move the corresponding assembly code into lib/crypto/, and wire it up to the GHASH library. This makes the GHASH library be optimized for POWER8. It also greatly reduces the amount of powerpc-specific glue code that is needed, and it fixes the issue where this optimized GHASH code was disabled by default. Note that previously the C code defined the POWER8 GHASH key format as "u128 htable[16]", despite the assembly code only using four entries. Fix the C code to use the correct key format. To fulfill the library API contract, also make the key preparation work in all contexts. Note that the POWER8 assembly code takes the accumulator in GHASH format, but it actually byte-reflects it to get it into POLYVAL format. The library already works with POLYVAL natively. For now, just wire up this existing code by converting it to/from GHASH format in C code. This should be cleaned up to eliminate the unnecessary conversion later. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: arm64/ghash: Migrate optimized code into libraryEric Biggers3-8/+283
Remove the "ghash-neon" crypto_shash algorithm. Move the corresponding assembly code into lib/crypto/, and wire it up to the GHASH library. This makes the GHASH library be optimized on arm64 (though only with NEON, not PMULL; for now the goal is just parity with crypto_shash). It greatly reduces the amount of arm64-specific glue code that is needed, and it fixes the issue where this optimization was disabled by default. To integrate the assembly code correctly with the library, make the following tweaks: - Change the type of 'blocks' from int to size_t - Change the types of 'dg' and 'h' to polyval_elem. Note that this simply reflects the format that the code was already using. - Remove the 'head' argument, which is no longer needed. - Remove the CFI stubs, as indirect calls are no longer used. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: arm/ghash: Migrate optimized code into libraryEric Biggers4-0/+254
Remove the "ghash-neon" crypto_shash algorithm. Move the corresponding assembly code into lib/crypto/, and wire it up to the GHASH library. This makes the GHASH library be optimized on arm (though only with NEON, not PMULL; for now the goal is just parity with crypto_shash). It greatly reduces the amount of arm-specific glue code that is needed, and it fixes the issue where this optimization was disabled by default. To integrate the assembly code correctly with the library, make the following tweaks: - Change the type of 'blocks' from int to size_t. - Change the types of 'dg' and 'h' to polyval_elem. Note that this simply reflects the format that the code was already using, at least on little endian CPUs. For big endian CPUs, add byte-swaps. - Remove the 'head' argument, which is no longer needed. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: tests: Add KUnit tests for GHASHEric Biggers5-0/+390
Add a KUnit test suite for the GHASH library functions. It closely mirrors the POLYVAL test suite. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: gf128hash: Add GHASH supportEric Biggers1-13/+132
Add GHASH support to the gf128hash module. This will replace the GHASH support in the crypto_shash API. It will be used by the "gcm" template and by the AES-GCM library (when an arch-optimized implementation of the full AES-GCM is unavailable). This consists of a simple API that mirrors the existing POLYVAL API, a generic implementation of that API based on the existing efficient and side-channel-resistant polyval_mul_generic(), and the framework for architecture-optimized implementations of the GHASH functions. The GHASH accumulator is stored in POLYVAL format rather than GHASH format, since this is what most modern GHASH implementations actually need. The few implementations that expect the accumulator in GHASH format will just convert the accumulator to/from GHASH format temporarily. (Supporting architecture-specific accumulator formats would be possible, but doesn't seem worth the complexity.) However, architecture-specific formats of struct ghash_key will be supported, since a variety of formats will be needed there anyway. The default format is just the key in POLYVAL format. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: gf128hash: Support GF128HASH_ARCH without all POLYVAL functionsEric Biggers3-4/+18
Currently, some architectures (arm64 and x86) have optimized code for both GHASH and POLYVAL. Others (arm, powerpc, riscv, and s390) have optimized code only for GHASH. While POLYVAL support could be implemented on these other architectures, until then we need to support the case where arch-optimized functions are present only for GHASH. Therefore, update the support for arch-optimized POLYVAL functions to allow architectures to opt into supporting these functions individually. The new meaning of CONFIG_CRYPTO_LIB_GF128HASH_ARCH is that some level of GHASH and/or POLYVAL acceleration is provided. Also provide an implementation of polyval_mul() based on polyval_blocks_arch(), for when polyval_mul_arch() isn't implemented. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib/crypto: gf128hash: Rename polyval module to gf128hashEric Biggers7-42/+42
Currently, the standalone GHASH code is coupled with crypto_shash. This has resulted in unnecessary complexity and overhead, as well as the code being unavailable to library code such as the AES-GCM library. Like was done with POLYVAL, it needs to find a new home in lib/crypto/. GHASH and POLYVAL are closely related and can each be implemented in terms of each other. Optimized code for one can be reused with the other. But also since GHASH tends to be difficult to implement directly due to its unnatural bit order, most modern GHASH implementations (including the existing arm, arm64, powerpc, and x86 optimized GHASH code, and the new generic GHASH code I'll be adding) actually reinterpret the GHASH computation as an equivalent POLYVAL computation, pre and post-processing the inputs and outputs to map to/from POLYVAL. Given this close relationship, it makes sense to group the GHASH and POLYVAL code together in the same module. This gives us a wide range of options for implementing them, reusing code between the two and properly utilizing whatever instructions each architecture provides. Thus, GHASH support will be added to the library module that is currently called "polyval". Rename it to an appropriate name: "gf128hash". Rename files, options, functions, etc. where appropriate to reflect the upcoming sharing with GHASH. (Note: polyval_kunit is not renamed, as ghash_kunit will be added alongside it instead.) Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260319061723.1140720-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-23lib: crypto: fix comments for count_leading_zeros()Yury Norov1-4/+4
count_leading_zeros() is based on fls(), which is defined for x == 0, contrary to __ffs() family. The comment in crypto/mpi erroneously states that the function may return undef in such case. Fix the comment together with the outdated function signature, and now that COUNT_LEADING_ZEROS_0 is not referenced in the codebase, get rid of it too. Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Yury Norov <ynorov@nvidia.com>
2026-03-19lib/crypto: arm64: Drop checks for CONFIG_KERNEL_MODE_NEONEric Biggers5-39/+19
CONFIG_KERNEL_MODE_NEON is always enabled on arm64, and it always has been since its introduction in 2013. Given that and the fact that the usefulness of kernel-mode NEON has only been increasing over time, checking for this option in arm64-specific code is unnecessary. Remove these checks from lib/crypto/ to simplify the code and prevent any future bugs where e.g. code gets disabled due to a typo in this logic. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260314175049.26931-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-19lib/crypto: tests: Drop the default to CRYPTO_SELFTESTSEric Biggers1-13/+13
Defaulting the crypto KUnit tests to KUNIT_ALL_TESTS || CRYPTO_SELFTESTS instead of simply KUNIT_ALL_TESTS was originally intended to make it easy to enable all the crypto KUnit tests. This additional default is nonstandard for KUnit tests, though, and it can cause all the KUnit tests to be built-in unexpectedly if CRYPTO_SELFTESTS is set. It also constitutes a back-reference to crypto/ from lib/crypto/, which is something that we should be avoiding in order to get clean layering. Now that we provide a lib/crypto/.kunitconfig file that enables all crypto KUnit tests, let's consider that to be the supported way to enable all these tests, and drop the default of CRYPTO_SELFTESTS. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260317040626.5697-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-19lib/crypto: tests: Introduce CRYPTO_LIB_ENABLE_ALL_FOR_KUNITEric Biggers2-21/+25
For kunit.py to run all the crypto library tests when passed the --alltests option, tools/testing/kunit/configs/all_tests.config needs to enable options that satisfy the test dependencies. This is the same as what lib/crypto/.kunitconfig already does. However, the strategy that lib/crypto/.kunitconfig currently uses to select all the hidden library options isn't going to scale up well when it needs to be repeated in two places. Instead let's go ahead and introduce an option CRYPTO_LIB_ENABLE_ALL_FOR_KUNIT that depends on KUNIT and selects all the crypto library options that have corresponding KUnit tests. Update lib/crypto/.kunitconfig to use this option. Link: https://lore.kernel.org/r/20260314035927.51351-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-17lib/crypto: powerpc: Add powerpc/aesp8-ppc.S to clean-filesEric Biggers1-0/+3
Make the generated file powerpc/aesp8-ppc.S be removed by 'make clean'. Fixes: 7cf2082e74ce ("lib/crypto: powerpc/aes: Migrate POWER8 optimized code into library") Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260317044925.104184-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-14lib/crypto: x86/sha256: PHE Extensions optimized SHA256 transform functionAlanSong-oc1-0/+25
Zhaoxin CPUs have implemented the SHA(Secure Hash Algorithm) as its CPU instructions by PHE(Padlock Hash Engine) Extensions, including XSHA1, XSHA256, XSHA384 and XSHA512 instructions. The instruction specification is available at the following link. (https://gitee.com/openzhaoxin/zhaoxin_specifications/blob/20260227/ZX_Padlock_Reference.pdf) With the help of implementation of SHA in hardware instead of software, can develop applications with higher performance, more security and more flexibility. This patch includes the XSHA256 instruction optimized implementation of SHA-256 transform function. The table below shows the benchmark results before and after applying this patch by using CRYPTO_LIB_BENCHMARK on Zhaoxin KX-7000 platform, highlighting the achieved speedups. +---------+--------------------------+ | | SHA256 | +---------+--------+-----------------+ | Len | Before | After | +---------+--------+-----------------+ | 1* | 2 | 7 (3.50x) | | 16 | 35 | 119 (3.40x) | | 64 | 74 | 280 (3.78x) | | 127 | 99 | 387 (3.91x) | | 128 | 103 | 427 (4.15x) | | 200 | 123 | 537 (4.37x) | | 256 | 128 | 582 (4.55x) | | 511 | 144 | 679 (4.72x) | | 512 | 146 | 714 (4.89x) | | 1024 | 157 | 796 (5.07x) | | 3173 | 167 | 883 (5.28x) | | 4096 | 166 | 876 (5.28x) | | 16384 | 169 | 899 (5.32x) | +---------+--------+-----------------+ *: The length of each data block to be processed by one complete SHA sequence. **: The throughput of processing data blocks, unit is Mb/s. After applying this patch, the SHA256 KUnit test suite passes on Zhaoxin platforms. Detailed test logs are shown below. [ 7.767257] # Subtest: sha256 [ 7.770542] # module: sha256_kunit [ 7.770544] 1..15 [ 7.777383] ok 1 test_hash_test_vectors [ 7.788563] ok 2 test_hash_all_lens_up_to_4096 [ 7.806090] ok 3 test_hash_incremental_updates [ 7.813553] ok 4 test_hash_buffer_overruns [ 7.822384] ok 5 test_hash_overlaps [ 7.829388] ok 6 test_hash_alignment_consistency [ 7.833843] ok 7 test_hash_ctx_zeroization [ 7.915191] ok 8 test_hash_interrupt_context_1 [ 8.362312] ok 9 test_hash_interrupt_context_2 [ 8.401607] ok 10 test_hmac [ 8.415458] ok 11 test_sha256_finup_2x [ 8.419397] ok 12 test_sha256_finup_2x_defaultctx [ 8.424107] ok 13 test_sha256_finup_2x_hugelen [ 8.451289] # benchmark_hash: len=1: 7 MB/s [ 8.465372] # benchmark_hash: len=16: 119 MB/s [ 8.481760] # benchmark_hash: len=64: 280 MB/s [ 8.499344] # benchmark_hash: len=127: 387 MB/s [ 8.515800] # benchmark_hash: len=128: 427 MB/s [ 8.531970] # benchmark_hash: len=200: 537 MB/s [ 8.548241] # benchmark_hash: len=256: 582 MB/s [ 8.564838] # benchmark_hash: len=511: 679 MB/s [ 8.580872] # benchmark_hash: len=512: 714 MB/s [ 8.596858] # benchmark_hash: len=1024: 796 MB/s [ 8.612567] # benchmark_hash: len=3173: 883 MB/s [ 8.628546] # benchmark_hash: len=4096: 876 MB/s [ 8.644482] # benchmark_hash: len=16384: 899 MB/s [ 8.649773] ok 14 benchmark_hash [ 8.655505] ok 15 benchmark_sha256_finup_2x # SKIP not relevant [ 8.659065] # sha256: pass:14 fail:0 skip:1 total:15 [ 8.665276] # Totals: pass:14 fail:0 skip:1 total:15 [ 8.670195] ok 7 sha256 Signed-off-by: AlanSong-oc <AlanSong-oc@zhaoxin.com> Link: https://lore.kernel.org/r/20260313080150.9393-3-AlanSong-oc@zhaoxin.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09lib/crypto: aes: Add FIPS self-test for CMACEric Biggers2-3/+37
Add a FIPS cryptographic algorithm self-test for AES-CMAC to fulfill the self-test requirement when this code is built into a FIPS 140 cryptographic module. This provides parity with the traditional crypto API, which uses crypto/testmgr.c to meet the FIPS self-test requirement. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260218213501.136844-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09lib/crypto: tests: Add KUnit tests for CBC-based MACsEric Biggers5-0/+422
Add a KUnit test suite for the AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC library functions. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260218213501.136844-7-ebiggers@kernel.org Link: https://lore.kernel.org/r/20260306001917.24105-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09lib/crypto: arm64/aes: Migrate optimized CBC-based MACs into libraryEric Biggers2-11/+56
Instead of exposing the arm64-optimized CMAC, XCBC-MAC, and CBC-MAC code via arm64-specific crypto_shash algorithms, instead just implement the aes_cbcmac_blocks_arch() library function. This is much simpler, it makes the corresponding library functions be arm64-optimized, and it fixes the longstanding issue where this optimized code was disabled by default. The corresponding algorithms still remain available through crypto_shash, but individual architectures no longer need to handle it. Note that to be compatible with the library using 'size_t' lengths, the type of the return value and 'blocks' parameter to the assembly functions had to be changed to 'size_t', and the assembly code had to be updated accordingly to use the corresponding 64-bit registers. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260218213501.136844-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09lib/crypto: arm64/aes: Move assembly code for AES modes into libaesEric Biggers5-1/+1294
To migrate the support for CBC-based MACs into libaes, the corresponding arm64 assembly code needs to be moved there. However, the arm64 AES assembly code groups many AES modes together; individual modes aren't easily separable. (This isn't unique to arm64; other architectures organize their AES modes similarly.) Since the other AES modes will be migrated into the library eventually too, just move the full assembly files for the AES modes into the library. (This is similar to what I already did for PowerPC and SPARC.) Specifically: move the assembly files aes-ce.S, aes-modes.S, and aes-neon.S and their build rules; declare the assembly functions in <crypto/aes.h>; and export the assembly functions from libaes. Note that the exports and public declarations of the assembly functions are temporary. They exist only to keep arch/arm64/crypto/ working until the AES modes are fully moved into the library. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260218213501.136844-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-09lib/crypto: aes: Add support for CBC-based MACsEric Biggers2-0/+208
Add support for CBC-based MACs to the AES library, specifically AES-CMAC, AES-XCBC-MAC, and AES-CBC-MAC. Of these three algorithms, AES-CMAC is the most modern and the most commonly used. Use cases for the AES-CMAC library include the kernel's SMB client and server, and the bluetooth and mac80211 drivers. Support for AES-XCBC-MAC and AES-CBC-MAC is included so that there will be no performance regression in the "xcbc(aes)" and "ccm(aes)" support in the traditional crypto API once the arm64-optimized code is migrated into the library. AES-XCBC-MAC is given its own key preparation function but is otherwise identical to AES-CMAC and just reuses the AES-CMAC structs and functions. The implementation automatically uses the optimized AES key expansion and single block en/decryption functions. It also allows architectures to provide an optimized implementation of aes_cbcmac_blocks(), which allows the existing arm64-optimized code for these modes to be used. Just put the code for these modes directly in the libaes module rather than in a separate module. This is simpler, it makes it easier to share code between AES modes, and it increases the amount of inlining that is possible. (Indeed, for these reasons, most of the architecture-optimized AES code already provides multiple modes per module. x86 for example has only a single aesni-intel module. So to a large extent, this design choice just reflects the status quo.) However, since there are a lot of AES modes, there's still some value in omitting modes that are not needed at all in a given kernel. Therefore, make these modes an optional feature of libaes, controlled by CONFIG_CRYPTO_LIB_AES_CBC_MACS. This seems like a good middle ground. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260218213501.136844-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-03-02lib/crypto: tests: Add a .kunitconfig fileEric Biggers1-0/+34
Add a .kunitconfig file to the lib/crypto/ directory so that the crypto library tests can be run more easily using kunit.py. Example with UML: tools/testing/kunit/kunit.py run --kunitconfig=lib/crypto Example with QEMU: tools/testing/kunit/kunit.py run --kunitconfig=lib/crypto --arch=arm64 --make_options LLVM=1 Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260301040140.490310-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-28lib/crypto: tests: Depend on library options rather than selecting themEric Biggers1-23/+12
The convention for KUnit tests is to have the test kconfig options visible only when the code they depend on is already enabled. This way only the tests that are relevant to the particular kernel build can be enabled, either manually or via KUNIT_ALL_TESTS. Update lib/crypto/tests/Kconfig to follow that convention, i.e. depend on the corresponding library options rather than selecting them. This fixes an issue where enabling KUNIT_ALL_TESTS enabled non-test code. This does mean that it becomes a bit more difficult to enable *all* the crypto library tests (which is what I do as a maintainer of the code), since doing so will now require enabling other options that select the libraries. Regardless, we should follow the standard KUnit convention. I'll also add a .kunitconfig file that does enable all these options. Note: currently most of the crypto library options are selected by visible options in crypto/Kconfig, which can be used to enable them without too much trouble. If in the future we end up with more cases like CRYPTO_LIB_CURVE25519 which is selected only by WIREGUARD (thus making CRYPTO_LIB_CURVE25519_KUNIT_TEST effectively depend on WIREGUARD after this commit), we could consider adding a new kconfig option that enables all the library code specifically for testing. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/r/CAMuHMdULzMdxuTVfg8_4jdgzbzjfx-PHkcgbGSthcUx_sHRNMg@mail.gmail.com Fixes: 4dcf6caddaa0 ("lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256") Fixes: 571eaeddb67d ("lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512") Fixes: 6dd4d9f7919e ("lib/crypto: tests: Add KUnit tests for Poly1305") Fixes: 66b130607908 ("lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1") Fixes: d6b6aac0cdb4 ("lib/crypto: tests: Add KUnit tests for MD5 and HMAC-MD5") Fixes: afc4e4a5f122 ("lib/crypto: tests: Migrate Curve25519 self-test to KUnit") Fixes: 6401fd334ddf ("lib/crypto: tests: Add KUnit tests for BLAKE2b") Fixes: 15c64c47e484 ("lib/crypto: tests: Add SHA3 kunit tests") Fixes: b3aed551b3fc ("lib/crypto: tests: Add KUnit tests for POLYVAL") Fixes: ed894faccb8d ("lib/crypto: tests: Add KUnit tests for ML-DSA verification") Fixes: 7246fe6cd644 ("lib/crypto: tests: Add KUnit tests for NH") Cc: stable@vger.kernel.org Reviewed-by: David Gow <david@davidgow.net> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260226191749.39397-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-22Merge tag 'libcrypto-for-linus' of ↵Linus Torvalds1-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library fix from Eric Biggers: "Fix a big endian specific issue in the PPC64-optimized AES code" * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: powerpc/aes: Fix rndkey_from_vsx() on big endian CPUs
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds3-7/+7
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook3-7/+7
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-18lib/crypto: powerpc/aes: Fix rndkey_from_vsx() on big endian CPUsEric Biggers1-5/+7
I finally got a big endian PPC64 kernel to boot in QEMU. The PPC64 VSX optimized AES library code does work in that case, with the exception of rndkey_from_vsx() which doesn't take into account that the order in which the VSX code stores the round key words depends on the endianness. So fix rndkey_from_vsx() to do the right thing on big endian CPUs. Fixes: 7cf2082e74ce ("lib/crypto: powerpc/aes: Migrate POWER8 optimized code into library") Link: https://lore.kernel.org/r/20260216022104.332991-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-11Merge tag 'net-next-7.0' of ↵Linus Torvalds1-46/+17
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core & protocols: - A significant effort all around the stack to guide the compiler to make the right choice when inlining code, to avoid unneeded calls for small helper and stack canary overhead in the fast-path. This generates better and faster code with very small or no text size increases, as in many cases the call generated more code than the actual inlined helper. - Extend AccECN implementation so that is now functionally complete, also allow the user-space enabling it on a per network namespace basis. - Add support for memory providers with large (above 4K) rx buffer. Paired with hw-gro, larger rx buffer sizes reduce the number of buffers traversing the stack, dincreasing single stream CPU usage by up to ~30%. - Do not add HBH header to Big TCP GSO packets. This simplifies the RX path, the TX path and the NIC drivers, and is possible because user-space taps can now interpret correctly such packets without the HBH hint. - Allow IPv6 routes to be configured with a gateway address that is resolved out of a different interface than the one specified, aligning IPv6 to IPv4 behavior. - Multi-queue aware sch_cake. This makes it possible to scale the rate shaper of sch_cake across multiple CPUs, while still enforcing a single global rate on the interface. - Add support for the nbcon (new buffer console) infrastructure to netconsole, enabling lock-free, priority-based console operations that are safer in crash scenarios. - Improve the TCP ipv6 output path to cache the flow information, saving cpu cycles, reducing cache line misses and stack use. - Improve netfilter packet tracker to resolve clashes for most protocols, avoiding unneeded drops on rare occasions. - Add IP6IP6 tunneling acceleration to the flowtable infrastructure. - Reduce tcp socket size by one cache line. - Notify neighbour changes atomically, avoiding inconsistencies between the notification sequence and the actual states sequence. - Add vsock namespace support, allowing complete isolation of vsocks across different network namespaces. - Improve xsk generic performances with cache-alignment-oriented optimizations. - Support netconsole automatic target recovery, allowing netconsole to reestablish targets when underlying low-level interface comes back online. Driver API: - Support for switching the working mode (automatic vs manual) of a DPLL device via netlink. - Introduce PHY ports representation to expose multiple front-facing media ports over a single MAC. - Introduce "rx-polarity" and "tx-polarity" device tree properties, to generalize polarity inversion requirements for differential signaling. - Add helper to create, prepare and enable managed clocks. Device drivers: - Add Huawei hinic3 PF etherner driver. - Add DWMAC glue driver for Motorcomm YT6801 PCIe ethernet controller. - Add ethernet driver for MaxLinear MxL862xx switches - Remove parallel-port Ethernet driver. - Convert existing driver timestamp configuration reporting to hwtstamp_get and remove legacy ioctl(). - Convert existing drivers to .get_rx_ring_count(), simplifing the RX ring count retrieval. Also remove the legacy fallback path. - Ethernet high-speed NICs: - Broadcom (bnxt, bng): - bnxt: add FW interface update to support FEC stats histogram and NVRAM defragmentation - bng: add TSO and H/W GRO support - nVidia/Mellanox (mlx5): - improve latency of channel restart operations, reducing the used H/W resources - add TSO support for UDP over GRE over VLAN - add flow counters support for hardware steering (HWS) rules - use a static memory area to store headers for H/W GRO, leading to 12% RX tput improvement - Intel (100G, ice, idpf): - ice: reorganizes layout of Tx and Rx rings for cacheline locality and utilizes __cacheline_group* macros on the new layouts - ice: introduces Synchronous Ethernet (SyncE) support - Meta (fbnic): - adds debugfs for firmware mailbox and tx/rx rings vectors - Ethernet virtual: - geneve: introduce GRO/GSO support for double UDP encapsulation - Ethernet NICs consumer, and embedded: - Synopsys (stmmac): - some code refactoring and cleanups - RealTek (r8169): - add support for RTL8127ATF (10G Fiber SFP) - add dash and LTR support - Airoha: - AN8811HB 2.5 Gbps phy support - Freescale (fec): - add XDP zero-copy support - Thunderbolt: - add get link setting support to allow bonding - Renesas: - add support for RZ/G3L GBETH SoC - Ethernet switches: - Maxlinear: - support R(G)MII slow rate configuration - add support for Intel GSW150 - Motorcomm (yt921x): - add DCB/QoS support - TI: - icssm-prueth: support bridging (STP/RSTP) via the switchdev framework - Ethernet PHYs: - Realtek: - enable SGMII and 2500Base-X in-band auto-negotiation - simplify and reunify C22/C45 drivers - Micrel: convert bindings to DT schema - CAN: - move skb headroom content into skb extensions, making CAN metadata access more robust - CAN drivers: - rcar_canfd: - add support for FD-only mode - add support for the RZ/T2H SoC - sja1000: cleanup the CAN state handling - WiFi: - implement EPPKE/802.1X over auth frames support - split up drop reasons better, removing generic RX_DROP - additional FTM capabilities: 6 GHz support, supported number of spatial streams and supported number of LTF repetitions - better mac80211 iterators to enumerate resources - initial UHR (Wi-Fi 8) support for cfg80211/mac80211 - WiFi drivers: - Qualcomm/Atheros: - ath11k: support for Channel Frequency Response measurement - ath12k: a significant driver refactor to support multi-wiphy devices and and pave the way for future device support in the same driver (rather than splitting to ath13k) - ath12k: support for the QCC2072 chipset - Intel: - iwlwifi: partial Neighbor Awareness Networking (NAN) support - iwlwifi: initial support for U-NII-9 and IEEE 802.11bn - RealTek (rtw89): - preparations for RTL8922DE support - Bluetooth: - implement setsockopt(BT_PHY) to set the connection packet type/PHY - set link_policy on incoming ACL connections - Bluetooth drivers: - btusb: add support for MediaTek7920, Realtek RTL8761BU and 8851BE - btqca: add WCN6855 firmware priority selection feature" * tag 'net-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1254 commits) bnge/bng_re: Add a new HSI net: macb: Fix tx/rx malfunction after phy link down and up af_unix: Fix memleak of newsk in unix_stream_connect(). net: ti: icssg-prueth: Add optional dependency on HSR net: dsa: add basic initial driver for MxL862xx switches net: mdio: add unlocked mdiodev C45 bus accessors net: dsa: add tag format for MxL862xx switches dt-bindings: net: dsa: add MaxLinear MxL862xx selftests: drivers: net: hw: Modify toeplitz.c to poll for packets octeontx2-pf: Unregister devlink on probe failure net: renesas: rswitch: fix forwarding offload statemachine ionic: Rate limit unknown xcvr type messages tcp: inet6_csk_xmit() optimization tcp: populate inet->cork.fl.u.ip6 in tcp_v6_syn_recv_sock() tcp: populate inet->cork.fl.u.ip6 in tcp_v6_connect() ipv6: inet6_csk_xmit() and inet6_csk_update_pmtu() use inet->cork.fl.u.ip6 ipv6: use inet->cork.fl.u.ip6 and np->final in ip6_datagram_dst_update() ipv6: use np->final in inet6_sk_rebuild_header() ipv6: add daddr/final storage in struct ipv6_pinfo net: stmmac: qcom-ethqos: fix qcom_ethqos_serdes_powerup() ...
2026-02-03lib/crypto: mldsa: Clarify the documentation for mldsa_verify() slightlyEric Biggers1-1/+0
mldsa_verify() implements ML-DSA.Verify with ctx='', so document this more explicitly. Remove the one-liner comment above mldsa_verify() which was somewhat misleading. Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20260202221552.174341-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-27lib/crypto: sha1: Remove low-level functions from APIEric Biggers1-46/+17
Now that there are no users of the low-level SHA-1 interface, remove it. Specifically: - Remove SHA1_DIGEST_WORDS (no longer used) - Remove sha1_init_raw() (no longer used) - Rename sha1_transform() to sha1_block_generic() and make it static - Move SHA1_WORKSPACE_WORDS into lib/crypto/sha1.c Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20260123051656.396371-3-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-15lib/crypto: aes: Drop 'volatile' from aes_sbox and aes_inv_sboxEric Biggers1-7/+3
The volatile keyword is no longer necessary or useful on aes_sbox and aes_inv_sbox, since the table prefetching is now done using a helper function that casts to volatile itself and also includes an optimization barrier. Since it prevents some compiler optimizations, remove it. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-36-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15lib/crypto: aes: Remove old AES en/decryption functionsEric Biggers1-112/+6
Now that all callers of the aes_encrypt() and aes_decrypt() type-generic macros are using the new types, remove the old functions. Then, replace the macro with direct calls to the new functions, dropping the "_new" suffix from them. This completes the change in the type of the key struct that is passed to aes_encrypt() and aes_decrypt(). Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-35-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15lib/crypto: aesgcm: Use new AES library APIEric Biggers1-6/+6
Switch from the old AES library functions (which use struct crypto_aes_ctx) to the new ones (which use struct aes_enckey). This eliminates the unnecessary computation and caching of the decryption round keys. The new AES en/decryption functions are also much faster and use AES instructions when supported by the CPU. Note that in addition to the change in the key preparation function and the key struct type itself, the change in the type of the key struct results in aes_encrypt() (which is temporarily a type-generic macro) calling the new encryption function rather than the old one. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-34-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15lib/crypto: aescfb: Use new AES library APIEric Biggers1-15/+15
Switch from the old AES library functions (which use struct crypto_aes_ctx) to the new ones (which use struct aes_enckey). This eliminates the unnecessary computation and caching of the decryption round keys. The new AES en/decryption functions are also much faster and use AES instructions when supported by the CPU. Note that in addition to the change in the key preparation function and the key struct type itself, the change in the type of the key struct results in aes_encrypt() (which is temporarily a type-generic macro) calling the new encryption function rather than the old one. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-33-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15lib/crypto: x86/aes: Add AES-NI optimizationEric Biggers4-0/+348
Optimize the AES library with x86 AES-NI instructions. The relevant existing assembly functions, aesni_set_key(), aesni_enc(), and aesni_dec(), are a bit difficult to extract into the library: - They're coupled to the code for the AES modes. - They operate on struct crypto_aes_ctx. The AES library now uses different structs. - They assume the key is 16-byte aligned. The AES library only *prefers* 16-byte alignment; it doesn't require it. Moreover, they're not all that great in the first place: - They use unrolled loops, which isn't a great choice on x86. - They use the 'aeskeygenassist' instruction, which is unnecessary, is slow on Intel CPUs, and forces the loop to be unrolled. - They have special code for AES-192 key expansion, despite that being kind of useless. AES-128 and AES-256 are the ones used in practice. These are small functions anyway. Therefore, I opted to just write replacements of these functions for the library. They address all the above issues. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-18-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15lib/crypto: sparc/aes: Migrate optimized code into libraryEric Biggers4-0/+1694
Move the SPARC64 AES assembly code into lib/crypto/, wire the key expansion and single-block en/decryption functions up to the AES library API, and remove the "aes-sparc64" crypto_cipher algorithm. The result is that both the AES library and crypto_cipher APIs use the SPARC64 AES opcodes, whereas previously only crypto_cipher did (and it wasn't enabled by default, which this commit fixes as well). Note that some of the functions in the SPARC64 AES assembly code are still used by the AES mode implementations in arch/sparc/crypto/aes_glue.c. For now, just export these functions. These exports will go away once the AES mode implementations are migrated to the library as well. (Trying to split up the assembly file seemed like much more trouble than it would be worth.) Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-17-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-15lib/crypto: s390/aes: Migrate optimized code into libraryEric Biggers2-0/+107
Implement aes_preparekey_arch(), aes_encrypt_arch(), and aes_decrypt_arch() using the CPACF AES instructions. Then, remove the superseded "aes-s390" crypto_cipher. The result is that both the AES library and crypto_cipher APIs use the CPACF AES instructions, whereas previously only crypto_cipher did (and it wasn't enabled by default, which this commit fixes as well). Note that this preserves the optimization where the AES key is stored in raw form rather than expanded form. CPACF just takes the raw key. Acked-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Holger Dengler <dengler@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Link: https://lore.kernel.org/r/20260112192035.10427-16-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: riscv/aes: Migrate optimized code into libraryEric Biggers4-0/+150
Move the aes_encrypt_zvkned() and aes_decrypt_zvkned() assembly functions into lib/crypto/, wire them up to the AES library API, and remove the "aes-riscv64-zvkned" crypto_cipher algorithm. To make this possible, change the prototypes of these functions to take (rndkeys, key_len) instead of a pointer to crypto_aes_ctx, and change the RISC-V AES-XTS code to implement tweak encryption using the AES library instead of directly calling aes_encrypt_zvkned(). The result is that both the AES library and crypto_cipher APIs use RISC-V's AES instructions, whereas previously only crypto_cipher did (and it wasn't enabled by default, which this commit fixes as well). Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-15-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: powerpc/aes: Migrate POWER8 optimized code into libraryEric Biggers5-2/+4070
Move the POWER8 AES assembly code into lib/crypto/, wire the key expansion and single-block en/decryption functions up to the AES library API, and remove the superseded "p8_aes" crypto_cipher algorithm. The result is that both the AES library and crypto_cipher APIs are now optimized for POWER8, whereas previously only crypto_cipher was (and optimizations weren't enabled by default, which this commit fixes too). Note that many of the functions in the POWER8 assembly code are still used by the AES mode implementations in arch/powerpc/crypto/. For now, just export these functions. These exports will go away once the AES modes are migrated to the library as well. (Trying to split up the assembly file seemed like much more trouble than it would be worth.) Another challenge with this code is that the POWER8 assembly code uses a custom format for the expanded AES key. Since that code is imported from OpenSSL and is also targeted to POWER8 (rather than POWER9 which has better data movement and byteswap instructions), that is not easily changed. For now I've just kept the custom format. To maintain full correctness, this requires executing some slow fallback code in the case where the usability of VSX changes between key expansion and use. This should be tolerable, as this case shouldn't happen in practice. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-14-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: powerpc/aes: Migrate SPE optimized code into libraryEric Biggers8-0/+1696
Move the PowerPC SPE AES assembly code into lib/crypto/, wire the key expansion and single-block en/decryption functions up to the AES library API, and remove the superseded "aes-ppc-spe" crypto_cipher algorithm. The result is that both the AES library and crypto_cipher APIs are now optimized with SPE, whereas previously only crypto_cipher was (and optimizations weren't enabled by default, which this commit fixes too). Note that many of the functions in the PowerPC SPE assembly code are still used by the AES mode implementations in arch/powerpc/crypto/. For now, just export these functions. These exports will go away once the AES modes are migrated to the library as well. (Trying to split up the assembly files seemed like much more trouble than it would be worth.) Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: arm64/aes: Migrate optimized code into libraryEric Biggers5-0/+386
Move the ARM64 optimized AES key expansion and single-block AES en/decryption code into lib/crypto/, wire it up to the AES library API, and remove the superseded crypto_cipher algorithms. The result is that both the AES library and crypto_cipher APIs are now optimized for ARM64, whereas previously only crypto_cipher was (and the optimizations weren't enabled by default, which this fixes as well). Note: to see the diff from arch/arm64/crypto/aes-ce-glue.c to lib/crypto/arm64/aes.h, view this commit with 'git show -M10'. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: arm/aes: Migrate optimized code into libraryEric Biggers4-0/+261
Move the ARM optimized single-block AES en/decryption code into lib/crypto/, wire it up to the AES library API, and remove the superseded "aes-arm" crypto_cipher algorithm. The result is that both the AES library and crypto_cipher APIs are now optimized for ARM, whereas previously only crypto_cipher was (and the optimizations weren't enabled by default, which this fixes as well). Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: aes: Introduce improved AES libraryEric Biggers3-58/+354
The kernel's AES library currently has the following issues: - It doesn't take advantage of the architecture-optimized AES code, including the implementations using AES instructions. - It's much slower than even the other software AES implementations: 2-4 times slower than "aes-generic", "aes-arm", and "aes-arm64". - It requires that both the encryption and decryption round keys be computed and cached. This is wasteful for users that need only the forward (encryption) direction of the cipher: the key struct is 484 bytes when only 244 are actually needed. This missed optimization is very common, as many AES modes (e.g. GCM, CFB, CTR, CMAC, and even the tweak key in XTS) use the cipher only in the forward (encryption) direction even when doing decryption. - It doesn't provide the flexibility to customize the prepared key format. The API is defined to do key expansion, and several callers in drivers/crypto/ use it specifically to expand the key. This is an issue when integrating the existing powerpc, s390, and sparc code, which is necessary to provide full parity with the traditional API. To resolve these issues, I'm proposing the following changes: 1. New structs 'aes_key' and 'aes_enckey' are introduced, with corresponding functions aes_preparekey() and aes_prepareenckey(). Generally these structs will include the encryption+decryption round keys and the encryption round keys, respectively. However, the exact format will be under control of the architecture-specific AES code. (The verb "prepare" is chosen over "expand" since key expansion isn't necessarily done. It's also consistent with hmac*_preparekey().) 2. aes_encrypt() and aes_decrypt() will be changed to operate on the new structs instead of struct crypto_aes_ctx. 3. aes_encrypt() and aes_decrypt() will use architecture-optimized code when available, or else fall back to a new generic AES implementation that unifies the existing two fragmented generic AES implementations. The new generic AES implementation uses tables for both SubBytes and MixColumns, making it almost as fast as "aes-generic". However, instead of aes-generic's huge 8192-byte tables per direction, it uses only 1024 bytes for encryption and 1280 bytes for decryption (similar to "aes-arm"). The cost is just some extra rotations. The new generic AES implementation also includes table prefetching, making it have some "constant-time hardening". That's an improvement from aes-generic which has no constant-time hardening. It does slightly regress in constant-time hardening vs. the old lib/crypto/aes.c which had smaller tables, and from aes-fixed-time which disabled IRQs on top of that. But I think this is tolerable. The real solutions for constant-time AES are AES instructions or bit-slicing. The table-based code remains a best-effort fallback for the increasingly-rare case where a real solution is unavailable. 4. crypto_aes_ctx and aes_expandkey() will remain for now, but only for callers that are using them specifically for the AES key expansion (as opposed to en/decrypting data with the AES library). This commit begins the migration process by introducing the new structs and functions, backed by the new generic AES implementation. To allow callers to be incrementally converted, aes_encrypt() and aes_decrypt() are temporarily changed into macros that use a _Generic expression to call either the old functions (which take crypto_aes_ctx) or the new functions (which take the new types). Once all callers have been updated, these macros will go away, the old functions will be removed, and the "_new" suffix will be dropped from the new functions. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260112192035.10427-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: mldsa: Add FIPS cryptographic algorithm self-testEric Biggers2-0/+489
Since ML-DSA is FIPS-approved, add the boot-time self-test which is apparently required. Just add a test vector manually for now, borrowed from lib/crypto/tests/mldsa-testvecs.h (where in turn it's borrowed from leancrypto). The SHA-* FIPS test vectors are generated by scripts/crypto/gen-fips-testvecs.py instead, but the common Python libraries don't support ML-DSA yet. Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20260107044215.109930-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: nh: Restore dependency of arch code on !KMSANEric Biggers1-1/+1
Since the architecture-specific implementations of NH initialize memory in assembly code, they aren't compatible with KMSAN as-is. Fixes: 382de740759a ("lib/crypto: nh: Add NH library") Link: https://lore.kernel.org/r/20260105053652.1708299-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: md5: Use rol32() instead of open-coding itRusydi H. Makarim1-1/+1
For the bitwise left rotation in MD5STEP, use rol32() from <linux/bitops.h> instead of open-coding it. Signed-off-by: Rusydi H. Makarim <rusydi.makarim@kriptograf.id> Link: https://lore.kernel.org/r/20251214-rol32_in_md5-v1-1-20f5f11a92b2@kriptograf.id Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: x86/nh: Migrate optimized code into libraryEric Biggers5-0/+328
Migrate the x86_64 implementations of NH into lib/crypto/. This makes the nh() function be optimized on x86_64 kernels. Note: this temporarily makes the adiantum template not utilize the x86_64 optimized NH code. This is resolved in a later commit that converts the adiantum template to use nh() instead of "nhpoly1305". Link: https://lore.kernel.org/r/20251211011846.8179-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: arm64/nh: Migrate optimized code into libraryEric Biggers4-0/+139
Migrate the arm64 NEON implementation of NH into lib/crypto/. This makes the nh() function be optimized on arm64 kernels. Note: this temporarily makes the adiantum template not utilize the arm64 optimized NH code. This is resolved in a later commit that converts the adiantum template to use nh() instead of "nhpoly1305". Link: https://lore.kernel.org/r/20251211011846.8179-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: arm/nh: Migrate optimized code into libraryEric Biggers4-0/+151
Migrate the arm32 NEON implementation of NH into lib/crypto/. This makes the nh() function be optimized on arm32 kernels. Note: this temporarily makes the adiantum template not utilize the arm32 optimized NH code. This is resolved in a later commit that converts the adiantum template to use nh() instead of "nhpoly1305". Link: https://lore.kernel.org/r/20251211011846.8179-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: tests: Add KUnit tests for NHEric Biggers4-0/+350
Add some simple KUnit tests for the nh() function. These replace the test coverage which will be lost by removing the nhpoly1305 crypto_shash. Note that the NH code also continues to be tested indirectly as well, via the tests for the "adiantum(xchacha12,aes)" crypto_skcipher. Link: https://lore.kernel.org/r/20251211011846.8179-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: nh: Add NH libraryEric Biggers3-0/+100
Add support for the NH "almost-universal hash function" to lib/crypto/, specifically the variant of NH used in Adiantum. This will replace the need for the "nhpoly1305" crypto_shash algorithm. All the implementations of "nhpoly1305" use architecture-optimized code only for the NH stage; they just use the generic C Poly1305 code for the Poly1305 stage. We can achieve the same result in a simpler way using an (architecture-optimized) nh() function combined with code in crypto/adiantum.c that passes the results to the Poly1305 library. This commit begins this cleanup by adding the nh() function. The code is derived from crypto/nhpoly1305.c and include/crypto/nhpoly1305.h. Link: https://lore.kernel.org/r/20251211011846.8179-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: tests: Add KUnit tests for ML-DSA verificationEric Biggers4-0/+2335
Add a KUnit test suite for ML-DSA verification, including the following for each ML-DSA parameter set (ML-DSA-44, ML-DSA-65, and ML-DSA-87): - Positive test (valid signature), using vector imported from leancrypto - Various negative tests: - Wrong length for signature, message, or public key - Out-of-range coefficients in z vector - Invalid encoded hint vector - Any bit flipped in signature, message, or public key - Unit test for the internal function use_hint() - A benchmark ML-DSA inputs and outputs are very large. To keep the size of the tests down, use just one valid test vector per parameter set, and generate the negative tests at runtime by mutating the valid test vector. I also considered importing the test vectors from Wycheproof. I've tested that mldsa_verify() indeed passes all of Wycheproof's ML-DSA test vectors that use an empty context string. However, importing these permanently would add over 6 MB of source. That's too much to be a reasonable addition to the Linux kernel tree for one algorithm. It also wouldn't actually provide much better test coverage than this commit. Another potential issue is that Wycheproof uses the Apache license. Similarly, this also differs from the earlier proposal to import a long list of test vectors from leancrypto. I retained only one valid signature for each algorithm, and I also added (runtime-generated) negative tests which were missing. I think this is a better tradeoff. Reviewed-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20251214181712.29132-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-12lib/crypto: Add ML-DSA verification supportEric Biggers3-0/+664
Add support for verifying ML-DSA signatures. ML-DSA (Module-Lattice-Based Digital Signature Algorithm) is specified in FIPS 204 and is the standard version of Dilithium. Unlike RSA and elliptic-curve cryptography, ML-DSA is believed to be secure even against adversaries in possession of a large-scale quantum computer. Compared to the earlier patch (https://lore.kernel.org/r/20251117145606.2155773-3-dhowells@redhat.com/) that was based on "leancrypto", this implementation: - Is about 700 lines of source code instead of 4800. - Generates about 4 KB of object code instead of 28 KB. - Uses 9-13 KB of memory to verify a signature instead of 31-84 KB. - Is at least about the same speed, with a microbenchmark showing 3-5% improvements on one x86_64 CPU and -1% to 1% changes on another. When memory is a bottleneck, it's likely much faster. - Correctly implements the RejNTTPoly step of the algorithm. The API just consists of a single function mldsa_verify(), supporting pure ML-DSA with any standard parameter set (ML-DSA-44, ML-DSA-65, or ML-DSA-87) as selected by an enum. That's all that's actually needed. The following four potential features are unneeded and aren't included. However, any that ever become needed could fairly easily be added later, as they only affect how the message representative mu is calculated: - Nonempty context strings - Incremental message hashing - HashML-DSA - External mu Signing support would, of course, be a larger and more complex addition. However, the kernel doesn't, and shouldn't, need ML-DSA signing support. Note that mldsa_verify() allocates memory, so it can sleep and can fail with ENOMEM. Unfortunately we don't have much choice about that, since ML-DSA needs a lot of memory. At least callers have to check for errors anyway, since the signature could be invalid. Note that verification doesn't require constant-time code, and in fact some steps are inherently variable-time. I've used constant-time patterns in some places anyway, but technically they're not needed. Reviewed-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20251214181712.29132-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-08lib/crypto: aes: Fix missing MMU protection for AES S-boxEric Biggers1-2/+2
__cacheline_aligned puts the data in the ".data..cacheline_aligned" section, which isn't marked read-only i.e. it doesn't receive MMU protection. Replace it with ____cacheline_aligned which does the right thing and just aligns the data while keeping it in ".rodata". Fixes: b5e0b032b6c3 ("crypto: aes - add generic time invariant AES cipher") Cc: stable@vger.kernel.org Reported-by: Qingfang Deng <dqfext@gmail.com> Closes: https://lore.kernel.org/r/20260105074712.498-1-dqfext@gmail.com/ Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20260107052023.174620-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-08lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQsThomas Weißschuh1-1/+1
On my development machine the generic, memcpy()-only implementation of polyval_preparekey() is too fast for the IRQ workers to actually fire. The test fails. Increase the iterations to make the test more robust. The test will run for a maximum of one second in any case. [EB: This failure was already fixed by commit c31f4aa8fed0 ("kunit: Enforce task execution in {soft,hard}irq contexts"). I'm still applying this patch too, since the iteration count in this test made its running time much shorter than the other similar ones.] Fixes: b3aed551b3fc ("lib/crypto: tests: Add KUnit tests for POLYVAL") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://lore.kernel.org/r/20260102-kunit-polyval-fix-v1-1-5313b5a65f35@linutronix.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-14lib/crypto: riscv: Add poly1305-core.S to .gitignoreCharles Mirabile1-0/+2
poly1305-core.S is an auto-generated file, so it should be ignored. Fixes: bef9c7559869 ("lib/crypto: riscv/poly1305: Import OpenSSL/CRYPTOGAMS implementation") Cc: stable@vger.kernel.org Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Link: https://lore.kernel.org/r/20251212184717.133701-1-cmirabil@redhat.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09lib/crypto: blake2s: Replace manual unrolling with unrolled_fullEric Biggers1-22/+16
As we're doing in the BLAKE2b code, use unrolled_full to make the compiler handle the loop unrolling. This simplifies the code slightly. The generated object code is nearly the same with both gcc and clang. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251205051155.25274-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09lib/crypto: blake2b: Roll up BLAKE2b round loop on 32-bitEric Biggers2-25/+20
BLAKE2b has a state of 16 64-bit words. Add the message data in and there are 32 64-bit words. With the current code where all the rounds are unrolled to enable constant-folding of the blake2b_sigma values, this results in a very large code size on 32-bit kernels, including a recurring issue where gcc uses a large amount of stack. There's just not much benefit to this unrolling when the code is already so large. Let's roll up the rounds when !CONFIG_64BIT. To avoid having to duplicate the code, just write the code once using a loop, and conditionally use 'unrolled_full' from <linux/unroll.h>. Then, fold the now-unneeded ROUND() macro into the loop. Finally, also remove the now-unneeded override of the stack frame size warning. Code size improvements for blake2b_compress_generic(): Size before (bytes) Size after (bytes) ------------------- ------------------ i386, gcc 27584 3632 i386, clang 18208 3248 arm32, gcc 19912 2860 arm32, clang 21336 3344 Running the BLAKE2b benchmark on a !CONFIG_64BIT kernel on an x86_64 processor shows a 16384B throughput change of 351 => 340 MB/s (gcc) or 442 MB/s => 375 MB/s (clang). So clearly not much of a slowdown either. But also that microbenchmark also effectively disregards cache usage, which is important in practice and is far better in the smaller code. Note: If we rolled up the loop on x86_64 too, the change would be 7024 bytes => 1584 bytes and 1960 MB/s => 1396 MB/s (gcc), or 6848 bytes => 1696 bytes and 1920 MB/s => 1263 MB/s (clang). Maybe still worth it, though not quite as clearly beneficial. Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation") Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251205050330.89704-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09lib/crypto: riscv: Depend on RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESSEric Biggers1-3/+6
Replace the RISCV_ISA_V dependency of the RISC-V crypto code with RISCV_EFFICIENT_VECTOR_UNALIGNED_ACCESS, which implies RISCV_ISA_V as well as vector unaligned accesses being efficient. This is necessary because this code assumes that vector unaligned accesses are supported and are efficient. (It does so to avoid having to use lots of extra vsetvli instructions to switch the element width back and forth between 8 and either 32 or 64.) This was omitted from the code originally just because the RISC-V kernel support for detecting this feature didn't exist yet. Support has now been added, but it's fragmented into per-CPU runtime detection, a command-line parameter, and a kconfig option. The kconfig option is the only reasonable way to do it, though, so let's just rely on that. Fixes: eb24af5d7a05 ("crypto: riscv - add vector crypto accelerated AES-{ECB,CBC,CTR,XTS}") Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20") Fixes: 600a3853dfa0 ("crypto: riscv - add vector crypto accelerated GHASH") Fixes: 8c8e40470ffe ("crypto: riscv - add vector crypto accelerated SHA-{256,224}") Fixes: b3415925a08b ("crypto: riscv - add vector crypto accelerated SHA-{512,384}") Fixes: 563a5255afa2 ("crypto: riscv - add vector crypto accelerated SM3") Fixes: b8d06352bbf3 ("crypto: riscv - add vector crypto accelerated SM4") Cc: stable@vger.kernel.org Reported-by: Vivian Wang <wangruikang@iscas.ac.cn> Closes: https://lore.kernel.org/r/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/ Reviewed-by: Jerry Shih <jerry.shih@sifive.com> Link: https://lore.kernel.org/r/20251206213750.81474-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-09lib/crypto: riscv/chacha: Avoid s0/fp registerVivian Wang1-3/+2
In chacha_zvkb, avoid using the s0 register, which is the frame pointer, by reallocating KEY0 to t5. This makes stack traces available if e.g. a crash happens in chacha_zvkb. No frame pointer maintenance is otherwise required since this is a leaf function. Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Fixes: bb54668837a0 ("crypto: riscv - add vector crypto accelerated ChaCha20") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20251202-riscv-chacha_zvkb-fp-v2-1-7bd00098c9dc@iscas.ac.cn Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-03Merge tag 'v6.19-p1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Rewrite memcpy_sglist from scratch - Add on-stack AEAD request allocation - Fix partial block processing in ahash Algorithms: - Remove ansi_cprng - Remove tcrypt tests for poly1305 - Fix EINPROGRESS processing in authenc - Fix double-free in zstd Drivers: - Use drbg ctr helper when reseeding xilinx-trng - Add support for PCI device 0x115A to ccp - Add support of paes in caam - Add support for aes-xts in dthev2 Others: - Use likely in rhashtable lookup - Fix lockdep false-positive in padata by removing a helper" * tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits) crypto: zstd - fix double-free in per-CPU stream cleanup crypto: ahash - Zero positive err value in ahash_update_finish crypto: ahash - Fix crypto_ahash_import with partial block data crypto: lib/mpi - use min() instead of min_t() crypto: ccp - use min() instead of min_t() hwrng: core - use min3() instead of nested min_t() crypto: aesni - ctr_crypt() use min() instead of min_t() crypto: drbg - Delete unused ctx from struct sdesc crypto: testmgr - Add missing DES weak and semi-weak key tests Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist" crypto: scatterwalk - Fix memcpy_sglist() to always succeed crypto: iaa - Request to add Kanchana P Sridhar to Maintainers. crypto: tcrypt - Remove unused poly1305 support crypto: ansi_cprng - Remove unused ansi_cprng algorithm crypto: asymmetric_keys - fix uninitialized pointers with free attribute KEYS: Avoid -Wflex-array-member-not-at-end warning crypto: ccree - Correctly handle return of sg_nents_for_len crypto: starfive - Correctly handle return of sg_nents_for_len crypto: iaa - Fix incorrect return value in save_iaa_wq() crypto: zstd - Remove unnecessary size_t cast ...
2025-12-02Merge tag 'fpsimd-on-stack-for-linus' of ↵Linus Torvalds14-84/+61
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers: "This is a core arm64 change. However, I was asked to take this because most uses of kernel-mode FPSIMD are in crypto or CRC code. In v6.8, the size of task_struct on arm64 increased by 528 bytes due to the new 'kernel_fpsimd_state' field. This field was added to allow kernel-mode FPSIMD code to be preempted. Unfortunately, 528 bytes is kind of a lot for task_struct. This regression in the task_struct size was noticed and reported. Recover that space by making this state be allocated on the stack at the beginning of each kernel-mode FPSIMD section. To make it easier for all the users of kernel-mode FPSIMD to do that correctly, introduce and use a 'scoped_ksimd' abstraction" * tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits) lib/crypto: arm64: Move remaining algorithms to scoped ksimd API lib/crypto: arm/blake2b: Move to scoped ksimd API arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack arm64/fpu: Enforce task-context only for generic kernel mode FPU net/mlx5: Switch to more abstract scoped ksimd guard API on arm64 arm64/xorblocks: Switch to 'ksimd' scoped guard API crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API crypto/arm64: polyval - Switch to 'ksimd' scoped guard API crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API raid6: Move to more abstract 'ksimd' guard API crypto: aegis128-neon - Move to more abstract 'ksimd' guard API crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API ...
2025-12-02Merge tag 'libcrypto-at-least-for-linus' of ↵Linus Torvalds1-9/+9
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull 'at_least' array size update from Eric Biggers: "C supports lower bounds on the sizes of array parameters, using the static keyword as follows: 'void f(int a[static 32]);'. This allows the compiler to warn about a too-small array being passed. As discussed, this reuse of the 'static' keyword, while standard, is a bit obscure. Therefore, add an alias 'at_least' to compiler_types.h. Then, add this 'at_least' annotation to the array parameters of various crypto library functions" * tag 'libcrypto-at-least-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: sha2: Add at_least decoration to fixed-size array params lib/crypto: sha1: Add at_least decoration to fixed-size array params lib/crypto: poly1305: Add at_least decoration to fixed-size array params lib/crypto: md5: Add at_least decoration to fixed-size array params lib/crypto: curve25519: Add at_least decoration to fixed-size array params lib/crypto: chacha: Add at_least decoration to fixed-size array params lib/crypto: chacha20poly1305: Statically check fixed array lengths compiler_types: introduce at_least parameter decoration pseudo keyword wifi: iwlwifi: trans: rename at_least variable to min_mode
2025-12-02Merge tag 'libcrypto-tests-for-linus' of ↵Linus Torvalds8-0/+1587
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library test updates from Eric Biggers: - Add KUnit test suites for SHA-3, BLAKE2b, and POLYVAL. These are the algorithms that have new crypto library interfaces this cycle. - Remove the crypto_shash POLYVAL tests. They're no longer needed because POLYVAL support was removed from crypto_shash. Better POLYVAL test coverage is now provided via the KUnit test suite. * tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: crypto: testmgr - Remove polyval tests lib/crypto: tests: Add KUnit tests for POLYVAL lib/crypto: tests: Add additional SHAKE tests lib/crypto: tests: Add SHA3 kunit tests lib/crypto: tests: Add KUnit tests for BLAKE2b
2025-12-02Merge tag 'libcrypto-updates-for-linus' of ↵Linus Torvalds30-208/+2959
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: "This is the main crypto library pull request for 6.19. It includes: - Add SHA-3 support to lib/crypto/, including support for both the hash functions and the extendable-output functions. Reimplement the existing SHA-3 crypto_shash support on top of the library. This is motivated mainly by the upcoming support for the ML-DSA signature algorithm, which needs the SHAKE128 and SHAKE256 functions. But even on its own it's a useful cleanup. This also fixes the longstanding issue where the architecture-optimized SHA-3 code was disabled by default. - Add BLAKE2b support to lib/crypto/, and reimplement the existing BLAKE2b crypto_shash support on top of the library. This is motivated mainly by btrfs, which supports BLAKE2b checksums. With this change, all btrfs checksum algorithms now have library APIs. btrfs is planned to start just using the library directly. This refactor also improves consistency between the BLAKE2b code and BLAKE2s code. And as usual, it also fixes the issue where the architecture-optimized BLAKE2b code was disabled by default. - Add POLYVAL support to lib/crypto/, replacing the existing POLYVAL support in crypto_shash. Reimplement HCTR2 on top of the library. This simplifies the code and improves HCTR2 performance. As usual, it also makes the architecture-optimized code be enabled by default. The generic implementation of POLYVAL is greatly improved as well. - Clean up the BLAKE2s code - Add FIPS self-tests for SHA-1, SHA-2, and SHA-3" * tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (37 commits) fscrypt: Drop obsolete recommendation to enable optimized POLYVAL crypto: polyval - Remove the polyval crypto_shash crypto: hctr2 - Convert to use POLYVAL library lib/crypto: x86/polyval: Migrate optimized code into library lib/crypto: arm64/polyval: Migrate optimized code into library lib/crypto: polyval: Add POLYVAL library crypto: polyval - Rename conflicting functions lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORs lib/crypto: x86/blake2s: Avoid writing back unchanged 'f' value lib/crypto: x86/blake2s: Improve readability lib/crypto: x86/blake2s: Use local labels for data lib/crypto: x86/blake2s: Drop check for nblocks == 0 lib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bit lib/crypto: arm, arm64: Drop filenames from file comments lib/crypto: arm/blake2s: Fix some comments crypto: s390/sha3 - Remove superseded SHA-3 code crypto: sha3 - Reimplement using library API crypto: jitterentropy - Use default sha3 implementation lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functions lib/crypto: sha3: Support arch overrides of one-shot digest functions ...
2025-11-24crypto: lib/mpi - use min() instead of min_t()David Laight1-1/+1
min_t(unsigned int, a, b) casts an 'unsigned long' to 'unsigned int'. Use min(a, b) instead as it promotes any 'unsigned int' to 'unsigned long' and so cannot discard significant bits. In this case the 'unsigned long' value is small enough that the result is ok. Detected by an extra check added to min_t(). Signed-off-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-11-23lib/crypto: chacha20poly1305: Statically check fixed array lengthsJason A. Donenfeld1-9/+9
Several parameters of the chacha20poly1305 functions require arrays of an exact length. Use the new at_least keyword to instruct gcc and clang to statically check that the caller is passing an object of at least that length. Here it is in action, with this faulty patch to wireguard's cookie.h: struct cookie_checker { u8 secret[NOISE_HASH_LEN]; - u8 cookie_encryption_key[NOISE_SYMMETRIC_KEY_LEN]; + u8 cookie_encryption_key[NOISE_SYMMETRIC_KEY_LEN - 1]; u8 message_mac1_key[NOISE_SYMMETRIC_KEY_LEN]; If I try compiling this code, I get this helpful warning: CC drivers/net/wireguard/cookie.o drivers/net/wireguard/cookie.c: In function ‘wg_cookie_message_create’: drivers/net/wireguard/cookie.c:193:9: warning: ‘xchacha20poly1305_encrypt’ reading 32 bytes from a region of size 31 [-Wstringop-overread] 193 | xchacha20poly1305_encrypt(dst->encrypted_cookie, cookie, COOKIE_LEN, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 194 | macs->mac1, COOKIE_LEN, dst->nonce, | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 195 | checker->cookie_encryption_key); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/wireguard/cookie.c:193:9: note: referencing argument 7 of type ‘const u8 *’ {aka ‘const unsigned char *’} In file included from drivers/net/wireguard/messages.h:10, from drivers/net/wireguard/cookie.h:9, from drivers/net/wireguard/cookie.c:6: include/crypto/chacha20poly1305.h:28:6: note: in a call to function ‘xchacha20poly1305_encrypt’ 28 | void xchacha20poly1305_encrypt(u8 *dst, const u8 *src, const size_t src_len, Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: "Jason A. Donenfeld" <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20251123054819.2371989-4-Jason@zx2c4.com Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-21lib/crypto: tests: Fix KMSAN warning in test_sha256_finup_2x()Eric Biggers1-0/+1
Fully initialize *ctx, including the buf field which sha256_init() doesn't initialize, to avoid a KMSAN warning when comparing *ctx to orig_ctx. This KMSAN warning slipped in while KMSAN was not working reliably due to a stackdepot bug, which has now been fixed. Fixes: 6733968be7cb ("lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()") Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251121033431.34406-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12lib/crypto: arm64: Move remaining algorithms to scoped ksimd APIArd Biesheuvel2-21/+16
Move the arm64 implementations of SHA-3 and POLYVAL to the newly introduced scoped ksimd API, which replaces kernel_neon_begin() and kernel_neon_end(). On arm64, this is needed because the latter API will change in an incompatible manner. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12lib/crypto: arm/blake2b: Move to scoped ksimd APIArd Biesheuvel1-3/+2
Even though ARM's versions of kernel_neon_begin()/_end() are not being changed, update the newly migrated ARM blake2b to the scoped ksimd API so that all ARM and arm64 in lib/crypto remains consistent in this manner. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12Merge tag 'scoped-ksimd-for-arm-arm64' into libcrypto-fpsimd-on-stackEric Biggers11-60/+43
Pull scoped ksimd API for ARM and arm64 from Ard Biesheuvel: "Introduce a more strict replacement API for kernel_neon_begin()/kernel_neon_end() on both ARM and arm64, and replace occurrences of the latter pair appearing in lib/crypto" Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-12lib/crypto: Switch ARM and arm64 to 'ksimd' scoped guard APIArd Biesheuvel11-60/+43
Before modifying the prototypes of kernel_neon_begin() and kernel_neon_end() to accommodate kernel mode FP/SIMD state buffers allocated on the stack, move arm64 to the new 'ksimd' scoped guard API, which encapsulates the calls to those functions. For symmetry, do the same for 32-bit ARM too. Reviewed-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-11-11lib/crypto: tests: Add KUnit tests for POLYVALEric Biggers4-0/+419
Add a test suite for the POLYVAL library, including: - All the standard tests and the benchmark from hash-test-template.h - Comparison with a test vector from the RFC - Test with key and message containing all one bits - Additional tests related to the key struct Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251109234726.638437-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11lib/crypto: tests: Add additional SHAKE testsEric Biggers2-51/+147
Add the following test cases to cover gaps in the SHAKE testing: - test_shake_all_lens_up_to_4096() - test_shake_multiple_squeezes() - test_shake_with_guarded_bufs() Remove test_shake256_tiling() and test_shake256_tiling2() since they are superseded by test_shake_multiple_squeezes(). It provides better test coverage by using randomized testing. E.g., it's able to generate a zero-length squeeze followed by a nonzero-length squeeze, which the first 7 versions of the SHA-3 patchset handled incorrectly. Tested-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251026055032.1413733-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11lib/crypto: tests: Add SHA3 kunit testsDavid Howells4-0/+587
Add a SHA3 kunit test suite, providing the following: (*) A simple test of each of SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE128 and SHAKE256. (*) NIST 0- and 1600-bit test vectors for SHAKE128 and SHAKE256. (*) Output tiling (multiple squeezing) tests for SHAKE256. (*) Standard hash template test for SHA3-256. To make this possible, gen-hash-testvecs.py is modified to support sha3-256. (*) Standard benchmark test for SHA3-256. [EB: dropped some unnecessary changes to gen-hash-testvecs.py, moved addition of Testing section in doc file into this commit, and other small cleanups] Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20251026055032.1413733-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11lib/crypto: tests: Add KUnit tests for BLAKE2bEric Biggers4-0/+485
Add a KUnit test suite for the BLAKE2b library API, mirroring the BLAKE2s test suite very closely. As with the BLAKE2s test suite, a benchmark is included. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11lib/crypto: x86/polyval: Migrate optimized code into libraryEric Biggers4-0/+404
Migrate the x86_64 implementation of POLYVAL into lib/crypto/, wiring it up to the POLYVAL library interface. This makes the POLYVAL library be properly optimized on x86_64. This drops the x86_64 optimizations of polyval in the crypto_shash API. That's fine, since polyval will be removed from crypto_shash entirely since it is unneeded there. But even if it comes back, the crypto_shash API could just be implemented on top of the library API, as usual. Adjust the names and prototypes of the assembly functions to align more closely with the rest of the library code. Also replace a movaps instruction with movups to remove the assumption that the key struct is 16-byte aligned. Users can still align the key if they want (and at least in this case, movups is just as fast as movaps), but it's inconvenient to require it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251109234726.638437-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11lib/crypto: arm64/polyval: Migrate optimized code into libraryEric Biggers4-0/+443
Migrate the arm64 implementation of POLYVAL into lib/crypto/, wiring it up to the POLYVAL library interface. This makes the POLYVAL library be properly optimized on arm64. This drops the arm64 optimizations of polyval in the crypto_shash API. That's fine, since polyval will be removed from crypto_shash entirely since it is unneeded there. But even if it comes back, the crypto_shash API could just be implemented on top of the library API, as usual. Adjust the names and prototypes of the assembly functions to align more closely with the rest of the library code. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251109234726.638437-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-11lib/crypto: polyval: Add POLYVAL libraryEric Biggers3-0/+325
Add support for POLYVAL to lib/crypto/. This will replace the polyval crypto_shash algorithm and its use in the hctr2 template, simplifying the code and reducing overhead. Specifically, this commit introduces the POLYVAL library API and a generic implementation of it. Later commits will migrate the existing architecture-optimized implementations of POLYVAL into lib/crypto/ and add a KUnit test suite. I've also rewritten the generic implementation completely, using a more modern approach instead of the traditional table-based approach. It's now constant-time, requires no precomputation or dynamic memory allocations, decreases the per-key memory usage from 4096 bytes to 16 bytes, and is faster than the old polyval-generic even on bulk data reusing the same key (at least on x86_64, where I measured 15% faster). We should do this for GHASH too, but for now just do it for POLYVAL. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251109234726.638437-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: x86/blake2s: Use vpternlogd for 3-input XORsEric Biggers1-4/+2
AVX-512 supports 3-input XORs via the vpternlogd (or vpternlogq) instruction with immediate 0x96. This approach, vs. the alternative of two vpxor instructions, is already used in the CRC, AES-GCM, and AES-XTS code, since it reduces the instruction count and is faster on some CPUs. Make blake2s_compress_avx512() take advantage of it too. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102234209.62133-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: x86/blake2s: Avoid writing back unchanged 'f' valueEric Biggers1-2/+2
Just before returning, blake2s_compress_ssse3() and blake2s_compress_avx512() store updated values to the 'h', 't', and 'f' fields of struct blake2s_ctx. But 'f' is always unchanged (which is correct; only the C code changes it). So, there's no need to write to 'f'. Use 64-bit stores (movq and vmovq) instead of 128-bit stores (movdqu and vmovdqu) so that only 't' is written. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102234209.62133-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: x86/blake2s: Improve readabilityEric Biggers1-97/+134
Various cleanups for readability. No change to the generated code: - Add some comments - Add #defines for arguments - Rename some labels - Use decimal constants instead of hex where it makes sense. (The pshufd immediates intentionally remain as hex.) - Add blank lines when there's a logical break The round loop still could use some work, but this is at least a start. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102234209.62133-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: x86/blake2s: Use local labels for dataEric Biggers1-19/+26
Following the usual practice, prefix the names of the data labels with ".L" so that the assembler treats them as truly local. This more clearly expresses the intent and is less error-prone. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102234209.62133-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: x86/blake2s: Drop check for nblocks == 0Eric Biggers1-3/+0
Since blake2s_compress() is always passed nblocks != 0, remove the unnecessary check for nblocks == 0 from blake2s_compress_ssse3(). Note that this makes it consistent with blake2s_compress_avx512() in the same file as well as the arm32 blake2s_compress(). Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102234209.62133-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: x86/blake2s: Fix 32-bit arg treated as 64-bitEric Biggers1-2/+2
In the C code, the 'inc' argument to the assembly functions blake2s_compress_ssse3() and blake2s_compress_avx512() is declared with type u32, matching blake2s_compress(). The assembly code then reads it from the 64-bit %rcx. However, the ABI doesn't guarantee zero-extension to 64 bits, nor do gcc or clang guarantee it. Therefore, fix these functions to read this argument from the 32-bit %ecx. In theory, this bug could have caused the wrong 'inc' value to be used, causing incorrect BLAKE2s hashes. In practice, probably not: I've fixed essentially this same bug in many other assembly files too, but there's never been a real report of it having caused a problem. In x86_64, all writes to 32-bit registers are zero-extended to 64 bits. That results in zero-extension in nearly all situations. I've only been able to demonstrate a lack of zero-extension with a somewhat contrived example involving truncation, e.g. when the C code has a u64 variable holding 0x1234567800000040 and passes it as a u32 expecting it to be truncated to 0x40 (64). But that's not what the real code does, of course. Fixes: ed0356eda153 ("crypto: blake2s - x86_64 SIMD implementation") Cc: stable@vger.kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102234209.62133-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: arm, arm64: Drop filenames from file commentsEric Biggers7-7/+7
Remove self-references to filenames from assembly files in lib/crypto/arm/ and lib/crypto/arm64/. This follows the recommended practice and eliminates an outdated reference to sha2-ce-core.S. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102014809.170713-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: arm/blake2s: Fix some commentsEric Biggers1-4/+4
Fix the indices in some comments in blake2s-core.S. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251102021553.176587-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: s390/sha3: Add optimized one-shot SHA-3 digest functionsEric Biggers1-2/+65
Some z/Architecture processors can compute a SHA-3 digest in a single instruction. arch/s390/crypto/ already uses this capability to optimize the SHA-3 crypto_shash algorithms. Use this capability to implement the sha3_224(), sha3_256(), sha3_384(), and sha3_512() library functions too. SHA3-256 benchmark results provided by Harald Freudenberger (https://lore.kernel.org/r/4188d18bfcc8a64941c5ebd8de10ede2@linux.ibm.com/) on a z/Architecture machine with "facility 86" (MSA level 12): Length (bytes) Before (MB/s) After (MB/s) ============== ============= ============ 16 212 225 64 820 915 256 1850 3350 1024 5400 8300 4096 11200 11300 Note: the original data from Harald was given in the form of a graph for each length, showing the distribution of throughputs from 500 runs. I guesstimated the peak of each one. Harald also reported that the generic SHA-3 code was at most 259 MB/s (https://lore.kernel.org/r/c39f6b6c110def0095e5da5becc12085@linux.ibm.com/). So as expected, the earlier commit that optimized sha3_absorb_blocks() and sha3_keccakf() is the more important one; it optimized the Keccak permutation which is the most performance-critical part of SHA-3. Still, this additional commit does notably improve performance further on some lengths. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20251026055032.1413733-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: sha3: Support arch overrides of one-shot digest functionsEric Biggers1-0/+37
Add support for architecture-specific overrides of sha3_224(), sha3_256(), sha3_384(), and sha3_512(). This will be used to implement these functions more efficiently on s390 than is possible via the usual init + update + final flow. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20251026055032.1413733-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: s390/sha3: Add optimized Keccak functionsEric Biggers2-0/+89
Implement sha3_absorb_blocks() and sha3_keccakf() using the hardware- accelerated SHA-3 support in Message-Security-Assist Extension 6. This accelerates the SHA3-224, SHA3-256, SHA3-384, SHA3-512, and SHAKE256 library functions. Note that arch/s390/crypto/ already has SHA-3 code that uses this extension, but it is exposed only via crypto_shash. This commit brings the same acceleration to the SHA-3 library. The arch/s390/crypto/ version will become redundant and be removed in later changes. Tested-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251026055032.1413733-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: arm64/sha3: Migrate optimized code into libraryEric Biggers4-0/+285
Instead of exposing the arm64-optimized SHA-3 code via arm64-specific crypto_shash algorithms, instead just implement the sha3_absorb_blocks() and sha3_keccakf() library functions. This is much simpler, it makes the SHA-3 library functions be arm64-optimized, and it fixes the longstanding issue where the arm64-optimized SHA-3 code was disabled by default. SHA-3 still remains available through crypto_shash, but individual architectures no longer need to handle it. Note: to see the diff from arch/arm64/crypto/sha3-ce-glue.c to lib/crypto/arm64/sha3.h, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251026055032.1413733-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: sha3: Add FIPS cryptographic algorithm self-testEric Biggers2-1/+23
Since the SHA-3 algorithms are FIPS-approved, add the boot-time self-test which is apparently required. This closely follows the corresponding SHA-1, SHA-256, and SHA-512 tests. Tested-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251026055032.1413733-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: sha3: Move SHA3 Iota step mapping into round functionDavid Howells1-6/+6
In crypto/sha3_generic.c, the keccakf() function calls keccakf_round() to do four of Keccak-f's five step mappings. However, it does not do the Iota step mapping - presumably because that is dependent on round number, whereas Theta, Rho, Pi and Chi are not. Note that the keccakf_round() function needs to be explicitly non-inlined on certain architectures as gcc's produced output will (or used to) use over 1KiB of stack space if inlined. Now, this code was copied more or less verbatim into lib/crypto/sha3.c, so that has the same aesthetic issue. Fix this there by passing the round number into sha3_keccakf_one_round_generic() and doing the Iota step mapping there. crypto/sha3_generic.c is left untouched as that will be converted to use lib/crypto/sha3.c at some point. Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251026055032.1413733-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-05lib/crypto: sha3: Add SHA-3 supportDavid Howells3-0/+371
Add SHA-3 support to lib/crypto/. All six algorithms in the SHA-3 family are supported: four digests (SHA3-224, SHA3-256, SHA3-384, and SHA3-512) and two extendable-output functions (SHAKE128 and SHAKE256). The SHAKE algorithms will be required for ML-DSA. [EB: simplified the API to use fewer types and functions, fixed bug that sometimes caused incorrect SHAKE output, cleaned up the documentation, dropped an ad-hoc test that was inconsistent with the rest of lib/crypto/, and many other cleanups] Signed-off-by: David Howells <dhowells@redhat.com> Co-developed-by: Eric Biggers <ebiggers@kernel.org> Tested-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251026055032.1413733-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-04lib/crypto: arm/curve25519: Disable on CPU_BIG_ENDIANEric Biggers1-1/+1
On big endian arm kernels, the arm optimized Curve25519 code produces incorrect outputs and fails the Curve25519 test. This has been true ever since this code was added. It seems that hardly anyone (or even no one?) actually uses big endian arm kernels. But as long as they're ostensibly supported, we should disable this code on them so that it's not accidentally used. Note: for future-proofing, use !CPU_BIG_ENDIAN instead of CPU_LITTLE_ENDIAN. Both of these are arch-specific options that could get removed in the future if big endian support gets dropped. Fixes: d8f1308a025f ("crypto: arm/curve25519 - wire up NEON implementation") Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251104054906.716914-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-11-04lib/crypto: curve25519-hacl64: Fix older clang KASAN workaround for GCCNathan Chancellor1-1/+1
Commit 2f13daee2a72 ("lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older") inadvertently disabled KASAN in curve25519-hacl64.o for GCC unconditionally because clang-min-version will always evaluate to nothing for GCC. Add a check for CONFIG_CC_IS_CLANG to avoid applying the workaround for GCC, which is only needed for clang-17 and older. Cc: stable@vger.kernel.org Fixes: 2f13daee2a72 ("lib/crypto/curve25519-hacl64: Disable KASAN with clang-17 and older") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251103-curve25519-hacl64-fix-kasan-workaround-v2-1-ab581cbd8035@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: arm/blake2b: Migrate optimized code into libraryEric Biggers4-0/+393
Migrate the arm-optimized BLAKE2b code from arch/arm/crypto/ to lib/crypto/arm/. This makes the BLAKE2b library able to use it, and it also simplifies the code because it's easier to integrate with the library than crypto_shash. This temporarily makes the arm-optimized BLAKE2b code unavailable via crypto_shash. A later commit reimplements the blake2b-* crypto_shash algorithms on top of the BLAKE2b library API, making it available again. Note that as per the lib/crypto/ convention, the optimized code is now enabled by default. So, this also fixes the longstanding issue where the optimized BLAKE2b code was not enabled by default. To see the diff from arch/arm/crypto/blake2b-neon-glue.c to lib/crypto/arm/blake2b.h, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2b: Add BLAKE2b library functionsEric Biggers3-0/+193
Add a library API for BLAKE2b, closely modeled after the BLAKE2s API. This will allow in-kernel users such as btrfs to use BLAKE2b without going through the generic crypto layer. In addition, as usual the BLAKE2b crypto_shash algorithms will be reimplemented on top of this. Note: to create lib/crypto/blake2b.c I made a copy of lib/crypto/blake2s.c and made the updates from BLAKE2s => BLAKE2b. This way, the BLAKE2s and BLAKE2b code is kept consistent. Therefore, it borrows the SPDX-License-Identifier and Copyright from lib/crypto/blake2s.c rather than crypto/blake2b_generic.c. The library API uses 'struct blake2b_ctx', consistent with other lib/crypto/ APIs. The existing 'struct blake2b_state' will be removed once the blake2b crypto_shash algorithms are updated to stop using it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2s: Drop excessive const & rename block => dataEric Biggers4-20/+18
A couple more small cleanups to the BLAKE2s code before these things get propagated into the BLAKE2b code: - Drop 'const' from some non-pointer function parameters. It was a bit excessive and not conventional. - Rename 'block' argument of blake2s_compress*() to 'data'. This is for consistency with the SHA-* code, and also to avoid the implication that it points to a singular "block". No functional changes. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2s: Rename blake2s_state to blake2s_ctxEric Biggers5-54/+53
For consistency with the SHA-1, SHA-2, SHA-3 (in development), and MD5 library APIs, rename blake2s_state to blake2s_ctx. As a refresher, the ctx name: - Is a bit shorter. - Avoids confusion with the compression function state, which is also often called the state (but is just part of the full context). - Is consistent with OpenSSL. Not a big deal, of course. But consistency is nice. With a BLAKE2b library API about to be added, this is a convenient time to update this. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: blake2s: Adjust parameter order of blake2s()Eric Biggers1-8/+8
Reorder the parameters of blake2s() from (out, in, key, outlen, inlen, keylen) to (key, keylen, in, inlen, out, outlen). This aligns BLAKE2s with the common conventions of pairing buffers and their lengths, and having outputs follow inputs. This is widely used elsewhere in lib/crypto/ and crypto/, and even elsewhere in the BLAKE2s code itself such as blake2s_init_key() and blake2s_final(). So blake2s() was a bit of an exception. Notably, this results in the same order as hmac_*_usingrawkey(). Note that since the type signature changed, it's not possible for a blake2s() call site to be silently missed. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29lib/crypto: Add FIPS self-tests for SHA-1 and SHA-2Eric Biggers4-6/+96
Add FIPS cryptographic algorithm self-tests for all SHA-1 and SHA-2 algorithms. Following the "Implementation Guidance for FIPS 140-3" document, to achieve this it's sufficient to just test a single test vector for each of HMAC-SHA1, HMAC-SHA256, and HMAC-SHA512. Just run these tests in the initcalls, following the example of e.g. crypto/kdf_sp800108.c. Note that this should meet the FIPS self-test requirement even in the built-in case, given that the initcalls run before userspace, storage, network, etc. are accessible. This does not fix a regression, seeing as lib/ has had SHA-1 support since 2005 and SHA-256 support since 2018. Neither ever had FIPS self-tests. Moreover, fips=1 support has always been an unfinished feature upstream. However, with lib/ now being used more widely, it's now seeing more scrutiny and people seem to want these now [1][2]. [1] https://lore.kernel.org/r/3226361.1758126043@warthog.procyon.org.uk/ [2] https://lore.kernel.org/r/f31dbb22-0add-481c-aee0-e337a7731f8e@oracle.com/ Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251011001047.51886-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-22lib/crypto: poly1305: Restore dependency of arch code on !KMSANEric Biggers1-1/+1
Restore the dependency of the architecture-optimized Poly1305 code on !KMSAN. It was dropped by commit b646b782e522 ("lib/crypto: poly1305: Consolidate into single module"). Unlike the other hash algorithms in lib/crypto/ (e.g., SHA-512), the way the architecture-optimized Poly1305 code is integrated results in assembly code initializing memory, for several different architectures. Thus, it generates false positive KMSAN warnings. These could be suppressed with kmsan_unpoison_memory(), but it would be needed in quite a few places. For now let's just restore the dependency on !KMSAN. Note: this should have been caught by running poly1305_kunit with CONFIG_KMSAN=y, which I did. However, due to an unrelated KMSAN bug (https://lore.kernel.org/r/20251022030213.GA35717@sol/), KMSAN currently isn't working reliably. Thus, the warning wasn't noticed until later. Fixes: b646b782e522 ("lib/crypto: poly1305: Consolidate into single module") Reported-by: syzbot+01fcd39a0d90cdb0e3df@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/68f6a48f.050a0220.91a22.0452.GAE@google.com/ Reported-by: Pei Xiao <xiaopei01@kylinos.cn> Closes: https://lore.kernel.org/r/751b3d80293a6f599bb07770afcef24f623c7da0.1761026343.git.xiaopei01@kylinos.cn/ Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251022033405.64761-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-29Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds6-11/+972
Pull interleaved SHA-256 hashing support from Eric Biggers: "Optimize fsverity with 2-way interleaved hashing Add support for 2-way interleaved SHA-256 hashing to lib/crypto/, and make fsverity use it for faster file data verification. This improves fsverity performance on many x86_64 and arm64 processors. Later, I plan to make dm-verity use this too" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: Use 2-way interleaved SHA-256 hashing when supported fsverity: Remove inode parameter from fsverity_hash_block() lib/crypto: tests: Add tests and benchmark for sha256_finup_2x() lib/crypto: x86/sha256: Add support for 2-way interleaved hashing lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing lib/crypto: sha256: Add support for 2-way interleaved hashing
2025-09-29Merge tag 'libcrypto-for-linus' of ↵Linus Torvalds88-1896/+7747
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: - Add a RISC-V optimized implementation of Poly1305. This code was written by Andy Polyakov and contributed by Zhihang Shao. - Migrate the MD5 code into lib/crypto/, and add KUnit tests for MD5. Yes, it's still the 90s, and several kernel subsystems are still using MD5 for legacy use cases. As long as that remains the case, it's helpful to clean it up in the same way as I've been doing for other algorithms. Later, I plan to convert most of these users of MD5 to use the new MD5 library API instead of the generic crypto API. - Simplify the organization of the ChaCha, Poly1305, BLAKE2s, and Curve25519 code. Consolidate these into one module per algorithm, and centralize the configuration and build process. This is the same reorganization that has already been successful for SHA-1 and SHA-2. - Remove the unused crypto_kpp API for Curve25519. - Migrate the BLAKE2s and Curve25519 self-tests to KUnit. - Always enable the architecture-optimized BLAKE2s code. * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (38 commits) crypto: md5 - Implement export_core() and import_core() wireguard: kconfig: simplify crypto kconfig selections lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTS lib/crypto: curve25519: Consolidate into single module lib/crypto: curve25519: Move a couple functions out-of-line lib/crypto: tests: Add Curve25519 benchmark lib/crypto: tests: Migrate Curve25519 self-test to KUnit crypto: curve25519 - Remove unused kpp support crypto: testmgr - Remove curve25519 kpp tests crypto: x86/curve25519 - Remove unused kpp support crypto: powerpc/curve25519 - Remove unused kpp support crypto: arm/curve25519 - Remove unused kpp support crypto: hisilicon/hpre - Remove unused curve25519 kpp support lib/crypto: tests: Add KUnit tests for BLAKE2s lib/crypto: blake2s: Consolidate into single C translation unit lib/crypto: blake2s: Move generic code into blake2s.c lib/crypto: blake2s: Always enable arch-optimized BLAKE2s code lib/crypto: blake2s: Remove obsolete self-test lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2 lib/crypto: chacha: Consolidate into single module ...
2025-09-29Merge tag 'crc-for-linus' of ↵Linus Torvalds1-119/+4
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC updates from Eric Biggers: "Update crc_kunit to test the CRC functions in softirq and hardirq contexts, similar to what the lib/crypto/ KUnit tests do. Move the helper function needed to do this into a common header. This is useful mainly to test fallback code paths for when FPU/SIMD/vector registers are unusable" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: Documentation/staging: Fix typo and incorrect citation in crc32.rst lib/crc: Drop inline from all *_mod_init_arch() functions lib/crc: Use underlying functions instead of crypto_simd_usable() lib/crc: crc_kunit: Test CRC computation in interrupt contexts kunit, lib/crypto: Move run_irq_test() to common header
2025-09-17lib/crypto: tests: Add tests and benchmark for sha256_finup_2x()Eric Biggers1-0/+184
Update sha256_kunit to include test cases and a benchmark for the new sha256_finup_2x() function. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250915160819.140019-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17lib/crypto: x86/sha256: Add support for 2-way interleaved hashingEric Biggers2-0/+407
Add an implementation of sha256_finup_2x_arch() for x86_64. It interleaves the computation of two SHA-256 hashes using the x86 SHA-NI instructions. dm-verity and fs-verity will take advantage of this for greatly improved performance on capable CPUs. This increases the throughput of SHA-256 hashing 4096-byte messages by the following amounts on the following CPUs: Intel Ice Lake (server): 4% Intel Sapphire Rapids: 38% Intel Emerald Rapids: 38% AMD Zen 1 (Threadripper 1950X): 84% AMD Zen 4 (EPYC 9B14): 98% AMD Zen 5 (Ryzen 9 9950X): 64% For now, this seems to benefit AMD more than Intel. This seems to be because current AMD CPUs support concurrent execution of the SHA-NI instructions, but unfortunately current Intel CPUs don't, except for the sha256msg2 instruction. Hopefully future Intel CPUs will support SHA-NI on more execution ports. Zen 1 supports 2 concurrent sha256rnds2, and Zen 4 supports 4 concurrent sha256rnds2, which suggests that even better performance may be achievable on Zen 4 by interleaving more than two hashes. However, doing so poses a number of trade-offs, and furthermore Zen 5 goes back to supporting "only" 2 concurrent sha256rnds2. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250915160819.140019-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17lib/crypto: arm64/sha256: Add support for 2-way interleaved hashingEric Biggers2-6/+315
Add an implementation of sha256_finup_2x_arch() for arm64. It interleaves the computation of two SHA-256 hashes using the ARMv8 SHA-256 instructions. dm-verity and fs-verity will take advantage of this for greatly improved performance on capable CPUs. This increases the throughput of SHA-256 hashing 4096-byte messages by the following amounts on the following CPUs: ARM Cortex-X1: 70% ARM Cortex-X3: 68% ARM Cortex-A76: 65% ARM Cortex-A715: 43% ARM Cortex-A510: 25% ARM Cortex-A55: 8% Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250915160819.140019-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-17lib/crypto: sha256: Add support for 2-way interleaved hashingEric Biggers1-5/+66
Many arm64 and x86_64 CPUs can compute two SHA-256 hashes in nearly the same speed as one, if the instructions are interleaved. This is because SHA-256 is serialized block-by-block, and two interleaved hashes take much better advantage of the CPU's instruction-level parallelism. Meanwhile, a very common use case for SHA-256 hashing in the Linux kernel is dm-verity and fs-verity. Both use a Merkle tree that has a fixed block size, usually 4096 bytes with an empty or 32-byte salt prepended. Usually, many blocks need to be hashed at a time. This is an ideal scenario for 2-way interleaved hashing. To enable this optimization, add a new function sha256_finup_2x() to the SHA-256 library API. It computes the hash of two equal-length messages, starting from a common initial context. For now it always falls back to sequential processing. Later patches will wire up arm64 and x86_64 optimized implementations. Note that the interleaving factor could in principle be higher than 2x. However, that runs into many practical difficulties and CPU throughput limitations. Thus, both the implementations I'm adding are 2x. In the interest of using the simplest solution, the API matches that. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250915160819.140019-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06lib/crypto: tests: Enable Curve25519 test when CRYPTO_SELFTESTSEric Biggers1-1/+1
Now that the Curve25519 library has been disentangled from CRYPTO, adding CRYPTO_SELFTESTS as a default value of CRYPTO_LIB_CURVE25519_KUNIT_TEST no longer causes a recursive kconfig dependency. Do this, which makes this option consistent with the other crypto KUnit test options in the same file. Link: https://lore.kernel.org/r/20250906213523.84915-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06lib/crypto: curve25519: Consolidate into single moduleEric Biggers9-72/+4645
Reorganize the Curve25519 library code: - Build a single libcurve25519 module, instead of up to three modules: libcurve25519, libcurve25519-generic, and an arch-specific module. - Move the arch-specific Curve25519 code from arch/$(SRCARCH)/crypto/ to lib/crypto/$(SRCARCH)/. Centralize the build rules into lib/crypto/Makefile and lib/crypto/Kconfig. - Include the arch-specific code directly in lib/crypto/curve25519.c via a header, rather than using a separate .c file. - Eliminate the entanglement with CRYPTO. CRYPTO_LIB_CURVE25519 no longer selects CRYPTO, and the arch-specific Curve25519 code no longer depends on CRYPTO. This brings Curve25519 in line with the latest conventions for lib/crypto/, used by other algorithms. The exception is that I kept the generic code in separate translation units for now. (Some of the function names collide between the x86 and generic Curve25519 code. And the Curve25519 functions are very long anyway, so inlining doesn't matter as much for Curve25519 as it does for some other algorithms.) Link: https://lore.kernel.org/r/20250906213523.84915-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06lib/crypto: curve25519: Move a couple functions out-of-lineEric Biggers1-1/+33
Move curve25519() and curve25519_generate_public() from curve25519.h to curve25519.c. There's no good reason for them to be inline. Link: https://lore.kernel.org/r/20250906213523.84915-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06lib/crypto: tests: Add Curve25519 benchmarkEric Biggers1-1/+32
Add a benchmark to curve25519_kunit. This brings it in line with the other crypto KUnit tests and provides an easy way to measure performance. Link: https://lore.kernel.org/r/20250906213523.84915-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-09-06lib/crypto: tests: Migrate Curve25519 self-test to KUnitEric Biggers5-35/+52
Move the Curve25519 test from an ad-hoc self-test to a KUnit test. Generally keep the same test logic for now, just translated to KUnit. There's one exception, which is that I dropped the incomplete test of curve25519_generic(). The approach I'm taking to cover the different implementations with the KUnit tests is to just rely on booting kernels in QEMU with different '-cpu' options, rather than try to make the tests (incompletely) test multiple implementations on one CPU. This way, both the test and the library API are simpler. This commit makes the file lib/crypto/curve25519.c no longer needed, as its only purpose was to call the self-test. However, keep it for now, since a later commit will add code to it again. Temporarily omit the default value of CRYPTO_SELFTESTS that the other lib/crypto/ KUnit tests have. It would cause a recursive kconfig dependency, since the Curve25519 code is still entangled with CRYPTO. A later commit will fix that. Link: https://lore.kernel.org/r/20250906213523.84915-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: tests: Add KUnit tests for BLAKE2sEric Biggers4-0/+383
Add a KUnit test suite for BLAKE2s. Most of the core test logic is in the previously-added hash-test-template.h. This commit just adds the actual KUnit suite, commits the generated test vectors to the tree so that gen-hash-testvecs.py won't have to be run at build time, and adds a few BLAKE2s-specific test cases. This is the replacement for blake2s-selftest, which an earlier commit removed. Improvements over blake2s-selftest include integration with KUnit, more comprehensive test cases, and support for benchmarking. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: blake2s: Consolidate into single C translation unitEric Biggers11-92/+47
As was done with the other algorithms, reorganize the BLAKE2s code so that the generic implementation and the arch-specific "glue" code is consolidated into a single translation unit, so that the compiler will inline the functions and automatically decide whether to include the generic code in the resulting binary or not. Similarly, also consolidate the build rules into lib/crypto/{Makefile,Kconfig}. This removes the last uses of lib/crypto/{arm,x86}/{Makefile,Kconfig}, so remove those too. Don't keep the !KMSAN dependency. It was needed only for other algorithms such as ChaCha that initialize memory from assembly code. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: blake2s: Move generic code into blake2s.cEric Biggers3-112/+94
Move blake2s_compress_generic() from blake2s-generic.c to blake2s.c. For now it's still guarded by CONFIG_CRYPTO_LIB_BLAKE2S_GENERIC, but this prepares for changing it to a 'static __maybe_unused' function and just using the compiler to automatically decide its inclusion. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: blake2s: Always enable arch-optimized BLAKE2s codeEric Biggers2-2/+2
When support for a crypto algorithm is enabled, the arch-optimized implementation of that algorithm should be enabled too. We've learned this the hard way many times over the years: people regularly forget to enable the arch-optimized implementations of the crypto algorithms, resulting in significant performance being left on the table. Currently, BLAKE2s support is always enabled ('obj-y'), since random.c uses it. Therefore, the arch-optimized BLAKE2s code, which exists for ARM and x86_64, should be always enabled too. Let's do that. Note that the effect on kernel image size is very small and should not be a concern. On ARM, enabling CRYPTO_BLAKE2S_ARM actually *shrinks* the kernel size by about 1200 bytes, since the ARM-optimized blake2s_compress() completely replaces the generic blake2s_compress(). On x86_64, enabling CRYPTO_BLAKE2S_X86 increases the kernel size by about 1400 bytes, as the generic blake2s_compress() is still included as a fallback; however, for context, that is only about a quarter the size of the generic blake2s_compress(). The x86_64 optimized BLAKE2s code uses much less icache at runtime than the generic code. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: blake2s: Remove obsolete self-testEric Biggers3-662/+0
Remove the original BLAKE2s self-test, since it will be superseded by blake2s_kunit. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: x86/blake2s: Reduce size of BLAKE2S_SIGMA2Eric Biggers1-14/+14
Save 480 bytes of .rodata by replacing the .long constants with .bytes, and using the vpmovzxbd instruction to expand them. Also update the code to do the loads before incrementing %rax rather than after. This avoids the need for the first load to use an offset. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: chacha: Consolidate into single moduleEric Biggers24-265/+122
Consolidate the ChaCha code into a single module (excluding chacha-block-generic.c which remains always built-in for random.c), similar to various other algorithms: - Each arch now provides a header file lib/crypto/$(SRCARCH)/chacha.h, replacing lib/crypto/$(SRCARCH)/chacha*.c. The header defines chacha_crypt_arch() and hchacha_block_arch(). It is included by lib/crypto/chacha.c, and thus the code gets built into the single libchacha module, with improved inlining in some cases. - Whether arch-optimized ChaCha is buildable is now controlled centrally by lib/crypto/Kconfig instead of by lib/crypto/$(SRCARCH)/Kconfig. The conditions for enabling it remain the same as before, and it remains enabled by default. - Any additional arch-specific translation units for the optimized ChaCha code, such as assembly files, are now compiled by lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile. This removes the last use for the Makefile and Kconfig files in the arm64, mips, powerpc, riscv, and s390 subdirectories of lib/crypto/. So also remove those files and the references to them. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-7-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: chacha: Rename libchacha.c to chacha.cEric Biggers2-0/+1
Rename libchacha.c to chacha.c to make the naming consistent with other algorithms and allow additional source files to be added to the libchacha module. This file currently contains chacha_crypt_generic(), but it will soon be updated to contain chacha_crypt(). Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: chacha: Rename chacha.c to chacha-block-generic.cEric Biggers2-2/+2
Rename chacha.c to chacha-block-generic.c to free up the name chacha.c for the high-level API entry points (chacha_crypt() and hchacha_block()), similar to the other algorithms. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: chacha: Remove unused function chacha_is_arch_optimized()Eric Biggers7-43/+0
chacha_is_arch_optimized() is no longer used, so remove it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250827151131.27733-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: riscv/poly1305: Import OpenSSL/CRYPTOGAMS implementationZhihang Shao4-1/+877
This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation for riscv authored by Andy Polyakov. The file 'poly1305-riscv.pl' is taken straight from https://github.com/dot-asm/cryptogams commit 5e3fba73576244708a752fa61a8e93e587f271bb. This patch was tested on SpacemiT X60, with 2~2.5x improvement over generic implementation. Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn> Signed-off-by: Zhihang Shao <zhihang.shao.iscas@gmail.com> [EB: ported to lib/crypto/riscv/] Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250829152513.92459-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: poly1305: Consolidate into single moduleEric Biggers26-415/+290
Consolidate the Poly1305 code into a single module, similar to various other algorithms (SHA-1, SHA-256, SHA-512, etc.): - Each arch now provides a header file lib/crypto/$(SRCARCH)/poly1305.h, replacing lib/crypto/$(SRCARCH)/poly1305*.c. The header defines poly1305_block_init(), poly1305_blocks(), poly1305_emit(), and optionally poly1305_mod_init_arch(). It is included by lib/crypto/poly1305.c, and thus the code gets built into the single libpoly1305 module, with improved inlining in some cases. - Whether arch-optimized Poly1305 is buildable is now controlled centrally by lib/crypto/Kconfig instead of by lib/crypto/$(SRCARCH)/Kconfig. The conditions for enabling it remain the same as before, and it remains enabled by default. (The PPC64 one remains unconditionally disabled due to 'depends on BROKEN'.) - Any additional arch-specific translation units for the optimized Poly1305 code, such as assembly files, are now compiled by lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile. A special consideration is needed because the Adiantum code uses the poly1305_core_*() functions directly. For now, just carry forward that approach. This means retaining the CRYPTO_LIB_POLY1305_GENERIC kconfig symbol, and keeping the poly1305_core_*() functions in separate translation units. So it's not quite as streamlined I've done with the other hash functions, but we still get a single libpoly1305 module. Note: to see the diff from the arm, arm64, and x86 .c files to the new .h files, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250829152513.92459-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-29lib/crypto: poly1305: Remove unused function poly1305_is_arch_optimized()Eric Biggers5-32/+0
poly1305_is_arch_optimized() is unused, so remove it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250829152513.92459-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-27lib/crypto: Drop inline from all *_mod_init_arch() functionsEric Biggers18-18/+18
Drop 'inline' from all the *_mod_init_arch() functions so that the compiler will warn about any bugs where they are unused due to not being wired up properly. (There are no such bugs currently, so this just establishes a more robust convention for the future. Of course, these functions also tend to get inlined anyway, regardless of the keyword.) Link: https://lore.kernel.org/r/20250816020457.432040-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-27lib/crypto: tests: Add KUnit tests for MD5 and HMAC-MD5Eric Biggers4-0/+236
Add a KUnit test suite for the MD5 library functions, including the corresponding HMAC support. The core test logic is in the previously-added hash-test-template.h. This commit just adds the actual KUnit suite, and it adds the generated test vectors to the tree so that gen-hash-testvecs.py won't have to be run at build time. Link: https://lore.kernel.org/r/20250805222855.10362-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26lib/crypto: sparc/md5: Migrate optimized code into libraryEric Biggers4-0/+120
Instead of exposing the sparc-optimized MD5 code via sparc-specific crypto_shash algorithms, instead just implement the md5_blocks() library function. This is much simpler, it makes the MD5 library functions be sparc-optimized, and it fixes the longstanding issue where the sparc-optimized MD5 code was disabled by default. MD5 still remains available through crypto_shash, but individual architectures no longer need to handle it. Note: to see the diff from arch/sparc/crypto/md5_glue.c to lib/crypto/sparc/md5.h, view this commit with 'git show -M10'. Link: https://lore.kernel.org/r/20250805222855.10362-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26lib/crypto: powerpc/md5: Migrate optimized code into libraryEric Biggers4-0/+249
Instead of exposing the powerpc-optimized MD5 code via powerpc-specific crypto_shash algorithms, instead just implement the md5_blocks() library function. This is much simpler, it makes the MD5 library functions be powerpc-optimized, and it fixes the longstanding issue where the powerpc-optimized MD5 code was disabled by default. MD5 still remains available through crypto_shash, but individual architectures no longer need to handle it. Link: https://lore.kernel.org/r/20250805222855.10362-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26lib/crypto: mips/md5: Migrate optimized code into libraryEric Biggers2-0/+66
Instead of exposing the mips-optimized MD5 code via mips-specific crypto_shash algorithms, instead just implement the md5_blocks() library function. This is much simpler, it makes the MD5 library functions be mips-optimized, and it fixes the longstanding issue where the mips-optimized MD5 code was disabled by default. MD5 still remains available through crypto_shash, but individual architectures no longer need to handle it. Note: to see the diff from arch/mips/cavium-octeon/crypto/octeon-md5.c to lib/crypto/mips/md5.h, view this commit with 'git show -M10'. Link: https://lore.kernel.org/r/20250805222855.10362-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26lib/crypto: md5: Add MD5 and HMAC-MD5 library functionsEric Biggers3-0/+342
Add library functions for MD5, including HMAC support. The MD5 implementation is derived from crypto/md5.c. This closely mirrors the corresponding SHA-1 and SHA-2 changes. Like SHA-1 and SHA-2, support for architecture-optimized MD5 implementations is included. I originally proposed dropping those, but unfortunately there is an AF_ALG user of the PowerPC MD5 code (https://lore.kernel.org/r/c4191597-341d-4fd7-bc3d-13daf7666c41@csgroup.eu/), and dropping that code would be viewed as a performance regression. We don't add new software algorithm implementations purely for AF_ALG, as escalating to kernel mode merely to do calculations that could be done in userspace is inefficient and is completely the wrong design. But since this one already existed, it gets grandfathered in for now. An objection was also raised to dropping the SPARC64 MD5 code because it utilizes the CPU's direct support for MD5, although it remains unclear that anyone is using that. Regardless, we'll keep these around for now. Note that while MD5 is a legacy algorithm that is vulnerable to practical collision attacks, it still has various in-kernel users that implement legacy protocols. Switching to a simple library API, which is the way the code should have been organized originally, will greatly simplify their code. For example: MD5: drivers/md/dm-crypt.c (for lmk IV generation) fs/nfsd/nfs4recover.c fs/ecryptfs/ fs/smb/client/ net/{ipv4,ipv6}/ (for TCP-MD5 signatures) HMAC-MD5: fs/smb/client/ fs/smb/server/ (Also net/sctp/ if it continues using HMAC-MD5 for cookie generation. However, that use case has the flexibility to upgrade to a more modern algorithm, which I'll be proposing instead.) As usual, the "md5" and "hmac(md5)" crypto_shash algorithms will also be reimplemented on top of these library functions. For "hmac(md5)" this will provide a faster, more streamlined implementation. Link: https://lore.kernel.org/r/20250805222855.10362-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26lib/crypto: sha512: Use underlying functions instead of crypto_simd_usable()Eric Biggers4-12/+6
Since sha512_kunit tests the fallback code paths without using crypto_simd_disabled_for_test, make the SHA-512 code just use the underlying may_use_simd() and irq_fpu_usable() functions directly instead of crypto_simd_usable(). This eliminates an unnecessary layer. Link: https://lore.kernel.org/r/20250731223651.136939-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-26lib/crypto: sha256: Use underlying functions instead of crypto_simd_usable()Eric Biggers4-16/+15
Since sha256_kunit tests the fallback code paths without using crypto_simd_disabled_for_test, make the SHA-256 code just use the underlying may_use_simd() and irq_fpu_usable() functions directly instead of crypto_simd_usable(). This eliminates an unnecessary layer. While doing this, also add likely() annotations, and fix a minor inconsistency where the static keys in the sha256.h files were in a different place than in the corresponding sha1.h and sha512.h files. Link: https://lore.kernel.org/r/20250731223510.136650-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-14lib/crypto: ensure generated *.S files are removed on make cleanTal Zussman1-4/+4
make clean does not check the kernel config when removing files. As such, additions to clean-files under CONFIG_ARM or CONFIG_ARM64 are not evaluated. For example, when building on arm64, this means that lib/crypto/arm64/sha{256,512}-core.S are left over after make clean. Set clean-files unconditionally to ensure that make clean removes these files. Fixes: e96cb9507f2d ("lib/crypto: sha256: Consolidate into single module") Fixes: 24c91b62ac50 ("lib/crypto: arm/sha512: Migrate optimized SHA-512 code to library") Fixes: 60e3f1e9b7a5 ("lib/crypto: arm64/sha512: Migrate optimized SHA-512 code to library") Signed-off-by: Tal Zussman <tz2294@columbia.edu> Link: https://lore.kernel.org/r/20250814-crypto_clean-v2-1-659a2dc86302@columbia.edu Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-14lib/crypto: sha: Update Kconfig help for SHA1 and SHA256Eric Biggers1-5/+5
Update the help text for CRYPTO_LIB_SHA1 and CRYPTO_LIB_SHA256 to reflect the addition of HMAC support, and to be consistent with CRYPTO_LIB_SHA512. Link: https://lore.kernel.org/r/20250731224218.137947-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-08-11kunit, lib/crypto: Move run_irq_test() to common headerEric Biggers1-119/+4
Rename run_irq_test() to kunit_run_irq_test() and move it to a public header so that it can be reused by crc_kunit. Link: https://lore.kernel.org/r/20250811182631.376302-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-29Merge tag 's390-6.17-1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Standardize on the __ASSEMBLER__ macro that is provided by GCC and Clang compilers and replace __ASSEMBLY__ with __ASSEMBLER__ in both uapi and non-uapi headers - Explicitly include <linux/export.h> in architecture and driver files which contain an EXPORT_SYMBOL() and remove the include from the files which do not contain the EXPORT_SYMBOL() - Use the full title of "z/Architecture Principles of Operation" manual and the name of a section where facility bits are listed - Use -D__DISABLE_EXPORTS for files in arch/s390/boot to avoid unnecessary slowing down of the build and confusing external kABI tools that process symtypes data - Print additional unrecoverable machine check information to make the root cause analysis easier - Move cmpxchg_user_key() handling to uaccess library code, since the generated code is large anyway and there is no benefit if it is inlined - Fix a problem when cmpxchg_user_key() is executing a code with a non-default key: if a system is IPL-ed with "LOAD NORMAL", and the previous system used storage keys where the fetch-protection bit was set for some pages, and the cmpxchg_user_key() is located within such page, a protection exception happens - Either the external call or emergency signal order is used to send an IPI to a remote CPU. Use the external order only, since it is at least as good and sometimes even better, than the emergency signal - In case of an early crash the early program check handler prints more or less random value of the last breaking event address, since it is not initialized properly. Copy the last breaking event address from the lowcore to pt_regs to address this - During STP synchronization check udelay() can not be used, since the first CPU modifies tod_clock_base and get_tod_clock_monotonic() might return a non-monotonic time. Instead, busy-loop on other CPUs, while the the first CPU actually handles the synchronization operation - When debugging the early kernel boot using QEMU with the -S flag and GDB attached, skip the decompressor and start directly in kernel - Rename PAI Crypto event 4210 according to z16 and z17 "z/Architecture Principles of Operation" manual - Remove the in-kernel time steering support in favour of the new s390 PTP driver, which allows the kernel clock steered more precisely - Remove a possible false-positive warning in pte_free_defer(), which could be triggered in a valid case KVM guest process is initializing * tag 's390-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits) s390/mm: Remove possible false-positive warning in pte_free_defer() s390/stp: Default to enabled s390/stp: Remove leap second support s390/time: Remove in-kernel time steering s390/sclp: Use monotonic clock in sclp_sync_wait() s390/smp: Use monotonic clock in smp_emergency_stop() s390/time: Use monotonic clock in get_cycles() s390/pai_crypto: Rename PAI Crypto event 4210 scripts/gdb/symbols: make lx-symbols skip the s390 decompressor s390/boot: Introduce jump_to_kernel() function s390/stp: Remove udelay from stp_sync_clock() s390/early: Copy last breaking event address to pt_regs s390/smp: Remove conditional emergency signal order code usage s390/uaccess: Merge cmpxchg_user_key() inline assemblies s390/uaccess: Prevent kprobes on cmpxchg_user_key() functions s390/uaccess: Initialize code pages executed with non-default access key s390/skey: Provide infrastructure for executing with non-default access key s390/uaccess: Make cmpxchg_user_key() library code s390/page: Add memory clobber to page_set_storage_key() s390/page: Cleanup page_set_storage_key() inline assemblies ...
2025-07-28Merge tag 'libcrypto-tests-for-linus' of ↵Linus Torvalds17-0/+2619
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library test updates from Eric Biggers: "Add KUnit test suites for the Poly1305, SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512 library functions. These are the first KUnit tests for lib/crypto/. So in addition to being useful tests for these specific algorithms, they also establish some conventions for lib/crypto/ testing going forwards. The new tests are fairly comprehensive: more comprehensive than the generic crypto infrastructure's tests. They use a variety of techniques to check for the types of implementation bugs that tend to occur in the real world, rather than just naively checking some test vectors. (Interestingly, poly1305_kunit found a bug in QEMU) The core test logic is shared by all six algorithms, rather than being duplicated for each algorithm. Each algorithm's test suite also optionally includes a benchmark" * tag 'libcrypto-tests-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crypto: tests: Annotate worker to be on stack lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1 lib/crypto: tests: Add KUnit tests for Poly1305 lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512 lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256 lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py
2025-07-21lib/crypto: tests: Annotate worker to be on stackGuenter Roeck1-1/+1
The following warning traceback is seen if object debugging is enabled with the new crypto test code. ODEBUG: object 9000000106237c50 is on stack 9000000106234000, but NOT annotated. ------------[ cut here ]------------ WARNING: lib/debugobjects.c:655 at lookup_object_or_alloc.part.0+0x19c/0x1f4, CPU#0: kunit_try_catch/468 ... This also results in a boot stall when running the code in qemu:loongarch. Initializing the worker with INIT_WORK_ONSTACK() fixes the problem. Fixes: 950a81224e8b ("lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.py") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250721231917.3182029-1-linux@roeck-us.net Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-20lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutilsEric Biggers1-20/+11
Now that the oldest supported binutils version is 2.30, the macros that emit the SHA-512 instructions as '.inst' words are no longer needed. So drop them. No change in the generated machine code. Changed from the original patch by Ard Biesheuvel: (https://lore.kernel.org/r/20250515142702.2592942-2-ardb+git@google.com): - Reduced scope to just SHA-512 - Added comment that explains why "sha3" is used instead of "sha2" Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250718220706.475240-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-20lib/crypto: x86/sha1-ni: Convert to use rounds macrosEric Biggers1-158/+29
The assembly code that does all 80 rounds of SHA-1 is highly repetitive. Replace it with 20 expansions of a macro that does 4 rounds, using the macro arguments and .if directives to handle the slight variations between rounds. This reduces the length of sha1-ni-asm.S by 129 lines while still producing the exact same object file. This mirrors sha256-ni-asm.S which uses this same strategy. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250718191900.42877-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-20lib/crypto: x86/sha1-ni: Minor optimizations and cleanupEric Biggers1-42/+24
- Store the previous state in %xmm8-%xmm9 instead of spilling it to the stack. There are plenty of unused XMM registers here, so there is no reason to spill to the stack. (While 32-bit code is limited to %xmm0-%xmm7, this is 64-bit code, so it's free to use %xmm8-%xmm15.) - Remove the unnecessary check for nblocks == 0. sha1_ni_transform() is always passed a positive nblocks. - To get an XMM register with 'e' in the high dword and the rest zeroes, just zeroize the register using pxor, then load 'e'. Previously the code loaded 'e', then zeroized the lower dwords by AND-ing with a constant, which was slightly less efficient. - Instead of computing &DATA_PTR[NBLOCKS << 6] and stopping when DATA_PTR reaches that value, instead just decrement NBLOCKS on each iteration and stop when it reaches 0. This is fewer instructions. - Rename DIGEST_PTR to STATE_PTR. It points to the SHA-1 internal state, not a SHA-1 digest value. This commit shrinks the code size of sha1_ni_transform() from 624 bytes to 589 bytes and also shrinks rodata by 16 bytes. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250718191900.42877-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: tests: Add KUnit tests for SHA-1 and HMAC-SHA1Eric Biggers4-0/+262
Add a KUnit test suite for the SHA-1 library functions, including the corresponding HMAC support. The core test logic is in the previously-added hash-test-template.h. This commit just adds the actual KUnit suite, and it adds the generated test vectors to the tree so that gen-hash-testvecs.py won't have to be run at build time. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-16-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: tests: Add KUnit tests for Poly1305Eric Biggers4-0/+361
Add a KUnit test suite for the Poly1305 functions. Most of its test cases are instantiated from hash-test-template.h, which is also used by the SHA-2 tests. A couple additional test cases are also included to test edge cases specific to Poly1305. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250709200112.258500-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: tests: Add KUnit tests for SHA-384 and SHA-512Eric Biggers6-0/+723
Add KUnit test suites for the SHA-384 and SHA-512 library functions, including the corresponding HMAC support. The core test logic is in the previously-added hash-test-template.h. This commit just adds the actual KUnit suites, and it adds the generated test vectors to the tree so that gen-hash-testvecs.py won't have to be run at build time. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250709200112.258500-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: tests: Add KUnit tests for SHA-224 and SHA-256Eric Biggers8-0/+590
Add KUnit test suites for the SHA-224 and SHA-256 library functions, including the corresponding HMAC support. The core test logic is in the previously-added hash-test-template.h. This commit just adds the actual KUnit suites, and it adds the generated test vectors to the tree so that gen-hash-testvecs.py won't have to be run at build time. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250709200112.258500-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: tests: Add hash-test-template.h and gen-hash-testvecs.pyEric Biggers1-0/+683
Add hash-test-template.h which generates the following KUnit test cases for hash functions: test_hash_test_vectors test_hash_all_lens_up_to_4096 test_hash_incremental_updates test_hash_buffer_overruns test_hash_overlaps test_hash_alignment_consistency test_hash_ctx_zeroization test_hash_interrupt_context_1 test_hash_interrupt_context_2 test_hmac (when HMAC is supported) benchmark_hash (when CONFIG_CRYPTO_LIB_BENCHMARK=y) The initial use cases for this will be sha224_kunit, sha256_kunit, sha384_kunit, sha512_kunit, and poly1305_kunit. Add a Python script gen-hash-testvecs.py which generates the test vectors required by test_hash_test_vectors, test_hash_all_lens_up_to_4096, and test_hmac. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250709200112.258500-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: x86/sha1: Migrate optimized code into libraryEric Biggers6-0/+1625
Instead of exposing the x86-optimized SHA-1 code via x86-specific crypto_shash algorithms, instead just implement the sha1_blocks() library function. This is much simpler, it makes the SHA-1 library functions be x86-optimized, and it fixes the longstanding issue where the x86-optimized SHA-1 code was disabled by default. SHA-1 still remains available through crypto_shash, but individual architectures no longer need to handle it. To match sha1_blocks(), change the type of the nblocks parameter of the assembly functions from int to size_t. The assembly functions actually already treated it as size_t. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-14-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: sparc/sha1: Migrate optimized code into libraryEric Biggers4-0/+117
Instead of exposing the sparc-optimized SHA-1 code via sparc-specific crypto_shash algorithms, instead just implement the sha1_blocks() library function. This is much simpler, it makes the SHA-1 library functions be sparc-optimized, and it fixes the longstanding issue where the sparc-optimized SHA-1 code was disabled by default. SHA-1 still remains available through crypto_shash, but individual architectures no longer need to handle it. Note: to see the diff from arch/sparc/crypto/sha1_glue.c to lib/crypto/sparc/sha1.h, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: s390/sha1: Migrate optimized code into libraryEric Biggers2-0/+29
Instead of exposing the s390-optimized SHA-1 code via s390-specific crypto_shash algorithms, instead just implement the sha1_blocks() library function. This is much simpler, it makes the SHA-1 library functions be s390-optimized, and it fixes the longstanding issue where the s390-optimized SHA-1 code was disabled by default. SHA-1 still remains available through crypto_shash, but individual architectures no longer need to handle it. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-12-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: powerpc/sha1: Migrate optimized code into libraryEric Biggers5-0/+554
Instead of exposing the powerpc-optimized SHA-1 code via powerpc-specific crypto_shash algorithms, instead just implement the sha1_blocks() library function. This is much simpler, it makes the SHA-1 library functions be powerpc-optimized, and it fixes the longstanding issue where the powerpc-optimized SHA-1 code was disabled by default. SHA-1 still remains available through crypto_shash, but individual architectures no longer need to handle it. Note: to see the diff from arch/powerpc/crypto/sha1-spe-glue.c to lib/crypto/powerpc/sha1.h, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-11-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: mips/sha1: Migrate optimized code into libraryEric Biggers2-0/+82
Instead of exposing the mips-optimized SHA-1 code via mips-specific crypto_shash algorithms, instead just implement the sha1_blocks() library function. This is much simpler, it makes the SHA-1 library functions be mips-optimized, and it fixes the longstanding issue where the mips-optimized SHA-1 code was disabled by default. SHA-1 still remains available through crypto_shash, but individual architectures no longer need to handle it. Note: to see the diff from arch/mips/cavium-octeon/crypto/octeon-sha1.c to lib/crypto/mips/sha1.h, view this commit with 'git show -M10'. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-10-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: arm64/sha1: Migrate optimized code into libraryEric Biggers4-0/+171
Instead of exposing the arm64-optimized SHA-1 code via arm64-specific crypto_shash algorithms, instead just implement the sha1_blocks() library function. This is much simpler, it makes the SHA-1 library functions be arm64-optimized, and it fixes the longstanding issue where the arm64-optimized SHA-1 code was disabled by default. SHA-1 still remains available through crypto_shash, but individual architectures no longer need to handle it. Remove support for SHA-1 finalization from assembly code, since the library does not yet support architecture-specific overrides of the finalization. (Support for that has been omitted for now, for simplicity and because usually it isn't performance-critical.) To match sha1_blocks(), change the type of the nblocks parameter and the return value of __sha1_ce_transform() from int to size_t. Update the assembly code accordingly. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-9-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: arm/sha1: Migrate optimized code into libraryEric Biggers6-0/+1315
Instead of exposing the arm-optimized SHA-1 code via arm-specific crypto_shash algorithms, instead just implement the sha1_blocks() library function. This is much simpler, it makes the SHA-1 library functions be arm-optimized, and it fixes the longstanding issue where the arm-optimized SHA-1 code was disabled by default. SHA-1 still remains available through crypto_shash, but individual architectures no longer need to handle it. To match sha1_blocks(), change the type of the nblocks parameter of the assembly functions from int to size_t. The assembly functions actually already treated it as size_t. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: sha1: Add HMAC supportEric Biggers1-3/+105
Add HMAC support to the SHA-1 library, again following what was done for SHA-2. Besides providing the basis for a more streamlined "hmac(sha1)" shash, this will also be useful for multiple in-kernel users such as net/sctp/auth.c, net/ipv6/seg6_hmac.c, and security/keys/trusted-keys/trusted_tpm1.c. Those are currently using crypto_shash, but using the library functions would be much simpler. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: sha1: Add SHA-1 library functionsEric Biggers3-7/+124
Add a library interface for SHA-1, following the SHA-2 one. As was the case with SHA-2, this will be useful for various in-kernel users. The crypto_shash interface will be reimplemented on top of it as well. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: sha1: Rename sha1_init() to sha1_init_raw()Eric Biggers1-3/+3
Rename the existing sha1_init() to sha1_init_raw(), since it conflicts with the upcoming library function. This will later be removed, but this keeps the kernel building for the introduction of the library. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250712232329.818226-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: sha2: Add hmac_sha*_init_usingrawkey()Eric Biggers2-34/+74
While the HMAC library functions support both incremental and one-shot computation and both prepared and raw keys, the combination of raw key + incremental was missing. It turns out that several potential users of the HMAC library functions (tpm2-sessions.c, smb2transport.c, trusted_tpm1.c) want exactly that. Therefore, add the missing functions hmac_sha*_init_usingrawkey(). Implement them in an optimized way that directly initializes the HMAC context without a separate key preparation step. Reimplement the one-shot raw key functions hmac_sha*_usingrawkey() on top of the new functions, which makes them a bit more efficient. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250711215844.41715-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-14lib/crypto: arm/poly1305: Remove unneeded empty weak functionEric Biggers2-6/+1
Fix poly1305-armv4.pl to not do '.globl poly1305_blocks_neon' when poly1305_blocks_neon() is not defined. Then, remove the empty __weak definition of poly1305_blocks_neon(), which was still needed only because of that unnecessary globl statement. (It also used to be needed because the compiler could generate calls to it when CONFIG_KERNEL_MODE_NEON=n, but that has been fixed.) Thanks to Arnd Bergmann for reporting that the globl statement in the asm file was still depending on the weak symbol. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250711212822.6372-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-11lib/crypto: x86/poly1305: Fix performance regression on short messagesEric Biggers1-0/+8
Restore the len >= 288 condition on using the AVX implementation, which was incidentally removed by commit 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface"). This check took into account the overhead in key power computation, kernel-mode "FPU", and tail handling associated with the AVX code. Indeed, restoring this check slightly improves performance for len < 256 as measured using poly1305_kunit on an "AMD Ryzen AI 9 365" (Zen 5) CPU: Length Before After ====== ========== ========== 1 30 MB/s 36 MB/s 16 516 MB/s 598 MB/s 64 1700 MB/s 1882 MB/s 127 2265 MB/s 2651 MB/s 128 2457 MB/s 2827 MB/s 200 2702 MB/s 3238 MB/s 256 3841 MB/s 3768 MB/s 511 4580 MB/s 4585 MB/s 512 5430 MB/s 5398 MB/s 1024 7268 MB/s 7305 MB/s 3173 8999 MB/s 8948 MB/s 4096 9942 MB/s 9921 MB/s 16384 10557 MB/s 10545 MB/s While the optimal threshold for this CPU might be slightly lower than 288 (see the len == 256 case), other CPUs would need to be tested too, and these sorts of benchmarks can underestimate the true cost of kernel-mode "FPU". Therefore, for now just restore the 288 threshold. Fixes: 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250706231100.176113-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-11lib/crypto: x86/poly1305: Fix register corruption in no-SIMD contextsEric Biggers1-1/+39
Restore the SIMD usability check and base conversion that were removed by commit 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface"). This safety check is cheap and is well worth eliminating a footgun. While the Poly1305 functions should not be called when SIMD registers are unusable, if they are anyway, they should just do the right thing instead of corrupting random tasks' registers and/or computing incorrect MACs. Fixing this is also needed for poly1305_kunit to pass. Just use irq_fpu_usable() instead of the original crypto_simd_usable(), since poly1305_kunit won't rely on crypto_simd_disabled_for_test. Fixes: 318c53ae02f2 ("crypto: x86/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250706231100.176113-5-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-11lib/crypto: arm64/poly1305: Fix register corruption in no-SIMD contextsEric Biggers1-1/+2
Restore the SIMD usability check that was removed by commit a59e5468a921 ("crypto: arm64/poly1305 - Add block-only interface"). This safety check is cheap and is well worth eliminating a footgun. While the Poly1305 functions should not be called when SIMD registers are unusable, if they are anyway, they should just do the right thing instead of corrupting random tasks' registers and/or computing incorrect MACs. Fixing this is also needed for poly1305_kunit to pass. Just use may_use_simd() instead of the original crypto_simd_usable(), since poly1305_kunit won't rely on crypto_simd_disabled_for_test. Fixes: a59e5468a921 ("crypto: arm64/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250706231100.176113-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-11lib/crypto: arm/poly1305: Fix register corruption in no-SIMD contextsEric Biggers1-1/+2
Restore the SIMD usability check that was removed by commit 773426f4771b ("crypto: arm/poly1305 - Add block-only interface"). This safety check is cheap and is well worth eliminating a footgun. While the Poly1305 functions should not be called when SIMD registers are unusable, if they are anyway, they should just do the right thing instead of corrupting random tasks' registers and/or computing incorrect MACs. Fixing this is also needed for poly1305_kunit to pass. Just use may_use_simd() instead of the original crypto_simd_usable(), since poly1305_kunit won't rely on crypto_simd_disabled_for_test. Fixes: 773426f4771b ("crypto: arm/poly1305 - Add block-only interface") Cc: stable@vger.kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250706231100.176113-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-08lib/crypto: hash_info: Move hash_info.c into lib/crypto/Eric Biggers3-0/+68
crypto/hash_info.c just contains a couple of arrays that map HASH_ALGO_* algorithm IDs to properties of those algorithms. It is compiled only when CRYPTO_HASH_INFO=y, but currently CRYPTO_HASH_INFO depends on CRYPTO. Since this can be useful without the old-school crypto API, move it into lib/crypto/ so that it no longer depends on CRYPTO. This eliminates the need for FS_VERITY to select CRYPTO after it's been converted to use lib/crypto/. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250630172224.46909-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04lib/crypto: x86/sha256: Remove unnecessary checks for nblocks==0Eric Biggers4-10/+0
Since sha256_blocks() is called only with nblocks >= 1, remove unnecessary checks for nblocks == 0 from the x86 SHA-256 assembly code. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250704023958.73274-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-07-04lib/crypto: x86/sha256: Move static_call above kernel-mode FPU sectionEric Biggers5-34/+25
As I did for sha512_blocks(), reorganize x86's sha256_blocks() to be just a static_call. To achieve that, for each assembly function add a C function that handles the kernel-mode FPU section and fallback. While this increases total code size slightly, the amount of code actually executed on a given system does not increase, and it is slightly more efficient since it eliminates the extra static_key. It also makes the assembly functions be called with standard direct calls instead of static calls, eliminating the need for ANNOTATE_NOENDBR. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250704023958.73274-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>