Skip to content

drivers: spi: stm32: add RTIO-DMA support#101331

Open
juickar wants to merge 2 commits intozephyrproject-rtos:mainfrom
juickar:spi_rtio_dma
Open

drivers: spi: stm32: add RTIO-DMA support#101331
juickar wants to merge 2 commits intozephyrproject-rtos:mainfrom
juickar:spi_rtio_dma

Conversation

@juickar
Copy link
Contributor

@juickar juickar commented Dec 19, 2025

This PR adds the support of RTIO SPI transactions using DMA, it also adds a test case for the loopback test.

Tested on our test bench using H7 and non-H7 SPI compatible boards

@juickar juickar marked this pull request as ready for review December 19, 2025 10:46
@zephyrbot zephyrbot added area: Tests Issues related to a particular existing or missing test area: SPI SPI bus platform: STM32 ST Micro STM32 labels Dec 19, 2025
Copy link
Contributor

@teburd teburd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we toggle power here. Is the spi fifo pair flushed on the completion or is only DMA done?

@teburd teburd mentioned this pull request Dec 23, 2025
14 tasks
@juickar
Copy link
Contributor Author

juickar commented Jan 5, 2026

Suppose we toggle power here. Is the spi fifo pair flushed on the completion or is only DMA done?

Only DMA is done

@teburd
Copy link
Contributor

teburd commented Jan 5, 2026

Suppose we toggle power here. Is the spi fifo pair flushed on the completion or is only DMA done?

Only DMA is done

This should really only be marking the completion when the bus peripheral says its done if possible. Otherwise there could be subtle races on power/clock gating happening before the data actually was shifted out.

@wouterh
Copy link
Contributor

wouterh commented Jan 6, 2026

I have tested this for my use case:

  • custom board (L4 based) with ST iis3dwb accelerometer
  • device runtime power management enabled

It works sometimes and if it works, I see nice DMA transfers in my logic analyzer and the code is able to keep up with the accelerometer (3-axis data @ 26kHz).

However, it crashes most of the time. I think the crashes have something to with power management.

A couple of crashes, I've seen:

[00:00:07.856,000] <err> os: ***** BUS FAULT *****
[00:00:07.856,000] <err> os:   Precise data bus error
[00:00:07.856,000] <err> os:   BFAR Address: 0xf7ff9805
[00:00:07.856,000] <err> os: r0/a1:  0xf7ff9805  r1/a2:  0x08010f11  r2/a3:  0x2001f778
[00:00:07.856,000] <err> os: r3/a4:  0x2001f778 r12/ip:  0x00000000 r14/lr:  0x080086db
[00:00:07.856,000] <err> os:  xpsr:  0x01000200
[00:00:07.856,000] <err> os: s[ 0]:  0x080032a0  s[ 1]:  0x080032a0  s[ 2]:  0x000f423f  s[ 3]:  0x00000000
[00:00:07.856,000] <err> os: s[ 4]:  0x00000000  s[ 5]:  0x00000000  s[ 6]:  0x080132a0  s[ 7]:  0x00000003
[00:00:07.856,000] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x200009c0  s[10]:  0x200009e8  s[11]:  0x00000000
[00:00:07.856,000] <err> os: s[12]:  0x20000080  s[13]:  0x0800fb9d  s[14]:  0x080132a0  s[15]:  0x00000002
[00:00:07.856,000] <err> os: fpscr:  0x00000000
[00:00:07.856,000] <err> os: Faulting instruction address (r15/pc): 0x08011faa
[00:00:07.856,000] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0

Faulting instruction points to include/zephyr/device.h:733

[00:00:58.732,000] <err> os: ***** USAGE FAULT *****
[00:00:58.732,000] <err> os:   Illegal use of the EPSR
[00:00:58.732,000] <err> os: r0/a1:  0x0003bbb6  r1/a2:  0x00000000  r2/a3:  0x40007c00
[00:00:58.732,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x00000000 r14/lr:  0x08009bf1
[00:00:58.732,000] <err> os:  xpsr:  0x00000000
[00:00:58.732,000] <err> os: s[ 0]:  0x00000000  s[ 1]:  0x00000000  s[ 2]:  0x2001f770  s[ 3]:  0x080124f1
[00:00:58.732,000] <err> os: s[ 4]:  0x01000003  s[ 5]:  0x0801f2bb  s[ 6]:  0x0801f320  s[ 7]:  0x08000200
[00:00:58.732,000] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x080132a0  s[11]:  0x00000000
[00:00:58.732,000] <err> os: s[12]:  0x00000000  s[13]:  0x200009c0  s[14]:  0x200009e8  s[15]:  0x00000000
[00:00:58.732,000] <err> os: fpscr:  0x20000080
[00:00:58.732,000] <err> os: Faulting instruction address (r15/pc): 0x080132a0
[00:00:58.732,000] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0

Faulting instruction points to drivers/power_domain/power_domain_gpio.c:139

[00:03:10.752,000] <err> os: ***** BUS FAULT *****
[00:03:10.752,000] <err> os:   Instruction bus error
[00:03:10.752,000] <err> os: r0/a1:  0x2001f760  r1/a2:  0x0000000e  r2/a3:  0x00000000
[00:03:10.752,000] <err> os: r3/a4:  0x2001f71c r12/ip:  0x08007cdd r14/lr:  0x0a404fed
[00:03:10.752,000] <err> os:  xpsr:  0x61000000
[00:03:10.752,000] <err> os: s[ 0]:  0x0000000a  s[ 1]:  0x2001f714  s[ 2]:  0x00000004  s[ 3]:  0x00000000
[00:03:10.752,000] <err> os: s[ 4]:  0x00007ae2  s[ 5]:  0x00000000  s[ 6]:  0x08013830  s[ 7]:  0x0000000e
[00:03:10.752,000] <err> os: s[ 8]:  0x080131ec  s[ 9]:  0x2001f7b8  s[10]:  0x0000000a  s[11]:  0x00000000
[00:03:10.752,000] <err> os: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0xe000ed00
[00:03:10.752,000] <err> os: fpscr:  0x20002838
[00:03:10.752,000] <err> os: Faulting instruction address (r15/pc): 0x0a404420

arm-none-eabi-addr2line can't figure out that one.

Could this be related to the subtle races that @teburd mentioned?

I also have a quick and dirty RTIO+DMA implementation (see https://github.com/versasense/zephyr/tree/rtio-dma) that I only tested on my board. I haven't seen any crashes with that. In that implementation, I do wait for TXE before marking completion.

@petejohanson-adi
Copy link
Contributor

Suppose we toggle power here. Is the spi fifo pair flushed on the completion or is only DMA done?

Only DMA is done

This should really only be marking the completion when the bus peripheral says its done if possible. Otherwise there could be subtle races on power/clock gating happening before the data actually was shifted out.

Yeah, that was my expectation of this as well. I happened to be working on this in parallel before this PR was up, and my implementation uses a timer, if needed, to check for TX fifo levels before reporting any TX operation as completed, to ensure we are actually "done" before continuing the next operation in the sequence. The timer is needed, AFAIK, since some STM32 SoCs don't have an interrupt for TX FIFO empty and the TXE, if there, is just to indicate there's some amount of space in the TX FIFO for more to be added.

You can see my version (which still needs cleaning up, and is more PoC level at this point) at c4bc4ad

@petejohanson-adi
Copy link
Contributor

With DMA enabled, my sensor streaming testing just fails to init:

[00:00:00.005,000] <err> ADXL362: wrong part_id: 0
uart:~$ 

If I disable DMA, and just use streaming with just interrupts, it instead faults:

uart:~$ sensor stream adxl362@0 on fifo_wm incl 
Enabling stream...
Trigger (1 / data_ready) detected
Trigger (10 / fifo_wm) detected
[00:15:24.986,000] <err> os: ***** USAGE FAULT *****
[00:15:24.986,000] <err> os:   Illegal use of the EPSR
[00:15:24.986,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x24004060  r2/a3:  0x24004064
[00:15:24.986,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x0000d6d8 r14/lr:  0x080011a9
[00:15:24.986,000] <err> os:  xpsr:  0x60000000
[00:15:24.986,000] <err> os: Faulting instruction address (r15/pc): 0x00000000
[00:15:24.986,000] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
[00:15:24.986,000] <err> os: Current thread: 0x24001868 (sensor_shell_processing_tid)
[00:15:25.007,000] <err> os: Halting system

Both setups are working fine in the implementation I had linked a few minutes ago, with the same configs.

@petejohanson-adi
Copy link
Contributor

I also have a quick and dirty RTIO+DMA implementation (see https://github.com/versasense/zephyr/tree/rtio-dma) that I only tested on my board. I haven't seen any crashes with that. In that implementation, I do wait for TXE before marking completion.

Do note, that TXE doesn't really mean empty, despite that naming. From https://www.st.com/resource/en/reference_manual/rm0434-multiprotocol-wireless-32bit-mcu-armbased-cortexm4-with-fpu-bluetooth-lowenergy-and-802154-radio-solution-stmicroelectronics.pdf section 35.4.10, for instance:

Tx buffer empty flag (TXE)
The TXE flag is set when transmission TXFIFO has enough space to store data to send.
TXE flag is linked to the TXFIFO level. The flag goes high and stays high until the TXFIFO
level is lower or equal to 1/2 of the FIFO depth. An interrupt can be generated if the TXEIE
bit in the SPIx_CR2 register is set. The bit is cleared automatically when the TXFIFO level
becomes greater than 1/2.
Rx buffer not empty (RXNE)
The RXNE flag is set depending on the FRXTH bit value

Which is why my version is actually checking if the TX FIFO is actually empty, using an additional timer if needed, since TXE isn't actually enough to be sure a given RTIO TX operation has been fully sent.

AFAICT, that's the only way to be 100% sure that after a DMA TX is complete, that the SPI peripheral has actually sent all the data

@juickar
Copy link
Contributor Author

juickar commented Jan 7, 2026

With DMA enabled, my sensor streaming testing just fails to init:

[00:00:00.005,000] <err> ADXL362: wrong part_id: 0
uart:~$ 

If I disable DMA, and just use streaming with just interrupts, it instead faults:

uart:~$ sensor stream adxl362@0 on fifo_wm incl 
Enabling stream...
Trigger (1 / data_ready) detected
Trigger (10 / fifo_wm) detected
[00:15:24.986,000] <err> os: ***** USAGE FAULT *****
[00:15:24.986,000] <err> os:   Illegal use of the EPSR
[00:15:24.986,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x24004060  r2/a3:  0x24004064
[00:15:24.986,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x0000d6d8 r14/lr:  0x080011a9
[00:15:24.986,000] <err> os:  xpsr:  0x60000000
[00:15:24.986,000] <err> os: Faulting instruction address (r15/pc): 0x00000000
[00:15:24.986,000] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
[00:15:24.986,000] <err> os: Current thread: 0x24001868 (sensor_shell_processing_tid)
[00:15:25.007,000] <err> os: Halting system

Both setups are working fine in the implementation I had linked a few minutes ago, with the same configs.

I'm currently working on adding a check to make sure TX Fifo is empty. These errors happened with my changes ?

@petejohanson-adi
Copy link
Contributor

With DMA enabled, my sensor streaming testing just fails to init:

[00:00:00.005,000] <err> ADXL362: wrong part_id: 0
uart:~$ 

If I disable DMA, and just use streaming with just interrupts, it instead faults:

uart:~$ sensor stream adxl362@0 on fifo_wm incl 
Enabling stream...
Trigger (1 / data_ready) detected
Trigger (10 / fifo_wm) detected
[00:15:24.986,000] <err> os: ***** USAGE FAULT *****
[00:15:24.986,000] <err> os:   Illegal use of the EPSR
[00:15:24.986,000] <err> os: r0/a1:  0x00000000  r1/a2:  0x24004060  r2/a3:  0x24004064
[00:15:24.986,000] <err> os: r3/a4:  0x00000000 r12/ip:  0x0000d6d8 r14/lr:  0x080011a9
[00:15:24.986,000] <err> os:  xpsr:  0x60000000
[00:15:24.986,000] <err> os: Faulting instruction address (r15/pc): 0x00000000
[00:15:24.986,000] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
[00:15:24.986,000] <err> os: Current thread: 0x24001868 (sensor_shell_processing_tid)
[00:15:25.007,000] <err> os: Halting system

Both setups are working fine in the implementation I had linked a few minutes ago, with the same configs.

I'm currently working on adding a check to make sure TX Fifo is empty. These errors happened with my changes ?

Yes, those errors occurred on your PR branch.

@juickar juickar force-pushed the spi_rtio_dma branch 3 times, most recently from 5121bf6 to 6f47d7c Compare January 9, 2026 15:37
@juickar
Copy link
Contributor Author

juickar commented Jan 9, 2026

Change made:

Added the check for TX fifo to be empty
Added the missing part needed from transceive_dma to spi_stm32_iodev_msg_start

@@ -187,16 +187,21 @@ static uint8_t bits2bytes(spi_operation_t operation)
*/
static __aligned(32) uint32_t dummy_rx_tx_buffer __nocache;

#if defined(CONFIG_SPI_STM32_DMA) && defined(CONFIG_SPI_RTIO)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update description of SPI_STM32_DMA symbol explaining that it is compatible with SPI_RTIO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 265 to 274
int err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel);

if (err != 0) {
LOG_DBG("Rx dma_stop failed with error %d", err);
}
err = dma_stop(spi_dma_data->dma_tx.dma_dev, spi_dma_data->dma_tx.channel);
if (err != 0) {
LOG_DBG("Tx dma_stop failed with error %d", err);
}
spi_stm32_iodev_complete(spi_dev, status);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel);
if (err != 0) {
LOG_DBG("Rx dma_stop failed with error %d", err);
}
err = dma_stop(spi_dma_data->dma_tx.dma_dev, spi_dma_data->dma_tx.channel);
if (err != 0) {
LOG_DBG("Tx dma_stop failed with error %d", err);
}
spi_stm32_iodev_complete(spi_dev, status);
int err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel);
if (err != 0) {
LOG_DBG("Rx dma_stop failed with error %d", err);
}
err = dma_stop(spi_dma_data->dma_tx.dma_dev, spi_dma_data->dma_tx.channel);
if (err != 0) {
LOG_DBG("Tx dma_stop failed with error %d", err);
}
spi_stm32_iodev_complete(spi_dev, status);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

spi_stm32_iodev_complete(spi_dev, -EIO);
return;
}
const struct spi_stm32_config *cfg = spi_dev->config;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break a line before this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 235 to 255
while (LL_SPI_GetTxFIFOLevel(cfg->spi) > 0) {
}
#endif /* SPI_SR_FTLVL */

#ifdef CONFIG_SPI_STM32_ERRATA_BUSY
WAIT_FOR(!ll_func_spi_dma_busy(cfg->spi),
CONFIG_SPI_STM32_BUSY_FLAG_TIMEOUT,
k_yield());
#else
/* wait until spi is no more busy (spi TX fifo is really empty) */
while (ll_func_spi_dma_busy(cfg->spi) && LL_SPI_IsEnabled(cfg->spi)) {
#if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi)
uint32_t width = SPI_WORD_SIZE_GET(spi_dma_data->ctx.config->operation);
/* The TXC flag is not raised at the end of 9, 17 or 25
* bit transfer, so disable the SPI in these cases to avoid being stuck.
*/
if ((width == 9U) || (width == 17U) || (width == 25U)) {
k_busy_wait(1000);
ll_func_disable_spi(cfg->spi);
}
#endif /* DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... I really have to question the design here.. The DMA callback occurs in an interrupt context, and these busy loops seem like a really bad idea to just duplicate in here. You can see in the implementation I have in #101971 that instead I use a secondary timer that will fire and check for the FIFO levels, DMA being done, etc. to ensure we can not block, but also not complete the transaction until we're really ready to.

Copy link
Contributor

@teburd teburd Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunate if the hardware does not signal fifo empty through an interrupt, but in that case yes... to make it asynchronous and not block in the ISR context your options are basically going to be...

  • Schedule a timer to check this in the next tick or whatever you think is appropriate to poll for the completion
  • Offload to a work queue/thread to poll for completion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the TXE interrupt that I use in my implementation. I still have the busy loop in there because I didn't have time to figure out what the BSY errata is about. It might not be needed anymore.

@petejohanson-adi regarding your timer-based implementation. I see a timer timeout of 100µs in there. Isn't that a long time for a bus running at several MHz?
I am a bit tight on time this week, but I'll see if I can squeeze in a test of your code on my use case...

Copy link
Contributor Author

@juickar juickar Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and maybe for the h7 compatible boards we could use the EOT interrupts:
see at 50.11.6

3 EOT: end of transfer
EOT is set by hardware as soon as a full transfer is complete, that is when TSIZE number of
data have been transmitted and/or received on the SPI. EOT is cleared by software write 1 to
EOTC bit at SPI_IFCR.
EOT flag triggers an interrupt if EOTIE bit is set.
If DXP flag is used until TXTF flag is set and DXPIE is cleared, EOT can be used to
download the last packets contained into RxFIFO in one-shot.
0: transfer is on-going or not started
1: transfer complete
In master, EOT event terminates the data transaction and handles SS output optionally. When
CRC is applied, the EOT event is extended over the CRC frame transaction.

since the TXE interrupt does not exist for these boards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the TXE interrupt that I use in my implementation. I still have the busy loop in there because I didn't have time to figure out what the BSY errata is about. It might not be needed anymore.

See my comment here: #101331 (comment)

TXE on the couple reference manuals I checked for different STM32 SoCs is a misnomer, and doesn't actually mean completely empty, just "TX FiFO has room for more", so IMHO isn't enough to mark the transaction as completed.

On platforms where OET is available, that does seem to match expectations, and could be used directly without any timer/polling approaches.

@petejohanson-adi regarding your timer-based implementation. I see a timer timeout of 100µs in there. Isn't that a long time for a bus running at several MHz? I am a bit tight on time this week, but I'll see if I can squeeze in a test of your code on my use case...

Yes, this definitely needs better tuning, which is why I opened this just as a draft for discussion/comparison. On the sensor WG call today, we discussed implications for this, how it might be tuned, possible nicer integration via the ideas discussed in #86503 etc.

}
#endif /* DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) */

#ifdef CONFIG_SPI_STM32_DMA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block is duplicated, roughly, from elsewhere in the code. Ideally should be extracted and re-used.

This is also missing checks for the buffers being in cached memory, so this silently fails if they are, instead of checking/warning/exiting earlier.

I was able to track that down and get this working as well for my sensor streaming test case, once I disabled DCACHE, but it would be good to have this check (which properly catches this case in tranceive_dma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added helper functions and the DCACHE test

@JarmouniA
Copy link
Contributor

JarmouniA commented Jan 12, 2026

@juickar You may want to take a look at #102014, I don't know if it will have a direct impact on this PR though.

@juickar
Copy link
Contributor Author

juickar commented Jan 13, 2026

The comments has been addressed, but the TX completion check still needs to be reworked.


#if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi)
if (transfer_dir == LL_SPI_HALF_DUPLEX_RX &&
LL_SPI_GetMode(spi) == LL_SPI_MODE_MASTER) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LL_SPI_GetMode(spi) == LL_SPI_MODE_MASTER) {
LL_SPI_GetMode(spi) == LL_SPI_MODE_MASTER) {
static int spi_dma_move_rx_buffers(const struct device *dev, size_t dma_len);

static void spi_stm32_enable_dma_transfer( SPI_TypeDef *spi, uint32_t transfer_dir){
#if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi)
#if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi)
/* wait until spi is no more busy (spi TX fifo is really empty) */
while (ll_func_spi_dma_busy(cfg->spi) && LL_SPI_IsEnabled(cfg->spi)) {
#if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi)
uint32_t width = SPI_WORD_SIZE_GET(spi_dma_data->ctx.config->operation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an empty line below this local variable definition?

LL_SPI_DisableDMAReq_RX(cfg->spi);
#endif /* ! st_stm32h7_spi */

int err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also, add an empty line below. Or alternatively:

		int err;

		err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel);
		(...)
Comment on lines 770 to 774
if (!(is_dummy_buffer(buf)) &&
!stm32_buf_in_nocache((uintptr_t)buf->buf, buf->len)) {
return false;
}
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!(is_dummy_buffer(buf)) &&
!stm32_buf_in_nocache((uintptr_t)buf->buf, buf->len)) {
return false;
}
return true;
return (buf->buf == NULL) || stm32_buf_in_nocache((uintptr_t)buf->buf, buf->len);

That said, these helper functions may not be needed, see below.

Comment on lines 827 to 828
const struct spi_buf tx = {.buf = (void *)tx_buf, .len = buf_len};
const struct spi_buf rx = {.buf = (void *)rx_buf, .len = buf_len};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const struct spi_buf tx = {.buf = (void *)tx_buf, .len = buf_len};
const struct spi_buf rx = {.buf = (void *)rx_buf, .len = buf_len};
const struct spi_buf tx = {.buf = (void *)tx_buf, .len = buf_len};
const struct spi_buf rx = {.buf = (void *)rx_buf, .len = buf_len};
Comment on lines 830 to 831
if ((tx_buf != NULL && !spi_buf_set_in_nocache(&tx)) ||
(rx_buf != NULL && !spi_buf_set_in_nocache(&rx))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spi_buf_set_in_nocache() already addresses NULL pointers.

Suggested change
if ((tx_buf != NULL && !spi_buf_set_in_nocache(&tx)) ||
(rx_buf != NULL && !spi_buf_set_in_nocache(&rx))) {
if (!spi_buf_set_in_nocache(&tx)) || !spi_buf_set_in_nocache(&rx))) {

Alternatively, maybe call stm32_buf_in_nocache() striaght?

	if ((tx_buf != NULL && !stm32_buf_in_nocache((uintptr_t)tx_buf, buf_len)) ||
	    (rx_buf != NULL && !stm32_buf_in_nocache((uintptr_t)rx_buf, buf_len))) {
/* Assert CS before enabling transfer */
spi_stm32_cs_control(dev, true);

spi_stm32_move_dma_buffers(dev, transfer_dir);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	ret = spi_stm32_move_dma_buffers(dev, transfer_dir);
	if (ret != 0) {
		...
	}
struct spi_stm32_data *data = dev->data;
size_t dma_len;
int ret;
data->status_flags = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an empty line above to separate from local variable definitions.

LL_SPI_EnableDMAReq_TX(spi);
} else {
LL_SPI_EnableDMAReq_RX(spi);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sequence is repeated 3 times. Maybe worth to factorize in a helper function (or not):

static void enable_ll_spi_dma_request(SPI_TypeDef *spi, uint32_t transfer_dir)
{
	if (transfer_dir == LL_SPI_FULL_DUPLEX) {
		LL_SPI_EnableDMAReq_RX(spi);
		LL_SPI_EnableDMAReq_TX(spi);
	} else if (transfer_dir == LL_SPI_HALF_DUPLEX_TX) {
		LL_SPI_EnableDMAReq_TX(spi);
	} else {
		LL_SPI_EnableDMAReq_RX(spi);
	}
}
@petejohanson-adi
Copy link
Contributor

The comments has been addressed, but the TX completion check still needs to be reworked.

So... I've been really digging into this. This is a capture of one of the ADXL362 initialization interactions, to read a register value, using the branch in this PR, cherry picking the fix in #101912 and with local changes to properly set up the DMA channels in devicetree using the sensor_shell sample and an ADXL362 sensor:

Screenshot 2026-01-14 at 12 14 31 PM

The pins for the LA are as follows:

  • 0 - SCK
  • 1 - MISO
  • 2 - MOSI
  • 3 - CS
  • 4 - a custom pin I'm toggling right before enabling the DMA transfer, and toggling back in the dma_callback as soon as it notifies that TX and RX are both done.

I've reordered them in the display, however, to make it a bit easier to compare where the transfer completes (clocking out/in of the data stops) and where the DMA callback is invoked leading to the D4 capture pin toggling back low.

I've captured similar data for both H723ZG and WB55RG nucleo kits with the same EVAL-ADXL362-ARDZ shield.

Despite my best efforts to disabling logging, etc to reduce any latency, I am not ever seeing our DMA callback getting invoked before the peripheral has managed to complete the actual data transfer. This certainly could use more testing on additional targets to really verify (which perhaps @juickar has done, and if so, excellent!), but my initial hesitation about the current "RTIO completion" logic is significantly reduced.

@petejohanson-adi
Copy link
Contributor

@juickar This does need a rebase to account for recent change in main.

@wouterh
Copy link
Contributor

wouterh commented Jan 14, 2026

Despite my best efforts to disabling logging, etc to reduce any latency, I am not ever seeing our DMA callback getting invoked before the peripheral has managed to complete the actual data transfer. This certainly could use more testing on additional targets to really verify (which perhaps @juickar has done, and if so, excellent!), but my initial hesitation about the current "RTIO completion" logic is significantly reduced.

These are full duplex transfers, I assume? I've been thinking about it and the RX DMA transfer can't be completed before the SPI transfer is completed and all data is actually received. The code checks completion of both TX and RX DMA transfers, which explains what you see, I guess?

Half duplex transfers might need other logic?

@petejohanson-adi
Copy link
Contributor

Despite my best efforts to disabling logging, etc to reduce any latency, I am not ever seeing our DMA callback getting invoked before the peripheral has managed to complete the actual data transfer. This certainly could use more testing on additional targets to really verify (which perhaps @juickar has done, and if so, excellent!), but my initial hesitation about the current "RTIO completion" logic is significantly reduced.

These are full duplex transfers, I assume? I've been thinking about it and the RX DMA transfer can't be completed before the SPI transfer is completed and all data is actually received. The code checks completion of both TX and RX DMA transfers, which explains what you see, I guess?

There's actually a mix here. And I agree on the RX side, that's simpler since it's impossible, AFAICT, for the DMA transfer to complete there without if first being completed on the SPI peripheral FIFO. But yes, the code is waiting for both DMA events to be completed, which may be superfluous for some RTIO msg types.

Half duplex transfers might need other logic?

It seems to work fine, but in theory may be able to be optimized to trigger slightly earlier for half duplex transfers.

@wouterh
Copy link
Contributor

wouterh commented Jan 15, 2026

Half duplex transfers might need other logic?

It seems to work fine, but in theory may be able to be optimized to trigger slightly earlier for half duplex transfers.

Sorry if I wasn't entirely clear. With half duplex transfers, I meant transfers that trigger this code path where only a DMA TX transfer is initiated. In that case there is no DMA RX transfer to wait for and extra code might be needed to make sure the SPI transfer is completed after the DMA TX transfer completes.

@juickar
Copy link
Contributor Author

juickar commented Jan 15, 2026

@juickar This does need a rebase to account for recent change in main.

@petejohanson-adi, I'm currently waiting for another PR that is being prepared to simplify the driver by adding helper functions, etc., before rebasing.

@higginsa1
Copy link

Hi everyone,

I tested the changes from this PR: #101331
, but I still can’t get asynchronous SPI transfers to work.

I’m working on an STM32H533 and I need to implement async SPI transmission. It’s possible I implemented something incorrectly.

With the implementation below, my system runs into a fatal kernel error with reason 0x19.

Setup

My prj.conf looks like this:
image

What I see
image

In my implementation, when I start the SPI transmit I set debug pin 1 high. After the transfer finishes, this pin goes low again.

At this point I expect the callback to be called. I indicate the callback execution by toggling debug pin 2, but the callback never fires.

Code
image

image

If anyone has an idea what I might be missing (especially for async SPI on STM32H5), I’d really appreciate any hints or pointers. Thanks!

@juickar
Copy link
Contributor Author

juickar commented Jan 29, 2026

Hello @higginsa1, If you enable CONFIG_SPI_RTIO, you don't have to set CONFIG_SPI_ASYNC. You only have to call the spi_stm32_iodev_submit to use RTIO with DMA.

Also, I'm currently waiting for these 2 PRs to be merged before rebasing:

#102882
#103079

@MaureenHelm MaureenHelm added this to the v4.4.0 milestone Feb 10, 2026
@MaureenHelm
Copy link
Member

Hello @higginsa1, If you enable CONFIG_SPI_RTIO, you don't have to set CONFIG_SPI_ASYNC. You only have to call the spi_stm32_iodev_submit to use RTIO with DMA.

Also, I'm currently waiting for these 2 PRs to be merged before rebasing:

#102882 #103079

Dependencies have been merged. Will you be able to return to this before the 4.4 merge window closes?

@juickar juickar force-pushed the spi_rtio_dma branch 2 times, most recently from e146602 to 356ea93 Compare February 26, 2026 09:36
@juickar
Copy link
Contributor Author

juickar commented Feb 26, 2026

  • Rebased
  • Comments adressed
  • Changed the check completion for the H7 compatible boards to use the EOT interrupt
juickar and others added 2 commits February 26, 2026 10:41
Add SPI DMA support for RTIO

Signed-off-by: Julien Racki <julien.racki-ext@st.com>
Co-authored-by: Wouter Horré <wouter@versasense.com>
Add a test case for STM32 SPI RTIO with DMA.

Signed-off-by: Julien Racki <julien.racki-ext@st.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: SPI SPI bus area: Tests Issues related to a particular existing or missing test platform: STM32 ST Micro STM32