drivers: spi: stm32: add RTIO-DMA support#101331
drivers: spi: stm32: add RTIO-DMA support#101331juickar wants to merge 2 commits intozephyrproject-rtos:mainfrom
Conversation
e0fc363 to
336d253
Compare
Only DMA is done |
This should really only be marking the completion when the bus peripheral says its done if possible. Otherwise there could be subtle races on power/clock gating happening before the data actually was shifted out. |
|
I have tested this for my use case:
It works sometimes and if it works, I see nice DMA transfers in my logic analyzer and the code is able to keep up with the accelerometer (3-axis data @ 26kHz). However, it crashes most of the time. I think the crashes have something to with power management. A couple of crashes, I've seen: Faulting instruction points to Faulting instruction points to
Could this be related to the subtle races that @teburd mentioned? I also have a quick and dirty RTIO+DMA implementation (see https://github.com/versasense/zephyr/tree/rtio-dma) that I only tested on my board. I haven't seen any crashes with that. In that implementation, I do wait for TXE before marking completion. |
Yeah, that was my expectation of this as well. I happened to be working on this in parallel before this PR was up, and my implementation uses a timer, if needed, to check for TX fifo levels before reporting any TX operation as completed, to ensure we are actually "done" before continuing the next operation in the sequence. The timer is needed, AFAIK, since some STM32 SoCs don't have an interrupt for TX FIFO empty and the TXE, if there, is just to indicate there's some amount of space in the TX FIFO for more to be added. You can see my version (which still needs cleaning up, and is more PoC level at this point) at c4bc4ad |
|
With DMA enabled, my sensor streaming testing just fails to init: If I disable DMA, and just use streaming with just interrupts, it instead faults: Both setups are working fine in the implementation I had linked a few minutes ago, with the same configs. |
Do note, that TXE doesn't really mean empty, despite that naming. From https://www.st.com/resource/en/reference_manual/rm0434-multiprotocol-wireless-32bit-mcu-armbased-cortexm4-with-fpu-bluetooth-lowenergy-and-802154-radio-solution-stmicroelectronics.pdf section 35.4.10, for instance:
Which is why my version is actually checking if the TX FIFO is actually empty, using an additional timer if needed, since TXE isn't actually enough to be sure a given RTIO TX operation has been fully sent. AFAICT, that's the only way to be 100% sure that after a DMA TX is complete, that the SPI peripheral has actually sent all the data |
I'm currently working on adding a check to make sure TX Fifo is empty. These errors happened with my changes ? |
Yes, those errors occurred on your PR branch. |
5121bf6 to
6f47d7c
Compare
|
Change made: Added the check for TX fifo to be empty |
drivers/spi/spi_stm32.c
Outdated
| @@ -187,16 +187,21 @@ static uint8_t bits2bytes(spi_operation_t operation) | |||
| */ | |||
| static __aligned(32) uint32_t dummy_rx_tx_buffer __nocache; | |||
|
|
|||
| #if defined(CONFIG_SPI_STM32_DMA) && defined(CONFIG_SPI_RTIO) | |||
There was a problem hiding this comment.
Update description of SPI_STM32_DMA symbol explaining that it is compatible with SPI_RTIO.
drivers/spi/spi_stm32.c
Outdated
| int err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel); | ||
|
|
||
| if (err != 0) { | ||
| LOG_DBG("Rx dma_stop failed with error %d", err); | ||
| } | ||
| err = dma_stop(spi_dma_data->dma_tx.dma_dev, spi_dma_data->dma_tx.channel); | ||
| if (err != 0) { | ||
| LOG_DBG("Tx dma_stop failed with error %d", err); | ||
| } | ||
| spi_stm32_iodev_complete(spi_dev, status); |
There was a problem hiding this comment.
| int err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel); | |
| if (err != 0) { | |
| LOG_DBG("Rx dma_stop failed with error %d", err); | |
| } | |
| err = dma_stop(spi_dma_data->dma_tx.dma_dev, spi_dma_data->dma_tx.channel); | |
| if (err != 0) { | |
| LOG_DBG("Tx dma_stop failed with error %d", err); | |
| } | |
| spi_stm32_iodev_complete(spi_dev, status); | |
| int err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel); | |
| if (err != 0) { | |
| LOG_DBG("Rx dma_stop failed with error %d", err); | |
| } | |
| err = dma_stop(spi_dma_data->dma_tx.dma_dev, spi_dma_data->dma_tx.channel); | |
| if (err != 0) { | |
| LOG_DBG("Tx dma_stop failed with error %d", err); | |
| } | |
| spi_stm32_iodev_complete(spi_dev, status); |
drivers/spi/spi_stm32.c
Outdated
| spi_stm32_iodev_complete(spi_dev, -EIO); | ||
| return; | ||
| } | ||
| const struct spi_stm32_config *cfg = spi_dev->config; |
drivers/spi/spi_stm32.c
Outdated
| while (LL_SPI_GetTxFIFOLevel(cfg->spi) > 0) { | ||
| } | ||
| #endif /* SPI_SR_FTLVL */ | ||
|
|
||
| #ifdef CONFIG_SPI_STM32_ERRATA_BUSY | ||
| WAIT_FOR(!ll_func_spi_dma_busy(cfg->spi), | ||
| CONFIG_SPI_STM32_BUSY_FLAG_TIMEOUT, | ||
| k_yield()); | ||
| #else | ||
| /* wait until spi is no more busy (spi TX fifo is really empty) */ | ||
| while (ll_func_spi_dma_busy(cfg->spi) && LL_SPI_IsEnabled(cfg->spi)) { | ||
| #if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) | ||
| uint32_t width = SPI_WORD_SIZE_GET(spi_dma_data->ctx.config->operation); | ||
| /* The TXC flag is not raised at the end of 9, 17 or 25 | ||
| * bit transfer, so disable the SPI in these cases to avoid being stuck. | ||
| */ | ||
| if ((width == 9U) || (width == 17U) || (width == 25U)) { | ||
| k_busy_wait(1000); | ||
| ll_func_disable_spi(cfg->spi); | ||
| } | ||
| #endif /* DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) */ |
There was a problem hiding this comment.
So... I really have to question the design here.. The DMA callback occurs in an interrupt context, and these busy loops seem like a really bad idea to just duplicate in here. You can see in the implementation I have in #101971 that instead I use a secondary timer that will fire and check for the FIFO levels, DMA being done, etc. to ensure we can not block, but also not complete the transaction until we're really ready to.
There was a problem hiding this comment.
Unfortunate if the hardware does not signal fifo empty through an interrupt, but in that case yes... to make it asynchronous and not block in the ISR context your options are basically going to be...
- Schedule a timer to check this in the next tick or whatever you think is appropriate to poll for the completion
- Offload to a work queue/thread to poll for completion
There was a problem hiding this comment.
There is the TXE interrupt that I use in my implementation. I still have the busy loop in there because I didn't have time to figure out what the BSY errata is about. It might not be needed anymore.
@petejohanson-adi regarding your timer-based implementation. I see a timer timeout of 100µs in there. Isn't that a long time for a bus running at several MHz?
I am a bit tight on time this week, but I'll see if I can squeeze in a test of your code on my use case...
There was a problem hiding this comment.
and maybe for the h7 compatible boards we could use the EOT interrupts:
see at 50.11.6
3 EOT: end of transfer
EOT is set by hardware as soon as a full transfer is complete, that is when TSIZE number of
data have been transmitted and/or received on the SPI. EOT is cleared by software write 1 to
EOTC bit at SPI_IFCR.
EOT flag triggers an interrupt if EOTIE bit is set.
If DXP flag is used until TXTF flag is set and DXPIE is cleared, EOT can be used to
download the last packets contained into RxFIFO in one-shot.
0: transfer is on-going or not started
1: transfer complete
In master, EOT event terminates the data transaction and handles SS output optionally. When
CRC is applied, the EOT event is extended over the CRC frame transaction.
since the TXE interrupt does not exist for these boards.
There was a problem hiding this comment.
There is the TXE interrupt that I use in my implementation. I still have the busy loop in there because I didn't have time to figure out what the BSY errata is about. It might not be needed anymore.
See my comment here: #101331 (comment)
TXE on the couple reference manuals I checked for different STM32 SoCs is a misnomer, and doesn't actually mean completely empty, just "TX FiFO has room for more", so IMHO isn't enough to mark the transaction as completed.
On platforms where OET is available, that does seem to match expectations, and could be used directly without any timer/polling approaches.
@petejohanson-adi regarding your timer-based implementation. I see a timer timeout of 100µs in there. Isn't that a long time for a bus running at several MHz? I am a bit tight on time this week, but I'll see if I can squeeze in a test of your code on my use case...
Yes, this definitely needs better tuning, which is why I opened this just as a draft for discussion/comparison. On the sensor WG call today, we discussed implications for this, how it might be tuned, possible nicer integration via the ideas discussed in #86503 etc.
| } | ||
| #endif /* DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) */ | ||
|
|
||
| #ifdef CONFIG_SPI_STM32_DMA |
There was a problem hiding this comment.
This whole block is duplicated, roughly, from elsewhere in the code. Ideally should be extracted and re-used.
This is also missing checks for the buffers being in cached memory, so this silently fails if they are, instead of checking/warning/exiting earlier.
I was able to track that down and get this working as well for my sensor streaming test case, once I disabled DCACHE, but it would be good to have this check (which properly catches this case in tranceive_dma.
There was a problem hiding this comment.
Added helper functions and the DCACHE test
|
The comments has been addressed, but the TX completion check still needs to be reworked. |
|
drivers/spi/spi_stm32.c
Outdated
|
|
||
| #if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) | ||
| if (transfer_dir == LL_SPI_HALF_DUPLEX_RX && | ||
| LL_SPI_GetMode(spi) == LL_SPI_MODE_MASTER) { |
There was a problem hiding this comment.
| LL_SPI_GetMode(spi) == LL_SPI_MODE_MASTER) { | |
| LL_SPI_GetMode(spi) == LL_SPI_MODE_MASTER) { |
drivers/spi/spi_stm32.c
Outdated
| static int spi_dma_move_rx_buffers(const struct device *dev, size_t dma_len); | ||
|
|
||
| static void spi_stm32_enable_dma_transfer( SPI_TypeDef *spi, uint32_t transfer_dir){ | ||
| #if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) |
There was a problem hiding this comment.
| #if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) | |
| #if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) |
drivers/spi/spi_stm32.c
Outdated
| /* wait until spi is no more busy (spi TX fifo is really empty) */ | ||
| while (ll_func_spi_dma_busy(cfg->spi) && LL_SPI_IsEnabled(cfg->spi)) { | ||
| #if DT_HAS_COMPAT_STATUS_OKAY(st_stm32h7_spi) | ||
| uint32_t width = SPI_WORD_SIZE_GET(spi_dma_data->ctx.config->operation); |
There was a problem hiding this comment.
Could you also add an empty line below this local variable definition?
drivers/spi/spi_stm32.c
Outdated
| LL_SPI_DisableDMAReq_RX(cfg->spi); | ||
| #endif /* ! st_stm32h7_spi */ | ||
|
|
||
| int err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel); |
There was a problem hiding this comment.
Here also, add an empty line below. Or alternatively:
int err;
err = dma_stop(spi_dma_data->dma_rx.dma_dev, spi_dma_data->dma_rx.channel);
(...)
drivers/spi/spi_stm32.c
Outdated
| if (!(is_dummy_buffer(buf)) && | ||
| !stm32_buf_in_nocache((uintptr_t)buf->buf, buf->len)) { | ||
| return false; | ||
| } | ||
| return true; |
There was a problem hiding this comment.
| if (!(is_dummy_buffer(buf)) && | |
| !stm32_buf_in_nocache((uintptr_t)buf->buf, buf->len)) { | |
| return false; | |
| } | |
| return true; | |
| return (buf->buf == NULL) || stm32_buf_in_nocache((uintptr_t)buf->buf, buf->len); |
That said, these helper functions may not be needed, see below.
drivers/spi/spi_stm32.c
Outdated
| const struct spi_buf tx = {.buf = (void *)tx_buf, .len = buf_len}; | ||
| const struct spi_buf rx = {.buf = (void *)rx_buf, .len = buf_len}; |
There was a problem hiding this comment.
| const struct spi_buf tx = {.buf = (void *)tx_buf, .len = buf_len}; | |
| const struct spi_buf rx = {.buf = (void *)rx_buf, .len = buf_len}; | |
| const struct spi_buf tx = {.buf = (void *)tx_buf, .len = buf_len}; | |
| const struct spi_buf rx = {.buf = (void *)rx_buf, .len = buf_len}; |
drivers/spi/spi_stm32.c
Outdated
| if ((tx_buf != NULL && !spi_buf_set_in_nocache(&tx)) || | ||
| (rx_buf != NULL && !spi_buf_set_in_nocache(&rx))) { |
There was a problem hiding this comment.
spi_buf_set_in_nocache() already addresses NULL pointers.
| if ((tx_buf != NULL && !spi_buf_set_in_nocache(&tx)) || | |
| (rx_buf != NULL && !spi_buf_set_in_nocache(&rx))) { | |
| if (!spi_buf_set_in_nocache(&tx)) || !spi_buf_set_in_nocache(&rx))) { |
Alternatively, maybe call stm32_buf_in_nocache() striaght?
if ((tx_buf != NULL && !stm32_buf_in_nocache((uintptr_t)tx_buf, buf_len)) ||
(rx_buf != NULL && !stm32_buf_in_nocache((uintptr_t)rx_buf, buf_len))) {
drivers/spi/spi_stm32.c
Outdated
| /* Assert CS before enabling transfer */ | ||
| spi_stm32_cs_control(dev, true); | ||
|
|
||
| spi_stm32_move_dma_buffers(dev, transfer_dir); |
There was a problem hiding this comment.
ret = spi_stm32_move_dma_buffers(dev, transfer_dir);
if (ret != 0) {
...
}
drivers/spi/spi_stm32.c
Outdated
| struct spi_stm32_data *data = dev->data; | ||
| size_t dma_len; | ||
| int ret; | ||
| data->status_flags = 0; |
There was a problem hiding this comment.
Add an empty line above to separate from local variable definitions.
drivers/spi/spi_stm32.c
Outdated
| LL_SPI_EnableDMAReq_TX(spi); | ||
| } else { | ||
| LL_SPI_EnableDMAReq_RX(spi); | ||
| } |
There was a problem hiding this comment.
This sequence is repeated 3 times. Maybe worth to factorize in a helper function (or not):
static void enable_ll_spi_dma_request(SPI_TypeDef *spi, uint32_t transfer_dir)
{
if (transfer_dir == LL_SPI_FULL_DUPLEX) {
LL_SPI_EnableDMAReq_RX(spi);
LL_SPI_EnableDMAReq_TX(spi);
} else if (transfer_dir == LL_SPI_HALF_DUPLEX_TX) {
LL_SPI_EnableDMAReq_TX(spi);
} else {
LL_SPI_EnableDMAReq_RX(spi);
}
}
So... I've been really digging into this. This is a capture of one of the ADXL362 initialization interactions, to read a register value, using the branch in this PR, cherry picking the fix in #101912 and with local changes to properly set up the DMA channels in devicetree using the sensor_shell sample and an ADXL362 sensor:
The pins for the LA are as follows:
I've reordered them in the display, however, to make it a bit easier to compare where the transfer completes (clocking out/in of the data stops) and where the DMA callback is invoked leading to the D4 capture pin toggling back low. I've captured similar data for both H723ZG and WB55RG nucleo kits with the same EVAL-ADXL362-ARDZ shield. Despite my best efforts to disabling logging, etc to reduce any latency, I am not ever seeing our DMA callback getting invoked before the peripheral has managed to complete the actual data transfer. This certainly could use more testing on additional targets to really verify (which perhaps @juickar has done, and if so, excellent!), but my initial hesitation about the current "RTIO completion" logic is significantly reduced. |
|
@juickar This does need a rebase to account for recent change in |
These are full duplex transfers, I assume? I've been thinking about it and the RX DMA transfer can't be completed before the SPI transfer is completed and all data is actually received. The code checks completion of both TX and RX DMA transfers, which explains what you see, I guess? Half duplex transfers might need other logic? |
There's actually a mix here. And I agree on the RX side, that's simpler since it's impossible, AFAICT, for the DMA transfer to complete there without if first being completed on the SPI peripheral FIFO. But yes, the code is waiting for both DMA events to be completed, which may be superfluous for some RTIO msg types.
It seems to work fine, but in theory may be able to be optimized to trigger slightly earlier for half duplex transfers. |
Sorry if I wasn't entirely clear. With half duplex transfers, I meant transfers that trigger this code path where only a DMA TX transfer is initiated. In that case there is no DMA RX transfer to wait for and extra code might be needed to make sure the SPI transfer is completed after the DMA TX transfer completes. |
@petejohanson-adi, I'm currently waiting for another PR that is being prepared to simplify the driver by adding helper functions, etc., before rebasing. |
|
Hi everyone, I tested the changes from this PR: #101331 I’m working on an STM32H533 and I need to implement async SPI transmission. It’s possible I implemented something incorrectly. With the implementation below, my system runs into a fatal kernel error with reason 0x19. Setup In my implementation, when I start the SPI transmit I set debug pin 1 high. After the transfer finishes, this pin goes low again. At this point I expect the callback to be called. I indicate the callback execution by toggling debug pin 2, but the callback never fires.
If anyone has an idea what I might be missing (especially for async SPI on STM32H5), I’d really appreciate any hints or pointers. Thanks! |
|
Hello @higginsa1, If you enable CONFIG_SPI_RTIO, you don't have to set CONFIG_SPI_ASYNC. You only have to call the spi_stm32_iodev_submit to use RTIO with DMA. Also, I'm currently waiting for these 2 PRs to be merged before rebasing: |
Dependencies have been merged. Will you be able to return to this before the 4.4 merge window closes? |
e146602 to
356ea93
Compare
|
Add SPI DMA support for RTIO Signed-off-by: Julien Racki <julien.racki-ext@st.com> Co-authored-by: Wouter Horré <wouter@versasense.com>
Add a test case for STM32 SPI RTIO with DMA. Signed-off-by: Julien Racki <julien.racki-ext@st.com>
356ea93 to
e4bf591
Compare
|








This PR adds the support of RTIO SPI transactions using DMA, it also adds a test case for the loopback test.
Tested on our test bench using
H7andnon-H7SPI compatible boards