arch/arm: New arch_switch() based context layer for Cortex M#85248
arch/arm: New arch_switch() based context layer for Cortex M#85248andyross wants to merge 34 commits intozephyrproject-rtos:mainfrom
Conversation
|
This is finally looking good enough to submit, let's see how it runs in CI. First, it's important to note that @ithinuel has an entirely different arch_switch() implementation in #85080 that everyone should review too. That one is a relatively straight-line evolution of the current PendSV implementation. This one is (as I'm sure surprises no one) more of a rewrite, using a "normal" context switch. Really I don't see any reason why both shouldn't be able to merge: this will likely take some time to stabilize and we'd want to be maintaining the old stuff in parallel anyway. The big advantages to this one over that one:
|
This is a mistake. IRQ_OFFLOAD_NESTED is not intended to be an app tunable. It reflects whether or not platform layer irq_offload() implementation supports its use from within an ISR (i.e. to invoke a nested interrupt for testing purposes). Not all architectures can do that (and I know of at least one Xtensa soc that didn't give a software interrupt the right priority and can't either). Basically this should have been a "ARCH_HAS_" kind of kconfig, it's just named funny. And in any case the thread_metric tests don't do nested interrupts, which as I understand it aren't even possible in the FreeRTOS environment it was written on. Signed-off-by: Andy Ross <andyross@google.com>
z_get_next_switch_handle() is a clean API, but implementing it as a (comparatively large) callable function requires significant entry/exit boilerplate and hides the very common "no switch needed" early exit condition from the enclosing C code that calls it. (Most architectures call this from assembly though and don't notice). Provide an unwrapped version for the specific needs non-SMP builds. It's compatible in all other ways. Slightly ugly, but the gains are significant (like a dozen cycles or so). Signed-off-by: Andy Ross <andyross@google.com>
Micro-optimization: We don't need a full arch_irq_lock(), which is a ~6-instruction sequence on Cortex M. The lock will be dropped unconditionally on interrupt exit, so take it unconditionally. Signed-off-by: Andy Ross <andyross@google.com>
The new switch code no longer needs PendSV, but still calls the SVC vector. Split them into separate files for hygiene and a few microseconds of build time. Signed-off-by: Andy Ross <andyross@google.com>
When USE_SWITCH=y, the thread struct is now mostly degenerate. Only the two words for ICI/IT state tracking are required. Eliminate all the extra fields when not needed and save a bunch of SRAM. Note a handful of spots in coredump/debug that need a location for the new stack pointer (stored as the switch handle now) are also updated. Signed-off-by: Andy Ross <andyross@google.com>
z_reschedule() is the basic kernel entry point for context switch, wrapping z_swap(), and thence arch_switch(). It's currently defined as a first class function for entry from other files in the kernel and elsewhere (e.g. IPC library code). But in practice it's actually a very thin wrapper without a lot of logic of its own, and the context switch layers of some of the more obnoxiously clever architectures are designed to interoperate with the compiler's own spill/fill logic to avoid double saving. And with a small z_reschedule() there's not a lot to work with. Make reschedule() an inlinable static, so the compiler has more options. Signed-off-by: Andy Ross <andyross@google.com>
This function was a little clumsy, taking the scheduler lock, releasing it, and then calling z_reschedule_unlocked() instead of the normal locked variant of reschedule. Don't take the lock twice. Mostly this is a code size and hygiene win. Obviously the sched lock is not normally a performance path, but I happened to have picked this API for my own microbenchmark in tests/benchmarks/swap and so noticed the double-lock while staring at disassembly. Signed-off-by: Andy Ross <andyross@google.com>
Some nitpicky hand-optimizations, no logic changes: + Shrink the assembly entry to put more of the logic into compiler-optimizable C. + Split arm_m_must_switch() into two functions so that the first doesn't look so big to the compiler. That allows it to spill (many) fewer register on entry and speeds the (very) common early-exit case where an interrupt returns without context switch. Signed-off-by: Andy Ross <andyross@google.com>
Late-arriving clang-format-demanded changes that are too hard to split and squash into the original patches. No behavior changes. Signed-off-by: Andy Ross <andyross@google.com>
The z_arm_pendsv vector doesn't exist on USE_SWITCH=y builds Signed-off-by: Andy Ross <andyross@google.com>
Some toolchains don't support an __asm__(...) block at the top level of a file and require that they live within function scope. That's not a hardship as these two blocks were defining callable functions anyway. Exploit the "naked" attribute to avoid wasted bytes in unused entry/exit code. Signed-off-by: Andy Ross <andyross@google.com>
I'm at a loss here. The feature this test case wants to see (on ARMv7M, not ARMv6M) is the ability to take a irq_lock() inside a system call and then see that future system calls from the same thread continue to hold the lock. That's not documented AFAICT. It's also just a terrible idea because either: 1. The obvious denial of service implications if user code is allowed to run in an unpreemptible mode, or: 2. The broken locking promise if this is implemented to release the lock and reacquire it in an attempt to avoid #1. (FWIW: my read of the code is that #1 is the current implementation. But hilariously the test isn't able to tell the difference!) And in any case it's not how any of our other platforms work (or can work, in some cases), making this a non-portable system call API/feature at best. Leave it in place for now out of conservatism, but disable with the new arch_switch() code, whose behavior matches that of other Zephyr userspaces. Signed-off-by: Andy Ross <andyross@google.com>
The exit from the SVC exception used for syscalls back into the calling thread is done without locking. This means that the intermediate states can be interrupted while the kernel-mode code is still managing thread state like the mode bit, leading to mismatches. This seems mostly robust when used with PendSV (though I'm a little dubious), but the new arch_switch() code needs to be able to suspend such an interrupted thread and restore it without going through a full interrupt entry/exit again, so it needs locking for sure. Take the lock unconditionally before exiting the call, and release it in the thread once the magic is finished, just before calling the handler. Then take it again before swapping stacks and dropping privilege. Even then there is a one-cycle race where the interrupted thread has dropped the lock but still has privilege (the nPRIV bit is clear in CONTROL). This thread will be resumed later WITHOUT privilege, which means that trying to set CONTROL will fail. So there's detection of this 1-instruction race that will skip over it. Signed-off-by: Andy Ross <andyross@google.com>
The ARM Ltd. FVP emulator (at least the variants run in Zephyr CI) appears to have a bug with the stack alignment bit in xPSR. It's common (it fails in the first 4-6 timer interrupts in tests.syscalls.timeslicing) that we'll take an interrupt from a seemingly aligned (!) stack with the bit set. If we then switch and resume the thread from a different context later, popping the stack goes wrong (more so than just a misalignment of four bytes: I usually see it too low by 20 bytes) in a way that it doesn't if we return synchronously. Presumably legacy PendSV didn't see this because it used the unmodified exception frame. Work around this by simply assuming all interrupted stacks were aligned and clearing the bit. That is NOT correct in the general case, but in practice it's enough to get tests to pass. Signed-off-by: Andy Ross <andyross@google.com>
…tion
A nested exception can occur in arm_m_exc_exit after interrupts are
re-enabled but before branching to the EXC_RETURN value. In that case,
the nested exception stacks an exception stack frame (ESF) on MSP.
arm_m_exc_tail() unconditionally rewrites the LR slot on the active
stack to redirect execution to arm_m_exc_exit. If a nested exception
has stacked an ESF on MSP, this rewrite corrupts the stacked xPSR
field, leading to a UsageFault ("Illegal use of EPSR") on exception
return.
Guard the LR rewrite so that it is only performed when the exception
is returning to Thread mode using PSP. This ensures that the rewrite
does not interfere with ESFs stacked on MSP during nested exceptions.
Signed-off-by: Sudan Landge <sudan.landge@arm.com>
USE_SWITCH code unconditionally applied interrupt locking, which altered BASEPRI handling and broke expected interrupt behavior on both Baseline and Mainline CPUs when USE_SWITCH was disabled. This commit restores the original behavior with USE_SWITCH disabled and fixes tests/arch/arm/arm_interrupt failures. Signed-off-by: Sudan Landge <sudan.landge@arm.com>
The z_arm_pendsv vector doesn't exist on USE_SWITCH=y builds Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Change the MPU alignment to fix below ci failure for nrf52840dk/nrf52840: ``` padding_section' will not fit in region `RAM' arm-zephyr-eabi/bin/ld.bfd: region `RAM' overflowed by 54928 bytes collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. ``` Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Documented Cortex-M switch helper interfaces in arm-m-switch.h with Doxygen blocks. Signed-off-by: Sudan Landge <sudan.landge@arm.com>
orr fix is as reported in review: ``` The add causes a crash with IAR tools as the address loaded to r8 already has the lowest bit set, and the add causes it to be set to ARM mode. The orr instruction works fine with both scenarios ``` `UDF 0` seems to break on IAR but `UDF #0` works for all. Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Fix below issues when trying to build hello world with armclang: ``` Error: L6218E: Undefined symbol z_arm_exc_exit (referred from reset.o). Error: L6218E: Undefined symbol z_arm_int_exit (referred from reset.o). Error: L6218E: Undefined symbol z_arm_pendsv (referred from reset.o). ``` Signed-off-by: Sudan Landge <sudan.landge@arm.com>
04ff8f0 to
4fbd4f5
Compare
|
|
CI is green and I think, except using CMSIS Apis, I covered all the feedback posted on this PR. CMSIS apis will be added as a follow up PR as discussed. Now would be the time to start with review and testing on more devices :) cc: @ajf58 @LoveKarlsson @RobinKastberg @cfriedt @peter-mitsis @nashif @MaureenHelm @ljd42 @andyross |



Mostly complete. Supports MPU, userspace, PSPLIM-based stack guards, and FPU/DSP features. ARMv8-M secure mode "should" work but I don't know how to test it.
Designed with an eye to uncompromising/best-in-industry cooperative context switch performance. No PendSV exception nor hardware stacking/unstacking, just a traditional "musical chairs" switch. Context gets saved on process stacks only instead of split between there and the thread struct. No branches in the core integer switch code (and just one in the FPU bits that can't be avoided).
Minimal assembly use; arch_switch() itself is ALWAYS_INLINE, there is an assembly stub for exception exit, and that's it beyond one/two instruction inlines elsewhere.
Selectable at build time, interoperable with existing code. Just use the pre-existing CONFIG_USE_SWITCH=y flag to enable it. Or turn it off to evade regressions as this stabilizes.
Exception/interrupt returns in the common case need only a single C function to be called at the tail, and then return naturally. Effectively "all interrupts are direct now". This isn't a benefit currently because the existing stubs haven't been removed (see # 4), but in the long term we can look at exploiting this. The boilerplate previously required is now (mostly) empty.
No support for ARMv6 (Cortex M0 et. al.) thumb code. The expanded instruction encodings in ARMv7 are a big (big) win, so the older cores really need a separate port to avoid impacting newer hardware. Thankfully there isn't that much code to port (see # 3), so this should be doable.
Fixes #79069