Skip to content

arch/arm: New arch_switch() based context layer for Cortex M#85248

Open
andyross wants to merge 34 commits intozephyrproject-rtos:mainfrom
andyross:arm-m-switch
Open

arch/arm: New arch_switch() based context layer for Cortex M#85248
andyross wants to merge 34 commits intozephyrproject-rtos:mainfrom
andyross:arm-m-switch

Conversation

@andyross
Copy link
Contributor

@andyross andyross commented Feb 5, 2025

  1. Mostly complete. Supports MPU, userspace, PSPLIM-based stack guards, and FPU/DSP features. ARMv8-M secure mode "should" work but I don't know how to test it.

  2. Designed with an eye to uncompromising/best-in-industry cooperative context switch performance. No PendSV exception nor hardware stacking/unstacking, just a traditional "musical chairs" switch. Context gets saved on process stacks only instead of split between there and the thread struct. No branches in the core integer switch code (and just one in the FPU bits that can't be avoided).

  3. Minimal assembly use; arch_switch() itself is ALWAYS_INLINE, there is an assembly stub for exception exit, and that's it beyond one/two instruction inlines elsewhere.

  4. Selectable at build time, interoperable with existing code. Just use the pre-existing CONFIG_USE_SWITCH=y flag to enable it. Or turn it off to evade regressions as this stabilizes.

  5. Exception/interrupt returns in the common case need only a single C function to be called at the tail, and then return naturally. Effectively "all interrupts are direct now". This isn't a benefit currently because the existing stubs haven't been removed (see # 4), but in the long term we can look at exploiting this. The boilerplate previously required is now (mostly) empty.

  6. No support for ARMv6 (Cortex M0 et. al.) thumb code. The expanded instruction encodings in ARMv7 are a big (big) win, so the older cores really need a separate port to avoid impacting newer hardware. Thankfully there isn't that much code to port (see # 3), so this should be doable.

Fixes #79069

@andyross
Copy link
Contributor Author

andyross commented Feb 5, 2025

This is finally looking good enough to submit, let's see how it runs in CI. First, it's important to note that @ithinuel has an entirely different arch_switch() implementation in #85080 that everyone should review too. That one is a relatively straight-line evolution of the current PendSV implementation. This one is (as I'm sure surprises no one) more of a rewrite, using a "normal" context switch. Really I don't see any reason why both shouldn't be able to merge: this will likely take some time to stabilize and we'd want to be maintaining the old stuff in parallel anyway.

The big advantages to this one over that one:

  1. Smaller. I worked really hard to limit code size for performance reasons. And there's more fruit to pick: the thread struct can lose all the still-present slots for the callee-saved registers that now live on the stack, and lots of the legacy fault handlers have boilerplate that now duplicates the exit code that runs out of a regular C handler.

  2. Bigger, heh. Well, more complete. This works with the PSPLIM stack guard feature (which btw: we have very poor test coverage of!) FPU hardware (which was have almost no coverage of, there's only one in-tree qemu FPU platform and it doesn't run in CI). And as I understand the architecture secure mode should ("should") work too, but I don't have a system to test with.

  3. It's actually kinda scary fast, which is what I was hoping to see. The microbenchmark at the end of the series is showing about 60% improvement in z_swap() on my FRDM-K64F vs. the current tree (just z_swap though, not all the other stuff!). It's tuned heavily for the common case of cooperative switching, using a custom entirely-on-process-stack frame format for suspended threads and not the one the hardware emits (there's a conversion step when threads switch on interrupt/exception exit).

  4. Legacy-free. No more ARCH_HAS_CUSTOM_SWAP_TO_MAIN or ARCH_HAS_THREAD_ABORT (and especially no more SWAP_NONATOMIC!), nor a custom arch_thread_return_value_set(). ARM Cortex M as of this patch looks like a "standard" Zephyr platform without any magic.

  5. Minimal impact on existing code. The new context layer is in two new files with only ~130 lines of changes to existing code.

andyross and others added 21 commits February 25, 2026 15:17
This is a mistake.  IRQ_OFFLOAD_NESTED is not intended to be an app
tunable. It reflects whether or not platform layer irq_offload()
implementation supports its use from within an ISR (i.e. to invoke a
nested interrupt for testing purposes).  Not all architectures can do
that (and I know of at least one Xtensa soc that didn't give a
software interrupt the right priority and can't either).  Basically
this should have been a "ARCH_HAS_" kind of kconfig, it's just named
funny.

And in any case the thread_metric tests don't do nested interrupts,
which as I understand it aren't even possible in the FreeRTOS
environment it was written on.

Signed-off-by: Andy Ross <andyross@google.com>
z_get_next_switch_handle() is a clean API, but implementing it as a
(comparatively large) callable function requires significant
entry/exit boilerplate and hides the very common "no switch needed"
early exit condition from the enclosing C code that calls it.  (Most
architectures call this from assembly though and don't notice).

Provide an unwrapped version for the specific needs non-SMP builds.
It's compatible in all other ways.

Slightly ugly, but the gains are significant (like a dozen cycles or
so).

Signed-off-by: Andy Ross <andyross@google.com>
Micro-optimization: We don't need a full arch_irq_lock(), which is a
~6-instruction sequence on Cortex M.  The lock will be dropped
unconditionally on interrupt exit, so take it unconditionally.

Signed-off-by: Andy Ross <andyross@google.com>
The new switch code no longer needs PendSV, but still calls the SVC
vector.  Split them into separate files for hygiene and a few
microseconds of build time.

Signed-off-by: Andy Ross <andyross@google.com>
When USE_SWITCH=y, the thread struct is now mostly degenerate.  Only
the two words for ICI/IT state tracking are required.  Eliminate all
the extra fields when not needed and save a bunch of SRAM.

Note a handful of spots in coredump/debug that need a location for the
new stack pointer (stored as the switch handle now) are also updated.

Signed-off-by: Andy Ross <andyross@google.com>
z_reschedule() is the basic kernel entry point for context switch,
wrapping z_swap(), and thence arch_switch().  It's currently defined
as a first class function for entry from other files in the kernel and
elsewhere (e.g. IPC library code).

But in practice it's actually a very thin wrapper without a lot of
logic of its own, and the context switch layers of some of the more
obnoxiously clever architectures are designed to interoperate with the
compiler's own spill/fill logic to avoid double saving.  And with a
small z_reschedule() there's not a lot to work with.

Make reschedule() an inlinable static, so the compiler has more
options.

Signed-off-by: Andy Ross <andyross@google.com>
This function was a little clumsy, taking the scheduler lock,
releasing it, and then calling z_reschedule_unlocked() instead of the
normal locked variant of reschedule.  Don't take the lock twice.

Mostly this is a code size and hygiene win.  Obviously the sched lock
is not normally a performance path, but I happened to have picked this
API for my own microbenchmark in tests/benchmarks/swap and so noticed
the double-lock while staring at disassembly.

Signed-off-by: Andy Ross <andyross@google.com>
Some nitpicky hand-optimizations, no logic changes:

+ Shrink the assembly entry to put more of the logic into
  compiler-optimizable C.

+ Split arm_m_must_switch() into two functions so that the first
  doesn't look so big to the compiler.  That allows it to spill (many)
  fewer register on entry and speeds the (very) common early-exit case
  where an interrupt returns without context switch.

Signed-off-by: Andy Ross <andyross@google.com>
Late-arriving clang-format-demanded changes that are too hard to split
and squash into the original patches.  No behavior changes.

Signed-off-by: Andy Ross <andyross@google.com>
The z_arm_pendsv vector doesn't exist on USE_SWITCH=y builds

Signed-off-by: Andy Ross <andyross@google.com>
Some toolchains don't support an __asm__(...) block at the top level
of a file and require that they live within function scope.  That's
not a hardship as these two blocks were defining callable functions
anyway.  Exploit the "naked" attribute to avoid wasted bytes in unused
entry/exit code.

Signed-off-by: Andy Ross <andyross@google.com>
I'm at a loss here.  The feature this test case wants to see (on
ARMv7M, not ARMv6M) is the ability to take a irq_lock() inside a
system call and then see that future system calls from the same thread
continue to hold the lock.

That's not documented AFAICT.  It's also just a terrible idea because
either:

1. The obvious denial of service implications if user code is allowed
   to run in an unpreemptible mode, or:

2. The broken locking promise if this is implemented to release the
   lock and reacquire it in an attempt to avoid #1.

(FWIW: my read of the code is that #1 is the current implementation.
But hilariously the test isn't able to tell the difference!)

And in any case it's not how any of our other platforms work (or can
work, in some cases), making this a non-portable system call
API/feature at best.

Leave it in place for now out of conservatism, but disable with the
new arch_switch() code, whose behavior matches that of other Zephyr
userspaces.

Signed-off-by: Andy Ross <andyross@google.com>
The exit from the SVC exception used for syscalls back into the
calling thread is done without locking.  This means that the
intermediate states can be interrupted while the kernel-mode code is
still managing thread state like the mode bit, leading to mismatches.
This seems mostly robust when used with PendSV (though I'm a little
dubious), but the new arch_switch() code needs to be able to suspend
such an interrupted thread and restore it without going through a full
interrupt entry/exit again, so it needs locking for sure.

Take the lock unconditionally before exiting the call, and release it
in the thread once the magic is finished, just before calling the
handler.  Then take it again before swapping stacks and dropping
privilege.

Even then there is a one-cycle race where the interrupted thread has
dropped the lock but still has privilege (the nPRIV bit is clear in
CONTROL).  This thread will be resumed later WITHOUT privilege, which
means that trying to set CONTROL will fail.  So there's detection of
this 1-instruction race that will skip over it.

Signed-off-by: Andy Ross <andyross@google.com>
The ARM Ltd. FVP emulator (at least the variants run in Zephyr CI)
appears to have a bug with the stack alignment bit in xPSR.  It's
common (it fails in the first 4-6 timer interrupts in
tests.syscalls.timeslicing) that we'll take an interrupt from a
seemingly aligned (!) stack with the bit set.  If we then switch and
resume the thread from a different context later, popping the stack
goes wrong (more so than just a misalignment of four bytes: I usually
see it too low by 20 bytes) in a way that it doesn't if we return
synchronously.  Presumably legacy PendSV didn't see this because it
used the unmodified exception frame.

Work around this by simply assuming all interrupted stacks were
aligned and clearing the bit.  That is NOT correct in the general
case, but in practice it's enough to get tests to pass.

Signed-off-by: Andy Ross <andyross@google.com>
…tion

A nested exception can occur in arm_m_exc_exit after interrupts are
re-enabled but before branching to the EXC_RETURN value. In that case,
the nested exception stacks an exception stack frame (ESF) on MSP.

arm_m_exc_tail() unconditionally rewrites the LR slot on the active
stack to redirect execution to arm_m_exc_exit. If a nested exception
has stacked an ESF on MSP, this rewrite corrupts the stacked xPSR
field, leading to a UsageFault ("Illegal use of EPSR") on exception
return.

Guard the LR rewrite so that it is only performed when the exception
is returning to Thread mode using PSP. This ensures that the rewrite
does not interfere with ESFs stacked on MSP during nested exceptions.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
USE_SWITCH code unconditionally applied interrupt locking, which altered
BASEPRI handling and broke expected interrupt behavior on both
Baseline and Mainline CPUs when USE_SWITCH was disabled.

This commit restores the original behavior with USE_SWITCH disabled and
fixes tests/arch/arm/arm_interrupt failures.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
The z_arm_pendsv vector doesn't exist on USE_SWITCH=y builds

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Change the MPU alignment to fix below ci failure for
nrf52840dk/nrf52840:

```
padding_section' will not fit in region `RAM'
arm-zephyr-eabi/bin/ld.bfd: region `RAM' overflowed by 54928 bytes
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
```

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Documented Cortex-M switch helper interfaces in arm-m-switch.h with
Doxygen blocks.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
orr fix is as reported in review:
```
The add causes a crash with IAR tools as the address loaded to r8
already has the lowest bit set, and the add causes it to be set to ARM
mode. The orr instruction works fine with both scenarios
```

`UDF 0` seems to break on IAR but `UDF #0` works for all.

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
Fix below issues when trying to build hello world with armclang:
```
Error: L6218E: Undefined symbol z_arm_exc_exit (referred from reset.o).
Error: L6218E: Undefined symbol z_arm_int_exit (referred from reset.o).
Error: L6218E: Undefined symbol z_arm_pendsv (referred from reset.o).
```

Signed-off-by: Sudan Landge <sudan.landge@arm.com>
@wearyzen
Copy link
Contributor

CI is green and I think, except using CMSIS Apis, I covered all the feedback posted on this PR. CMSIS apis will be added as a follow up PR as discussed.

Now would be the time to start with review and testing on more devices :)

cc: @ajf58 @LoveKarlsson @RobinKastberg @cfriedt @peter-mitsis @nashif @MaureenHelm @ljd42 @andyross

@wearyzen wearyzen dismissed their stale review February 25, 2026 17:50

dismissing my previous review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment