Bluetooth: Controller: In-system profiling and CPU overhead assertions#95391
Conversation
dcdb73d to
fada168
Compare
|
There was a problem hiding this comment.
Pull Request Overview
This PR adds in-system profiling for CPU usage measurement and ticker testing functionality to the Bluetooth controller. It includes CPU overhead assertions to verify performance within defined limits and implements ticker start/stop tests for validation.
Key changes:
- Added CPU profiling assertions with platform-specific maximum thresholds
- Implemented ticker node testing with semaphore-based synchronization
- Enhanced profiling infrastructure with getter functions and conditional compilation guards
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| subsys/bluetooth/controller/ll_sw/ull.c | Added ticker testing infrastructure and profiling output during initialization |
| subsys/bluetooth/controller/ll_sw/nordic/lll/lll_prof_internal.h | Added getter functions for profiling metrics and removed conditional compilation |
| subsys/bluetooth/controller/ll_sw/nordic/lll/lll_prof.c | Added CPU usage assertions, getter functions, and expanded data types for profiling |
| subsys/bluetooth/controller/ll_sw/nordic/lll/lll_adv.c | Added profiling calls and new ISR handler for advertising |
| subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c | Added conditional profiling calls to ISR handlers |
| subsys/bluetooth/controller/hci/hci.c | Changed logging method for profiling output |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| #define LLL_PROF_ASSERT(val, max) \ | ||
| { \ | ||
| LL_ASSERT_MSG(((val) <= (max)), \ | ||
| "%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \ | ||
| HAL_TICKER_TICKS_TO_US(cputime_ticks_radio), \ | ||
| LLL_PROF_RADIO_MAX_US, \ | ||
| HAL_TICKER_TICKS_TO_US(cputime_ticks_lll), \ | ||
| LLL_PROF_LLL_MAX_US, \ | ||
| HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_high), \ | ||
| LLL_PROF_ULL_HIGH_MAX_US, \ | ||
| HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_low), \ |
There was a problem hiding this comment.
The LLL_PROF_ASSERT macro accesses global variables (cputime_ticks_*) that may not be initialized when the macro is called. This creates an implicit dependency that makes the code harder to maintain and debug. Consider passing these values as parameters to make dependencies explicit.
| #define LLL_PROF_ASSERT(val, max) \ | |
| { \ | |
| LL_ASSERT_MSG(((val) <= (max)), \ | |
| "%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \ | |
| HAL_TICKER_TICKS_TO_US(cputime_ticks_radio), \ | |
| LLL_PROF_RADIO_MAX_US, \ | |
| HAL_TICKER_TICKS_TO_US(cputime_ticks_lll), \ | |
| LLL_PROF_LLL_MAX_US, \ | |
| HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_high), \ | |
| LLL_PROF_ULL_HIGH_MAX_US, \ | |
| HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_low), \ | |
| #define LLL_PROF_ASSERT(val, max, radio_ticks, lll_ticks, ull_high_ticks, ull_low_ticks) \ | |
| { \ | |
| LL_ASSERT_MSG(((val) <= (max)), \ | |
| "%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \ | |
| HAL_TICKER_TICKS_TO_US(radio_ticks), \ | |
| LLL_PROF_RADIO_MAX_US, \ | |
| HAL_TICKER_TICKS_TO_US(lll_ticks), \ | |
| LLL_PROF_LLL_MAX_US, \ | |
| HAL_TICKER_TICKS_TO_US(ull_high_ticks), \ | |
| LLL_PROF_ULL_HIGH_MAX_US, \ | |
| HAL_TICKER_TICKS_TO_US(ull_low_ticks), \ |
There was a problem hiding this comment.
They are static globals, hence it is ok as-is.
| { | ||
| delta(timestamp_radio, &cputime_radio); | ||
| delta_ticks(timestamp_ticks_radio, &cputime_ticks_radio); | ||
| LLL_PROF_ASSERT(cputime_ticks_radio, HAL_TICKER_US_TO_TICKS(LLL_PROF_RADIO_MAX_US)); |
There was a problem hiding this comment.
The LLL_PROF_ASSERT macro is called in every exit function, but the macro performs multiple HAL_TICKER_TICKS_TO_US conversions for the assertion message formatting. These conversions are expensive and only needed if the assertion fails. Consider moving the conversions inside a conditional block that only executes when the assertion would fail.
There was a problem hiding this comment.
HAL_TICKER_US_TO_TICKS(LLL_PROF_RADIO_MAX_US) is constant, expect the compiler to not do conversions at runtime.
| #define LLL_PROF_RADIO_MAX_US 85 /* Max. Radio Rx/Tx ISR, O(1)*/ | ||
| #define LLL_PROF_LLL_MAX_US 75 /* Max. LLL prepare, O(1) */ | ||
| #define LLL_PROF_ULL_HIGH_MAX_US 260 /* Max. Radio + LLL + ULL High, O(1) */ | ||
| #define LLL_PROF_ULL_LOW_MAX_US 306 /* Max. ULL Low, O(n) n is ticker nodes */ |
There was a problem hiding this comment.
If this is O(n), shouldn't the value here be something like TICKER_NODE_CNT * X?
There was a problem hiding this comment.
The measured value are restricted to the use case mentioned in the comments above. Other scenarios, or increased nodes are not profiled.
There was a problem hiding this comment.
But if you have 306 for the above use case (number of nodes = 6?), then you can define this as 51 * NODES which gives the same result, while also fitting better to the O(n) description.
Why not do it that way?
There was a problem hiding this comment.
It does not work like this. The ISRs are of different priority and pre-empt, .i.e. in the default values, Radio = LLL ISR has higher priority, that means ULL High == ULL_LOW measured account for CPU execution of Radio and/or LLL executing over their execution time :-( . Actually, ULL_LOW CPU time is more like x + 85 + 75 + (2 x ISR Latency) = 306.
Also, TICKER_NODES are 16 for this case based on other instances needed to schedule auxiliary transmission, receptions and timeouts. And, the CPU execution time is very depended on difference in optimization (and h/w runtime caching) with any code changes.
I will take these numbers with a pinch of salt!
There was a problem hiding this comment.
x + 85 + 75 + (2 x ISR Latency) = 306.
That is far from clear from the text and descriptions here :) It may be useful to include the actual calculations behind these numbers, and how they were measured so that others can reproduce the same numbers in case something changes
There was a problem hiding this comment.
As mentioned here:
https://github.com/zephyrproject-rtos/zephyr/pull/95391/files#diff-08119f87c615666fb406cca4f91566518ab93c150083e6d3094c00400b590894R29-R34
A sample with mentioned extended advertising, peripheral role and scanning role was executed on target for couple hours; and the measurements as printed in this PR implementation was updated here for nRF54 and nRF53 target tested.
There was a problem hiding this comment.
Stating the scenario where the numbers come from, is not the same as providing the actual calculations/formula/method of measurement that the numbers come from.
When you have something like #define LLL_PROF_ULL_LOW_MAX_US 306 /* Max. ULL Low, O(n) n is ticker nodes */ you should state exactly how you arrived at that number, so that it can be peer reviewed and verified by others - Otherwise it's just a magic number to others :)
fada168 to
b0f12ad
Compare
e648609 to
30f8489
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Prompt:
Answer: PRs returned from lookup What this PR does (scope)
Changed files (6)
Highlights from the diffs
Pros
Cons and potential drawbacks/regressions
Risk assessment
Suggestions to mitigate
Performance notes from PR description
Net summary
|
ff980ea to
999bdd9
Compare
f482cd7 to
ff980ea
Compare
2dc90e6 to
14456e5
Compare
14456e5 to
ad27c51
Compare
Thalley
left a comment
There was a problem hiding this comment.
I still don't like the way that the values in lll_prof.c are defined, but not enough to block this PR
Thanks. I am trying to upstream the sample I use for profiling, but it's source is shared with the host adv_chain bsim CI and hence an issue. The profiling numbers have to be run on target, and hence can be stale very soon. But the implementation provides in-system information any day, provided the numbers are tuned (implementation asserts with the increased number being provided). That said, the current numbers in the implementation does not cover nrf5340, which I have not profiled. The updates have to be in a new PR... hopefully by then be able to add/upstream the sample/test application that gives these numbers. |
Add in-system ticker_start and ticker_stop tests. Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Updated in-system profiling for CPU usage with on target measurements and added assertion checks. Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
ad27c51 to
a8a9406
Compare
|



Updated in-system profiling for CPU usage with on target
measurements and added assertion checks.
Add in-system ticker_start and ticker_stop tests.
There is significant improved lower CPU use in Radio ISR (2 ticks less)and LLL ISR (3 ticks less) execution context when experimental LLL pipeline using ordered linked list is used (
nrf52833dk/nrf52833), using:Comparing without that improvement:
2025-10-13 (with ordered linked list use for LLL pipeline):
Initial Latencies
[2025-10-13 06:08:06.536] test_ticker_cb: 0/16 at 1604878 + 0 + 37 us (ULL_LOW 26 us)Scheduling 16 ticker nodes in the future, max. ULL_LOW CPU use: 26 us
Max. latency resolving 16 overlaps at first expiry: 37 us
Of which, ISR Latency with CPU wake up: 10 us
Rescheduling overlap
[2025-10-13 06:08:06.541] test_ticker_cb: 1/16 at 1705028 + 100150 + 24 us (ULL_LOW 141 us)Reschedule one amongst 16 ticker nodes, max. ULL_LOW CPU use: 141 us
Max. latency resolving (16 - n) on subsequent overlaps on expiry: 24 us
Radio ISR Latency
Radio ISR Latency = LLL CPU usage? : 56 us