Skip to content

Bluetooth: Controller: In-system profiling and CPU overhead assertions#95391

Merged
fabiobaltieri merged 2 commits intozephyrproject-rtos:mainfrom
cvinayak:github_in_system_profiling
Nov 28, 2025
Merged

Bluetooth: Controller: In-system profiling and CPU overhead assertions#95391
fabiobaltieri merged 2 commits intozephyrproject-rtos:mainfrom
cvinayak:github_in_system_profiling

Conversation

@cvinayak
Copy link
Contributor

@cvinayak cvinayak commented Sep 3, 2025

Updated in-system profiling for CPU usage with on target
measurements and added assertion checks.

Add in-system ticker_start and ticker_stop tests.

There is significant improved lower CPU use in Radio ISR (2 ticks less)and LLL ISR (3 ticks less) execution context when experimental LLL pipeline using ordered linked list is used (nrf52833dk/nrf52833), using:

[2025-10-11 07:41:49.567] l: 10, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.573] l: 4, 3, 57; t: 24, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.579] 29787. [DEVICE]: 1C:2B:44:E1:A3:61 (random), AD evt type 5, Tx Pwr: 127, RSSI -34 Data status: 0, AD data len: 382 Name: Broadcaster Multipl$ C:0 S:0 D:0 SR:0 E:1 Pri PHY: LE 1M, Sec PHY: LE 2M, Interval: 0x0000 (0 ms), SID: 1
[2025-10-11 07:41:49.602] l: 8, 3, 57; t: 10, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.609] --- 1 messages dropped ---
[2025-10-11 07:41:49.612] l: 14, 3, 57; t: 10, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.618] l: 10, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.625] l: 10, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.631] l: 10, 3, 57; t: 13, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.637] l: 10, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.643] l: 10, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.650] l: 14, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.657] l: 14, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.663] l: 10, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.669] l: 10, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.677] l: 8, 3, 57; t: 10, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.684] l: 10, 3, 57; t: 10, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.690] l: 10, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.698] l: 10, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.705] l: 10, 3, 57; t: 13, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.711] l: 10, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.717] l: 10, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.723] l: 14, 3, 57; t: 14, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.730] l: 14, 3, 57; t: 16, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.736] l: 14, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.742] l: 10, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.749] l: 10, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.756] l: 4, 3, 57; t: 24, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.764] 29788. [DEVICE]: 10:7C:45:1F:59:C4 (random), AD evt type 5, Tx Pwr: 127, RSSI -79 Data status: 0, AD data len: 382 Name: Broadcaster Multipl$ C:0 S:0 D:0 SR:0 E:1 Pri PHY: LE Coded, Sec PHY: LE Coded, Interval: 0x0000 (0 ms), SID: 3
[2025-10-11 07:41:49.787] l: 10, 3, 57; t: 10, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.793] l: 10, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.801] l: 10, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.808] l: 10, 3, 57; t: 13, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.814] l: 14, 3, 57; t: 16, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.820] l: 14, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.826] l: 14, 3, 57; t: 14, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.832] l: 14, 3, 57; t: 16, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.838] l: 14, 3, 57; t: 7, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.845] l: 10, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).
[2025-10-11 07:41:49.851] l: 15, 3, 57; t: 15, 6, 27; cpu: 36 (3), 101 (4), 149 (8), 318 (12).

Comparing without that improvement:

[2025-10-11 07:25:09.850] --- 1 messages dropped ---
[2025-10-11 07:25:09.853] 903128. [DEVICE]: 3F:9D:0C:F1:F0:FB (random), AD evt type 5, Tx Pwr: 127, RSSI -32 Data status: 0, AD data len: 191 Name:  C:0 S:0 D:0 SR:0 E:1 Pri PHY: LE 1M, Sec PHY: LE 1M, Interval: 0x0000 (0 ms), SID: 2
[2025-10-11 07:25:09.873] --- 1 messages dropped ---
[2025-10-11 07:25:09.877] l: 14, 3, 103; t: 14, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.883] l: 14, 3, 103; t: 7, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.890] l: 10, 3, 103; t: 14, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.897] l: 10, 3, 103; t: 7, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.904] l: 10, 3, 103; t: 13, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.912] l: 57880, 21790, 10636; t: 6657, 65293, 76; cpu: 2070 (138), 53760 (40), 33769 (12), 47734 (14).
[2025-10-11 07:25:09.920] l: 58120, 25896, 103; t: 24, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.929] l: 9, 3, 103; t: 15, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.937] l: 10, 3, 103; t: 7, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.943] l: 4, 3, 103; t: 24, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.949] 903129. [DEVICE]: 20:FD:03:3A:F8:30 (random), AD evt type 5, Tx Pwr: 127, RSSI -33 Data status: 0, AD data len: 192 Name: Broadcaster Multiple C:0 S:0 D:0 SR:0 E:1 Pri PHY: LE Coded, Sec PHY: LE Coded, Interval: 0x0000 (0 ms), SID: 3
[2025-10-11 07:25:09.974] --- 8 messages dropped ---
[2025-10-11 07:25:09.977] l: 3, 3, 103; t: 22, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.984] l: 10, 3, 103; t: 13, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.990] l: 41992, 25916, 103; t: 7, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:09.997] l: 42008, 22076, 19353; t: 1537, 1809, 50459; cpu: 42453 (164), 2 (198), 47017 (0), 4578 (254).
[2025-10-11 07:25:10.006] 903131. [DEVICE]: 75:46:E3:01:22:0D (random), AD evt type 0, Tx Pwr: 127, RSSI -86 Data status: 0, AD data len: 18 Name:  C:1 S:1 D:0 SR:0 E:0 Pri PHY: LE 1M, Sec PHY: No packets, Interval: 0x0000 (0 ms), SID: 255
[2025-10-11 07:25:10.027] l: 3, 3, 103; t: 22, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.034] l: 10, 3, 103; t: 14, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.042] l: 10, 3, 103; t: 7, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.049] l: 4, 3, 103; t: 24, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.055] l: 14, 3, 103; t: 14, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.061] l: 14, 3, 103; t: 7, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.069] l: 10, 3, 103; t: 14, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.076] l: 10, 3, 103; t: 7, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.082] l: 8, 3, 103; t: 10, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.088] l: 10, 3, 103; t: 10, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.095] l: 14, 3, 103; t: 10, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.101] l: 10, 3, 103; t: 13, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.107] l: 41992, 25916, 103; t: 24, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.116] l: 12313, 15096, 64771; t: 42016, 18236, 16853; cpu: 65449 (0), 65535 (0), 0 (0), 0 (0).
[2025-10-11 07:25:10.125] l: 9, 3, 103; t: 8, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.133] l: 4, 3, 103; t: 24, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.140] 903132. [DEVICE]: 3F:9D:0C:F1:F0:FB (random), AD evt type 5, Tx Pwr: 127, RSSI -32 Data status: 0, AD data len: 192 Name: Broadcaster Multiple C:0 S:0 D:0 SR:0 E:1 Pri PHY: LE 1M, Sec PHY: LE 1M, Interval: 0x0000 (0 ms), SID: 2
[2025-10-11 07:25:10.163] --- 2 messages dropped ---
[2025-10-11 07:25:10.166] l: 18, 3, 103; t: 15, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.173] --- 1 messages dropped ---
[2025-10-11 07:25:10.178] l: 8, 3, 103; t: 9, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).
[2025-10-11 07:25:10.185] --- 4 messages dropped ---
[2025-10-11 07:25:10.188] l: 10, 3, 103; t: 10, 6, 26; cpu: 39 (5), 208 (7), 242 (12), 322 (14).

2025-10-13 (with ordered linked list use for LLL pipeline):

[2025-10-13 06:08:04.038] Starting Observer Broadcaster Demo
[2025-10-13 06:08:04.042] Thread analyze:
[2025-10-13 06:08:04.043]  BT RX WQ            : STACK: unused 1016 usage 184 / 1200 (15 %); CPU: 0 %
[2025-10-13 06:08:04.050]                      : Total CPU cycles used: 8
[2025-10-13 06:08:04.054]  thread_analyzer     : STACK: unused 720 usage 304 / 1024 (29 %); CPU: 2 %
[2025-10-13 06:08:04.061]                      : Total CPU cycles used: 191
[2025-10-13 06:08:04.065]  sysworkq            : STACK: unused 840 usage 184 / 1024 (17 %); CPU: 0 %
[2025-10-13 06:08:04.072]                      : Total CPU cycles used: 10
[2025-10-13 06:08:04.076]  logging             : STACK: unused 368 usage 400 / 768 (52 %); CPU: 85 %
[2025-10-13 06:08:04.083]                      : Total CPU cycles used: 8201
[2025-10-13 06:08:04.087]  idle                : STACK: unused 272 usage 48 / 320 (15 %); CPU: 0 %
[2025-10-13 06:08:04.094]                      : Total CPU cycles used: 0
[2025-10-13 06:08:04.098]  main                : STACK: unused 336 usage 688 / 1024 (67 %); CPU: 10 %
[2025-10-13 06:08:04.105]                      : Total CPU cycles used: 1046
[2025-10-13 06:08:04.109]  ISR0                : STACK: unused 1752 usage 296 / 2048 (14 %)
[2025-10-13 06:08:06.536] test_ticker_cb: 0/16 at 1604878 + 0 + 37 us (ULL_LOW 26 us)
[2025-10-13 06:08:06.541] test_ticker_cb: 1/16 at 1705028 + 100150 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:06.547] test_ticker_cb: 2/16 at 1805178 + 200300 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:06.553] test_ticker_cb: 3/16 at 1905328 + 300450 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:06.559] test_ticker_cb: 4/16 at 2005478 + 400600 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:06.565] test_ticker_cb: 5/16 at 2105628 + 500750 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:06.570] test_ticker_cb: 6/16 at 2205778 + 600900 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:06.576] test_ticker_cb: 7/16 at 2305928 + 701050 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:06.582] test_ticker_cb: 8/16 at 2406078 + 801200 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:06.588] test_ticker_cb: 9/16 at 2506228 + 901350 + 24 us (ULL_LOW 141 us)
[2025-10-13 06:08:07.143] test_ticker_cb: 10/16 at 2606378 + 1001500 + 26 us (ULL_LOW 141 us)
[2025-10-13 06:08:07.150] --- 6 messages dropped ---
[2025-10-13 06:08:07.153] [00:00:03.108,914] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[2025-10-13 06:08:07.161] [00:00:03.108,925] <inf> bt_hci_core: HW Variant: nRF54Lx (0x0005)
[2025-10-13 06:08:07.168] [00:00:03.108,935] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 4.2 Build 99
[2025-10-13 06:08:07.178] [00:00:03.109,314] <inf> bt_hci_core: HCI transport: Controller
[2025-10-13 06:08:07.184] [00:00:03.109,359] <inf> bt_hci_core: Identity: FB:32:82:2B:02:5B (random)
[2025-10-13 06:08:07.191] [00:00:03.109,371] <inf> bt_hci_core: HCI: version 5.4 (0x0d) revision 0x0000, manufacturer 0x05f1
[2025-10-13 06:08:07.201] [00:00:03.109,382] <inf> bt_hci_core: LMP: version 5.4 (0x0d) subver 0xffff
[2025-10-13 06:08:07.208] Started Extended Advertising Set 0.
[2025-10-13 06:08:07.211] Started Extended Advertising Set 1.
[2025-10-13 06:08:07.214] Started Extended Advertising Set 2.
[2025-10-13 06:08:07.218] Started Extended Advertising Set 3.
[2025-10-13 06:08:07.221] Registered scan callbacks
[2025-10-13 06:08:07.223] Starting scanning...
[2025-10-13 06:08:07.225] success.
[2025-10-13 06:08:07.226] Stopping scanning...
[2025-10-13 06:08:07.228] Stopped scanning...
[2025-10-13 06:08:07.230] Starting scanning...
[2025-10-13 06:08:07.232] success.
[2025-10-13 06:08:07.233] l: 10, 10, 10; t: 9, 9, 9; cpu: 0 (0), 0 (36), 0 (76), 0 (141).
[2025-10-13 06:08:07.238] l: 10, 10, 10; t: 8, 8, 9; cpu: 22 (23), 0 (36), 17 (76), 28 (141).
[2025-10-13 06:08:07.245] l: 10, 10, 10; t: 4, 4, 9; cpu: 22 (23), 0 (36), 17 (76), 28 (141).
[2025-10-13 06:08:07.251] l: 7, 7, 10; t: 7, 4, 9; cpu: 22 (51), 0 (36), 17 (76), 28 (141).
[2025-10-13 06:08:07.258] l: 10, 7, 10; t: 8, 4, 9; cpu: 22 (51), 10 (36), 31 (76), 28 (141).
[2025-10-13 06:08:07.264] l: 9, 7, 10; t: 4, 4, 9; cpu: 22 (51), 10 (36), 31 (76), 28 (141).
[2025-10-13 06:08:07.271] l: 9, 7, 10; t: 8, 4, 9; cpu: 22 (52), 17 (36), 51 (80), 88 (141).

Initial Latencies
[2025-10-13 06:08:06.536] test_ticker_cb: 0/16 at 1604878 + 0 + 37 us (ULL_LOW 26 us)
Scheduling 16 ticker nodes in the future, max. ULL_LOW CPU use: 26 us
Max. latency resolving 16 overlaps at first expiry: 37 us
Of which, ISR Latency with CPU wake up: 10 us

Rescheduling overlap
[2025-10-13 06:08:06.541] test_ticker_cb: 1/16 at 1705028 + 100150 + 24 us (ULL_LOW 141 us)
Reschedule one amongst 16 ticker nodes, max. ULL_LOW CPU use: 141 us
Max. latency resolving (16 - n) on subsequent overlaps on expiry: 24 us

Radio ISR Latency

[2025-10-13 06:40:57.801] l: 10, 1, 56; t: 7, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.808] l: 13, 1, 56; t: 7, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.814] l: 12, 1, 56; t: 4, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.852] l: 36, 1, 56; t: 7, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.859] l: 34, 1, 56; t: 7, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.865] l: 37, 1, 56; t: 7, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.871] l: 28, 1, 56; t: 7, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.878] l: 24, 1, 56; t: 8, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.884] l: 24, 1, 56; t: 5, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).
[2025-10-13 06:40:57.891] l: 41, 1, 56; t: 8, 3, 27; cpu: 64 (67), 55 (56), 84 (144), 168 (178).

Radio ISR Latency = LLL CPU usage? : 56 us

@cvinayak cvinayak force-pushed the github_in_system_profiling branch 2 times, most recently from dcdb73d to fada168 Compare September 3, 2025 11:18
@cvinayak cvinayak requested a review from Copilot September 3, 2025 14:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds in-system profiling for CPU usage measurement and ticker testing functionality to the Bluetooth controller. It includes CPU overhead assertions to verify performance within defined limits and implements ticker start/stop tests for validation.

Key changes:

  • Added CPU profiling assertions with platform-specific maximum thresholds
  • Implemented ticker node testing with semaphore-based synchronization
  • Enhanced profiling infrastructure with getter functions and conditional compilation guards

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
subsys/bluetooth/controller/ll_sw/ull.c Added ticker testing infrastructure and profiling output during initialization
subsys/bluetooth/controller/ll_sw/nordic/lll/lll_prof_internal.h Added getter functions for profiling metrics and removed conditional compilation
subsys/bluetooth/controller/ll_sw/nordic/lll/lll_prof.c Added CPU usage assertions, getter functions, and expanded data types for profiling
subsys/bluetooth/controller/ll_sw/nordic/lll/lll_adv.c Added profiling calls and new ISR handler for advertising
subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c Added conditional profiling calls to ISR handlers
subsys/bluetooth/controller/hci/hci.c Changed logging method for profiling output

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 47 to 57
#define LLL_PROF_ASSERT(val, max) \
{ \
LL_ASSERT_MSG(((val) <= (max)), \
"%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \
HAL_TICKER_TICKS_TO_US(cputime_ticks_radio), \
LLL_PROF_RADIO_MAX_US, \
HAL_TICKER_TICKS_TO_US(cputime_ticks_lll), \
LLL_PROF_LLL_MAX_US, \
HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_high), \
LLL_PROF_ULL_HIGH_MAX_US, \
HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_low), \
Copy link

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLL_PROF_ASSERT macro accesses global variables (cputime_ticks_*) that may not be initialized when the macro is called. This creates an implicit dependency that makes the code harder to maintain and debug. Consider passing these values as parameters to make dependencies explicit.

Suggested change
#define LLL_PROF_ASSERT(val, max) \
{ \
LL_ASSERT_MSG(((val) <= (max)), \
"%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \
HAL_TICKER_TICKS_TO_US(cputime_ticks_radio), \
LLL_PROF_RADIO_MAX_US, \
HAL_TICKER_TICKS_TO_US(cputime_ticks_lll), \
LLL_PROF_LLL_MAX_US, \
HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_high), \
LLL_PROF_ULL_HIGH_MAX_US, \
HAL_TICKER_TICKS_TO_US(cputime_ticks_ull_low), \
#define LLL_PROF_ASSERT(val, max, radio_ticks, lll_ticks, ull_high_ticks, ull_low_ticks) \
{ \
LL_ASSERT_MSG(((val) <= (max)), \
"%s: %u (%u), %u (%u), %u (%u), %u (%u)\n", __func__, \
HAL_TICKER_TICKS_TO_US(radio_ticks), \
LLL_PROF_RADIO_MAX_US, \
HAL_TICKER_TICKS_TO_US(lll_ticks), \
LLL_PROF_LLL_MAX_US, \
HAL_TICKER_TICKS_TO_US(ull_high_ticks), \
LLL_PROF_ULL_HIGH_MAX_US, \
HAL_TICKER_TICKS_TO_US(ull_low_ticks), \
Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are static globals, hence it is ok as-is.

{
delta(timestamp_radio, &cputime_radio);
delta_ticks(timestamp_ticks_radio, &cputime_ticks_radio);
LLL_PROF_ASSERT(cputime_ticks_radio, HAL_TICKER_US_TO_TICKS(LLL_PROF_RADIO_MAX_US));
Copy link

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LLL_PROF_ASSERT macro is called in every exit function, but the macro performs multiple HAL_TICKER_TICKS_TO_US conversions for the assertion message formatting. These conversions are expensive and only needed if the assertion fails. Consider moving the conversions inside a conditional block that only executes when the assertion would fail.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HAL_TICKER_US_TO_TICKS(LLL_PROF_RADIO_MAX_US) is constant, expect the compiler to not do conversions at runtime.

@cvinayak cvinayak added this to the v4.3.0 milestone Oct 3, 2025
@cvinayak cvinayak marked this pull request as ready for review October 3, 2025 04:07
@cvinayak cvinayak added Feature A planned feature with a milestone Enhancement Changes/Updates/Additions to existing features and removed Feature A planned feature with a milestone labels Oct 3, 2025
@cvinayak cvinayak requested a review from Thalley October 8, 2025 14:12
#define LLL_PROF_RADIO_MAX_US 85 /* Max. Radio Rx/Tx ISR, O(1)*/
#define LLL_PROF_LLL_MAX_US 75 /* Max. LLL prepare, O(1) */
#define LLL_PROF_ULL_HIGH_MAX_US 260 /* Max. Radio + LLL + ULL High, O(1) */
#define LLL_PROF_ULL_LOW_MAX_US 306 /* Max. ULL Low, O(n) n is ticker nodes */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is O(n), shouldn't the value here be something like TICKER_NODE_CNT * X?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The measured value are restricted to the use case mentioned in the comments above. Other scenarios, or increased nodes are not profiled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you have 306 for the above use case (number of nodes = 6?), then you can define this as 51 * NODES which gives the same result, while also fitting better to the O(n) description.

Why not do it that way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work like this. The ISRs are of different priority and pre-empt, .i.e. in the default values, Radio = LLL ISR has higher priority, that means ULL High == ULL_LOW measured account for CPU execution of Radio and/or LLL executing over their execution time :-( . Actually, ULL_LOW CPU time is more like x + 85 + 75 + (2 x ISR Latency) = 306.

Also, TICKER_NODES are 16 for this case based on other instances needed to schedule auxiliary transmission, receptions and timeouts. And, the CPU execution time is very depended on difference in optimization (and h/w runtime caching) with any code changes.

I will take these numbers with a pinch of salt!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x + 85 + 75 + (2 x ISR Latency) = 306.

That is far from clear from the text and descriptions here :) It may be useful to include the actual calculations behind these numbers, and how they were measured so that others can reproduce the same numbers in case something changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned here:
https://github.com/zephyrproject-rtos/zephyr/pull/95391/files#diff-08119f87c615666fb406cca4f91566518ab93c150083e6d3094c00400b590894R29-R34

A sample with mentioned extended advertising, peripheral role and scanning role was executed on target for couple hours; and the measurements as printed in this PR implementation was updated here for nRF54 and nRF53 target tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stating the scenario where the numbers come from, is not the same as providing the actual calculations/formula/method of measurement that the numbers come from.

When you have something like #define LLL_PROF_ULL_LOW_MAX_US 306 /* Max. ULL Low, O(n) n is ticker nodes */ you should state exactly how you arrived at that number, so that it can be peer reviewed and verified by others - Otherwise it's just a magic number to others :)

@cvinayak cvinayak force-pushed the github_in_system_profiling branch from fada168 to b0f12ad Compare October 10, 2025 15:57
@cvinayak cvinayak removed this from the v4.3.0 milestone Oct 17, 2025
@cvinayak cvinayak force-pushed the github_in_system_profiling branch 4 times, most recently from e648609 to 30f8489 Compare October 23, 2025 20:10
@cvinayak cvinayak requested review from aescolar and Copilot October 23, 2025 20:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@cvinayak cvinayak added this to the v4.3.0 milestone Oct 23, 2025
@cvinayak
Copy link
Contributor Author

Prompt:

Analyse this pull request, and give me details of any drawbacks and regressions introduced, if any. Do let me know pros and cons of this pull request.
@zephyrproject-rtos/zephyr/pull/95391

Answer:
Here’s an analysis of the pull request “Bluetooth: Controller: In-system profiling and CPU overhead assertions” with pros, cons, and potential regressions.

PRs returned from lookup

data:
- url: "https://github.com/zephyrproject-rtos/zephyr/pull/95391"
  state: "open"
  draft: false
  title: "Bluetooth: Controller: In-system profiling and CPU overhead assertions"
  number: 95391
  created_at: "50 days"
  closed_at: ""
  merged_at: ""
  author: "cvinayak"
  labels:
  - "Enhancement"
  - "area: Bluetooth"
  - "area: Bluetooth Controller"
  comments: 2
  review_comments: 20
  mergeable: true
  mergeable_state: "blocked"
  rebaseable: true
  additions: 243
  deletions: 38
  changed_files: 6
  commits: 2
  base_ref: "main"
  head_ref: "github_in_system_profiling"
  requested_reviewers:
  - "Tronil"
  - "carlescufi"
  - "aescolar"
  - "thoh-ot"
  - "erbr-ot"
  - "wopu-ot"
  - "ppryga"
  assignees:
  - "cvinayak"

What this PR does (scope)

  • Adds on-target in-system profiling for ISR CPU usage and latency, with threshold assertions to catch excessive overhead when profiling is enabled.
  • Introduces a stress test that starts/stops many ticker nodes and measures max expiry latency.
  • Switches some logging output from LOG_INF to printk for profiling events to reduce logging-thread overhead interference.
  • Wraps profiling enter/exit instrumentation around several ISR paths with build-time guards.

Changed files (6)

  • subsys/bluetooth/controller/hci/hci.c
  • subsys/bluetooth/controller/ll_sw/nordic/lll/lll.c
  • subsys/bluetooth/controller/ll_sw/nordic/lll/lll_adv.c
  • subsys/bluetooth/controller/ll_sw/nordic/lll/lll_prof.c
  • subsys/bluetooth/controller/ll_sw/nordic/lll/lll_prof_internal.h
  • subsys/bluetooth/controller/ll_sw/ull.c

Highlights from the diffs

  • lll_prof.c:
    • Adds SoC-specific max timing thresholds for RADIO, LLL, ULL_HIGH, ULL_LOW and asserts if exceeded.
    • Increases tick counters from uint8_t to uint16_t to prevent overflow.
    • Adds getters for measured times in microseconds.
  • lll.c, lll_adv.c:
    • Guards all profiling enter/exit/capture/send calls with IS_ENABLED(CONFIG_BT_CTLR_PROFILE_ISR) to avoid overhead when disabled.
    • Adds a new isr_tx_done wrapper in advertising to capture latency and send profiling data.
  • hci.c:
    • Replaces LOG_INF with printk for NODE_RX_TYPE_PROFILE printing.
  • ull.c:
    • Adds a ticker_start/ticker_stop stress test, measuring max latency, and prints a summary (profiling getters + measured latency).

Pros

  • Low overhead instrumentation path when profiling is disabled:
    • Most profiling hooks are compiled out of hot paths via IS_ENABLED guards, avoiding runtime cost in normal builds.
  • Better measurement fidelity:
    • On-target capture of RADIO/LLL/ULL timing with microsecond conversion and wider counters.
    • Additional capture points in advertising ISR path to measure realistic latency.
  • Automated guardrails:
    • Assertions flag regressions in ISR/ULL timing immediately when profiling is enabled, making performance regressions easier to detect early.
  • Useful stress test:
    • Starts/stops many ticker nodes to measure initial and reschedule overlap latencies; helps validate the scheduler under load.
  • Clearer performance signal:
    • Switching profiling output to printk reduces logging-thread scheduling overhead, which otherwise skews ISR timing when using the logging subsystem.

Cons and potential drawbacks/regressions

  • Possible build-time/link-time regressions if any unguarded lll_prof_* calls remain elsewhere:
    • lll_prof_internal.h removed the previous no-op inline stubs when profiling is disabled. Any remaining unguarded calls in other files would now produce unresolved symbols when CONFIG_BT_CTLR_PROFILE_ISR=n. The PR guards the modified call sites, but out-of-tree or unmodified in-tree call sites could break.
  • ull.c ticker test appears to be unconditionally executed in ll_init:
    • sem_test_ticker, test_ticker_cb, and test_ticker_ext are declared under CONFIG_BT_CTLR_TEST, but the ll_init block that initializes the semaphore, starts the ticker nodes, blocks on k_sem_take, and stops the tickers is shown without an explicit CONFIG_BT_CTLR_TEST guard in the diff chunk.
    • If truly unconditional, this will fail to compile when CONFIG_BT_CTLR_TEST is disabled (undefined symbols), or if it compiles, it will run a heavy ticker test during controller init in all builds, impacting boot and timing. This looks like the most significant potential regression. Verify that the ll_init test block is wrapped by CONFIG_BT_CTLR_TEST in the full file.
  • printk dependency in hci.c:
    • The PR uses printk for profiling output without an explicit CONFIG_PRINTK/CONFIG_LOG_PRINTK guard at that call site. In configurations without printk enabled, this could fail to build. Other parts of the PR do guard printk usage with IS_ENABLED(CONFIG_LOG_PRINTK).
  • Assertion thresholds might be too tight or board/config dependent:
    • The max thresholds are hardcoded with a bifurcation for nRF54Lx vs others. Different workloads (e.g., differing numbers of adv/scan sets, PHY combinations, or logging configuration) could exceed these and assert when profiling is enabled. This is desirable in CI that validates those exact workloads, but could be surprising in other configs.
  • Slight behavioral changes in adv ISR when profiling is enabled:
    • The handler is changed to isr_tx_done which captures latency and then calls isr_done and lll_prof_send. This adds extra work inside the ISR when profiling is enabled and could extend interrupt time. Disabled by default, but worth noting for measurement runs.
  • Loss of log filtering/formatting for profiling lines:
    • Replacing LOG_INF with printk means those lines won’t integrate with the logging subsystem’s levels, backends, and timestamps. This is an intentional tradeoff for lower overhead but reduces flexibility.

Risk assessment

  • Low for default builds if:
    • CONFIG_BT_CTLR_PROFILE_ISR=n and CONFIG_BT_CTLR_TEST=n, and all profiling calls are properly guarded everywhere (not just in the modified files).
  • Medium to high if:
    • The ull.c ll_init ticker test block is not actually wrapped in CONFIG_BT_CTLR_TEST (could cause build breaks or runtime impact).
    • Any remaining unguarded lll_prof_* usage exists (link errors when profiling disabled).
    • printk is not available in some targets/configs (build errors in hci.c).

Suggestions to mitigate

  • Ensure the ull.c ll_init ticker test block is guarded by CONFIG_BT_CTLR_TEST (both declarations and usage).
  • Consider adding a Kconfig select to guarantee printk availability when CONFIG_BT_CTLR_PROFILE_ISR is enabled, or wrap the printk calls with the same IS_ENABLED(CONFIG_LOG_PRINTK) pattern used elsewhere.
  • Re-introduce safe no-op inline stubs in the header or audit the tree to guarantee no unguarded lll_prof_* calls remain outside the files updated in this PR (to preserve buildability with profiling disabled).
  • Document the assertion thresholds and when they’re expected to hold; consider making them configurable per-target if needed.

Performance notes from PR description

  • On nRF52833 with the ordered linked list LLL pipeline (from another PR), measured reduction of ~2 ticks in Radio ISR and ~3 ticks in LLL ISR, suggesting meaningful CPU savings under that configuration.
  • The ticker stress logs indicate ULL_LOW O(n) behavior is measurable, with initial overlap resolution latency and reschedule costs captured, providing actionable data for scheduling tuning.

Net summary

  • This is a valuable diagnostic/profiling enhancement that should help catch and quantify controller ISR/ULL performance regressions early.
  • Primary risk centers on guarding of new test code and removal of stubbed profiling functions; addressing those will keep the change safe for non-profiling, non-test builds.
carlescufi
carlescufi previously approved these changes Oct 24, 2025
@cvinayak cvinayak force-pushed the github_in_system_profiling branch 4 times, most recently from ff980ea to 999bdd9 Compare October 27, 2025 05:01
@cvinayak cvinayak removed this from the v4.3.0 milestone Oct 27, 2025
@cvinayak cvinayak force-pushed the github_in_system_profiling branch from f482cd7 to ff980ea Compare October 27, 2025 07:27
@cvinayak cvinayak added this to the v4.4.0 milestone Oct 27, 2025
@cvinayak cvinayak force-pushed the github_in_system_profiling branch 4 times, most recently from 2dc90e6 to 14456e5 Compare October 30, 2025 21:10
@cvinayak cvinayak force-pushed the github_in_system_profiling branch from 14456e5 to ad27c51 Compare November 22, 2025 19:20
Thalley
Thalley previously approved these changes Nov 26, 2025
Copy link
Contributor

@Thalley Thalley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like the way that the values in lll_prof.c are defined, but not enough to block this PR

@cvinayak
Copy link
Contributor Author

I still don't like the way that the values in lll_prof.c are defined, but not enough to block this PR

Thanks. I am trying to upstream the sample I use for profiling, but it's source is shared with the host adv_chain bsim CI and hence an issue. The profiling numbers have to be run on target, and hence can be stale very soon. But the implementation provides in-system information any day, provided the numbers are tuned (implementation asserts with the increased number being provided).

That said, the current numbers in the implementation does not cover nrf5340, which I have not profiled. The updates have to be in a new PR... hopefully by then be able to add/upstream the sample/test application that gives these numbers.

Add in-system ticker_start and ticker_stop tests.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
Updated in-system profiling for CPU usage with on target
measurements and added assertion checks.

Signed-off-by: Vinayak Kariappa Chettimada <vich@nordicsemi.no>
@fabiobaltieri fabiobaltieri merged commit 6692264 into zephyrproject-rtos:main Nov 28, 2025
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Bluetooth Controller area: Bluetooth Enhancement Changes/Updates/Additions to existing features

6 participants