Replies: 4 comments 1 reply
-
Do you have Please provide simple samples, build, flash and steps to reproduce for a Zephyr supported nRF5x development boards. |
Beta Was this translation helpful? Give feedback.
-
|
ok. I vaguely remember another user mention similar issue and probably he created a github issue. As this discussion gives no more information to be able to reproduce the issue, and in case you do happen to fix it, feel free to send a pull request. |
Beta Was this translation helpful? Give feedback.
-
|
The only similar issue I found is #50438. My understanding is that issue was related to switching between central and peripheral in a dual role application. My application only uses the central role. I tried compiling my application with both CONFIG_BT_PERIPHERAL=y and n and my issue is reproducible in both cases. I'm just going to post some more info here for posterity in case it's useful. I think this is beyond my capability to fix. I'm using I realized that the artificial way I'm triggering the broken state in my application does an usual thing: I call bt_conn_disconnect() right after I call bt_gatt_subscribe(), without waiting for the subscribe to finish (i.e. not waiting for the params->subscribe callback). This often leads to a disconnect while the subscribe process is still running, leading to random -104 (ECONNRESET) errors in associated functions. For example: These errors don't always lead to the broken state. Sometimes it takes many tens of connects/disconnects and sometimes it breaks in two or three. With 1 connection per second the application typically ends up in a broken state in a minute or so. However if I correct this so that I disconnect only after the subscribe callback I can't reproduce the issue in this way anymore. So it must be related somehow. But I suspect this only fixes this specific artificial way I'm using of triggering the bug and that there is still an underlying issue somewhere else. I compared the output given by Also I'm not trusting this log much since I seem to be often missing messages due to the limited CONFIG_LOG_BUFFER_SIZE that I can fit in available RAM. Unfortunately I don't have a pair of nrf52832 devkits at hand to test the bt_gatt_subscribe/bt_conn_disconnect method with a modified central-hr example. |
Beta Was this translation helpful? Give feedback.
-
|
Thank you for your analysis... I will convert this discussion to an issue so that it is tracked better. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to diagnose an issue with a Zephyr application. I'm using a nrf52832 target. Zephyr 3.7.1. Same issue also reproducible with 3.7.2rc1.
The application uses Bluetooth central to connect a peripheral. The Bluetooth application code is modeled after the samples/bluetooth/central_hr and central_gatt_write examples.
During the lifetime of the application the Bluetooth connection is connected and disconnected many times. The issue is that sometimes Zephyr becomes stuck in a state where the connection to the device will consistently fail as if the peripheral is not present at all. Resetting the CPU via NVIC_SystemReset() restores functionality.
A reliable way to trigger this is to perform connects and disconnects in a loop to the same device with no GATT notifies/writes and after a while Zephyr will end up in the broken state.
I've verified that this is not an issue with the peripheral: Peripheral stays responsive when this happens. I can successfully connect to it from a different device. Power cycling the peripheral also does not help.
The exact symptom is that bt_conn_le_create() will return success, but then the connected() callback is called with conn_err=2. This is the exact same sequence of events that happens normally when trying to connect to a peripheral that is not present. So far my focus has been in trying to determine the difference between a) broken state and peripheral present and b) working state and peripheral absent.
In both cases: old_state is BT_CONN_INITIATING in bt_conn_set_state(). The backtraces [1] and the bt_conn object [2] from the connected() callback are identical.
I don't see any faults or stack overruns in any of the running Zephyr threads ("BT RX", "BT RX pri", "BT RX WQ", "BT LW WQ", "sysworkq", "idle" and "main").
It's also interesting that in the broken state the peripheral is still visible when doing bt_le_scan_start(), it's just the connection that fails.
In the broken state I don't see any CONNECT_IND packets sent to the peripheral in the Bluetooth sniffer.
As far as I can see, this is not a bt_conn_ref/bt_conn_unref counting issue with the application because: 1) the symptom with a dangling connection object in BT_CONN_DISCONNECTED state is different (with a dangling connection bt_conn_le_create() immediately returns -EINVAL when in my case it returns success) 2) logging connections in the broken state before attempting to create a connection using bt_conn_foreach() shows no existing connection objects.
Many thanks for any pointers on how to find what it causing this issue.
[1] Backtrace from the connected() callback when connection fails:
[2] bt_conn passed to the connected() callback when connection fails:
Beta Was this translation helpful? Give feedback.
All reactions