Hello
I wanted to follow David's post with some concrete information.
First, to explain what the actual issue was in this case. We're porting RDM's UI to Avalonia as David said; so 2026.2 got many existing screens and controls ported to that new framework. Avalonia uses skia for graphics and defaults to rendering with hardware acceleration. If the machine doesn't have a GPU (e.g. a terminal server), hardware accelerated rendering has to take an emulated software path (WARP) and in most cases, that's still fine for an application like RDM (something like a video game would obviously not work as well).
We're using a third party control for the circular "spinner" (progress) animation that gets shown when you're connecting a session. Unfortunately, what we found here, is that this control has exceptionally bad performance characteristics when rendered in software on certain machines . Taking the WARP (emulated 3D acceleration) could cause the control to take 50% of your CPU cores. Forcing Avalonia to use software rendering via skia (rather than hardware rendering that's then emulated on the CPU) drops that to about 15% of your CPU cores.
Then we hit a second issue: normally the progress is shown while needed, and then hidden. The expectation is that a hidden control no longer renders or animates and that's largely true, except again in the case of this specific control. The way that the control decides to draw its animation is done in a different way that means on the current Avalonia version, it continues to animate even when hidden. It's hard to say where the bug really lies there, but I did speak with the Avalonia team and they've corrected that behaviour in Avalonia 12 (which we will migrate to later this year).
So, in essence, there were two specific issues: a poorly performing control in certain environments, that quietly runs away with performance even when not displayed; and again, all of this only in certain environments (which is almost certainly related to the specific Windows version and driver combinations in some way).
The lowest hanging fruit to fix this was simply to identify the problematic uses of that control and ensure it's animation is stopped rather than just being hidden when needed. That immediately prevented the runaway CPU, but not the cost of actually using the control. So the remaining fixes look like this:
We now detect a no-GPU environment at application launch
If we're in such an environment, we force software rendering rather than emulated hardware rendering. One way is not strictly better than the other, except if cases such as this happen. So we just force the software path for safety now. This may need to be looked at again in future if we ever adopt any control that requires hardware rendering, but that's not the case right now and I don't envisage such a scenario in the future.
If we're in such an environment, we proactively degrade or disable animations. This is generally better behaviour for an application running on a terminal server and now it's done automatically. So, if we have the same issue reoccur in future with a different control, it shouldn't affect these environments - the animation is already degraded or disabled for the environment.
The next part of my comments intertwines two ideas - why did it take so long to reach a resolution here, and what are we doing to prevent this happening again?
A little extra background: the was particularly unfortunate because as this thread shows we had a very similar problem with the 2026.1 release, in fact it appeared almost identical. Both of these issues required a very low-level of debugging to understand and both were inside third party components. In the 2026.1 case, it was caused by a bug in Yubico SDK which is source-available but needed in-depth analysis of the runtime behaviour of the application (if you're interested, you can read the analysis I gave to Yubico here ). In this case, the same level of analysis was needed but also required decompiling the third party control library and understanding their code.
So, first we have the problem that this level of analysis requires knowledge of specific tools and processes that are not widely known outside of certain teams, and that was aggravated by the fact that we are in the summer and we have people on vacation and so on. Unfortunately since the issue looked so similar to the 2026.1 problem, that led the first-line support and developers along the wrong track to start with.
Last week our CTO has worked to provide extra tooling to all developers on the RDM team that simplifies the analysis process: all developers now have access to tooling that can ingest the runtime debug information from Windows, align it with the debugging symbols from the specific build and load that into their IDE as if they were debugging any "normal" issue on their workstation.
Next, I'm going to talk with the QA team this week and make sure that acceptance tests for RDM Windows cover running in a terminal server and specifically looking for high idle performance anomalies that they might not usually notice. This isn't a panacea - in both cases on this thread it was not easy for us to reproduce the issues, but it will add an extra line of defence.
The third part is that I will talk with our support team and ask them that if we get a similar report in future, the very first thing to do is to ask for runtime debug information from the customer to accelerate the analysis/fix/release cycle; even in the case where the issuer fundamentally appears the same as known or recurring problem.
And finally, as I said; I've added extra security to the application when running in such an environment to proactively disable or degrade features that might lead to this exact scenario again.
I hope this helps to understand the situation and that we do take this seriously. As David wrote, we cannot promise it won't happen again, but we (and I, personally) deeply care about such problems and are working hard to do better.
Please, let me know if something isn't clear or you have any other questions.
Kind regards,