BMG "VM worker error: -12" / "exec queue reset detected" (kernel bugs triggered by out-of-VRAM conditions / xe_dma_buf_unmap?)
Hello,
I am receiving many xe issues in kernel log, using BMG e20b via SYCL in AMD Ryzen 9950x desktop hardware with 6.14 series kernels (rc3, rc4, 6.14 release) on multiple systems. Some systems have one e20b, some systems have two.
The issues seem to be triggered by system cleanup after out-of-VRAM condition occurs (buffer allocation failure). When the software stack tears down, it triggers these bugs.
I think we can rule out hardware related issues for two reasons:
- All hardware systems see these issues when out-of-VRAM condition is encountered
- If we do not trigger out-of-VRAM condition, systems run correctly at 100% load for days with no errors.
Here are some examples:
```
[ 77.847309] xe 0000:0c:00.0: Using 46-bit DMA addresses [ 77.927010] xe 0000:03:00.0: Using 46-bit DMA addresses
[ 196.737610] xe 0000:03:00.0: [drm] VM worker error: -12 [ 198.871903] ------------[ cut here ]------------ [ 198.871910] WARNING: CPU: 5 PID: 2601 at drivers/iommu/dma-iommu.c:841 __iommu_dma_unmap+0x15f/0x170 [ 198.871918] Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel tcp_bbr qrtr bnep amd_atl intel_rapl_msr intel_rapl_common btusb btrtl btintel btbcm snd_hda_codec_realtek btmtk snd_hda_codec_generic snd_hda_scodec_component mei_gsc_proxy snd_hda_codec_hdmi mei_gsc snd_hda_intel mei_me snd_intel_dspcfg pmt_crashlog pmt_telemetry mei edac_mce_amd pmt_class snd_intel_sdw_acpi bluetooth input_leds snd_hda_codec amdgpu binfmt_misc kvm_amd snd_hda_core nls_iso8859_1 xe snd_hwdep kvm snd_pcm intel_vsec snd_seq_midi drm_gpuvm amdxcp snd_seq_midi_event gpu_sched drm_panel_backlight_quirks polyval_clmulni drm_buddy snd_rawmidi polyval_generic drm_ttm_helper ghash_clmulni_intel sha256_ssse3 ttm sha1_ssse3 eeepc_wmi drm_exec aesni_intel snd_seq spd5118 asus_wmi drm_suballoc_helper crypto_simd snd_seq_device platform_profile drm_display_helper cryptd
[ 198.871963] sparse_keymap snd_timer wmi_bmof cec rapl rc_core snd i2c_piix4 i2c_algo_bit ccp k10temp soundcore i2c_smbus gpio_amdpt mac_hid sch_fq_codel nct6775 nct6775_core hwmon_vid msr parport_pc ppdev lp parport nvme_fabrics nvme_keyring efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid xfs nvme nvme_core r8169 ahci ucsi_acpi video thunderbolt libahci realtek typec_ucsi nvme_auth wmi typec
[ 198.871992] CPU: 5 UID: 0 PID: 2601 Comm: kworker/u132:12 Not tainted 6.14.0-061400-generic #202503241442 [ 198.871994] Hardware name: ASUS System Product Name/PRIME X870-P WIFI, BIOS 0825 11/29/2024
[ 198.871996] Workqueue: ttm ttm_bo_delayed_delete [ttm] [ 198.872007] RIP: 0010:__iommu_dma_unmap+0x15f/0x170 [ 198.872009] Code: a8 00 00 00 00 48 c7 45 b0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 a0 ff ff ff ff 4c 89 45 b8 4c 89 45 c0 e9 77 ff ff ff <0f> 0b e9 60 ff ff ff e8 75 43 65 00 0f 1f 44 00 00 90 90 90 90 90 [ 198.872011] RSP: 0018:ffff9c9284a7fca8 EFLAGS: 00010207 [ 198.872013] RAX: 0000000080000000 RBX: 00003ffe80000000 RCX: 0000000000000000 [ 198.872014] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 198.872015] RBP: ffff9c9284a7fd10 R08: ffff9c9284a7fcc8 R09: 0000000000000000
[ 198.872016] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffa26fe0000 [ 198.872017] R13: ffff918443237a10 R14: ffff9c9284a7fcb0 R15: ffff918443938600
[ 198.872018] FS: 0000000000000000(0000) GS:ffff919adf080000(0000) knlGS:0000000000000000 [ 198.872019] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 198.872020] CR2: 00007fff95261588 CR3: 000000099a840000 CR4: 0000000000f50ef0 [ 198.872022] PKRU: 55555554 [ 198.872023] Call Trace: [ 198.872024] <TASK> [ 198.872027] ? show_trace_log_lvl+0x1be/0x310 [ 198.872030] ? show_trace_log_lvl+0x1be/0x310 [ 198.872032] ? iommu_dma_unmap_sg+0xb4/0x150 [ 198.872034] ? show_regs.part.0+0x22/0x30 [ 198.872036] ? show_regs.cold+0x8/0x10 [ 198.872038] ? __iommu_dma_unmap+0x15f/0x170 [ 198.872039] ? __warn.cold+0xac/0x10c
[ 198.872041] ? __iommu_dma_unmap+0x15f/0x170
[ 198.872042] ? report_bug+0x114/0x160
[ 198.872045] ? handle_bug+0x6e/0xb0
[ 198.872047] ? exc_invalid_op+0x18/0x80 [ 198.872049] ? asm_exc_invalid_op+0x1b/0x20 [ 198.872053] ? __iommu_dma_unmap+0x15f/0x170 [ 198.872055] ? __iommu_dma_unmap+0xb9/0x170 [ 198.872056] iommu_dma_unmap_sg+0xb4/0x150
[ 198.872058] dma_unmap_sg_attrs+0x13e/0x170
[ 198.872062] xe_dma_buf_unmap+0x3d/0x90 [xe]
[ 198.872114] dma_buf_unmap_attachment+0x47/0x80
[ 198.872118] xe_ttm_bo_delete_mem_notify+0x60/0x80 [xe]
[ 198.872165] ttm_bo_cleanup_memtype_use+0x23/0x80 [ttm]
[ 198.872169] ttm_bo_delayed_delete+0x44/0xc0 [ttm]
[ 198.872172] process_one_work+0x174/0x350
[ 198.872175] worker_thread+0x34a/0x480
[ 198.872177] ? __pfx_worker_thread+0x10/0x10
[ 198.872178] kthread+0xf9/0x230
[ 198.872180] ? __pfx_kthread+0x10/0x10
[ 198.872182] ret_from_fork+0x44/0x70
[ 198.872185] ? __pfx_kthread+0x10/0x10
[ 198.872186] ret_from_fork_asm+0x1a/0x30
[ 198.872190] </TASK>
[ 198.872191] ---[ end trace 0000000000000000 ]---
[ 198.872417] ------------[ cut here ]------------
```
```
[ 198.872417] ------------[ cut here ]------------
[ 198.872419] WARNING: CPU: 1 PID: 2601 at drivers/iommu/dma-iommu.c:841 __iommu_dma_unmap+0x15f/0x170
[ 198.872423] Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel tcp_bbr qrtr bnep amd_atl intel_rapl_msr intel_rapl_common btusb btrtl btintel btbcm snd_hda_codec_realtek btmtk snd_hda_codec_generic snd_hda_scodec_component mei_gsc_proxy snd_hda_codec_hdmi mei_gsc snd_hda_intel mei_me snd_intel_dspcfg pmt_crashlog pmt_telemetry mei edac_mce_amd pmt_class snd_intel_sdw_acpi bluetooth input_leds snd_hda_codec amdgpu binfmt_misc kvm_amd snd_hda_core nls_iso8859_1 xe snd_hwdep kvm snd_pcm intel_vsec snd_seq_midi drm_gpuvm amdxcp snd_seq_midi_event gpu_sched drm_panel_backlight_quirks polyval_clmulni drm_buddy snd_rawmidi polyval_generic drm_ttm_helper ghash_clmulni_intel sha256_ssse3 ttm sha1_ssse3 eeepc_wmi drm_exec aesni_intel snd_seq spd5118 asus_wmi drm_suballoc_helper crypto_simd snd_seq_device platform_profile drm_display_helper cryptd [ 198.872438] sparse_keymap snd_timer wmi_bmof cec rapl rc_core snd i2c_piix4 i2c_algo_bit ccp k10temp soundcore i2c_smbus gpio_amdpt mac_hid sch_fq_codel nct6775 nct6775_core hwmon_vid msr parport_pc ppdev lp parport nvme_fabrics nvme_keyring efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid xfs nvme nvme_core r8169 ahci ucsi_acpi video thunderbolt libahci realtek typec_ucsi nvme_auth wmi typec [ 198.872447] CPU: 1 UID: 0 PID: 2601 Comm: kworker/u132:12 Tainted: G W 6.14.0-061400-generic #202503241442
[ 198.872449] Tainted: [W]=WARN
[ 198.872449] Hardware name: ASUS System Product Name/PRIME X870-P WIFI, BIOS 0825 11/29/2024
[ 198.872450] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 198.872455] RIP: 0010:__iommu_dma_unmap+0x15f/0x170
[ 198.872456] Code: a8 00 00 00 00 48 c7 45 b0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 a0 ff ff ff ff 4c 89 45 b8 4c 89 45 c0 e9 77 ff ff ff <0f> 0b e9 60 ff ff ff e8 75 43 65 00 0f 1f 44 00 00 90 90 90 90 90
[ 198.872457] RSP: 0018:ffff9c9284a7fca8 EFLAGS: 00010207
[ 198.872458] RAX: 0000000080000000 RBX: 00003ffe80000000 RCX: 0000000000000000
[ 198.872458] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 198.872459] RBP: ffff9c9284a7fd10 R08: ffff9c9284a7fcc8 R09: 0000000000000000
[ 198.872459] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffda6fe0000
[ 198.872460] R13: ffff918443234610 R14: ffff9c9284a7fcb0 R15: ffff91844393a800
[ 198.872460] FS: 0000000000000000(0000) GS:ffff919adee80000(0000) knlGS:0000000000000000
[ 198.872461] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 198.872461] CR2: 00007fffeb54d0d0 CR3: 000000099a840000 CR4: 0000000000f50ef0
[ 198.872462] PKRU: 55555554
[ 198.872462] Call Trace:
[ 198.872463] <TASK>
[ 198.872464] ? show_trace_log_lvl+0x1be/0x310
[ 198.872466] ? show_trace_log_lvl+0x1be/0x310
[ 198.872467] ? iommu_dma_unmap_sg+0xb4/0x150
[ 198.872468] ? show_regs.part.0+0x22/0x30
[ 198.872469] ? show_regs.cold+0x8/0x10
[ 198.872469] ? __iommu_dma_unmap+0x15f/0x170
[ 198.872470] ? __warn.cold+0xac/0x10c
[ 198.872471] ? __iommu_dma_unmap+0x15f/0x170
[ 198.872472] ? report_bug+0x114/0x160
[ 198.872473] ? handle_bug+0x6e/0xb0
[ 198.872474] ? exc_invalid_op+0x18/0x80
[ 198.872475] ? asm_exc_invalid_op+0x1b/0x20
[ 198.872477] ? __iommu_dma_unmap+0x15f/0x170
[ 198.872477] ? __iommu_dma_unmap+0xb9/0x170
[ 198.872478] iommu_dma_unmap_sg+0xb4/0x150
[ 198.872479] dma_unmap_sg_attrs+0x13e/0x170
[ 198.872480] xe_dma_buf_unmap+0x3d/0x90 [xe]
[ 198.872504] dma_buf_unmap_attachment+0x47/0x80
[ 198.872506] xe_ttm_bo_delete_mem_notify+0x60/0x80 [xe]
[ 198.872525] ttm_bo_cleanup_memtype_use+0x23/0x80 [ttm]
[ 198.872527] ttm_bo_delayed_delete+0x44/0xc0 [ttm]
[ 198.872528] process_one_work+0x174/0x350
[ 198.872530] worker_thread+0x34a/0x480
[ 198.872531] ? __pfx_worker_thread+0x10/0x10
[ 198.872531] kthread+0xf9/0x230
[ 198.872532] ? __pfx_kthread+0x10/0x10
[ 198.872533] ret_from_fork+0x44/0x70
[ 198.872534] ? __pfx_kthread+0x10/0x10
[ 198.872535] ret_from_fork_asm+0x1a/0x30
[ 198.872536] </TASK>
[ 198.872537] ---[ end trace 0000000000000000 ]---
[ 251.953247] workqueue: pm_runtime_work hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND
[ 263.327251] r8169 0000:09:00.0: invalid VPD tag 0x00 (size 0) at offset 0; assume missing optional EEPROM
[ 269.883380] workqueue: pm_runtime_work hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND
[ 448.366883] xe 0000:03:00.0: [drm] GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=6
[ 448.388756] xe 0000:03:00.0: [drm] Xe device coredump has been created
[ 448.388763] xe 0000:03:00.0: [drm] Check your /sys/class/drm/card0/device/devcoredump/data
[ 466.631168] xe 0000:03:00.0: [drm] VM worker error: -12
[ 467.612929] ------------[ cut here ]------------
```
```
[ 7333.065068] xe 0000:03:00.0: [drm] VM worker error: -12
[ 7334.079023] xe 0000:03:00.0: [drm] exec queue reset detected
[ 7334.579147] xe 0000:03:00.0: [drm] exec queue reset detected
[ 7335.579412] xe 0000:03:00.0: [drm] exec queue reset detected
[ 7337.308734] ------------[ cut here ]------------
[ 7337.308741] WARNING: CPU: 30 PID: 16841 at drivers/iommu/dma-iommu.c:841 __iommu_dma_unmap+0x15f/0x170
[ 7337.308749] Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel tcp_bbr qrtr bnep amd_atl intel_rapl_msr intel_rapl_common btusb btrtl btintel btbcm snd_hda_codec_realtek btmtk snd_hda_codec_generic snd_hda_scodec_component mei_gsc_proxy snd_hda_codec_hdmi mei_gsc snd_hda_intel mei_me snd_intel_dspcfg pmt_crashlog pmt_telemetry mei edac_mce_amd pmt_class snd_intel_sdw_acpi bluetooth input_leds snd_hda_codec amdgpu binfmt_misc kvm_amd snd_hda_core nls_iso8859_1 xe snd_hwdep kvm snd_pcm intel_vsec snd_seq_midi drm_gpuvm amdxcp snd_seq_midi_event gpu_sched drm_panel_backlight_quirks polyval_clmulni drm_buddy snd_rawmidi polyval_generic drm_ttm_helper ghash_clmulni_intel sha256_ssse3 ttm sha1_ssse3 eeepc_wmi drm_exec aesni_intel snd_seq spd5118 asus_wmi drm_suballoc_helper crypto_simd snd_seq_device platform_profile drm_display_helper cryptd
[ 7337.308800] sparse_keymap snd_timer wmi_bmof cec rapl rc_core snd i2c_piix4 i2c_algo_bit ccp k10temp soundcore i2c_smbus gpio_amdpt mac_hid sch_fq_codel nct6775 nct6775_core hwmon_vid msr parport_pc ppdev lp parport nvme_fabrics nvme_keyring efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid xfs nvme nvme_core r8169 ahci ucsi_acpi video thunderbolt libahci realtek typec_ucsi nvme_auth wmi typec
[ 7337.308831] CPU: 30 UID: 0 PID: 16841 Comm: kworker/u133:6 Tainted: G W 6.14.0-061400-generic #202503241442
[ 7337.308834] Tainted: [W]=WARN
[ 7337.308835] Hardware name: ASUS System Product Name/PRIME X870-P WIFI, BIOS 0825 11/29/2024
[ 7337.308837] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 7337.308847] RIP: 0010:__iommu_dma_unmap+0x15f/0x170
[ 7337.308849] Code: a8 00 00 00 00 48 c7 45 b0 00 00 00 00 48 c7 45 c8 00 00 00 00 48 c7 45 a0 ff ff ff ff 4c 89 45 b8 4c 89 45 c0 e9 77 ff ff ff <0f> 0b e9 60 ff ff ff e8 75 43 65 00 0f 1f 44 00 00 90 90 90 90 90
[ 7337.308851] RSP: 0018:ffff9c9285833ca8 EFLAGS: 00010207
[ 7337.308853] RAX: 000000008f000000 RBX: 00003ff000000000 RCX: 0000000000000000
[ 7337.308854] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 7337.308855] RBP: ffff9c9285833d10 R08: ffff9c9285833cc8 R09: 0000000000000000
[ 7337.308856] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffeca1b80000
[ 7337.308857] R13: ffff918443234610 R14: ffff9c9285833cb0 R15: ffff91844393a800
[ 7337.308858] FS: 0000000000000000(0000) GS:ffff919adfd00000(0000) knlGS:0000000000000000
[ 7337.308859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7337.308860] CR2: 00007ffff6f4b020 CR3: 0000001247b90000 CR4: 0000000000f50ef0
[ 7337.308861] PKRU: 55555554
[ 7337.308862] Call Trace:
[ 7337.308864] <TASK>
[ 7337.308867] ? show_trace_log_lvl+0x1be/0x310
[ 7337.308870] ? show_trace_log_lvl+0x1be/0x310
[ 7337.308872] ? iommu_dma_unmap_sg+0xb4/0x150
[ 7337.308874] ? show_regs.part.0+0x22/0x30
[ 7337.308876] ? show_regs.cold+0x8/0x10
[ 7337.308877] ? __iommu_dma_unmap+0x15f/0x170
[ 7337.308878] ? __warn.cold+0xac/0x10c
[ 7337.308880] ? __iommu_dma_unmap+0x15f/0x170
[ 7337.308882] ? report_bug+0x114/0x160
[ 7337.308885] ? handle_bug+0x6e/0xb0
[ 7337.308887] ? exc_invalid_op+0x18/0x80
[ 7337.308889] ? asm_exc_invalid_op+0x1b/0x20
[ 7337.308892] ? __iommu_dma_unmap+0x15f/0x170
[ 7337.308894] ? __iommu_dma_unmap+0xb9/0x170
[ 7337.308895] iommu_dma_unmap_sg+0xb4/0x150
[ 7337.308897] dma_unmap_sg_attrs+0x13e/0x170
[ 7337.308901] xe_dma_buf_unmap+0x3d/0x90 [xe]
[ 7337.308966] dma_buf_unmap_attachment+0x47/0x80
[ 7337.308970] xe_ttm_bo_delete_mem_notify+0x60/0x80 [xe]
[ 7337.309018] ttm_bo_cleanup_memtype_use+0x23/0x80 [ttm]
[ 7337.309023] ttm_bo_delayed_delete+0x44/0xc0 [ttm]
[ 7337.309026] process_one_work+0x174/0x350
[ 7337.309030] worker_thread+0x34a/0x480
[ 7337.309031] ? _raw_spin_lock_irqsave+0xe/0x20
[ 7337.309034] ? __pfx_worker_thread+0x10/0x10
[ 7337.309035] kthread+0xf9/0x230
[ 7337.309038] ? __pfx_kthread+0x10/0x10
[ 7337.309039] ret_from_fork+0x44/0x70
[ 7337.309042] ? __pfx_kthread+0x10/0x10
[ 7337.309044] ret_from_fork_asm+0x1a/0x30
[ 7337.309047] </TASK>
[ 7337.309048] ---[ end trace 0000000000000000 ]---
```
We are very grateful for any advice or bugfix you can provide.
Regards,
Kumi
issue