| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Remove MPS/MRRS Kconfig settings (CONFIG_PCIE_BUS_*) that worked
around a WiFi device defect; use a quirk or boot-time
"pci=pcie_bus_tune_*" kernel parameter instead (Bjorn Helgaas)
- Always lift 2.5GT/s restriction in PCIe failed link retraining to
avoid clamping a link to 2.5GT/s after hot-plug changes the device
(Maciej W. Rozycki)
- Request bus reassignment when not probe-only to fix an enumeration
regression on Marvell CN106XX and possibly other DT-based systems
(Ratheesh Kannoth)
- Fix procfs race between pci_proc_init() and pci_bus_add_device()
that resulted in 'proc_dir_entry ... already registered' warnings
and pointer corruption (Krzysztof Wilczyński)
- Fix sysfs race that causes 'duplicate filename' warnings and boot
panics by converting PCI resource files to static attributes
(Krzysztof Wilczyński)
- Expose sysfs 'resourceN_resize' attributes only on platforms with
PCI mmap (Krzysztof Wilczyński)
- Require CAP_SYS_ADMIN to write to sysfs 'resourceN_resize'
attributes (Krzysztof Wilczyński)
- Add security_locked_down(LOCKDOWN_PCI_ACCESS) to alpha PCI resource
mmap path to match the generic path (Krzysztof Wilczyński)
- Use kstrtobool() to parse the 'rom' attribute input to avoid the
unexpected behavior of enabling the ROM when writing '0' with no
trailing newline (Krzysztof Wilczyński)
Resource management:
- Improve resource claim logging for debuggability (Ilpo Järvinen)
- Clean up several uses of const parameters (Ilpo Järvinen)
- Check option ROM header signatures and lengths before accessing to
avoid page faults and alignment faults (Guixin Liu)
ASPM:
- Don't reconfigure ASPM when entering low-power D-state; only do it
when returning back to D0 (Carlos Bilbao)
Power management:
- During suspend, set power state to 'unknown' for all devices, not
just those with drivers (Lukas Wunner)
- Skip restoring Resizable BARs and VF Resizable BARs if device
doesn't respond to config reads, to avoid invalid array accesses
(Marco Nenciarini)
- Add pci_suspend_retains_context() so drivers can tell whether
devices retain internal state across suspend/resume, since some
platforms reset devices on suspend; use this in nvme to avoid
issues on Qcom RCs (Manivannan Sadhasivam)
Power control:
- Only to power on/off devices that actually support power control to
avoid poking at incompatible devices mentioned in DT (Manivannan
Sadhasivam)
Virtualization and resets:
- Log device readiness timeouts as errors, not warnings, because the
device is likely unusable in this case (Bjorn Helgaas)
- Wait for device readiness after soft reset (D3hot ->
D0uninitialized transition), when the device may respond with
Request Retry Status (RRS) if it needs more time to initialize
(Bjorn Helgaas)
- Drop unnecessary retries when restoring BARs because resets should
now already include all required delays (Lukas Wunner)
- Avoid FLR for MediaTek MT7925 WiFi, where FLR fails after a VM
terminates uncleanly (Jose Ignacio Tornos Martinez)
- Avoid SBR for Qualcomm WCN6855/WCN7850 WiFi, SDX62/SDX65 modems,
which seem not to support it correctly (Jose Ignacio Tornos
Martinez)
Peer-to-peer DMA:
- Prevent P2PDMA as well as CPU access to non-mappable BARs, e.g.,
s390 ISM BARs (Matt Evans)
- Add Intel QAT, DSA, IAA devices to whitelist (Lukas Wunner)
Endpoint framework:
- Add endpoint controller APIs for use by function drivers to
discover auxiliary blocks like DMA engines (Koichiro Den)
- Remember DesignWare eDMA engine base/size and expose them via the
EPC aux-resource API (Koichiro Den)
- Add endpoint embedded doorbell fallback, used if MSI allocation
fails (Koichiro Den)
- Validate BAR index and remove dead BAR read in endpoint doorbell
test (Carlos Bilbao)
- Unwind MSI/MSI-X vectors if NTB initialization fails part-way
through (Koichiro Den)
- Cache sleepable pci_irq_vector() value at ISR setup to avoid
calling it from hardirq context (Koichiro Den)
- Call sleepable pci_epc_raise_irq() from a work item instead of
atomic context, e.g., when setting bits in NTB peer doorbells in
the ntb_peer_db_set() path (Koichiro Den)
- Report 0-based vNTB doorbell vector to account for link event 0 and
historically skipped slot 1 (Koichiro Den)
- Prevent configfs writes to vNTB db_count and other values that are
already in use after EPC attach (Koichiro Den)
- Account for vNTB db_valid reserved slots (link event 0 and
historically skipped slot 1) so they don't appear as valid
doorbells (Koichiro Den)
- Implement vNTB .db_vector_count()/mask() for doorbells so clients
can use multiple vectors and avoid thundering herds (Koichiro Den)
- Report 0-based NTB doorbell vector to account for link event 0 and
historically skipped slot 1 (Koichiro Den)
- Fix doorbell bitmask and IRQ vector handling to clear only
specified bits, use the correct vector for non-contiguous Linux IRQ
numbers, and validate incoming vectors (Koichiro Den)
- Implement NTB .db_vector_count()/mask() for doorbells so clients
can use multiple vectors (Koichiro Den)
Native PCIe controller infrastructure:
- Add pci_host_common_link_train_delay() for the mandatory delay
after > 5GT/s Link training completes and use it for cadence HPA,
j721e, LGA; dwc; aardvark, mediatek-gen3, rzg3s (Hans Zhang)
- Protect root bus removal with rescan lock in altera, brcmstb,
cadence, dwc, iproc, mediatek, plda, rockchip to prevent
use-after-free or crashes when racing with sysfs rescan or hotplug
(Hans Zhang)
- Add pci_host_common_parse_ports() for use by any native driver to
parse Root Port properties (per-Link features like width, speed,
PHY, power and reset control, etc should be described in Root Port
stanzas, not the host bridge; currently only reset GPIOs
implemented) (Sherry Sun)
New native PCIe controller drivers:
- Add DT binding and driver for UltraRISC DP1000 PCIe controller
(Xincheng Zhang, Jia Wang)
Altera PCIe controller driver:
- Do not dispose of the parent IRQ mapping, which belongs to the
parent interrupt controller (Mahesh Vaidya)
- Fix chained IRQ handler ordering issue and resource leaks on probe
failure (Mahesh Vaidya)
AMD MDB PCIe controller driver:
- Assert PERST# on shutdown so any connected Endpoints are held in
reset during shutdown (Sai Krishna Musham)
Amlogic Meson PCIe controller driver:
- Propagate devm_add_action_or_reset() failure to fix probe error
path (Shuvam Pandey)
- Add .remove() callback to deinitialize the host bridge and power
off the PHY (Shuvam Pandey)
Broadcom iProc PCIe controller driver:
- Restore .map_irq() assignment; its removal broke INTx on the iproc
platform bus driver (Mark Tomlinson)
Broadcom STB PCIe controller driver:
- No change, but products using certain WiFi devices may be affected
by removal of CONFIG_PCIE_BUS_* (see above)
Freescale i.MX6 PCIe controller driver:
- Move IMX6SX_GPR12_PCIE_TEST_POWERDOWN handling into the core reset
functions (Richard Zhu)
- Assert PERST# before enabling regulators to ensure that even if
power is enabled, endpoint stays inactive until REFCLK is stable
(Sherry Sun)
- Parse reset properties in Root Port nodes (falling back to host
bridge) to help support Key E connectors and the pwrctrl framework
(Sherry Sun)
- Configure i.MX95 REF_USE_PAD before PHY reset (Richard Zhu)
- Assert i.MX95 ref_clk_en after reference clock stabilizes (Richard
Zhu)
- Integrate new pwrctrl API for DTs with Root Port-level power
supplies (Sherry Sun)
Intel Gateway PCIe controller driver:
- Enable clock before PHY init for correct ordering (Florian Eckert)
- Add .start_link() callback so the driver works again (Florian
Eckert)
- Stop overwriting the ATU base address discovered by
dw_pcie_get_resources() (Florian Eckert)
- Add DT 'atu' region since this is hardware-specific, and fall back
to driver default if lacking (Florian Eckert)
Loongson PCIe controller driver:
- Ignore downstream devices only on internal bridges to avoid
Loongson hardware issue (Rong Zhang)
- Quirk old Loongson-3C6000 bridges that advertise incorrect
supported link speeds (Ziyao Li)
Marvell MVEBU PCIe controller driver:
- Use fixed-width interrupt masks to avoid truncation in 64-bit
builds (Rosen Penev)
MediaTek PCIe controller driver:
- Use FIELD_PREP() to fix incorrect operator precedence in
PCIE_FTS_NUM_L0 (Li RongQing)
- Fix IRQ domain leak when port fails to enable (Manivannan
Sadhasivam)
- Use actual physical address for MSI message address instead of
virt_to_phys() (Manivannan Sadhasivam)
- Add EcoNet EN7528 to DT binding (Caleb James DeLisle)
MediaTek PCIe Gen3 controller driver:
- Deassert PCIE_PHY_RSTB so REFCLK is stable for at least 100ms
(PCIE_T_PVPERL_MS) before deasserting PERST# (Jian Yang)
- Add .shutdown() to assert PERST# before powering down device (Jian
Yang)
- Do full device power down on removal, including asserting PERST#,
when removing driver (Chen-Yu Tsai)
- Fix a 'failed to create pwrctrl devices' error message that was
inadvertently skipped (Chen-Yu Tsai)
NVIDIA Tegra194 PCIe controller driver:
- Program the DesignWare PORT_AFR L1 entrance latency based on the
'aspm-l1-entry-delay-ns' DT property (Manikanta Maddireddy)
Qualcomm PCIe controller driver:
- Add Eliza SoC compatible in DT binding (Krishna Chaitanya Chundru)
- Set max OPP during resume so DBI register accesses don't fail with
NoC errors (Qiang Yu)
- Add pci_host_common_d3cold_possible() to determine whether
downstream devices are already in D3hot and wakeup-enabled devices
are capable of generating PME from D3cold (Krishna Chaitanya
Chundru)
- Add .get_ltssm() callback to get the LTSSM status without DBI,
since DBI may be inaccessible after PME_Turn_Off (Krishna Chaitanya
Chundru)
- Power down PHY via PARF_PHY_CTRL before disabling rails/clocks to
avoid power leakage (Krishna Chaitanya Chundru)
- Decide whether suspend should put the link in L2 and power down
using pci_host_common_d3cold_possible() instead of checking whether
ASPM L1 is enabled (Krishna Chaitanya Chundru)
- Add qcom D3cold support to tear down interconnect bandwidth and OPP
votes (Krishna Chaitanya Chundru)
- Handle unsupported mixed PERST#/PHY DT configurations, e.g., PHY in
RP node while PERST# is in the RC node, but warn about the DT issue
(Qiang Yu)
- Program T_POWER_ON based on DT 't-power-on-us' property in case
hardware advertises incorrect values (Krishna Chaitanya Chundru)
- Disable ASPM L0s for SA8775P (Shawn Guo)
- Initialize DWC MSI lock for firmware-managed ECAM hosts, which
don't use the dw_pcie_host_init() path that initializes the lock
(Yadu M G)
Renesas RZ/G3S PCIe controller driver:
- Add RZ/V2N DT support (Lad Prabhakar)
SOPHGO PCIe controller driver:
- Add 'dma-coherent' DT property for sg2042-pcie driver (Han Gao)
Synopsys DesignWare PCIe controller driver:
- Apply ECRC TLP Digest workaround for all DesignWare cores prior to
5.10a, not just 4.90a and 5.00a (Manikanta Maddireddy)
- Use common struct dw_pcie 'mode' rather than duplicating it in
artpec6, dra7xx, dwc-pcie, and keembay driver structs (Hans Zhang)
- Use DEFINE_SHOW_ATTRIBUTE for ltssm_status debugfs to reduce
boilerplate and fix a seq_file memory leak by including a
.release() callback (Hans Zhang)
- Fix a signedness bug in fault injection test code (Dan Carpenter)
- Avoid NULL pointer dereference when tearing down debugfs for
controller that lacks RAS DES capability (Shuvam Pandey)
MicroSemi Switchtec management driver:
- Add Gen6 Device IDs (Ben Reed)
Miscellaneous:
- Remove unused gpio.h include from amd-mdb, designware-plat, fu740,
visconti drivers (Andy Shevchenko)
- Fix typos in documentation (josh ziegler)
- Use FIELD_MODIFY() instead of open-coding it (Hans Zhang)"
* tag 'pci-v7.2-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (168 commits)
PCI/sysfs: Use kstrtobool() to parse the ROM attribute input
PCI/sysfs: Limit BAR resize attribute scope to platforms with PCI mmap
PCI/sysfs: Remove pci_create_legacy_files() and pci_sysfs_init()
PCI/sysfs: Convert legacy I/O and memory attributes to static definitions
PCI/sysfs: Add __weak pci_legacy_has_sparse() helper
alpha/PCI: Compute legacy size in pci_mmap_legacy_page_range()
PCI: Add macros for legacy I/O and memory address space sizes
PCI/sysfs: Remove pci_{create,remove}_sysfs_dev_files()
alpha/PCI: Convert resource files to static attributes
alpha/PCI: Add static PCI resource attribute macros
alpha/PCI: Remove WARN from __pci_mmap_fits() and __legacy_mmap_fits()
alpha/PCI: Fix __pci_mmap_fits() overflow for zero-length BARs
alpha/PCI: Use PCI resource accessor macros
alpha/PCI: Use BAR index in sysfs attr->private instead of resource pointer
alpha/PCI: Add security_locked_down() check to pci_mmap_resource()
PCI/sysfs: Limit pci_sysfs_init() late_initcall compile scope
PCI/sysfs: Add stubs for pci_{create,remove}_sysfs_dev_files()
PCI/sysfs: Warn about BAR resize failure in __resource_resize_store()
PCI/sysfs: Convert PCI resource files to static attributes
PCI/proc: Fix race between pci_proc_init() and pci_bus_add_device()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull MSI irq fix from Ingo Molnar:
- Revert a change that added a bad iounmap(NULL) call
to the MSI IRQ support code (Yuanhe Shu)
* tag 'irq-msi-2026-06-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "PCI/MSI: Unmap MSI-X region on error"
|
|
- Fix typos in documentation (josh ziegler)
- Use FIELD_MODIFY() instead of open-coding it (Hans Zhang)
* pci/misc:
PCI: Use FIELD_MODIFY() instead of open-coding it
Documentation: PCI: Fix typos
|
|
- Remove unused gpio.h include from amd-mdb, designware-plat, fu740,
visconti drivers (Andy Shevchenko)
* pci/controller/misc:
PCI: visconti: Drop unused include
PCI: fu740: Drop unused include
PCI: designware-plat: Drop unused include
PCI: amd-mdb: Use the right GPIO header
|
|
- Add common TLP Type macros (MRd/Wr, IORd/Wr, CfgRd/Wr 0, CfgRd/Wr 1, Msg)
and use them in aspeed, cadence, dwc, mediatek, tegra drivers (Hans
Zhang)
* pci/controller/tlp_macros:
PCI: cadence: Use common TLP type macros
PCI: dwc: Replace ATU type macros with common TLP type macros
PCI: Add common TLP type macros and convert aspeed/mediatek
|
|
- Protect root bus removal with rescan lock in altera, brcmstb, cadence,
dwc, iproc, mediatek, plda, rockchip to prevent use-after-free or crashes
when racing with sysfs rescan or hotplug (Hans Zhang)
* pci/controller/rescan_lock:
PCI: rockchip: Protect root bus removal with rescan lock
PCI: plda: Protect root bus removal with rescan lock
PCI: mediatek: Protect root bus removal with rescan lock
PCI: iproc: Protect root bus removal with rescan lock
PCI: dwc: Protect root bus removal with rescan lock
PCI: cadence: Protect root bus removal with rescan lock
PCI: brcmstb: Protect root bus removal with rescan lock
PCI: altera: Protect root bus removal with rescan lock
|
|
- Add pci_host_common_link_train_delay() for the mandatory delay after
> 5GT/s Link training completes and use it for cadence HPA, j721e, LGA;
dwc; aardvark, mediatek-gen3, rzg3s (Hans Zhang)
* pci/controller/link_train_delay:
PCI: rzg3s-host: Use common pci_host_common_link_train_delay() helper
PCI: mediatek-gen3: Add 100 ms delay after link up
PCI: aardvark: Add 100 ms delay after link training
PCI: dwc: Use common pci_host_common_link_train_delay() helper
PCI: cadence-hpa: Add post-link delay
PCI: cadence: Add post-link delay for LGA and j721e glue driver
PCI: Add pci_host_common_link_train_delay() helper
# Conflicts:
# drivers/pci/controller/pci-host-common.h
|
|
- Remove unused LIST_HEAD(res) (Lad Prabhakar)
* pci/controller/rcar-host:
PCI: rcar-host: Remove unused LIST_HEAD(res)
|
|
- Use fixed-width interrupt masks to avoid truncation in 64-bit builds
(Rosen Penev)
* pci/controller/mvebu:
PCI: mvebu: Use fixed-width interrupt masks to avoid truncation in 64-bit builds
|
|
- Deassert PCIE_PHY_RSTB so REFCLK is stable for at least 100ms
(PCIE_T_PVPERL_MS) before deasserting PERST# (Jian Yang)
- Add .shutdown() to assert PERST# before powering down device (Jian Yang)
- Do full device power down on removal, including asserting PERST#, when
removing driver (Chen-Yu Tsai)
- Fix a 'failed to create pwrctrl devices' error message that was
inadvertently skipped (Chen-Yu Tsai)
* pci/controller/mediatek-gen3:
PCI: mediatek-gen3: Fix incorrectly skipped pwrctrl error message
PCI: mediatek-gen3: Do full device power down on removal
PCI: mediatek-gen3: Add a .shutdown() callback to control PERST# signal
PCI: mediatek-gen3: Fix PERST# control timing during system startup
|
|
- Use FIELD_PREP() to fix incorrect operator precedence in PCIE_FTS_NUM_L0
(Li RongQing)
- Fix IRQ domain leak when port fails to enable (Manivannan Sadhasivam)
- Use actual physical address for MSI message address instead of
virt_to_phys() (Manivannan Sadhasivam)
- Add EcoNet EN7528 to DT binding (Caleb James DeLisle)
* pci/controller/mediatek:
dt-bindings: PCI: mediatek: Add support for EcoNet EN7528
PCI: mediatek: Use actual physical address instead of virt_to_phys()
PCI: mediatek: Fix IRQ domain leak when port fails to enable
PCI: mediatek: Fix operator precedence in PCIE_FTS_NUM_L0 macro
|
|
- Ignore downstream devices only on internal bridges to avoid Loongson
hardware issue (Rong Zhang)
- Quirk old Loongson-3C6000 bridges that advertise incorrect supported link
speeds (Ziyao Li)
* pci/controller/loongson:
PCI: loongson: Override PCIe bridge supported speeds for Loongson-3C6000 series
PCI: loongson: Do not ignore downstream devices on external bridges
|
|
- Restore .map_irq() assignment that broke INTx on the iproc platform bus
driver (Mark Tomlinson)
* pci/controller/iproc-bcma:
PCI: iproc: Restore .map_irq() for the platform bus driver
|
|
- Add UltraRISC DP1000 PCIe controller DT binding and driver (Jia Wang)
* pci/controller/dwc-ultrarisc:
PCI: ultrarisc: Add UltraRISC DP1000 PCIe Root Complex driver
dt-bindings: PCI: Add UltraRISC DP1000 PCIe controller
|
|
- Program the DesignWare PORT_AFR L1 entrance latency based on the
'aspm-l1-entry-delay-ns' DT property (Manikanta Maddireddy)
* pci/controller/dwc-tegra194:
PCI: tegra194: Use aspm-l1-entry-delay-ns DT property for L1 entrance latency
|
|
- Set max OPP during resume so DBI register accesses don't fail with NoC
errors (Qiang Yu)
- Add pci_host_common_d3cold_possible() to determine whether downstream
devices are already in D3hot and wakeup-enabled devices are capable of
generating PME from D3cold (Krishna Chaitanya Chundru)
- Add a .get_ltssm() callback to get the LTSSM status without DBI, since
DBI may be inaccessible after PME_Turn_Off (Krishna Chaitanya Chundru)
- Power down PHY via PARF_PHY_CTRL before disabling rails/clocks to avoid
power leakage (Krishna Chaitanya Chundru)
- Decide whether suspend should put the link in L2 and power down using
pci_host_common_d3cold_possible() instead of checking whether ASPM L1 is
enabled (Krishna Chaitanya Chundru)
- Add qcom D3cold support to tear down interconnect bandwidth and OPP votes
(Krishna Chaitanya Chundru)
- Handle unsupported mixed PERST#/PHY DT configurations, e.g., PHY in RP
node while PERST# is in the RC node, but warn about the DT issue (Qiang
Yu)
- Add pcie_encode_t_power_on() to encode L1SS T_POWER_ON fields (Krishna
Chaitanya Chundru)
- Add dw_pcie_program_t_power_on() to program T_POWER_ON (Krishna Chaitanya
Chundru)
- Program qcom T_POWER_ON based on DT 't-power-on-us' property in case
hardware advertises incorrect values (Krishna Chaitanya Chundru)
- Disable ASPM L0s for SA8775P (Shawn Guo)
- Initialize DWC MSI lock for firmware-managed ECAM hosts, which don't use
the dw_pcie_host_init() path that initializes the lock (Yadu M G)
* pci/controller/dwc-qcom:
PCI: qcom: Initialize DWC MSI lock for firmware-managed ECAM hosts
PCI: qcom: Disable ASPM L0s for SA8775P
PCI: qcom: Program T_POWER_ON
PCI: dwc: Add dw_pcie_program_t_power_on() to program T_POWER_ON
PCI/ASPM: Add pcie_encode_t_power_on() helper to encode L1SS T_POWER_ON fields
PCI: qcom: Handle mixed PERST#/PHY DT configuration
PCI: qcom: Add D3cold support
PCI: dwc: Use common D3cold eligibility helper in suspend path
PCI: qcom: Power down PHY via PARF_PHY_CTRL before disabling rails/clocks
PCI: qcom: Add .get_ltssm() callback to query LTSSM status
PCI: host-common: Add pci_host_common_d3cold_possible() helper
PCI: qcom: Set max OPP before DBI access during resume
# Conflicts:
# drivers/pci/controller/pci-host-common.c
|
|
- Propagate devm_add_action_or_reset() failure to fix probe error path
(Shuvam Pandey)
- Add a .remove() callback to deinitialize the host bridge and power off
the PHY (Shuvam Pandey)
* pci/controller/dwc-meson:
PCI: meson: Add missing remove callback
PCI: meson: Propagate devm_add_action_or_reset() failure
|
|
- Enable clock before PHY init for correct ordering (Florian Eckert)
- Add .start_link() callback so the driver works again (Florian Eckert)
- Stop overwriting the ATU base address discovered by
dw_pcie_get_resources() (Florian Eckert)
- Add DT 'atu' region since this is hardware-specific, and fall back to
driver default if lacking (Florian Eckert)
* pci/controller/dwc-intel-gw:
dt-bindings: PCI: intel,lgm-pcie: Add 'atu' resource
PCI: intel-gw: Fix ATU base address setup and add optional DT 'atu' region
PCI: intel-gw: Add .start_link() callback
PCI: intel-gw: Enable clock before PHY init
PCI: intel-gw: Move interrupt enable to own function
PCI: intel-gw: Remove unused PCIE_APP_INTX_OFST definition
|
|
- Move IMX6SX_GPR12_PCIE_TEST_POWERDOWN handling into the core reset
functions (Richard Zhu)
- Add pci_host_common_parse_ports() for use by any native driver to parse
Root Port properties (currently only reset GPIOs) (Sherry Sun)
- Assert PERST# before enabling regulators to ensure that even if power is
enabled, endpoint stays inactive until REFCLK is stable (Sherry Sun)
- Parse reset properties in Root Port nodes (falling back to host bridge)
to help support Key E connectors and the pwrctrl framework (Sherry Sun)
- Configure i.MX95 REF_USE_PAD before PHY reset (Richard Zhu)
- Assert i.MX95 ref_clk_en after reference clock stabilizes (Richard Zhu)
- Integrate new pwrctrl API for DTs with Root Port-level power supplies
(Sherry Sun)
* pci/controller/dwc-imx6:
PCI: imx6: Integrate new pwrctrl API
PCI: imx6: Assert ref_clk_en after reference clock stabilizes on i.MX95
PCI: imx6: Configure REF_USE_PAD before PHY reset for i.MX95
PCI: imx6: Parse 'reset-gpios' in Root Port nodes
PCI: imx6: Assert PERST# before enabling regulators
PCI: host-generic: Add common helpers for parsing Root Port properties
dt-bindings: PCI: fsl,imx6q-pcie: Add reset GPIO in Root Port node
PCI: imx6: Fix IMX6SX_GPR12_PCIE_TEST_POWERDOWN handling
|
|
- Assert PERST# on shutdown so any connected Endpoints are held in reset
during shutdown (Sai Krishna Musham)
* pci/controller/dwc-amd-mdb:
PCI: amd-mdb: Assert PERST# on shutdown
|
|
- Apply ECRC TLP Digest workaround for all DesignWare cores prior to 5.10a,
not just 4.90a and 5.00a (Manikanta Maddireddy)
- Use common struct dw_pcie 'mode' rather than duplicating it in artpec6,
dra7xx, dwc-pcie, and keembay driver structs (Hans Zhang)
- Use DEFINE_SHOW_ATTRIBUTE for ltssm_status debugfs to reduce boilerplate
and fix a seq_file memory leak by including a .release() callback (Hans
Zhang)
- Fix a signedness bug in fault injection test code (Dan Carpenter)
- Avoid NULL pointer dereference when tearing down debugfs for controller
that lacks RAS DES capability (Shuvam Pandey)
* pci/controller/dwc:
PCI: dwc: Avoid dwc_pcie_rasdes_debugfs_deinit() NULL dereference when no RAS DES capability
PCI: dwc: Fix signedness bug in fault injection test code
PCI: dwc: Use DEFINE_SHOW_ATTRIBUTE for ltssm_status debugfs
PCI: keembay: Use common mode field in struct dw_pcie
PCI: dwc: Use common mode field in struct dw_pcie
PCI: artpec6: Use common mode field in struct dw_pcie
PCI: dra7xx: Use common mode field in struct dw_pcie
PCI: dwc: Apply ECRC workaround for DesignWare cores prior to 5.10a
|
|
- Do not dispose of the parent IRQ mapping, which belongs to the parent
interrupt controller (Mahesh Vaidya)
- Fix chained IRQ handler ordering issue and resource leaks on probe
failure (Mahesh Vaidya)
* pci/controller/altera:
PCI: altera: Fix resource leaks on probe failure
PCI: altera: Do not dispose parent IRQ mapping
|
|
- Request bus reassignment when not probe-only to fix an enumeration
regression on Marvell CN106XX and possibly other DT-based systems
(Ratheesh Kannoth)
* pci/controller/host-common:
PCI: host-common: Request bus reassignment when not probe-only
|
|
- Add endpoint controller APIs for use by function drivers to discover
auxiliary blocks like DMA engines (Koichiro Den)
- Remember DesignWare eDMA engine base/size and expose them via the EPC
aux-resource API (Koichiro Den)
- Refactor endpoint doorbell allocation to allow non-MSI doorbells
(Koichiro Den)
- Add endpoint embedded doorbell fallback, used if MSI allocation fails
(Koichiro Den)
- Validate BAR index and remove dead BAR read in endpoint doorbell test
(Carlos Bilbao)
- Unwind MSI/MSI-X vectors if NTB initialization fails part-way through
(Koichiro Den)
- Cache sleepable pci_irq_vector() value at ISR setup to avoid calling it
from hardirq context (Koichiro Den)
- Validate doorbell count when configuring NTB and vNTB doorbells
(Manivannan Sadhasivam)
- Call sleepable pci_epc_raise_irq() from a work item instead of atomic
context, e.g., when setting bits in NTB peer doorbells in the
ntb_peer_db_set() path (Koichiro Den)
- Report 0-based vNTB doorbell vector to account for link event 0 and
historically skipped slot 1 (Koichiro Den)
- Reject unusable vNTB doorbell counts, e.g., if they don't allow space for
link event 0 and historically skipped slot 1 (Koichiro Den)
- Prevent configfs writes to vNTB db_count and other values that are
already in use after EPC attach (Koichiro Den)
- Account for vNTB db_valid reserved slots (link event 0 and historically
skipped slot 1) so they don't appear as valid doorbells (Koichiro Den)
- Implement vNTB .db_vector_count()/mask() for doorbells so clients can use
multiple vectors and avoid thundering herds (Koichiro Den)
- Report 0-based NTB doorbell vector to account for link event 0 and
historically skipped slot 1 (Koichiro Den)
- Fix doorbell bitmask and IRQ vector handling to clear only specified
bits, use the correct vector for non-contiguous Linux IRQ numbers, and
validate incoming vectors (Koichiro Den)
- Implement NTB .db_vector_count()/mask() for doorbells so clients can use
multiple vectors (Koichiro Den)
* pci/endpoint:
NTB: epf: Implement .db_vector_count()/mask() for doorbells
NTB: epf: Fix doorbell bitmask and IRQ vector handling
NTB: epf: Report 0-based doorbell vector via ntb_db_event()
NTB: epf: Make db_valid_mask cover only real doorbell bits
NTB: epf: Document legacy doorbell slot offset in ntb_epf_peer_db_set()
PCI: endpoint: pci-epf-vntb: Implement .db_vector_count()/mask() for doorbells
PCI: endpoint: pci-epf-vntb: Exclude reserved slots from db_valid_mask
PCI: endpoint: pci-epf-vntb: Guard configfs writes after EPC attach
PCI: endpoint: pci-epf-vntb: Reject unusable doorbell counts
PCI: endpoint: pci-epf-vntb: Report 0-based doorbell vector via ntb_db_event()
PCI: endpoint: pci-epf-vntb: Defer pci_epc_raise_irq() out of atomic context
PCI: endpoint: pci-epf-vntb: Document legacy MSI doorbell offset
PCI: endpoint: pci-epf-ntb: Add check to detect 'db_count' value of 0
PCI: endpoint: pci-epf-vntb: Add check to detect 'db_count' value of 0
NTB: epf: Avoid calling pci_irq_vector() from hardirq context
NTB: epf: Fix request_irq() unwind in ntb_epf_init_isr()
misc: pci_endpoint_test: Remove dead BAR read before doorbell trigger
misc: pci_endpoint_test: Validate BAR index in doorbell test
PCI: endpoint: pci-ep-msi: Add embedded doorbell fallback
PCI: endpoint: pci-epf-test: Reuse pre-exposed doorbell targets
PCI: endpoint: pci-epf-vntb: Reuse pre-exposed doorbells and IRQ flags
PCI: endpoint: pci-ep-msi: Refactor doorbell allocation for new backends
PCI: dwc: ep: Expose integrated eDMA resources via EPC aux-resource API
PCI: dwc: Record integrated eDMA register window
PCI: endpoint: Add auxiliary resource query API
|
|
- Add Gen6 Device IDs to the switchtec driver (Ben Reed)
* pci/switchtec:
PCI: switchtec: Add Gen6 Device IDs
|
|
- Avoid FLR for MediaTek MT7925 WiFi, where FLR fails after a VM terminates
uncleanly (Jose Ignacio Tornos Martinez)
- Avoid SBR for Qualcomm WCN6855/WCN7850 WiFi, SDX62/SDX65 modems, which
seem not to support it correctly (Jose Ignacio Tornos Martinez)
* pci/virtualization:
PCI: Avoid SBR for Qualcomm WCN6855/WCN7850 WiFi, SDX62/SDX65 modems
PCI: Avoid FLR for MediaTek MT7925 WiFi
|
|
- Require CAP_SYS_ADMIN to write to sysfs 'resourceN_resize' attributes
(Krzysztof Wilczyński)
- Convert PCI resource files to static attributes to avoid races that cause
'duplicate filename' warnings and boot panics (Krzysztof Wilczyński)
- Remove pci_create_sysfs_dev_files() and pci_remove_sysfs_dev_files(),
which are obsolete after converting to static attributes (Krzysztof
Wilczyński)
- Add security_locked_down(LOCKDOWN_PCI_ACCESS) to alpha PCI resource mmap
path to match the generic path (Krzysztof Wilczyński)
- Convert sysfs 'legacy_io' and 'legacy_mem' to static attributes
(Krzysztof Wilczyński)
- Remove pci_create_legacy_files() and pci_sysfs_init(), which are obsolete
after converting to static attributes (Krzysztof Wilczyński)
- Expose sysfs 'resourceN_resize' attributes only on platforms with PCI
mmap (Krzysztof Wilczyński)
- Use kstrtobool() to parse the 'rom' attribute input to avoid the
unexpected behavior of enabling the ROM when writing '0' with no trailing
newline (Krzysztof Wilczyński)
* pci/sysfs:
PCI/sysfs: Use kstrtobool() to parse the ROM attribute input
PCI/sysfs: Limit BAR resize attribute scope to platforms with PCI mmap
PCI/sysfs: Remove pci_create_legacy_files() and pci_sysfs_init()
PCI/sysfs: Convert legacy I/O and memory attributes to static definitions
PCI/sysfs: Add __weak pci_legacy_has_sparse() helper
alpha/PCI: Compute legacy size in pci_mmap_legacy_page_range()
PCI: Add macros for legacy I/O and memory address space sizes
PCI/sysfs: Remove pci_{create,remove}_sysfs_dev_files()
alpha/PCI: Convert resource files to static attributes
alpha/PCI: Add static PCI resource attribute macros
alpha/PCI: Remove WARN from __pci_mmap_fits() and __legacy_mmap_fits()
alpha/PCI: Fix __pci_mmap_fits() overflow for zero-length BARs
alpha/PCI: Use PCI resource accessor macros
alpha/PCI: Use BAR index in sysfs attr->private instead of resource pointer
alpha/PCI: Add security_locked_down() check to pci_mmap_resource()
PCI/sysfs: Limit pci_sysfs_init() late_initcall compile scope
PCI/sysfs: Add stubs for pci_{create,remove}_sysfs_dev_files()
PCI/sysfs: Warn about BAR resize failure in __resource_resize_store()
PCI/sysfs: Convert PCI resource files to static attributes
PCI/sysfs: Add static PCI resource attribute macros
PCI/sysfs: Add CAP_SYS_ADMIN check to __resource_resize_store()
PCI/sysfs: Split pci_llseek_resource() for device and legacy attributes
PCI/sysfs: Only allow supported resource types in I/O and MMIO helpers
PCI: Add pci_resource_is_io() and pci_resource_is_mem() helpers
PCI/sysfs: Use PCI resource accessor macros
|
|
- Check option ROM header signatures and lengths before accessing to avoid
page faults and alignment faults (Guixin Liu)
* pci/rom:
PCI: Check ROM header and data structure addr before accessing
PCI: Introduce named defines for PCI ROM
|
|
- Improve resource claim logging for debuggability (Ilpo Järvinen)
- Rename 'added' to 'add_list' for naming consistency (Ilpo Järvinen)
- Consolidate 'add_list' sanity checks (Ilpo Järvinen)
- Clean up several uses of const parameters (Ilpo Järvinen)
- Move pci_resource_alignment() from header to setup-res.c file (Ilpo
Järvinen)
* pci/resource:
PCI: Move pci_resource_alignment() to setup-res.c file
PCI: Convert pci_resource_alignment() input parameters to const
PCI: Make pci_sriov_resource_alignment() pci_dev const
powerpc/pseries: Make pseries_get_iov_fw_value() & pnv_iov_get() pci_dev const
resource: Make resource_alignment() input const resource
PCI: Remove const removal cast
PCI: Consolidate add_list (aka realloc_head) empty sanity checks
PCI: Rename 'added' to 'add_list'
PCI: Log all resource claims
|
|
- Log device readiness timeouts as errors, not warnings (Bjorn Helgaas)
- Wait for device readiness after soft reset (D3hot -> D0uninitialized
transition), when the device may respond with Request Retry Status if it
needs more time to initialize (Bjorn Helgaas)
- Drop unnecessary retries when restoring BARs (Lukas Wunner)
* pci/reset:
PCI: Drop unnecessary retries when restoring BARs
PCI: Wait for device readiness after D3hot -> D0uninitialized transition
PCI: Log device readiness timeouts as errors
|
|
- Don't try to power on/off devices unless we know they actually support
power control (Manivannan Sadhasivam)
* pci/pwrctrl:
PCI/pwrctrl: Lock device when calling device_is_bound()
PCI/pwrctrl: Do not try to power on/off devices that don't need pwrctrl
PCI/pwrctrl: Move pci_pwrctrl_is_required() earlier in file
|
|
- Fix race between pci_proc_init() and pci_bus_add_device() (Krzysztof
Wilczyński)
* pci/procfs:
PCI/proc: Fix race between pci_proc_init() and pci_bus_add_device()
|
|
- Set power state to 'unknown' for all devices, not just those with
drivers, during suspend (Lukas Wunner)
- Skip restoring Resizable BARs and VF Resizable BARs if device doesn't
respond to config reads, to avoid invalid array accesses (Marco
Nenciarini)
- Add pci_suspend_retains_context() so drivers can tell whether devices may
be reset while resuming from suspend due to platform issues; use this in
nvme to avoid issues on Qcom RCs (Manivannan Sadhasivam)
* pci/pm:
nvme-pci: Use pci_suspend_retains_context() during suspend
PCI: qcom: Indicate broken L1SS exit during resume from system suspend
PCI: Indicate context lost if L1SS exit is broken during resume from system suspend
PCI: Add pci_suspend_retains_context() to check if device state is preserved during suspend
PCI/IOV: Skip VF Resizable BAR restore on read error
PCI: Skip Resizable BAR restore on read error
PCI: Stop setting cached power state to 'unknown' on unbind
|
|
- Prevent P2PDMA as well as CPU access to non-mappable BARs, e.g., s390 ISM
BARs (Matt Evans)
- Add Intel QAT, DSA, IAA devices to whitelist (Lukas Wunner)
* pci/p2pdma:
PCI/P2PDMA: Add Intel QAT, DSA, IAA devices to whitelist
PCI/P2PDMA: Avoid returning a provider for non_mappable_bars
|
|
- Remove MPS/MRRS Kconfig settings (CONFIG_PCIE_BUS_*) that worked around a
WiFi device defect (Bjorn Helgaas)
- Always lift 2.5GT/s restriction in PCIe failed link retraining to avoid
clamping a link to 2.5GT/s after hot-plug changes the device (Maciej W.
Rozycki)
- Don't bother trying to retrain a 2.5GT/s link at 2.5GT/s since nothing
would be gained by the retrain (Maciej W. Rozycki)
* pci/enumeration:
PCI: Bail out early for 2.5GT/s devices in PCIe failed link retraining
PCI: Use pcie_get_speed_cap() in PCIe failed link retraining
PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
PCI: Remove MPS/MRRS Kconfig settings (CONFIG_PCIE_BUS_*)
|
|
pci_write_rom() controls access to the ROM content through the
corresponding sysfs attribute, and treats the input as a request to
disable only when it matches the string "0\n" exactly:
if ((off == 0) && (*buf == '0') && (count == 2))
The count == 2 condition encodes the trailing newline that echo(1) appends.
This was found when userspace wrote "0" without a trailing newline aiming
to disable access, which failed to match the condition above and enabled
access instead. For example:
$ echo 0 > rom # "0\n", count 2, access disabled
$ echo -n 0 > rom # "0", count 1, access enabled
$ echo > rom # "", count 1, access enabled (likely not desirable)
Parse the input with kstrtobool(), which handles common boolean inputs such
as "0", "1", "n", "y" or "off", "on", with or without a trailing newline,
so both of the above disable access, and update the now stale comment.
As a side effect, input that does not parse as a boolean is rejected with
-EINVAL rather than enabling access. The documented "0" and "1" continue
to work as before, and rejecting malformed input brings the attribute in
line with how sysfs attributes typically handle it.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260612182448.552406-1-kwilczynski@kernel.org
|
|
Currently, __resource_resize_store() uses sysfs_remove_groups() and
sysfs_create_groups() on pci_dev_resource_attr_groups to tear down and
recreate the resourceN files after a BAR resize, so the updated BAR sizes
are visible in sysfs.
The resourceN files only exist on platforms that define HAVE_PCI_MMAP or
ARCH_GENERIC_PCI_MMAP_RESOURCE. On platforms that define neither,
pci_dev_resource_attr_groups is NULL and the sysfs_remove_groups() and
sysfs_create_groups() calls in __resource_resize_store() become no-ops.
Resizable BAR (ReBAR) is a PCI Express Extended Capability
(PCI_EXT_CAP_ID_REBAR) that requires PCIe extended config space. Every
PCIe-capable architecture defines HAVE_PCI_MMAP or
ARCH_GENERIC_PCI_MMAP_RESOURCE (via arch headers or the asm-generic/pci.h
fallback). Architectures without either only support conventional PCI and
cannot have any ReBAR-capable devices.
Move the resize show and store helpers, the per-BAR attribute definitions,
and the attribute group behind the existing #ifdef HAVE_PCI_MMAP ||
ARCH_GENERIC_PCI_MMAP_RESOURCE guard, and fold the group reference in
pci_dev_groups[] into the existing #if block.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-25-kwilczynski@kernel.org
|
|
Currently, pci_create_legacy_files() and pci_remove_legacy_files() are
no-op stubs. With legacy attributes now handled by static groups
registered via pcibus_groups[], no call site needs them.
Remove both functions, their declarations, and the call sites in
pci_register_host_bridge(), pci_alloc_child_bus(), and pci_remove_bus().
Remove the pci_sysfs_init() late_initcall and sysfs_initialized. The
late_initcall originally existed to create all the dynamic PCI sysfs files,
but with both resource and legacy attributes now handled by static groups,
it is no longer needed.
Remove the legacy_io and legacy_mem fields from struct pci_bus which were
used to track the dynamically allocated legacy attributes.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-24-kwilczynski@kernel.org
|
|
Currently, legacy_io and legacy_mem are dynamically allocated and created
by pci_create_legacy_files(), with pci_adjust_legacy_attr() updating the
attributes at runtime on Alpha to rename them and shift the size for sparse
addressing.
Convert to four static const attributes (legacy_io, legacy_io_sparse,
legacy_mem, legacy_mem_sparse) with .is_bin_visible() callbacks that use
pci_legacy_has_sparse() to select the appropriate variant per bus. The
sizes are compile-time constants and .size is set directly on each
attribute.
Register the groups in pcibus_groups[] under a HAVE_PCI_LEGACY guard so the
driver model handles creation and removal automatically.
Stub out pci_create_legacy_files() and pci_remove_legacy_files() as the
dynamic creation is no longer needed. Remove the __weak
pci_adjust_legacy_attr(), Alpha's override, and its declaration from both
Alpha and PowerPC asm/pci.h headers.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-23-kwilczynski@kernel.org
|
|
Currently, Alpha's sparse/dense legacy attribute handling is done via
pci_adjust_legacy_attr(), which updates dynamically allocated attributes at
runtime. The upcoming conversion to static attributes needs a way to
determine sparse support at visibility check time.
Add a __weak pci_legacy_has_sparse() that returns false by default. Alpha
overrides it to check has_sparse() on the bus host controller.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-22-kwilczynski@kernel.org
|
|
Add defines for the standard PCI legacy address space sizes, replacing the
raw literals used by the legacy sysfs attributes.
Then, replace open-coded values with the newly added macros.
No functional changes intended.
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-20-kwilczynski@kernel.org
|
|
Currently, pci_create_sysfs_dev_files() and pci_remove_sysfs_dev_files()
are no-op stubs. With both the generic and Alpha resource files now
handled by static attribute groups, no platform needs dynamic per-device
sysfs file creation.
Remove both functions, their declarations, and the call sites in
pci_bus_add_device() and pci_stop_dev().
Remove __weak pci_create_resource_files() and pci_remove_resource_files()
stubs and their declarations in pci.h, as no architecture overrides them
anymore.
Remove the res_attr[] and res_attr_wc[] fields from struct pci_dev which
were used to track dynamically allocated resource attributes.
Finally, simplify pci_sysfs_init() to only handle legacy file creation
under HAVE_PCI_LEGACY, removing the per-device loop and the
HAVE_PCI_SYSFS_INIT helper added earlier.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-19-kwilczynski@kernel.org
|
|
Currently, pci_sysfs_init() and sysfs_initialized compile unconditionally,
even on platforms where static attribute groups handle all resource file
creation.
Place them behind a new HAVE_PCI_SYSFS_INIT macro, especially as the
late_initcall is only needed when:
- HAVE_PCI_LEGACY is set, to iterate buses and create legacy I/O and
memory files.
- Neither HAVE_PCI_MMAP nor ARCH_GENERIC_PCI_MMAP_RESOURCE is set, to
iterate devices and create resource files via the __weak
pci_create_resource_files() stub override (this is how the Alpha
architecture handles this currently).
On most systems both conditions are false and the entire late_initcall
compiles away.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-11-kwilczynski@kernel.org
|
|
On platforms with HAVE_PCI_MMAP or ARCH_GENERIC_PCI_MMAP_RESOURCE, resource
files are now handled by static attribute groups registered via
pci_dev_groups[].
Stub out the pci_create_sysfs_dev_files() and pci_remove_sysfs_dev_files(),
as the dynamic resource file creation is no longer needed.
Also, simplify pci_sysfs_init() on these platforms to only iterate buses
for legacy attributes creation, skipping the per-device loop.
Move the __weak stubs for pci_create_resource_files() and
pci_remove_resource_files() into the #else branch since only platforms
without HAVE_PCI_MMAP (such as Alpha architecture) still need them. Guard
the res_attr[] and res_attr_wc[] fields in struct pci_dev the same way.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-10-kwilczynski@kernel.org
|
|
Add a pci_warn() to __resource_resize_store(), so that BAR resize failures
are visible to the user, which can help troubleshoot any potential resource
resize issues.
While at it, rename the resource_resize_is_visible() to
resource_resize_attr_is_visible() along with the corresponding group
variable to align with the naming convention used by the resource attribute
groups.
Also, change the order of pci_dev_groups[] such that the resize group is
now located alongside the other resource groups.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-9-kwilczynski@kernel.org
|
|
Currently, the PCI resource files (resourceN, resourceN_wc) are dynamically
created by pci_create_sysfs_dev_files(), called from both
pci_bus_add_device() and the pci_sysfs_init() late_initcall, with only a
sysfs_initialized flag for synchronisation. This has caused warnings and
boot panics when both paths race on the same device, e.g.:
sysfs: cannot create duplicate filename '/devices/pci0000:3c/0000:3c:01.0/0000:3e:00.2/resource2'
This is especially likely on Devicetree-based platforms, where the PCI host
controllers are platform drivers that probe via the driver model, which can
happen during or after the late_initcall. As such, pci_bus_add_device()
and pci_sysfs_init() are more likely to overlap.
Convert to static const attributes with three attribute groups (I/O, UC,
WC), each with an .is_bin_visible() callback that checks resource flags,
BAR length, and non_mappable_bars. A .bin_size() callback provides
pci_resource_len() to the kernfs node for correct stat and lseek behaviour.
As part of this conversion:
- Rename pci_read_resource_io() and pci_write_resource_io() to
pci_read_resource() and pci_write_resource() since the callbacks are no
longer I/O-specific in the static attribute context.
- Update __resource_resize_store() to use sysfs_create_groups() and
sysfs_remove_groups(), which re-evaluates visibility and runs the
.bin_size() callback for the static resource attribute groups.
- Remove pci_create_resource_files(), pci_remove_resource_files(), and
pci_create_attr() which are no longer needed.
- Move the __weak stubs outside the #if guard so they remain available
for callers converted in subsequent commits.
Platforms that do not define the HAVE_PCI_MMAP macro or the
ARCH_GENERIC_PCI_MMAP_RESOURCE macro, such as Alpha architecture,
continue using their platform-specific resource file creation.
For reference, the dynamic creation dates back to the pre-Git era:
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/drivers/pci/pci-sysfs.c?id=42298be0eeb5ae98453b3374c36161b05a46c5dc
The write-combine support was added in commit 45aec1ae72fc ("x86: PAT
export resource_wc in pci sysfs").
Many other reports mentioned in the cover letter (first Link: below).
Link: https://lore.kernel.org/r/20260508043543.217179-1-kwilczynski@kernel.org/
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215515
Closes: https://github.com/openwrt/openwrt/issues/17143
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-8-kwilczynski@kernel.org
|
|
pci_proc_attach_device() creates procfs entries for PCI devices and is
called from pci_bus_add_device(). It lazily creates the per-bus procfs
directory (bus->procdir) via proc_mkdir() on first use, and returns early
if proc_initialized is not yet set.
On x86 with ACPI, PCI enumeration occurs at subsys_initcall, before
pci_proc_init() sets proc_initialized at device_initcall. The
for_each_pci_dev() loop in pci_proc_init() then creates procfs entries for
these already-enumerated devices, but runs without holding
pci_rescan_remove_lock.
On ARM64 with devicetree, PCI host bridges probe at device_initcall. With
async probing enabled, pci_bus_add_device() can run concurrently with
pci_proc_init(), and both may call pci_proc_attach_device() for the same
device or for different devices on the same bus. As pci_host_probe() holds
pci_rescan_remove_lock while pci_proc_init() does not, there is no
serialisation between the two paths.
When two threads concurrently call pci_proc_attach_device() for devices on
the same bus, both observe bus->procdir as NULL and both call proc_mkdir().
The proc filesystem serialises directory creation internally, so only one
caller succeeds. The other results in a warning like:
proc_dir_entry '000c:00/00.0' already registered
The caller receives NULL (duplicate entry) and unconditionally stores it to
bus->procdir, corrupting the valid pointer set by the first caller.
Serialise access to proc_initialized, proc_bus_pci_dir, bus->procdir and
dev->procent with a new mutex local to drivers/pci/proc.c, and store the
created entries to bus->procdir and dev->procent only on success, so a
failed creation can never overwrite a valid pointer.
Additionally, wrap the for_each_pci_dev() loop in pci_proc_init() with
pci_lock_rescan_remove() to serialise against concurrent PCI bus
operations, add an early return in pci_proc_attach_device() when
dev->procent is already set to make the function idempotent, and clear
bus->procdir in pci_proc_detach_bus() to prevent use of a dangling pointer
after proc_remove().
Reported-by: Shuan He <heshuan@bytedance.com>
Closes: https://lore.kernel.org/linux-pci/20250702155112.40124-2-heshuan@bytedance.com/
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20260611150543.511422-1-kwilczynski@kernel.org
|
|
Replace the unconditional msleep(100) with the common helper
pci_host_common_link_train_delay(). The helper only waits when
max_link_speed > 2, as required by PCIe r6.0 sec 6.6.1.
This avoids unnecessary delay for Gen1/Gen2 links while retaining
the mandatory 100 ms for higher speeds.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260518004246.1384532-8-18255117159@163.com
|
|
The MediaTek Gen3 PCIe host driver lacks the required 100 ms delay after
link training completes for speeds > 5.0 GT/s, as specified in PCIe r6.0
sec 6.6.1.
The driver already stores max_link_speed (from the device tree). After
mtk_pcie_startup_port() successfully brings up the link, call
pci_host_common_link_train_delay() to comply with the specification.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260518004246.1384532-7-18255117159@163.com
|
|
The Aardvark PCIe controller driver waits for the link to come up but
does not implement the mandatory 100 ms delay after link training
completes for speeds greater than 5.0 GT/s (PCIe r6.0 sec 6.6.1).
The driver already maintains a 'link_gen' field that holds the negotiated
link speed. Use it together with pci_host_common_link_train_delay() to
insert the required delay immediately after confirming that the link
is up.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260518004246.1384532-6-18255117159@163.com
|
|
The DWC driver already implements the 100 ms delay required by PCIe
r6.0 sec 6.6.1 by checking pci->max_link_speed and calling msleep(100).
Replace the open-coded msleep() with the new common helper
pci_host_common_link_train_delay() to reduce code duplication and
improve maintainability. No functional change intended.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260518004246.1384532-5-18255117159@163.com
|
|
The Cadence HPA (High Performance Architecture IP) specific link setup
function cdns_pcie_hpa_host_link_setup() waits for the link to come up
but does not implement the required 100 ms delay after link training
completes for speeds > 5.0 GT/s (PCIe r6.0 sec 6.6.1).
Add a call to pci_host_common_link_train_delay() immediately after the
link is confirmed to be up, using the max_link_speed field. Also, in the
HPA host setup function, read the device tree property "max-link-speed"
to initialize max_link_speed if not already set by a glue driver.
This ensures compliance for HPA-based platforms.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: driver tag "cadence: HPA:" -> "cadence-hpa:"]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260518004246.1384532-4-18255117159@163.com
|
|
pci_resource_alignment() is a bit on the complex side to have in a header
so put it into setup-res.c.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429122617.7324-10-ilpo.jarvinen@linux.intel.com
|
|
pci_resource_alignment() calculates resource alignment and should not alter
its input structs. Make its input parameters const.
It requires making also pci_cardbus_resource_alignment() input const.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429122617.7324-9-ilpo.jarvinen@linux.intel.com
|
|
pci_sriov_resource_alignment() inputs struct pci_dev which it should not
need to alter to calculate alignment.
Make pci_dev pci_sriov_resource_alignment() inputs const. It requires
making pci_iov_resource_size() input const as well.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429122617.7324-8-ilpo.jarvinen@linux.intel.com
|
|
__pci_bridge_assign_resources() inputs const pci_dev *bridge, but then
immediately casts const away to pass the bridge to
pdev_assign_resources_sorted().
As pdev_assign_resources_sorted() performs assignment of resources, it
is not possible to make its input parameter to const. Neither of the
__pci_bridge_assign_resources() callers requires the bridge parameter
to be const.
Thus, simply remove the out of place cast and convert the input parameter
to non-const.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429122617.7324-5-ilpo.jarvinen@linux.intel.com
|
|
Callers of __pci_bridge_assign_resources() and __pci_bus_assign_resources()
perform WARN_ON_ONCE(list_empty(add_list))) checks to sanity check that all
optional sizes were processed (and removed) from the list. The empty list
sanity check is duplicated code so the more appropriate place for it would
be inside the called function.
Placing the empty list check into __pci_bus_assign_resources() also ensures
all callsites do perform the sanity check which currently is not the case
when being called from enable_slot(). This inconsistency was noted by
Sashiko though only inside its in depth log but not flagged as a real
problem, possibly because this is only a sanity check that should never
fire. Nonetheless, this sanity check has been very useful to catch problems
early in the past so it's good to do it consistently everywhere.
As __pci_bus_assign_resources() is a recursive function, it needs to be
renamed to __pci_bus_assign_resources_one() to only perform the empty list
check at the end of processing the entire hierarchy in
__pci_bus_assign_resources().
Suggested-by: sashiko.dev # Sanity check missing from enable_slot()
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429122617.7324-4-ilpo.jarvinen@linux.intel.com
|
|
The resource fitting algorithm uses different names from the list holding
the optional sizes: added, add_head, add_list, and realloc_head. 'add_list'
sounds the most natural and some of the related variables also use 'add'
such as 'add_size'.
To reduce variation, rename 'added' and 'add_head' to 'add_list'. Also
rename some 'realloc_head' cases selectively to 'add_list'.
While it would be nice to rename every 'realloc_head' to 'add_list' for
consistency, it might create a backport headache with all the work going
into this algorithm that may need to be eventually backported. Thus, it's
better to leave 'realloc_head' as is for now.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429122617.7324-3-ilpo.jarvinen@linux.intel.com
|
|
Implement .db_vector_count() and .db_vector_mask() so NTB core/clients can
map doorbell events to per-vector work and avoid the thundering-herd
behavior.
pci-epf-vntb reserves two slots in db_count: slot 0 for link events and
slot 1 which is historically unused. Therefore the number of doorbell
vectors is (db_count - 2).
Report vectors as 0..N-1 and return BIT_ULL(db_vector) for the
corresponding doorbell bit. Build db_valid_mask from a validated vector
count so out-of-range db_count values cannot create invalid shifts.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260513024923.451765-8-den@valinux.co.jp
|
|
In pci-epf-vntb, db_count represents the total number of doorbell slots
exposed to the peer, including:
- slot #0 reserved for link events, and
- slot #1 historically unused (kept for compatibility).
Only the remaining slots correspond to actual doorbell bits. The current
db_valid_mask() exposes all slots as valid doorbells.
Limit db_valid_mask() to the real doorbell bits by returning
BIT_ULL(db_count - 2) - 1, and guard against db_count < 2.
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260513024923.451765-7-den@valinux.co.jp
|
|
db_count controls how many doorbell slots are allocated and exposed. It is
also used by the doorbell mask helpers. After an EPC has been attached,
changing it from configfs can leave runtime paths using a different count
than the one used to set up the doorbell resources.
Reject db_count writes after EPC attach, and reject values outside
MIN_DB_COUNT..MAX_DB_COUNT before attach. Now that MIN_DB_COUNT documents
the usable doorbell floor, use it in the store path too.
While at it, apply the same after-attach guard to the other vNTB configfs
knobs. BAR choices, spad_count, memory-window counts and sizes, and the
virtual PCI IDs are also consumed during bind, so changing them later at
runtime is meaningless and unsafe.
Return -EOPNOTSUPP for after-attach writes. The value itself may be valid,
but changing it in that state is not supported.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260513024923.451765-6-den@valinux.co.jp
|
|
pci-epf-vntb reserves slot 0 for link events and keeps slot 1 unused for
legacy layout compatibility. A db_count smaller than MIN_DB_COUNT leaves
no usable doorbell slot after those reservations.
Reject such configurations when configuring interrupts.
While at it, move MAX_DB_COUNT next to MIN_DB_COUNT. They are used as a
pair in the range check, and keeping them together makes the valid doorbell
range easier to read.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260513024923.451765-5-den@valinux.co.jp
|
|
ntb_db_event() expects the vector number to be relative to the first
doorbell vector starting at 0.
pci-epf-vntb reserves vector 0 for link events and uses higher vector
indices for doorbells. By passing the raw slot index to ntb_db_event(),
it effectively assumes that doorbell 0 maps to vector 1.
However, because the host uses a legacy slot layout and writes doorbell
0 into the third slot, doorbell 0 ultimately appears as vector 2 from
the NTB core perspective.
Adjust pci-epf-vntb to:
- skip the unused second slot, and
- report doorbells as 0-based vectors (DB#0 -> vector 0).
This change does not introduce a behavioral difference until
.db_vector_count()/.db_vector_mask() are implemented, because without
those callbacks NTB clients effectively ignore the vector number.
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260513024923.451765-4-den@valinux.co.jp
|
|
The NTB .peer_db_set() callback may be invoked from atomic context.
pci-epf-vntb currently calls pci_epc_raise_irq() directly, but
pci_epc_raise_irq() may sleep (it takes epc->lock).
Avoid sleeping in atomic context by coalescing doorbell bits into an
atomic64 pending mask and raising MSIs from a work item. Limit the
amount of work per run to avoid monopolizing the workqueue under a
doorbell storm.
Clear stale pending bits before enabling the work item and after disabling
it during cleanup. Also mask requested doorbells against the currently
valid doorbell mask before queueing work, and iterate the pending u64 with
__ffs64() so high doorbell bits are handled correctly.
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260513024923.451765-3-den@valinux.co.jp
|
|
vntb_epf_peer_db_set() raises an MSI interrupt to notify the RC side of
a doorbell event. pci_epc_raise_irq(..., PCI_IRQ_MSI, interrupt_num)
takes a 1-based MSI interrupt number.
The ntb_hw_epf driver reserves MSI #1 for link events, so doorbells
would naturally start at MSI #2 (doorbell bit 0 -> MSI #2). However,
pci-epf-vntb has historically applied an extra offset and mapped doorbell
bit 0 to MSI #3. This matches the legacy behavior of ntb_hw_epf and has
been preserved since commit e35f56bb0330 ("PCI: endpoint: Support NTB
transfer between RC and EP").
This offset has not surfaced as a functional issue because:
- ntb_hw_epf typically allocates enough MSI vectors, so the off-by-one
still hits a valid MSI vector, and
- ntb_hw_epf does not implement .db_vector_count()/.db_vector_mask(), so
client drivers such as ntb_transport effectively ignore the vector
number and schedule all QPs.
Correcting the MSI number would break interoperability with peers
running older kernels.
Document the legacy offset to avoid confusion when enabling
per-db-vector handling.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260513024923.451765-2-den@valinux.co.jp
|
|
epf_ntb->db_count value should be within 1 to MAX_DB_COUNT. Current code
only checks for the upper bound, while the lower bound is unchecked. This
can cause a lot of issues in the driver if the user passes 'db_count' as 0.
Add a check for 0 also. While at it, remove the redundant 'db_count'
variable from epf_ntb_configure_interrupt().
Fixes: 8b821cf76150 ("PCI: endpoint: Add EP function driver to provide NTB functionality")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260407124421.282766-3-mani@kernel.org
|
|
epf_ntb->db_count value should be within 1 to MAX_DB_COUNT. Current code
only checks for the upper bound, while the lower bound is unchecked. This
can cause a lot of issues in the driver if the user passes 'db_count' as 0.
Add a check for 0 also. While at it, remove the redundant 'db_count'
assignment.
Fixes: e35f56bb0330 ("PCI: endpoint: Support NTB transfer between RC and EP")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Koichiro Den <den@valinux.co.jp>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260407124421.282766-2-mani@kernel.org
|
|
pci_host_common_init() is used by several generic ECAM host drivers.
After PCI core changes around pci_flags and preserve_config, these hosts
no longer opted into full bus number reassignment the way they did
before, which broke enumeration of devices on a Marvell CN106XX board.
When PCI_PROBE_ONLY is not set, add PCI_REASSIGN_ALL_BUS so
pci_scan_bridge_extend() takes the reassignment path: bus numbers can be
assigned from firmware EA data (e.g. pci_ea_fixed_busnrs()). Skip the
flag in probe-only mode so existing assignments are not overridden.
Fixes: 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags")
Closes: https://lore.kernel.org/all/abkqm_LCd9zAM8cW@rkannoth-OptiPlex-7090/
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
[mani: added stable tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: add problem report link]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Cc: Vidya Sagar <vidyas@nvidia.com>
Link: https://patch.msgid.link/20260414081730.3864372-1-rkannoth@marvell.com
|
|
Hold the pci_rescan_remove_lock lock while stopping and removing a root bus
to avoid racing with concurrent rescan or hotplug operations triggered via
sysfs. Such races may lead to use-after-free issues or system crashes.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260521161822.132996-8-18255117159@163.com
|
|
Hold the pci_rescan_remove_lock lock while stopping and removing a root bus
to avoid racing with concurrent rescan or hotplug operations triggered via
sysfs. Such races may lead to use-after-free issues or system crashes.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260521161822.132996-10-18255117159@163.com
|
|
Hold the pci_rescan_remove_lock lock while stopping and removing a root bus
to avoid racing with concurrent rescan or hotplug operations triggered via
sysfs. Such races may lead to use-after-free issues or system crashes.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260521161822.132996-7-18255117159@163.com
|
|
Hold the pci_rescan_remove_lock lock while stopping and removing a root bus
to avoid racing with concurrent rescan or hotplug operations triggered via
sysfs. Such races may lead to use-after-free issues or system crashes.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260521161822.132996-6-18255117159@163.com
|
|
Hold the pci_rescan_remove_lock lock while stopping and removing a root bus
to avoid racing with concurrent rescan or hotplug operations triggered via
sysfs. Such races may lead to use-after-free issues or system crashes.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260521161822.132996-3-18255117159@163.com
|
|
Hold the pci_rescan_remove_lock lock while stopping and removing a root bus
to avoid racing with concurrent rescan or hotplug operations triggered via
sysfs. Such races may lead to use-after-free issues or system crashes.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260521161822.132996-2-18255117159@163.com
|
|
Hold the pci_rescan_remove_lock lock while stopping and removing a root bus
to avoid racing with concurrent rescan or hotplug operations triggered via
sysfs. Such races may lead to use-after-free issues or system crashes.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260521161822.132996-5-18255117159@163.com
|
|
Hold the pci_rescan_remove_lock lock while stopping and removing a root bus
to avoid racing with concurrent rescan or hotplug operations triggered via
sysfs. Such races may lead to use-after-free issues or system crashes.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260521161822.132996-4-18255117159@163.com
|
|
Commit b64aa11eb2dd ("PCI: Set bridge map_irq and swizzle_irq to default
functions") moved the assignment of default .map_irq() callback to
devm_of_pci_bridge_init() and removed the initialization of
'iproc_pcie::map_irq' in platform bus driver. This led to the callback
getting assigned the NULL pointer for platform bus driver, thereby breaking
the INTx functionality, since 'iproc_pcie::map_irq' overrides the
'pci_host_bridge::map_irq' callback in iproc_pcie_setup().
This issue only affected the iproc platform bus driver as this driver
relies on the default callback for non-PAXC controllers. iproc-brcm driver
was already providing the custom mapping function, so it was unaffected.
Restore the original (and intended) behaviour to use the default map_irq
function by removing the local 'iproc_pcie::map_irq' pointer and directly
assigning the 'pci_host_bridge::map_irq' callback in iproc-bcma driver.
This ensures that the default 'map_irq' callback is used for platform bus
driver and only iproc-brcm driver overrides it with a custom one.
Fixes: b64aa11eb2dd ("PCI: Set bridge map_irq and swizzle_irq to default functions")
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Ray Jui <ray.jui@broadcom.com>
Link: https://patch.msgid.link/20260430021628.1343154-1-mark.tomlinson@alliedtelesis.co.nz
|
|
Use u32-typed BIT and GENMASK helpers for PCIe interrupt register
masks. This keeps inverted masks in the same width as the registers
and avoids truncation warnings on 64-bit compile-test builds.
Fixes below and similar warnings:
drivers/pci/controller/pci-mvebu.c:316:21: error: implicit conversion from 'unsigned long' to 'u32' (aka 'unsigned int') changes value from 18446744069414584320 to 0 [-Werror,-Wconstant-conversion]
mvebu_writel(port, ~PCIE_INT_ALL_MASK, PCIE_INT_UNMASK_OFF);
Assisted-by: Codex:GPT-5.5
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260526044016.1025613-1-rosenp@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Joerg Roedel:
"Core Code:
- Fix dma-iommu scatterlist length handling in the P2PDMA path
- Extend the generic IOMMU page-table code with detailed gather
support for more precise invalidations
- Add pending-gather tracking to generic page-table invalidation
handling
- Add support for smaller virtual address sizes in the generic AMDv1
page-table format, including KUnit coverage
- Fix page-size bitmap calculation for smaller VA configurations
- Rework Arm io-pgtable allocation/freeing to consistently use the
iommu-pages API and address-conversion helpers
- Add PCI ATS infrastructure for devices that require ATS, including
always-on ATS handling for pre-CXL devices
AMD IOMMU:
- Fix several IOTLB invalidation details, including PDE handling,
flush-all behavior, and command address encoding
- Honor IVINFO[VASIZE] when deriving address limits
- Fix premature loop termination in init_iommu_one()
- Add Hygon family 18h model 4h IOAPIC support
- Clean up legacy-mode handling, stale comments, dead IVMD
exclusion-range code, and unused address-size macros
Arm SMMU / Arm SMMU v3:
- SMMUv2:
- Device-tree binding updates for Qualcomm Hawi, Nord and Shikra
SoCs
- Constrain the clocks which can be specified for recent Qualcomm
SoCs
- Fix broken compatible string for Qualcomm prefetcher
configuration an add new entry for the Glymur MDSS
- Ensure SMMU is powered-up when writing context bank for Adreno
client
- SMMUv3:
- Fix off-by-one in queue allocation retry loop
- Enable hardware update of access/dirty bits from the SMMU
- Re-jig command construction to use separate inline helpers for
each command type
Intel VT-d:
- Add the PCI segment number to DMA fault messages
- Improve support for non-PRI mode SVA
- Ensure atomicity during context entry teardown
- Fix RB-tree corruption in the probe error path
RISC-V IOMMU:
- Add NAPOT range invalidation support
- Use detailed gather information for invalidation decisions
- Compute the best stride for single invalidations
- Advertise Svpbmt support to the generic page-table code
- Add capability definitions and clean up command macro encoding
VeriSilicon IOMMU:
- Add a new VeriSilicon IOMMU driver
- Add devicetree binding documentation and MAINTAINERS coverage
- Add the RK3588 VeriSilicon IOMMU node
- Apply small cleanups and warning fixes in the new driver
Rockchip IOMMU:
- Disable the fetch DTE time limit
Apple DART:
- Correct a stale CONFIG_PCIE_APPLE macro name in a comment"
* tag 'iommu-updates-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (66 commits)
iommu/dma-iommu: Fix wrong scatterlist length assignment in P2PDMA path
iommu/amd: Control INVALIDATE_IOMMU_PAGES PDE from the gather
iommu/amd: Make CMD_INV_IOMMU_ALL_PAGES_ADDRESS match the spec
iommu/amd: Have amd_iommu_domain_flush_pages() use last
iommu/amd: Pass last in through to build_inv_address()
iommu/amd: Simplify build_inv_address()
iommu/apple-dart: correct CONFIG_PCIE_APPLE macro name in comment
iommu/vt-d: Fix RB-tree corruption in probe error path
iommu/vt-d: Improve IOMMU fault information
iommu/vt-d: Remove typo from pasid_pte_config_nested()
iommu/vt-d: Clear Present bit before tearing down scalable-mode context entry
iommu/vt-d: Avoid WARNING in sva unbind path
dt-bindings: arm-smmu: Correct and add constraints for Hawi, Shikra and Kaanapali
dt-bindings: arm-smmu: Add compatible for Qualcomm Nord SoC
iommu/amd: Don't split flush for amd_iommu_domain_flush_all()
iommu/rockchip: disable fetch dte time limit
iommu/arm-smmu-v3: Allow ATS to be always on
PCI: Allow ATS to be always on for pre-CXL devices
PCI: Add pci_ats_required() for CXL.cache capable devices
iommu/vsi: Use list_for_each_entry()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DT core:
- Add support for handling multiple cells in "iommu-map" entries
- Support only 1 entry in /reserved-memory "reg" entries. Support for
more than 1 entry has been broken
- Fix a UAF on alloc_reserved_mem_array() failure
- Make "ibm,phandle" handling logic specific to PPC
- Use memcpy() instead of strcpy() for known length strings
- Ensure __of_find_n_match_cpu_property() handles malformed "reg"
entries
- Add various checks that expected strings are strings before
accessing them
- Drop redundant memset() when unflattening DT
DT bindings:
- Add a DTS style checker. Currently hooked up to dt_binding_check to
check examples
- Convert st,nomadik platform, ti,omap-dmm, and ti,irq-crossbar
bindings to DT schema
- Add Apple System Management Controller hwmon, Qualcomm Hamoa
Embedded Controller, Qualcomm IPQ6018 PWM controller, fsl,mc1323,
Samsung SOFEF01-M DDIC panel, Freescale i.MX53 Television Encoder,
Samsung S2M series PMIC extcon, and MT6365 PMIC AuxADC schemas
- Extend bindings for QCom Maili and Nord PDC, QCom Hali fastrpc,
qcom,eliza-imem, qcom,oryon-1-5 CPU, and MT6365 Keys
- Consolidate "sram" property definitions
- Fix constraints on "nvmem" properties which only contain phandles
and no arg cells
- Another pass of fixing "phandle-array" constraints
- Add Gira vendor prefix"
* tag 'devicetree-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (50 commits)
dt-bindings: interrupt-controller: qcom,pdc: Add Maili compatible string
dt-bindings: interrupt-controller: ti,irq-crossbar: Convert to DT schema
dt-bindings: vendor-prefixes: add Gira
dt-bindings: embedded-controller: Add Qualcomm reference device EC description
dt-bindings: pwm: add IPQ6018 binding
dt-bindings: hwmon: Add Apple System Management Controller hwmon schema
docs: dt: writing-schema: Clarify what is required in a schema
of: Respect #{iommu,msi}-cells in maps
of: Factor arguments passed to of_map_id() into a struct
of: Add convenience wrappers for of_map_id()
of: reserved_mem: zero total_reserved_mem_cnt if no valid /reserved-memory entry
of: reserved_mem: handle NULL name in of_reserved_mem_lookup()
dt-bindings: cache: l2c2x0: Add missing power-domains
dt-bindings: interrupt-controller: renesas,r9a09g077-icu: Fix reg size in example
dt-bindings: nvmem: consumer: Make 'nvmem' an array of one-item entries
drivers/of/overlay: Use memcpy() to copy known length strings
dt-bindings: add self-test fixtures for style checker
dt-bindings: wire style checker into dt_binding_check
scripts/jobserver-exec: propagate child exit status
dt-bindings: add DTS style checker
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
"There are a few added drivers, but mostly the normal maintenance to
drivers for firmware, memory controller and other soc specific
hardware:
- The NXP QuickEngine gets modern MSI support, which allows some
cleanups to the GICv3 irqchip chip driver
- A new SoC specific driver for the Renesas R-Car MFIS unit is added,
encapsulating support for the on-chip mailbox and hwspinlock
implementations that are not easily separated into individual
drivers
- The Qualcomm SoC drivers add support for additional SoC
implementations, and flexibility around power management for the
serial-engine driver as well as probing the LLCC driver using
custom hardware descriptions inside of the device itself.
- Added support for the Samsung thermal management unit
- A cleanup to the Tegra 'PMC' driver interfaces to remove legacy
APIs and allow multiple PMC instances everywhere.
- Updates to the TI SCI and KNAS drivers to improve suspend/resume
support.
- Minor driver changes for mediatek, xilinx, allwinner, aspeed,
tegra, broadcom, amd, microchip and starfive specific drivers
- Memory controller updates for Tegra and Renesas for additional SoC
types and other improvements.
- Firmware driver updates for Arm FF-A, SMCCC and SCMI interfaces, to
update driver probing, object lifetimes and address minor bugs"
* tag 'soc-drivers-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (189 commits)
Revert "firmware: zynqmp: Add dynamic CSU register discovery and sysfs interface"
Revert "Documentation: ABI: add sysfs interface for ZynqMP CSU registers"
memory: tegra234: drop dead NULL check in tegra234_mc_icc_aggregate()
memory: tegra264: drop redundant tegra264_mc_icc_aggregate()
memory: tegra186-emc: stop borrowing MC aggregate hook for EMC
soc: aspeed: cleanup dead default for ASPEED_SOCINFO
firmware: tegra: bpmp: Add support for multi-socket platforms
firmware: tegra: bpmp: Propagate debugfs errors
soc/tegra: pmc: Add Tegra238 support
soc/tegra: pmc: Restrict power-off handler to Nexus 7
soc/tegra: pmc: Populate powergate debugfs only when needed
soc/tegra: pmc: Move legacy code behind CONFIG_ARM guard
soc/tegra: pmc: Remove unused legacy functions
soc/tegra: pmc: Create PMC context dynamically
firmware: samsung: acpm: remove compile-testing stubs
firmware: samsung: acpm: Add devm_acpm_get_by_phandle helper
firmware: samsung: acpm: Add TMU protocol support
firmware: samsung: acpm: Make acpm_ops const and access via pointer
firmware: samsung: acpm: Drop redundant _ops suffix in acpm_ops members
firmware: samsung: acpm: Annotate rx_data->cmd with __counted_by_ptr
...
|
|
This reverts commit 1a8d4c6ecb4c81261bcdf13556abd4a958eca202.
Commit 1a8d4c6ecb4c ("PCI/MSI: Unmap MSI-X region on error") added an
iounmap(dev->msix_base) on the error path of msix_capability_init() to
release the MSI-X region when msix_setup_interrupts() fails.
When msix_setup_interrupts() fails, the call chain is:
msix_setup_interrupts()
-> __msix_setup_interrupts()
struct pci_dev *dev __free(free_msi_irqs) = __dev;
...
return ret; // __free cleanup fires on error
The __free(free_msi_irqs) cleanup calls pci_free_msi_irqs(), which
already handles the unmap:
void pci_free_msi_irqs(struct pci_dev *dev)
{
pci_msi_teardown_msi_irqs(dev);
if (dev->msix_base) {
iounmap(dev->msix_base); // already unmapped here
dev->msix_base = NULL; // and set to NULL
}
}
So dev->msix_base is unmapped and set to NULL before
msix_setup_interrupts() returns to msix_capability_init(). The
"goto out_unmap" introduced by commit 1a8d4c6ecb4c ("PCI/MSI: Unmap
MSI-X region on error") then calls iounmap() a second time on a NULL
pointer.
This was reproduced on Intel Emerald Rapids (192 CPUs) while
running tools/testing/selftests/kexec/test_kexec_jump.sh:
WARNING: CPU#44 at iounmap+0x2a/0xe0
RIP: 0010:iounmap+0x2a/0xe0
RDI: 0000000000000000
Call Trace:
msix_capability_init+0x317/0x3f0
__pci_enable_msix_range+0x21d/0x2c0
pci_alloc_irq_vectors_affinity+0xa9/0x130
nvme_setup_io_queues+0x2a8/0x420 [nvme]
nvme_reset_work+0x151/0x340 [nvme]
...
RDI=0 confirms iounmap() is called with NULL.
Restore the original "goto out_disable" and leave the unmap to the
existing __free(free_msi_irqs) cleanup.
Fixes: 1a8d4c6ecb4c ("PCI/MSI: Unmap MSI-X region on error")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Yuanhe Shu <xiangzao@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://lore.kernel.org/all/20260610194406.GA380991@bhelgaas/
Link: https://patch.msgid.link/20260611025901.1105209-1-xiangzao@linux.alibaba.com
Closes: https://lore.kernel.org/all/4fc6208d-513b-4f41-a13a-4a0829ab50ad@roeck-us.net/
|
|
Change of_map_id() to take a pointer to struct of_phandle_args
instead of passing target device node and translated IDs separately.
Update all callers accordingly.
Add an explicit filter_np parameter to of_map_id() and of_map_msi_id()
to separate the filter input from the output. Previously, the target
parameter served dual purpose: as an input filter (if non-NULL, only
match entries targeting that node) and as an output (receiving the
matched node with a reference held). Now filter_np is the explicit
input filter and arg->np is the pure output.
Previously, of_map_id() would call of_node_put() on the matched node
when a filter was provided, making reference ownership inconsistent.
Remove this internal of_node_put() call so that of_map_id() now always
transfers ownership of the matched node reference to the caller via
arg->np. Callers are now consistently responsible for releasing this
reference with of_node_put(arg->np) when done.
Acked-by: Frank Li <Frank.Li@nxp.com>
Suggested-by: Rob Herring (Arm) <robh@kernel.org>
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Charan Teja Kalla <charan.kalla@oss.qualcomm.com>
Signed-off-by: Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
Link: https://patch.msgid.link/20260603-parse_iommu_cells-v16-2-dc509dacb19a@oss.qualcomm.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
Since we now have quite a few users parsing "iommu-map" and "msi-map"
properties, give them some wrappers to conveniently encapsulate the
appropriate sets of property names. This will also make it easier to
then change of_map_id() to correctly account for specifier cells.
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vijayanand Jitta <vijayanand.jitta@oss.qualcomm.com>
Link: https://patch.msgid.link/20260603-parse_iommu_cells-v16-1-dc509dacb19a@oss.qualcomm.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
'rockchip', 'verisilicon', 'riscv', 'intel/vt-d', 'amd/amd-vi' and 'core' into next
|
|
RAS DES capability
dwc_pcie_rasdes_debugfs_init() returns success when the controller has no
RAS DES capability, leaving pci->debugfs->rasdes_info unset. The common
debugfs teardown path still calls dwc_pcie_rasdes_debugfs_deinit(), which
dereferences rasdes_info unconditionally.
Return early when no RAS DES state was allocated. In that case no RAS DES
mutex was initialized, so there is nothing to destroy.
Fixes: 4fbfa17f9a07 ("PCI: dwc: Add debugfs based Silicon Debug support for DWC")
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
[mani: reworded subject]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/0f97352506d8d813f70f441de4d63fcd5b7d1c3e.1779123847.git.shuvampandey1@gmail.com
|
|
The driver previously used virt_to_phys() on the ioremapped register base
(port->base) to compute the MSI message address. Using virt_to_phys() on an
IO mapped address is incorrect because it expects a kernel virtual address.
To fix it, store the physical start of the I/O register region in
mtk_pcie_port->phys_base and use it to build the MSI address. This replaces
the incorrect virt_to_phys() usage and ensures MSI addresses are generated
correctly.
Fixes: 43e6409db64d ("PCI: mediatek: Add MSI support for MT2712 and MT7622")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Caleb James DeLisle <cjd@cjdns.fr>
Link: https://patch.msgid.link/20260521171951.1495781-2-cjd@cjdns.fr
|
|
Some Qualcomm PCIe devices (WCN6855/WCN7850 WiFi cards, SDX62/SDX65 modems)
do not properly support Secondary Bus Reset (SBR).
Testing confirms this is device-specific, not deployment-specific:
MediaTek MT7925e successfully uses bus reset through the same passive
M.2-to-PCIe adapters where Qualcomm devices fail, proving PERST# is
properly wired through the adapters.
Prevent use of Secondary Bus Reset for these devices.
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/all/20260609163649.319755-4-jtornosm@redhat.com
|
|
Remove the unused LIST_HEAD(res) declaration from rcar_pcie_hw_enable().
The macro instantiation defines an unused 'struct list_head res' variable,
which conflicts with a valid resource loop-local 'struct resource *res'
declaration further down in the function, triggering a compiler variable
shadowing warning:
drivers/pci/controller/pcie-rcar-host.c:357:34: warning: declaration of 'res' shadows a previous local [-Wshadow]
357 | struct resource *res = win->res;
Fixes: ce351636c67f75a9 ("PCI: rcar: Add suspend/resume")
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Link: https://patch.msgid.link/20260521091256.15737-1-prabhakar.mahadev-lad.rj@bp.renesas.com
|
|
Integrate the PCI pwrctrl framework into the pci-imx6 driver to provide
standardized power management for PCI devices.
Legacy regulator handling (vpcie-supply at controller level) is maintained
for backward compatibility with existing device trees. New device trees
should specify power supplies at the Root Port level to utilize the pwrctrl
framework.
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260520084904.2424253-2-sherry.sun@oss.nxp.com
|
|
The first device on a PCI root bus determines whether the host bridge is
whitelisted for P2PDMA. All Intel Xeon chips since Ice Lake (ICX, 2021)
expose a device with ID 0x09a2 as first device. It is loosely associated
with the IOMMU. All these Xeon chips support P2PDMA, so since the addition
of the device with commit feaea1fe8b36 ("PCI/P2PDMA: Add Intel 3rd Gen
Intel Xeon Scalable Processors to whitelist"), P2PDMA has been allowed on
all new Xeons without the need to amend the whitelist:
Xeons with Performance Cores:
Sapphire Rapids (SPR, 2023)
Emerald Rapids (EMR, 2023)
Granite Rapids (GNR, 2024)
Diamond Rapids (DMR, 2026)
Xeons with Efficiency Cores:
Sierra Forest (SRF, 2024)
Clearwater Forest (CWF, 2026)
However these Xeons also expose accelerators as first device on a root bus
of its own:
QuickAssist Technology (QAT, crypto & compression accelerator)
Data Streaming Accelerator (DSA, dma engine)
In-Memory Analytics Accelerator (IAA, compression accelerator)
Whitelist them for P2PDMA as well. Move their Device ID macros from the
accelerator drivers to <linux/pci_ids.h> for reuse by P2PDMA code.
Unfortunately the Device IDs vary across Xeon generations as additional
features were added to the accelerators. This currently necessitates an
amendment for each new Xeon chip.
For future chips, this need shall be avoided by an ongoing effort to extend
ACPI HMAT with PCIe P2PDMA characteristics (latency, bandwidth, ordering
constraints). The PCI core will be able look up in this BIOS-provided ACPI
table whether P2PDMA is supported, instead of relying on a whitelist that
needs to be amended continuously.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> # QAT
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/6aac4922b5fe7070b11874427a9285e42ddd05a4.1780585518.git.lukas@wunner.de
|
|
When mtk_pcie_enable_port() fails, mtk_pcie_port_free() removes the port
from pcie->ports and frees the port structure. However, the IRQ domains set
up earlier by mtk_pcie_init_irq_domain() are never freed.
Fix this by refactoring mtk_pcie_irq_teardown() into a per-port helper,
mtk_pcie_irq_teardown_port(), and calling it from mtk_pcie_setup() when
mtk_pcie_enable_port() fails. Since the IRQ teardown must only happen in
the probe error path (during resume, child devices may have active MSI
mappings and the NOIRQ context prohibits sleeping locks),
mtk_pcie_enable_port() is changed to return an error code so callers can
distinguish the two paths and act accordingly.
This issue was reported by Sashiko while reviewing the EcoNet EN7528 SoC
support series.
Fixes: b099631df160 ("PCI: mediatek: Add controller support for MT2712 and MT7622")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org # 5.10
Cc: Caleb James DeLisle <cjd@cjdns.fr>
Link: https://patch.msgid.link/20260521174617.17692-1-mani@kernel.org
|
|
meson_pcie_probe() powers on the PHY and registers the DesignWare host
bridge with dw_pcie_host_init(), but the driver has no remove callback.
On driver unbind or module unload, the driver core therefore proceeds to
devres cleanup without first unregistering the host bridge or powering off
the PHY.
Add a remove callback that deinitializes the DesignWare host bridge and
powers off the PHY while device-managed resources are still valid.
Fixes: 9c0ef6d34fdb ("PCI: amlogic: Add the Amlogic Meson PCIe controller driver")
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/1a0c86ab264cdc1c79c917e984b90991af51d827.1779123847.git.shuvampandey1@gmail.com
|
|
meson_pcie_probe_clock() enables a clock and then registers a devres
action to disable it during teardown. If devm_add_action_or_reset()
fails, it runs the action immediately, disabling the clock.
The return value is currently ignored, so on that failure path,
meson_pcie_probe_clock() returns the disabled clock and probe continues.
Return the error so the existing probe error path unwinds normally.
Fixes: 9c0ef6d34fdbf ("PCI: amlogic: Add the Amlogic Meson PCIe controller driver")
Signed-off-by: Shuvam Pandey <shuvampandey1@gmail.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/177909148011.9588.6639767953842842291@gmail.com
|
|
According to the PHY Databook Common Block Signals section, the
ref_clk_en signal must remain de-asserted until the reference clock is
running at the appropriate frequency. Once the clock is stable,
ref_clk_en can be asserted. For lower power states where the reference
clock to the PHY is disabled, ref_clk_en should also be de-asserted.
Move the ref_clk_en bit manipulation into imx95_pcie_enable_ref_clk()
to ensure the reference clock stabilizes before ref_clk_en is asserted
and before the PHY reset is de-asserted. This aligns with the timing
requirements specified in the PHY documentation.
Fixes: d8574ce57d76 ("PCI: imx6: Add external reference clock input mode support")
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260518072715.3166514-3-hongxing.zhu@nxp.com
|
|
According to the i.MX95 PCIe PHY Databook, the ref_use_pad signal in the
Common Block Signals section selects the reference clock source connected
to the PHY pads. Per the specification, any change to this input must be
followed by a PHY reset assertion to take effect.
Move the REF_USE_PAD configuration before the PHY reset toggle to comply
with the required initialization sequence.
Fixes: 47f54a902dcd ("PCI: imx6: Toggle the core reset for i.MX95 PCIe")
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
[mani: renamed the callback and helper to match the usecase]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260518072715.3166514-2-hongxing.zhu@nxp.com
|
|
The Cadence LGA (Legacy Architecture IP) PCIe host controller currently
lacks the mandatory 100 ms delay after link training completes for speeds
> 5.0 GT/s, as required by PCIe r6.0 sec 6.6.1.
Add a 'max_link_speed' field to struct cdns_pcie. In the common host
layer function cdns_pcie_host_start_link(), after the link has been
successfully established, call pci_host_common_link_train_delay() to
insert the required delay.
For the j721e glue driver, set cdns_pcie.max_link_speed from the existing
link speed logic. For other LGA-based glue drivers (sky1, sg2042), the
common LGA host setup (pcie-cadence-host.c) provides a fallback reading
of the device tree property "max-link-speed" when available. This ensures
that the delay is not missed on those platforms once they enable the
property.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260518004246.1384532-3-18255117159@163.com
|
|
PCIe r6.0, sec 6.6.1 (Conventional Reset) requires that for a Downstream
Port supporting Link speeds greater than 5.0 GT/s, software must wait a
minimum of 100 ms after Link training completes before sending any
Configuration Request.
Introduce a static inline helper pci_host_common_link_train_delay() that
checks the given max_link_speed (2 = 5.0 GT/s, 3 = 8.0 GT/s, etc.) and
calls msleep(100) only when the speed is greater than 5.0 GT/s.
This allows multiple host controller drivers to share the same mandatory
delay without duplicating the logic.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260518004246.1384532-2-18255117159@163.com
|
|
The MediaTek MT7925 WiFi device advertises FLR capability, but it does not
work correctly. This manifests in VFIO passthrough scenarios. Normal VM
operation works fine, including clean shutdown/reboot. However, when the VM
terminates uncleanly (crash, force-off), VFIO attempts to reset the device
before it can be assigned to another VM. Because FLR is broken, the reset
fails, preventing reuse.
This is similar to its predecessor MT7922 (see 81f64e925c29 ("PCI: Avoid
FLR for Mediatek MT7922 WiFi")), but with different symptoms. The MT7922
issue manifests as config read failures (returning ~0) after FLR. The
MT7925 shows different behavior: config reads work correctly after FLR, but
firmware communication fails.
First VM start with MT7925 works fine:
mt7925e 0000:08:00.0: ASIC revision: 79250000
mt7925e 0000:08:00.0: WM Firmware Version: ____000000, Build Time: 20260106153120
After force reset or VM crash, when VFIO attempts FLR to reset the device
for reassignment, firmware initialization fails:
mt7925e 0000:08:00.0: ASIC revision: 79250000
mt7925e 0000:08:00.0: Message 00000010 (seq 1) timeout
mt7925e 0000:08:00.0: Failed to get patch semaphore
[Repeats with increasing sequence numbers 2-10]
mt7925e 0000:08:00.0: hardware init failed
The driver cannot acquire the patch semaphore needed for firmware
initialization, indicating that FLR does not properly reset the firmware
state. The device remains in this broken state until physical power cycle.
Disable FLR for MT7925 so the PCI core falls back to other reset methods,
e.g., Secondary Bus Reset, which successfully resets the device and allows
reinitialization for VFIO passthrough reuse.
Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260522070646.203115-1-jtornosm@redhat.com
|
|
The Cadence HPA driver uses hardcoded constants (0x0, 0x2, 0x4, 0x5,
0x10) to program the outbound region type. Replace them with the newly
introduced common TLP type macros from pci.h for better readability
and maintainability.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260516153657.65214-4-18255117159@163.com
|
|
The dwc driver defines its own ATU type macros (PCIE_ATU_TYPE_MEM,
PCIE_ATU_TYPE_IO, PCIE_ATU_TYPE_CFG0, PCIE_ATU_TYPE_CFG1,
PCIE_ATU_TYPE_MSG) with the same numerical values as the newly
introduced common TLP type macros.
Remove the local definitions and switch all DWC users to the common
PCIE_TLP_TYPE_* macros. This eliminates redundancy and improves
consistency across PCI controller drivers.
No functional change intended.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260516153657.65214-3-18255117159@163.com
|
|
Introduce a set of unified TLP type macros in pci.h according to PCIe
spec r7.0, sec 2.2.1:
- PCIE_TLP_TYPE_MEM_RDWR (0x00) for Memory Read/Write
- PCIE_TLP_TYPE_IO_RDWR (0x02) for I/O Read/Write
- PCIE_TLP_TYPE_CFG0_RDWR (0x04) for Type 0 Config Read/Write
- PCIE_TLP_TYPE_CFG1_RDWR (0x05) for Type 1 Config Read/Write
- PCIE_TLP_TYPE_MSG (0x10) for Message Request (routing to RC)
These replace the old per-driver hardcoded values or local macros, and
also replace the previous PCIE_TLP_TYPE_CFG0_RD/WR and
PCIE_TLP_TYPE_CFG1_RD/WR definitions which had identical numeric values.
The read/write distinction is already handled by the TLP Format field
(Fmt), so a single type macro suffices.
Convert the aspeed and mediatek drivers to use the new macros, and remove
the obsolete definitions from pci.h.
No functional change intended.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260516153657.65214-2-18255117159@163.com
|
|
The original PCIE_FTS_NUM_L0(x) macro was buggy due to improper operator
precedence, where ((x) & 0xff << 8) was evaluated as ((x) & 0xff00).
Instead of just fixing the parentheses, use the standard FIELD_PREP()
macro. This makes the code more robust by automatically handling masks
and shifts, while also adding compile-time type and range checking to
ensure the value fits within PCIE_FTS_NUM_MASK.
Fixes: 637cfacae96f ("PCI: mediatek: Add MediaTek PCIe host controller support")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
[mani: added the bitfield header include spotted by Sashiko]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/20260515005552.2343-1-lirongqing@baidu.com
|
|
A lockdep warning is observed during boot on a Qcom firmware-managed
platform:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
...
Call trace:
register_lock_class+0x128/0x4d8
__lock_acquire+0x110/0x1db0
lock_acquire+0x278/0x3d8
_raw_spin_lock_irq+0x6c/0xc0
dw_pcie_irq_domain_alloc+0x48/0x190
irq_domain_alloc_irqs_parent+0x2c/0x48
msi_domain_alloc+0x90/0x160
...
dw_pcie_irq_domain_alloc() takes pp->lock while allocating MSI
interrupts. pp->lock is normally initialized by dw_pcie_host_init(), but
Qcom firmware-managed hosts use the ECAM init path instead:
pci_host_common_ecam_create()
pci_ecam_create()
qcom_pcie_ecam_host_init()
dw_pcie_msi_host_init()
dw_pcie_allocate_domains()
That path constructs a fresh struct dw_pcie_rp and calls
dw_pcie_msi_host_init() directly, without going through
dw_pcie_host_init(). As a result, pp->lock was not initialized, which
triggers the warning.
Initialize pp->lock in qcom_pcie_ecam_host_init() before registering the
MSI domains so the firmware-managed ECAM path matches the normal DWC host
initialization sequence.
Fixes: 7d944c0f1469 ("PCI: qcom: Add support for Qualcomm SA8255p based PCIe Root Complex")
Signed-off-by: Yadu M G <yadu.mg@oss.qualcomm.com>
[mani: added fixes tag and CCed stable]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Cc: stable@kernel.org
Link: https://patch.msgid.link/20260604122418.727274-1-yadu.mg@oss.qualcomm.com
|
|
Currently the kernel relies on a global variable to reference the PMC
context. Use an explicit lookup for the PMC and pass that to the public
PMC APIs.
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Due to a hardware issue, L0s is not properly supported by the PCIe
controller on the SA8775p SoC. If enabled, the L0s to L0 transition
triggers below correctable AER errors and may also affect link stability:
pcieport 0000:00:00.0: PME: Signaling with IRQ 332
pcieport 0000:00:00.0: AER: enabled with IRQ 332
pcieport 0000:00:00.0: AER: Correctable error message received from 0000:01:00.0
pci 0000:01:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
pci 0000:01:00.0: device [17cb:1103] error status/mask=00001000/0000e000
pci 0000:01:00.0: [12] Timeout
pcieport 0000:00:00.0: AER: Multiple Correctable error message received from 0000:01:00.0
pcieport 0000:00:00.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Transmitter ID)
pcieport 0000:00:00.0: device [17cb:0115] error status/mask=00001000/0000e000
pcieport 0000:00:00.0: [12] Timeout
Hence, disable L0s for the SA8775p SoC to allow it to properly function
by sacrificing a little bit of power saving.
Fixes: 58d0d3e032b3 ("PCI: qcom-ep: Add support for SA8775P SOC")
Assisted-by: Claude:claude-4-6-sonnet
Signed-off-by: Shawn Guo <shengchao.guo@oss.qualcomm.com>
[mani: commit log, corrected fixes tag]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260419093934.1223027-1-shengchao.guo@oss.qualcomm.com
|
|
Some NVIDIA GPU/NIC devices, though they don't implement CXL config space,
have many CXL-like properties. Call this kind "pre-CXL".
Similar to CXL.cache capability, these pre-CXL devices also require the ATS
function even when their RIDs are IOMMU bypassed, i.e. keep ATS "always on"
v.s. "on demand" when a non-zero PASID line gets enabled in SVA use cases.
Introduce pci_dev_specific_ats_required() quirk function to scan a list of
IDs for these devices. Then, include it in pci_ats_required().
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Nirmoy Das <nirmoyd@nvidia.com>
Tested-by: Nirmoy Das <nirmoyd@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Controlled by IOMMU drivers, ATS can be enabled "on demand", when a given
PASID on a device is attached to an I/O page table. This is working, even
when a device has no translation on its RID (i.e., RID is IOMMU bypassed).
However, certain PCIe devices require non-PASID ATS on their RID even when
the RID is IOMMU bypassed. Call this "ATS always on" in IOMMU term.
For example, CXL spec r4.0 notes in sec 3.2.5.13 Memory Type on CXL.cache:
"To source requests on CXL.cache, devices need to get the Host Physical
Address (HPA) from the Host by means of an ATS request on CXL.io."
In other words, the CXL.cache capability requires ATS; otherwise, it can't
access host physical memory.
Introduce a new pci_ats_required() helper for the IOMMU driver to scan a
PCI device and shift ATS policies between "on demand" and "always on".
Add the support for CXL.cache devices first. Pre-CXL devices will be added
in quirks.c file.
Note that pci_ats_required() validates against pci_ats_supported(), so we
ensure that untrusted devices (e.g. external ports) will not be always on.
This maintains the existing ATS security policy regarding potential side-
channel attacks via ATS.
Cc: linux-cxl@vger.kernel.org
Suggested-by: Vikram Sethi <vsethi@nvidia.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Some platforms have incorrect T_POWER_ON value programmed in hardware.
Generally these will be corrected by bootloaders, but not all targets
support bootloaders to program correct values. That means the
LTR_L1.2_THRESHOLD value calculated by aspm.c can be wrong, which can
result in improper L1.2 exit behavior. If AER happens to be supported and
enabled, the error may be *reported* via AER.
Parse the 't-power-on-us' property from each Root Port node and program it
as part of host initialization using dw_pcie_program_t_power_on() before
link training.
This property in added to the dtschema here [1].
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: reworded comment]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link[1]: https://lore.kernel.org/all/20260205093346.667898-1-krishna.chundru@oss.qualcomm.com/
Link: https://patch.msgid.link/20260428-t_power_on_fux-v5-3-f1ef926a91ff@oss.qualcomm.com
|
|
The T_POWER_ON indicates the time (in μs) that a Port requires the port on
the opposite side of Link to wait in L1.2.Exit after sampling CLKREQ#
asserted before actively driving the interface. This value is used by the
ASPM driver to compute the LTR_L1.2_THRESHOLD.
Currently, some controllers expose T_POWER_ON value of zero in the L1SS
capability registers, leading to incorrect LTR_L1.2_THRESHOLD calculations,
which can result in improper L1.2 exit behavior and if AER happens to be
supported and enabled, the error may be *reported* via AER.
Add a helper to override T_POWER_ON value by the DWC controller drivers.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: changed t_power_on to u32]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Link: https://patch.msgid.link/20260428-t_power_on_fux-v5-2-f1ef926a91ff@oss.qualcomm.com
|
|
Add pcie_encode_t_power_on() to encode the PCIe L1 PM Substates T_POWER_ON
parameter into the T_POWER_ON Scale and T_POWER_ON Value fields.
This helper can be used by the controller drivers to change the
default/wrong value of T_POWER_ON in L1SS capability register to avoid
incorrect calculation of LTR_L1.2_THRESHOLD value.
The helper converts a T_POWER_ON time specified in microseconds into the
appropriate scale/value encoding defined by PCIe r7.0, sec 7.8.3.2. Values
that exceed the maximum encodable range are clamped to the largest
representable encoding.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: changed t_power_on_us to u32, added helper name to subject]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260428-t_power_on_fux-v5-1-f1ef926a91ff@oss.qualcomm.com
|
|
The driver currently supports two PERST# and PHY DT configurations. In one
case, PHY and PERST# are described in the RC node. In the other case, they
are described in the RP node.
A mixed setup is not supported. One common example is PHY on the RP node
while PERST# remains on the RC node. In that case the driver goes through
the RP parse path, does not find PERST# on RP, and does not report an error
because PERST# is optional. Probe can then succeed silently while PERST# is
left uncontrolled, and PCIe endpoints fail to work later. This silent
probe success makes debugging difficult.
Handle this mixed case in the RP parse path by checking whether PERST# is
present on RC and, if so, using the RC PERST# GPIO for RP ports while
keeping RP parsing for PHY. Emit a warning to indicate mixed DT content so
it can be fixed.
This keeps mixed systems functional and makes the configuration issue
visible instead of failing later at endpoint bring-up.
Suggested-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Qiang Yu <qiang.yu@oss.qualcomm.com>
[mani: folded the fix: https://lore.kernel.org/linux-pci/20260526-fix_perst_gpio_handling-v1-1-9170507bb4e9@oss.qualcomm.com]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Link: https://patch.msgid.link/20260508-mix_perst_phy_dts-v1-1-9eff6ee9b51a@oss.qualcomm.com
|
|
We need the driver-core fixes in here as well to build on top of.
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Qcom PCIe RCs can successfully exit from L1SS during OS runtime. However,
during system suspend, the Qcom PCIe RC driver may remove all resource
votes and turn off the PHY to maximize power savings.
Consequently, when the host is in system suspend with the link in L1SS and
the endpoint asserts CLKREQ#, the RC driver must restore the PHY and enable
the REFCLK. This recovery process causes the L1SS exit latency time to be
exceeded (roughly L10_REFCLK_ON + T_COMMONMODE). If the RC driver were to
retain all votes during suspend, L1SS exit would succeed without issue but
at the expense of higher power consumption.
When the host fails to move the link from L1SS to L0 within the
L10_REFCLK_ON + T_COMMONMODE time, the endpoint may treat it as a fatal
condition and trigger Link Down (LDn) during resume. This LDn results in a
reset that destroys the internal device state.
To ensure that the client drivers can properly handle this scenario, let
them know about this platform limitation by setting the
'pci_host_bridge::broken_l1ss_resume' flag.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260519-l1ss-fix-v2-3-b2c3a4bdeb15@oss.qualcomm.com
|
|
suspend
Per PCIe v7.0, sec 5.5.3.3.1, when exiting L1.2 due to an endpoint
asserting CLKREQ# signal, the REFCLK must be turned on within the latency
advertised in the LTR message. This requirement applies to L1.1 as well.
On some platforms like Qcom, these requirements are satisfied during OS
runtime, but not while resuming from the system suspend. This happens
because the PCIe RC driver may remove all resource votes and turn off the
PHY analog circuitry during suspend to maximize power savings while keeping
the link in L1SS.
Consequently, when the endpoint asserts CLKREQ# to wake up, the RC driver
must restore the PHY and enable the REFCLK. When this recovery process
exceeds the L1SS exit latency time (roughly L10_REFCLK_ON + T_COMMONMODE),
the endpoint may treat it as a fatal condition and trigger Link Down (LDn).
This results in a reset that destroys the internal device state.
So to indicate this platform limitation to the client drivers, introduce a
new flag 'pci_host_bridge::broken_l1ss_resume' and check it in
pci_suspend_retains_context(). If the flag is set by the RC driver, the API
will return 'false' indicating the client drivers that the device context
may not be retained and the drivers must be prepared for context loss.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260519-l1ss-fix-v2-2-b2c3a4bdeb15@oss.qualcomm.com
|
|
during suspend
Currently, PCI endpoint drivers (e.g. nvme) use pm_suspend_via_firmware()
to check whether device state is preserved during system suspend. If
firmware will be invoked at the end of suspend, we don't know whether
devices will retain their internal state.
But device context might be lost due to platform issues as well. Having
those checks in endpoint drivers will not scale and will cause a lot of
code duplication.
Add pci_suspend_retains_context() as a sole point of truth that the
endpoint drivers can rely on to check whether they can expect the device
context to be retained or not.
If pci_suspend_retains_context() returns 'false', drivers need to prepare
for context loss by performing actions such as resetting the device, saving
the context, shutting it down etc. If it returns 'true', drivers do not
need to perform any special action and can leave the device in active
state.
Right now, this API only incorporates pm_suspend_via_firmware(), but will
be extended in future commits.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260519-l1ss-fix-v2-1-b2c3a4bdeb15@oss.qualcomm.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Remove obsolete PCIe maintainer addresses (Florian Eckert, Hans
Zhang)
- Restore a brcmstb link speed assignment that was inadvertently
removed, reducing bcm2712 performance (Florian Fainelli)
* tag 'pci-v7.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI: brcmstb: Assign pcie->gen from of_pci_get_max_link_speed()
MAINTAINERS: Remove Jianjun Wang as PCIe mediatek maintainer
MAINTAINERS: Remove Chuanhua Lei as PCIe intel-gw maintainer
|
|
Add support for transitioning PCIe endpoints under host bridge into D3cold
by integrating with the DWC core suspend/resume helpers.
Implement PME_Turn_Off message generation via ELBI_SYS_CTRL and hook it
into the DWC host operations so the controller follows the standard
PME_Turn_Off based power-down sequence before entering D3cold.
When the device is suspended into D3cold, fully tear down interconnect
bandwidth and OPP votes. If D3cold is not entered, retain existing
behavior by keeping the required interconnect and OPP votes.
Use dw_pcie::skip_pwrctrl_off to avoid powering off devices during suspend
to preserve wakeup capability of the devices and also not to power on the
devices in the init path.
Finally, drop the qcom_pcie::suspended flag and rely on the existing
dw_pcie::suspended state, which now drives both the power-management flow
and the interconnect/OPP handling.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429-d3cold-v5-5-89e9735b9df6@oss.qualcomm.com
|
|
Previously, the driver skipped putting the link into L2 and device state in
D3cold whenever L1 ASPM was enabled, since some devices (e.g. NVMe) expect
low resume latency and may not tolerate deeper power states. However, such
devices typically remain in D0 and are already covered by the new helper's
requirement that all endpoints be in D3hot before the devices under host
bridge may enter D3cold.
Replace the local L1/L1SS-based check in dw_pcie_suspend_noirq() with the
shared pci_host_common_d3cold_possible() helper to decide whether the
devices under host bridge can safely transition to D3cold.
In addition, propagate PME-from-D3cold capability information from the
helper and record it in skip_pwrctrl_off. Some devices (e.g. M.2 cards
without auxiliary power) cannot send PME when the main power is removed,
even if they advertise PME-from-D3cold support. This allows controller
power-off to be skipped when required to preserve wakeup functionality.
While at it, update the 'dw_pcie::suspended' flag in dw_pcie_resume_noirq()
only after the PCIe link resumes successfully, to avoid marking the
controller as active when link resume fails.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: commit log and added TODO to query Vaux]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429-d3cold-v5-4-89e9735b9df6@oss.qualcomm.com
|
|
Some Qcom PCIe controller variants bring the PHY out of test power-down
(PHY_TEST_PWR_DOWN) during init. When the link is later transitioned to
D3cold and the driver disables PCIe clocks and/or regulators without
explicitly re-asserting PHY_TEST_PWR_DOWN, the PHY can remain partially
powered, leading to avoidable power leakage.
Update the init-path comments to reflect that PARF_PHY_CTRL is used to
power the PHY on. Also, for controller revisions that enable PHY power in
init (2.3.2, 2.3.3, 2.4.0, 2.7.0 and 2.9.0), explicitly power the PHY down
via PARF_PHY_CTRL in the deinit path before disabling clocks or regulators.
This ensures the PHY is put into a defined low-power state prior to
removing its supplies, preventing leakage when entering D3cold.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429-d3cold-v5-3-89e9735b9df6@oss.qualcomm.com
|
|
For older SoCs like SC7280, reading DBI LTSSM register after sending
PME_Turn_Off message causes NOC error.
To avoid unsafe DBI accesses, introduce qcom_pcie_get_ltssm() to retrieve
the LTSSM state without DBI. For newer platforms, read the LTSSM state from
the PARF_LTSSM register; for older platforms continue to retrieve it from
ELBI_SYS_STTS.
This helper is used in place of direct DBI-based link state checks in the
D3cold path after sending PME_Turn_Off message, ensuring the LTSSM state
can be queried safely even after DBI access is no longer valid.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: commit log and fixed get_ltssm() check]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429-d3cold-v5-2-89e9735b9df6@oss.qualcomm.com
|
|
Add a common helper, pci_host_common_d3cold_possible(), to determine
whether PCIe devices under host bridge can safely transition to D3cold.
This helper is intended to be used by PCI host controller drivers to decide
whether they can safely put the endpoint devices into D3cold based on their
power state and wakeup capabilities. Once the devices are transitioned into
D3cold, the host controller can be safely powered off.
The helper walks all devices on the all downstream buses and only allows
the devices to enter D3cold if all PCIe endpoints are already in
PCI_D3hot. This ensures that the host controller driver does not broadcast
PME_Turn_Off or power down the controller while any active endpoint still
requires the link to remain powered.
For devices that may wake the system, the helper additionally requires that
the device supports PME wake from D3cold (via WAKE#). Devices that do not
have wakeup enabled are not restricted by this check and do not block the
devices under host bridge from entering D3cold.
Devices without a bound driver and with PCI not enabled via sysfs are
treated as inactive and therefore do not prevent the devices under host
bridge from entering D3cold. This allows controllers to power down more
aggressively when there are no actively managed endpoints.
Some devices (e.g. M.2 without auxiliary power) lose PME detection when
main power is removed. Even if such devices advertise PME-from-D3cold
capability, entering D3cold may break wakeup. Return PME-from-D3cold
capability via 'pme_capable' parameter so PCIe controller drivers can apply
platform-specific handling to preserve wakeup functionality.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
[mani: commit log and removed the device checks for d3cold]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429-d3cold-v5-1-89e9735b9df6@oss.qualcomm.com
|
|
Older steppings of the Loongson-3C6000 series incorrectly report the
supported link speeds on their PCIe bridges (device IDs 0x3c19, 0x3c29)
as only 2.5 GT/s, despite the upstream bus supporting speeds from
2.5 GT/s up to 16 GT/s.
As a result, since commit 774c71c52aa4 ("PCI/bwctrl: Enable only if more
than one speed is supported"), bwctrl will be disabled if there's only
one 2.5 GT/s value in vector 'supported_speeds'.
Manually override the 'supported_speeds' field for affected PCIe bridges
with those found on the upstream bus to correctly reflect the supported
link speeds. Updating the speeds to reflect what the hardware actually
supports avoids quirks in drivers consuming the speed information.
This commit was originally found from AOSC OS[1].
Fixes: cd89edda4002 ("PCI: loongson: Add ACPI init support")
Signed-off-by: Ayden Meng <aydenmeng@yeah.net>
Signed-off-by: Mingcong Bai <jeffbai@aosc.io>
[Ziyao Li: move from drivers/pci/quirks.c to drivers/pci/controller/pci-loongson.c]
Signed-off-by: Ziyao Li <liziyao@uniontech.com>
[Xi Ruoyao: Fixed falling through logic, added debug log, Fixes tag and rebased to 7.0-rc7]
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: commit log, https://lore.kernel.org/all/9d815df3b33a63223112b97440c01247935363c1.camel@xry111.site]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Lain Fearyncess Yang <fsf@live.com>
Tested-by: Ayden Meng <aydenmeng@yeah.net>
Tested-by: Mingcong Bai <jeffbai@aosc.io>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Cc: stable@vger.kernel.org
Link: https://github.com/AOSC-Tracking/linux/commit/4392f441363abdf6fa0a0433d73175a17f493454
Link: https://github.com/AOSC-Tracking/linux/pull/2 #1
Link: https://patch.msgid.link/20260412101731.107059-1-xry111@xry111.site
|
|
When pwrctrl integration was added, the error message for
pci_pwrctrl_create_devices() failure was incorrectly added after the goto
statement, causing it to be skipped.
Move the goto statement after the dev_err_probe() call so that the
error message actually gets printed (or saved if probe is deferred).
Fixes: 1a152e21940a ("PCI: mediatek-gen3: Integrate new pwrctrl API")
Reported-by: Dan Carpenter <error27@gmail.com>
Closes: https://lore.kernel.org/all/adjNaKB5KGpl6qIp@stanley.mountain/
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Hans Zhang <18255117159@163.com>
Link: https://patch.msgid.link/20260512103347.1751080-1-wenst@chromium.org
|
|
The kstrtou32() function returns negative error code or zero on success.
However, in this case "val" is a u32 and the function returns signed long,
so negative error codes from kstrtou32() are returned as high positive
values.
Store the error code in an int instead.
Fixes: d20ee8e2dbd6 ("PCI: dwc: Add debugfs based Error Injection support for DWC")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Hans Zhang <18255117159@163.com>
Link: https://patch.msgid.link/agL-Uwfn26SI4Gb0@stanley.mountain
|
|
Program the Synopsys DesignWare PORT_AFR L1 entrance latency field from the
optional aspm-l1-entry-delay-ns device tree property (nanoseconds).
Convert delay to whole microseconds with ceiling division (DIV_ROUND_UP),
then derive the 3-bit hw encoding as the minimum of order_base_2(us) and 7.
If the property is not present or cannot be read, default to 7.
Hardware encoding (PORT_AFR L1 entrance latency, bits 27:29):
+--------------------------+----------+
| Advertised maximum | Code |
+--------------------------+----------+
| Maximum of 1 us | 000b |
+--------------------------+----------+
| Maximum of 2 us | 001b |
+--------------------------+----------+
| Maximum of 4 us | 010b |
+--------------------------+----------+
| Maximum of 8 us | 011b |
+--------------------------+----------+
| Maximum of 16 us | 100b |
+--------------------------+----------+
| Maximum of 32 us | 101b |
+--------------------------+----------+
| Maximum of 64 us | 110b |
+--------------------------+----------+
| Rest | 111b |
+--------------------------+----------+
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260515070753.3852840-2-mmaddireddy@nvidia.com
|
|
Replace the custom open function and file_operations with the standard
DEFINE_SHOW_ATTRIBUTE macro to reduce boilerplate code.
This also adds the previously missing .release() callback and fixes the
seq_file leak during close.
Signed-off-by: Hans Zhang <18255117159@163.com>
[mani: added a note about implicit release callback change spotted by Sashiko]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260508133432.1964491-1-18255117159@163.com
|
|
The current DT binding for pci-imx6 specifies the 'reset-gpios' property
in the host bridge node. However, the PERST# signal logically belongs to
individual Root Ports rather than the host bridge itself. This becomes
important when supporting PCIe Key E connector and the PCI power control
framework for pci-imx6 driver, which requires properties to be specified
in Root Port nodes.
Parse 'reset-gpios' from Root Port nodes and the PCIe bridge nodes under
the Root Port using the common helper pci_host_common_parse_ports(), and
update the reset GPIO handling to use the parsed port list from
bridge->ports. To maintain DT backwards compatibility, fall back to the
legacy method of parsing the host bridge node if the reset property is not
present in the Root Port nodes.
Since now the reset GPIO is obtained with GPIOD_ASIS flag, it may be in
input mode, so use gpiod_direction_output() instead of
gpiod_set_value_cansleep() to ensure the reset GPIO is properly
configured as output before setting its value.
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
Link: https://patch.msgid.link/20260422093549.407022-5-sherry.sun@nxp.com
|
|
The PCIe endpoint may start responding or driving signals as soon as its
supply is enabled, even before the reference clock is stable. Asserting
PERST# before enabling the regulator ensures that the endpoint remains in
reset throughout the entire power-up sequence, until both power and refclk
are known to be stable and link initialization can safely begin.
Currently, the driver enables the vpcie3v3aux regulator in imx_pcie_probe()
before PERST# is asserted in imx_pcie_host_init(), which may cause PCIe
endpoint undefined behavior during early power-up. However, there is no
issue so far because PERST# is requested as GPIOD_OUT_HIGH in
imx_pcie_probe(), which guarantees that PERST# is asserted before enabling
the vpcie3v3aux regulator.
This prepares for an upcoming changes that will parse the reset property
using the new Root Port binding, which will use GPIOD_ASIS when requesting
the reset GPIO. With GPIOD_ASIS, the GPIO state is not guaranteed, so
explicit sequencing is required.
Fix the power sequencing by:
1. Moving vpcie3v3aux regulator enable from probe to
imx_pcie_host_init(), where it can be properly sequenced with PERST#.
2. Moving imx_pcie_assert_perst() before regulator and clock enable to
ensure correct ordering.
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Richard Zhu <hongxing.zhu@nxp.com>
Link: https://patch.msgid.link/20260422093549.407022-4-sherry.sun@nxp.com
|
|
Introduce generic helper functions to parse Root Port device tree nodes and
extract common properties like reset GPIOs. This allows multiple PCI host
controller drivers to share the same parsing logic.
Define struct pci_host_port to hold common Root Port properties (currently
only list of PERST# GPIO descriptors) and add pci_host_common_parse_ports()
to parse Root Port nodes from device tree.
Also add the 'ports' list to struct pci_host_bridge to better maintain
parsed Root Port information.
Signed-off-by: Sherry Sun <sherry.sun@nxp.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260422093549.407022-3-sherry.sun@nxp.com
|
|
The IMX6SX_GPR12_PCIE_TEST_POWERDOWN bit does not control the PCIe
reference clock on i.MX6SX. Instead, it is part of i.MX6SX PCIe core
reset sequence.
Move the IMX6SX_GPR12_PCIE_TEST_POWERDOWN assertion/deassertion into
the core reset functions to properly reflect its purpose. Remove the
.enable_ref_clk() callback for i.MX6SX since it was incorrectly
manipulating this bit.
Fixes: e3c06cd063d6 ("PCI: imx6: Add initial imx6sx support")
Signed-off-by: Richard Zhu <hongxing.zhu@nxp.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260319090844.444987-1-hongxing.zhu@nxp.com
|
|
The kerneldoc for device_is_bound() states that it must be called with
the device lock taken. Synchronize the two calls in pwrctrl core.
Fixes: b35cf3b6aa1e ("PCI/pwrctrl: Add APIs to power on/off pwrctrl devices")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260518100700.47581-1-bartosz.golaszewski@oss.qualcomm.com
|
|
In 2012, commit 26f41062f28d ("PCI: check for pci bar restore completion
and retry") amended pci_restore_state() to attempt BAR restoration up to 10
times. This was necessary because back in the day, only a 100 msec delay
was observed after pcie_flr() carried out a Function Level Reset. The
retries ensured that BARs were restored even if devices needed more time to
come out of reset.
In 2016, commit 5adecf817dd6 ("PCI: Wait for up to 1000ms after FLR reset")
extended the delay to 1 sec. Commit a2758b6b8fdb ("PCI: Rename
pci_flr_wait() to pci_dev_wait() and make it generic") subsequently
extended it further to 60 sec.
The lengthened delay makes it unnecessary to retry BAR restoration, so
drop it.
Reported-by: Bjorn Helgaas <bhelgaas@google.com>
Closes: https://lore.kernel.org/r/20260416225745.GA41850@bhelgaas/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/785c98b50a7a00d0698848c75d51b8f5669ad18f.1777814679.git.lukas@wunner.de
|
|
For a device that advertises No_Soft_Reset == 0, a transition from D3hot to
D0uninitialized is a soft reset, and the resulting internal device state is
undefined.
Per PCIe r7.0, sec 2.3.1, a transition from D3hot to D0uninitialized
mandates a minimum 10 ms delay before accessing the device. Following this
delay, the device is permitted to respond to initial configuration requests
with a Request Retry Status (RRS) completion status if it needs more time
to initialize.
Call pci_dev_wait() after pci_power_up() performs a D3hot->D0uninitialized
transition to ensure the device is ready to accept config accesses, as is
done after the similar transition in pci_pm_reset().
If the device is already ready, this is essentially a no-op except for one
additional config read.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260518191220.636213-3-bhelgaas@google.com
|
|
pci_dev_wait() waits for a device to be Configuration-Ready after a reset,
such as a Function-Level Reset (FLR), a soft reset during a D3hot->
D0uninitialized transition when No_Soft_Reset == 0), or a power-up sequence
from D3cold->D0uninitialized.
If pci_dev_wait() returns success, the device is guaranteed to respond to
configuration requests with Successful Completion status. If it times out,
device is completely non-responsive.
Upgrade the log level from pci_warn() to pci_err() to reflect this failure
state.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Link: https://patch.msgid.link/20260518191220.636213-2-bhelgaas@google.com
|
|
When power control for downstream devices was introduced in the
mediatek-gen3 PCIe controller driver, only the power to the downstream
devices was cut when the controller driver is removed. This matched
existing behavior, but in hindsight a proper power down sequence should
have been followed.
Call mtk_pcie_devices_power_down() on driver removal so that in addition
to removing power from the downstream devices, PERST# is asserted.
Fixes: 1a152e21940a ("PCI: mediatek-gen3: Integrate new pwrctrl API")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260505105918.1823170-1-wenst@chromium.org
|
|
Add a .shutdown() callback to control the timing of PERST# and power during
system shutdown to ensure that PERST# is asserted before power to the
connector is removed, as required by PCIe CEM r6.0, sec 2.2.
Signed-off-by: Jian Yang <jian.yang@mediatek.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260413071401.1151-3-jian.yang@mediatek.com
|
|
Some MediaTek chips stop generating REFCLK if the PCIE_PHY_RSTB signal of
PCIe controller is asserted at the start of mtk_pcie_devices_power_up().
But the driver deasserts PCIE_PHY_RSTB together with PCIE_PE_RSTB signal
that is used to deassert PERST#. This violates PCIe CEM r6.0, sec 2.11.2,
which mandates waiting for 100ms (PCIE_T_PVPERL_MS) after power becomes
stable.
Move the MAC, PHY and BRG reset deassert code above the PCIE_T_PVPERL_MS
delay and leave the PCIE_PE_RSTB deassertion after the delay.
Add the 10ms delay mentioned in the MediaTek datasheet after asserting
PCIE_BRG_RSTB and before accessing the PCIE_RST_CTRL_REG register.
Signed-off-by: Jian Yang <jian.yang@mediatek.com>
[mani: commit log and comments rewording]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260413071401.1151-2-jian.yang@mediatek.com
|
|
After commit 03f920936977 ("PCI: controller: Validate max-link-speed"),
pcie->gen stopped being assigned and as a result the established PCIe link
would stop supporting Gen3 speeds on 2712 since pcie->gen is used to
populate LnkCntl2 and LnkCap in brcm_pcie_set_gen().
If the 'max-link-speed' property is not specified, or it exceeds Gen3,
resort to the HW defaults.
Link: https://github.com/raspberrypi/linux/issues/7343
Reported-by: Dom Cobley <popcornmix@gmail.com>
Reported-by: Phil Elwell <phil@raspberrypi.com>
Fixes: 03f920936977 ("PCI: controller: Validate max-link-speed")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hans Zhang <18255117159@163.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260506164537.103196-1-florian.fainelli@broadcom.com
|
|
The chained IRQ handler is set during probe, but is only removed during the
driver remove(). If pci_host_probe() fails, the handler and INTx IRQ
domain remain set even though the devm-managed host bridge storage
containing struct altera_pcie will be released, leaving the handler with
a stale data pointer.
Interrupts are also enabled before pci_host_probe() is called. If probe
fails after that point, the controller interrupt source should be disabled
before the chained handler and INTx domain are removed.
So set the chained handler only after the INTx domain has been created.
Disable controller interrupts during IRQ teardown, and tear the IRQ setup
down if pci_host_probe() fails.
Fixes: c63aed7334c2 ("PCI: altera: Use pci_host_probe() to register host")
Signed-off-by: Mahesh Vaidya <mahesh.vaidya@altera.com>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Subhransu S. Prusty <subhransu.sekhar.prusty@altera.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260430204330.3121003-3-mahesh.vaidya@altera.com
|
|
altera_pcie_irq_teardown() calls irq_dispose_mapping() on pcie->irq.
However, pcie->irq is the parent IRQ returned by platform_get_irq(), not
the mapping created by Altera INTx irq_domain.
The Altera driver only sets the chained handler on the parent IRQ. It
should detach that handler during teardown, but it should not dispose the
parent IRQ mapping, which belongs to the parent interrupt controller's
irq_domain.
Drop irq_dispose_mapping(pcie->irq) from the teardown path.
Note that during irqchip remove(), the child IRQs should've disposed. But
since the chained handler itself is removed, there is no way the stale
child IRQs (if exists) could fire. So it is safe here.
Fixes: ec15c4d0d5d2 ("PCI: altera: Allow building as module")
Signed-off-by: Mahesh Vaidya <mahesh.vaidya@altera.com>
[mani: added a note about IRQ disposal]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Subhransu S. Prusty <subhransu.sekhar.prusty@altera.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260430204330.3121003-2-mahesh.vaidya@altera.com
|
|
Remove the redundant mode field from struct keembay_pcie and use the
existing mode field in struct dw_pcie instead.
This avoids duplication and prevents potential inconsistencies between
the two mode fields.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260501161010.71688-5-18255117159@163.com
|
|
Remove the redundant mode field from struct dw_plat_pcie and use the
existing mode field in struct dw_pcie instead.
This avoids duplication and prevents potential inconsistencies between
the two mode fields.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260501161010.71688-4-18255117159@163.com
|
|
Remove the redundant mode field from struct artpec6_pcie and use the
existing mode field in struct dw_pcie instead.
This avoids duplication and prevents potential inconsistencies between
the two mode fields.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260501161010.71688-3-18255117159@163.com
|
|
Remove the redundant mode field from struct dra7xx_pcie and use the
existing mode field in struct dw_pcie instead.
This avoids duplication and prevents potential inconsistencies between
the two mode fields.
Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260501161010.71688-2-18255117159@163.com
|
|
During resume, qcom_pcie_icc_opp_update() may access DBI registers before
the OPP votes are restored, triggering NoC errors.
Set the PCIe controller to the maximum OPP first in resume_noirq(), then
proceed with link/DBI accesses. The OPP is later updated again based on
the actual link bandwidth requirements.
Introduce a helper to reuse the max-OPP setup code and share it with
probe().
Fixes: 5b6272e0efd5 ("PCI: qcom: Add OPP support to scale performance")
Signed-off-by: Qiang Yu <qiang.yu@oss.qualcomm.com>
[mani: commit log and error log rewording]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260416-setmaxopp-v1-1-6a74e2d945a0@oss.qualcomm.com
|
|
The ECRC (TLP digest) workaround was originally applied only for DesignWare
core version 4.90a. Per discussion in Synopsys case, the dependency of the
iATU TD bit on ECRC generation was removed in 5.10a, so apply the
workaround for all DWC versions below that release.
Replace the misleading comment that referred to raw version constants
with readable DesignWare release name to help readability.
Fixes: b210b1595606 ("PCI: dwc: Apply ECRC workaround to DesignWare 5.00a as well")
Signed-off-by: Manikanta Maddireddy <mmaddireddy@nvidia.com>
[mani: corrected fixes tag format]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260410062507.657453-1-mmaddireddy@nvidia.com
|
|
Add DP1000 SoC PCIe Root Complex driver.
The controller only supports 32-bit aligned configuration space accesses.
Signed-off-by: Xincheng Zhang <zhangxincheng@ultrarisc.com>
Signed-off-by: Jia Wang <wangjia@ultrarisc.com>
[mani: changed to builtin_platform_driver() to prevent irqchip removal]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: squash MAINTAINERS update here]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260427-ultrarisc-pcie-v4-3-98935f6cdfb5@ultrarisc.com
|
|
Add a shutdown handler for the AMD MDB PCIe host controller that
asserts the PERST# signal via GPIO before the system powers off or
reboots. This ensures the connected PCIe endpoint is held in reset
during shutdown.
Signed-off-by: Sai Krishna Musham <sai.krishna.musham@amd.com>
[mani: removed conditional check since GPIO APIs return safely if desc is NULL]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260427121227.290604-1-sai.krishna.musham@amd.com
|
|
Loongson PCI host controllers have a hardware quirk that requires
software to ignore downstream devices with device number > 0 on the
internal bridges. The current implementation applies the workaround to
all non-root buses, which breaks external bridges (e.g., PCIe switches)
with multiple downstream devices.
Fix it by only applying the workaround to internal bridges.
Tested on Loongson-LS3A4000-7A1000-NUC-SE, using AMD Promontory 21
chipset add-in card [1].
$ lspci -tnnnvvv
-[0000:00]-+-00.0 Loongson Technology LLC 7A1000 Chipset Hyper Transport Bridge Controller [0014:7a00]
+-00.1 Loongson Technology LLC 7A2000 Chipset Hyper Transport Bridge Controller [0014:7a10]
+-03.0 Loongson Technology LLC 2K1000/2000 / 7A1000 Chipset Gigabit Ethernet Controller [0014:7a03]
+-04.0 Loongson Technology LLC 2K1000 / 7A1000/2000 Chipset USB OHCI Controller [0014:7a24]
+-04.1 Loongson Technology LLC 2K1000 / 7A1000/2000 Chipset USB EHCI Controller [0014:7a14]
+-05.0 Loongson Technology LLC 2K1000 / 7A1000/2000 Chipset USB OHCI Controller [0014:7a24]
+-05.1 Loongson Technology LLC 2K1000 / 7A1000/2000 Chipset USB EHCI Controller [0014:7a14]
+-06.0 Loongson Technology LLC 7A1000 Chipset Vivante GC1000 GPU [0014:7a15]
+-06.1 Loongson Technology LLC 2K1000 / 7A1000 Chipset Display Controller [0014:7a06]
+-07.0 Loongson Technology LLC 2K1000/2000/3000 / 3B6000M / 7A1000/2000 Chipset HD Audio Controller [0014:7a07]
+-08.0 Loongson Technology LLC 2K1000 / 7A1000 Chipset 3Gb/s SATA AHCI Controller [0014:7a08]
+-08.1 Loongson Technology LLC 2K1000 / 7A1000 Chipset 3Gb/s SATA AHCI Controller [0014:7a08]
+-08.2 Loongson Technology LLC 2K1000 / 7A1000 Chipset 3Gb/s SATA AHCI Controller [0014:7a08]
+-09.0-[01]----00.0 Qualcomm Technologies, Inc QCNFA765 Wireless Network Adapter [17cb:1103]
+-0a.0-[02]----00.0 Etron Technology, Inc. EJ188/EJ198 USB 3.0 Host Controller [1b6f:7052]
+-0f.0-[03-08]----00.0-[04-08]--+-00.0-[05]----00.0 Shenzhen Longsys Electronics Co., Ltd. FORESEE XP1000 / Lexar Professional CFexpress Type B Gold series, NM620 PCIe NVME SSD (DRAM-less) [1d97:5216]
| +-08.0-[06]----00.0 MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1202 (DRAM-less) [1e4b:1202]
| +-0c.0-[07]----00.0 Advanced Micro Devices, Inc. [AMD] 600 Series Chipset USB 3.2 Controller [1022:43f7]
| \-0d.0-[08]----00.0 Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller [1022:43f6]
\-16.0 Loongson Technology LLC 7A1000 Chipset SPI Controller [0014:7a0b]
Fixes: 2410e3301fcc ("PCI: loongson: Don't access non-existent devices")
Co-developed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Co-developed-by: Lain "Fearyncess" Yang <i@lain.vg>
Signed-off-by: Lain "Fearyncess" Yang <i@lain.vg>
Signed-off-by: Rong Zhang <i@rong.moe>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://oshwhub.com/wesd/b650 [1]
Link: https://patch.msgid.link/20260501-ls7a-bridge-fixes-v2-1-69fa93683805@rong.moe
|
|
The ATU base address was set in intel_pcie_host_setup(), which is called
via pp->ops->init(). However, dw_pcie_get_resources() runs before this
callback and sets a default atu_base of 0x300000, which then gets
overwritten by the driver's value of 0xC0000.
But this ordering is broken because atu_base must be set before
dw_pcie_get_resources() runs, not after. So move the atu_base assignment
from intel_pcie_host_setup() to intel_pcie_probe() to fix the
initialization order.
The call stack is:
intel_pcie_probe
dw_pcie_host_init
dw_pcie_host_get_resources
dw_pcie_get_resources <- sets atu_base = 0x300000
pp->ops->init
intel_pcie_rc_init
intel_pcie_host_setup <- was overwriting atu_base here
Additionally, add support for parsing the ATU region from the device tree.
If an 'atu' region is present in DT, the DWC core parses it via
dw_pcie_get_resources() and the driver does not set atu_base explicitly.
If 'atu' is absent, the driver falls back to the hardcoded offset (0xC0000
from DBI base) for backwards compatibility, with a warning to the user.
Signed-off-by: Florian Eckert <fe@dev.tdt.de>
[mani: commit log]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260417-pcie-intel-gw-v5-6-0a2b933fe04f@dev.tdt.de
|
|
The pcie-intel-gw driver had no .start_link() callback. Add one so the
driver works again and does not abort with the following error messages
during probing:
intel-gw-pcie d1000000.pcie: host bridge /soc/pcie@d1000000 ranges:
intel-gw-pcie d1000000.pcie: MEM 0x00dc000000..0x00ddffffff -> 0x00dc000000
intel-combo-phy d0c00000.combo-phy: Set combo mode: combophy[1]: mode: PCIe single lane mode
intel-gw-pcie d1000000.pcie: No outbound iATU found
intel-gw-pcie d1000000.pcie: Cannot initialize host
intel-gw-pcie d1000000.pcie: probe with driver intel-gw-pcie failed with error -22
intel-gw-pcie c1100000.pcie: host bridge /soc/pcie@c1100000 ranges:
intel-gw-pcie c1100000.pcie: MEM 0x00ce000000..0x00cfffffff -> 0x00ce000000
intel-combo-phy c0c00000.combo-phy: Set combo mode: combophy[3]: mode: PCIe single lane mode
intel-gw-pcie c1100000.pcie: No outbound iATU found
intel-gw-pcie c1100000.pcie: Cannot initialize host
intel-gw-pcie c1100000.pcie: probe with driver intel-gw-pcie failed with error -22
Fixes: c5097b9869a1 ("Revert "PCI: dwc: Wait for link up only if link is started"")
Fixes: da56a1bfbab5 ("PCI: dwc: Wait for link up only if link is started")
Signed-off-by: Florian Eckert <fe@dev.tdt.de>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: remove timestamps]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260417-pcie-intel-gw-v5-5-0a2b933fe04f@dev.tdt.de
|
|
To ensure that the boot sequence is correct, the DWC PCIe core clock must
be switched on before PHY init call [1]. This changes are based on patched
kernel sources of the MaxLinear SDK.
The reason why the MaxLinear SDK is used as a reference here is, that this
PCIe DWC IP is used in the URX851 and URX850 SoC. This SoC was originally
developed by Intel when they acquired Lantiq’s home networking division in
2015 [2]. In 2020 the home network division was sold to MaxLinear [3].
Since then, this SoC belongs to MaxLinear. They use their own SDK, which
runs on kernel version '5.15.x'.
[1] https://github.com/maxlinear/linux/blob/updk_9.1.90/drivers/pci/controller/dwc/pcie-intel-gw.c#L544
[2] https://www.intc.com/news-events/press-releases/detail/364/intel-to-acquire-lantiq-advancing-the-connected-home
[3] https://investors.maxlinear.com/press-releases/detail/395/maxlinear-to-acquire-intels-home-gateway-platform
Signed-off-by: Florian Eckert <fe@dev.tdt.de>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260417-pcie-intel-gw-v5-4-0a2b933fe04f@dev.tdt.de
|
|
To improve the readability of the code, move the interrupt enable
instructions to a separate function. That is already done for the disable
interrupt instruction.
In addition, clear and disable all pending interrupts, as is done in
intel_pcie_core_irq_disable(). After that, enable all relevant interrupts
again. The 'PCIE_APP_IRNEN' definition contains all the relevant interrupts
that are of interest.
This change is also done in the MaxLinear SDK [1]. As I unfortunately don’t
have any documentation for this IP core, I suspect that the intention is to
set the IP core for interrupt handling to a specific state. Perhaps the
problem is that the IP core did not reinitialize the interrupt register
properly after a power cycle.
In my view, it can’t do any harm to switch the interrupt off and then on
again to set them to a specific state.
The reason why the MaxLinear SDK is used as a reference here is, that this
PCIe DWC IP is used in the URX851 and URX850 SoC. This SoC was originally
developed by Intel when they acquired Lantiq’s home networking division in
2015 [2]. In 2020 the home network division was sold to MaxLinear [3].
Since then, this SoC belongs to MaxLinear. They use their own SDK, which
runs on kernel version '5.15.x'.
[1] https://github.com/maxlinear/linux/blob/updk_9.1.90/drivers/pci/controller/dwc/pcie-intel-gw.c#L431
[2] https://www.intc.com/news-events/press-releases/detail/364/intel-to-acquire-lantiq-advancing-the-connected-home
[3] https://investors.maxlinear.com/press-releases/detail/395/maxlinear-to-acquire-intels-home-gateway-platform
Signed-off-by: Florian Eckert <fe@dev.tdt.de>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260417-pcie-intel-gw-v5-3-0a2b933fe04f@dev.tdt.de
|
|
The C preprocessor define 'PCIE_APP_INTX_OFST' is not used in the sources.
Delete it.
Signed-off-by: Florian Eckert <fe@dev.tdt.de>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260417-pcie-intel-gw-v5-2-0a2b933fe04f@dev.tdt.de
|
|
Add three macros for declaring static binary attributes for PCI resource
files:
- pci_dev_resource_io_attr(), for I/O BAR resources (read/write)
- pci_dev_resource_uc_attr(), for memory BAR resources (mmap uncached)
- pci_dev_resource_wc_attr(), for write-combine resources (mmap WC)
Each macro only sets the callbacks its resource type needs. The I/O macro
conditionally includes mmap support via __PCI_RESOURCE_IO_MMAP_ATTRS on
architectures where arch_can_pci_mmap_io() is true at compile time (such as
PowerPC, SPARC, and Xtensa).
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-7-kwilczynski@kernel.org
|
|
Currently, the __resource_resize_store() allows writing to the
resourceN_resize sysfs attribute to change a BAR's size without checking
for capabilities, currently relying only on the file access check.
Resizing a BAR modifies PCI device configuration and can disrupt active
drivers. After the upcoming conversion to static attributes, it will also
trigger resource file updates via sysfs_update_groups().
Add a CAP_SYS_ADMIN check to prevent unprivileged users from performing BAR
resize operations.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20260508043543.217179-6-kwilczynski@kernel.org
|
|
Both legacy and resource attributes set .f_mapping = iomem_get_mapping, so
the default generic_file_llseek() would consult iomem_inode for the file
size, which knows nothing about the attribute. That is why custom llseek
callbacks exist.
Currently, the legacy and resource attributes have .size set at creation
time, as such, using the attr->size is sufficient. However, the upcoming
static resource attributes will have .size == 0 set, since they are const,
and the .bin_size() callback will be used to provide the real size to
kernfs instead.
The legacy attributes operate on a struct pci_bus, not struct pci_dev, so
calling to_pci_dev() on them would be invalid.
Thus, split pci_llseek_resource() into two functions:
- pci_llseek_resource(), which derives the file size from the BAR using
pci_resource_len().
- pci_llseek_resource_legacy(), which uses attr->size directly.
Update the dynamic legacy attribute creation to use the new
pci_llseek_resource_legacy() callback.
The original pci_llseek_resource() was added in commit 24de09c16f97 ("PCI:
Implement custom llseek for sysfs resource entries").
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Link: https://patch.msgid.link/20260508043543.217179-5-kwilczynski@kernel.org
|
|
Currently, when the sysfs attributes for PCI resources are added
dynamically, the resource access callbacks are only set when the underlying
BAR type matches, using .read and .write for IORESOURCE_IO, and .mmap for
IORESOURCE_MEM or IORESOURCE_IO with arch_can_pci_mmap_io() support. As
such, when the callback is not set, the operation inherently fails.
After the conversion to static attributes, visibility callbacks will
control which resource files appear for each BAR, but the callbacks
themselves will always be set.
Add a type check to pci_resource_io() and pci_mmap_resource() to return
-EIO for an unsupported resource type.
Use the new pci_resource_is_io() and pci_resource_is_mem() helpers for the
type checks, replacing the open-coded bitwise flag tests and also drop the
local struct resource pointer in pci_mmap_resource().
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20260508043543.217179-4-kwilczynski@kernel.org
|
|
Replace direct pdev->resource[] accesses with pci_resource_n(), and
pdev->resource[].flags accesses with pci_resource_flags().
No functional changes intended.
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://patch.msgid.link/20260508043543.217179-2-kwilczynski@kernel.org
|
|
We meet a crash when running stress-ng on x86_64 machine:
BUG: unable to handle page fault for address: ffa0000007f40000
RIP: 0010:pci_get_rom_size+0x52/0x220
Call Trace:
<TASK>
pci_map_rom+0x80/0x130
pci_read_rom+0x4b/0xe0
kernfs_file_read_iter+0x96/0x180
vfs_read+0x1b1/0x300
Our analysis reveals that the ROM space's start address is
0xffa0000007f30000, and size is 0x10000. Because of broken ROM space,
before calling readl(pds), the pds's value is 0xffa0000007f3ffff, which is
already pointed to the ROM space end, invoking readl() would read 4 bytes
therefore cause an out-of-bounds access and trigger a crash. Fix this by
adding image header and data structure checking.
We also found another crash on arm64 machine:
Unable to handle kernel paging request at virtual address ffff8000dd1393ff
Mem abort info:
ESR = 0x0000000096000021
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x21: alignment fault
The call trace is the same with x86_64, but the crash reason is that the
data structure addr is not aligned with 4, and arm64 machine report
"alignment fault". Fix this by adding alignment checking.
Fixes: 47b975d234ea ("PCI: Avoid iterating through memory outside the resource window")
Suggested-by: Guanghui Feng <guanghuifeng@linux.alibaba.com>
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
[bhelgaas: shorten function names, wrap comments]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://patch.msgid.link/20260508082128.3344255-3-kanie@linux.alibaba.com
|
|
Convert the magic numbers associated with PCI ROM into named
definitions. Some of these definitions will be used in the second
fix patch.
Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Link: https://patch.msgid.link/20260508082128.3344255-2-kanie@linux.alibaba.com
|
|
When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override(). That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").
The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.
Initialize the temporary pci_dev to fix this.
Repro:
Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:
# echo "8086 10f5" > /sys/bus/pci/drivers/e1000e/new_id
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
register_lock_class+0x77e/0x790
lock_acquire+0xbf/0x2e0
pci_match_device+0x24/0x180
new_id_store+0x189/0x1d0
kernfs_fop_write_iter+0x14f/0x210
vfs_write+0x263/0x5e0
ksys_write+0x79/0xf0
do_syscall_64+0x117/0xf80
Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure")
Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com
|
|
Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):
ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
ddbridge 0000:05:00.0: cannot read registers
ddbridge 0000:05:00.0: fail
BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window. The kernel corrects the BAR assignment:
pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned
Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:
https://git.kernel.org/tglx/history/c/a06a30144bbc
No other architecture performs such a late BAR correction.
Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit
4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.
Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction. Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:
vfio_pci_core_register_device()
vfio_pci_set_power_state()
pci_restore_state()
But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7. Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.
Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well. The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.
Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.
To robustly fix the issue, always update saved_config_space upon resource
assignment.
Reported-by: Bernd Schumacher <bernd@bschu.de>
Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/
Reported-by: Alexandre N. <an.tech@mailo.com>
Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/
Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Bernd Schumacher <bernd@bschu.de>
Tested-by: Alexandre N. <an.tech@mailo.com>
Cc: stable@vger.kernel.org # v6.12+
Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de
|
|
Reconfiguring ASPM when a device transitions to low-power state can enable
L1.1/L1.2 substates on the PCIe link at a time when the device is sleeping
and may be unable to exit them. ASPM should be reconfigured on D0 entry
(resume), not on the way down.
pci_set_low_power_state() calls pcie_aspm_pm_state_change() after writing
D3hot to PCI_PM_CTRL. pcie_aspm_pm_state_change() resets link->aspm_capable
to link->aspm_support and then calls pcie_config_aspm_path(), which can
enable ASPM L1.1/L1.2 substates on the PCIe link. If the device cannot
recover the link from L1.2 while in D3hot, subsequent config space reads
return 0xFFFF ("device inaccessible") and pci_power_up() fails with
messages like:
vfio-pci 0000:5d:00.0: Unable to change power state from D3hot to D0, device inaccessible
This was observed on NVIDIA H100 SXM5 GPUs bound to vfio-pci when Linux
runtime PM suspends them to D3hot: the GPU becomes permanently inaccessible
and disappears from the PCIe bus.
The call to pcie_aspm_pm_state_change() in pci_set_low_power_state() was
restored by f93e71aea6c6 ("Revert "PCI/ASPM: Remove
pcie_aspm_pm_state_change()""), which reverted 08d0cc5f3426 ("PCI/ASPM:
Remove pcie_aspm_pm_state_change()"). The revert was necessary because the
removal broke suspend/resume on certain platforms that required ASPM to be
reconfigured on D0 entry. However, the revert restored the call in both
pci_set_full_power_state() (D0 entry) and pci_set_low_power_state()
(low-power entry).
Only the D0-entry call is needed to fix the suspend/resume regression. The
low-power-entry call is harmful: reconfiguring ASPM immediately after
putting a device into D3hot can enable link substates that the device or
platform cannot exit while the device is sleeping.
Remove the pcie_aspm_pm_state_change() call from pci_set_low_power_state().
ASPM will still be reconfigured correctly when the device returns to D0 via
pci_set_full_power_state().
Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Signed-off-by: Carlos Bilbao (Lambda) <carlos.bilbao@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260428040104.78524-1-carlos.bilbao@kernel.org
|
|
There's no point in retraining a failed 2.5GT/s device at 2.5GT/s, so just
don't and return early. While such devices might be unlikely to implement
Link Active reporting, we need to retrieve the maximum link speed and use
it in a conditional later on anyway, so the early check comes for free.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/alpine.DEB.2.21.2512080356070.49654@angie.orcam.me.uk
|
|
Rewrite a check for the maximum link speed in the Link Capabilities
register in terms of pcie_get_speed_cap(). No functional change.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://patch.msgid.link/alpine.DEB.2.21.2512080348310.49654@angie.orcam.me.uk
|
|
Discard Vendor:Device ID matching in the PCIe failed link retraining quirk
and ignore the link status for the removal of the 2.5GT/s speed clamp,
whether applied by the quirk itself or the firmware earlier on. Revert to
the original target link speed if this final link retraining has failed.
This is so that link training noise in hot-plug scenarios does not make a
link remain clamped to the 2.5GT/s speed where an event race has led the
quirk to apply the speed clamp for one device, only to leave it in place
for a subsequent device to be plugged in.
Refer to the Link Capabilities register directly for the maximum link speed
determination so as to streamline backporting.
Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Cc: stable@vger.kernel.org # v6.5+
Link: https://patch.msgid.link/alpine.DEB.2.21.2512080331530.49654@angie.orcam.me.uk
|
|
This file does not use the symbols from the legacy <linux/gpio.h> header,
so drop it.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260506084858.867884-5-andriy.shevchenko@linux.intel.com
|
|
This file does not use the symbols from the legacy <linux/gpio.h> header,
so drop it.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260506084858.867884-4-andriy.shevchenko@linux.intel.com
|
|
This file does not use the symbols from the legacy <linux/gpio.h> header,
so drop it.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260506084858.867884-3-andriy.shevchenko@linux.intel.com
|
|
The driver includes the legacy GPIO header <linux/gpio.h> but does not use
any symbols from it and actually wants <linux/gpio/consumer.h>, so fix this
up.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260506084858.867884-2-andriy.shevchenko@linux.intel.com
|
|
Add device IDs for the next generation of switchtec products.
No changes to the driver were required with the new version of the
hardware.
[logang: rewrote commit message]
Signed-off-by: Ben Reed <Ben.Reed@microchip.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260505161633.67454-1-logang@deltatee.com
|
|
Use FIELD_MODIFY() to remove open-coded bit manipulation. No functional
change intended.
Signed-off-by: Hans Zhang <18255117159@163.com>
[bhelgaas: squash together]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com> # pcie-nxp-s32g.c
Link: https://patch.msgid.link/20260430162420.42839-2-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-3-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-4-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-5-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-6-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-7-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-8-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-9-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-10-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-11-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-12-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-13-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-14-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-15-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-16-18255117159@163.com
Link: https://patch.msgid.link/20260430162420.42839-17-18255117159@163.com
|
|
Some endpoint platforms cannot use platform MSI / GIC ITS to implement
EP-side doorbells. In those cases, EPF drivers cannot provide an
interrupt-driven doorbell and often fall back to polling.
Add an "embedded" doorbell backend that uses a controller-integrated
doorbell target (e.g. DesignWare integrated eDMA interrupt-emulation
doorbell).
The backend locates the doorbell register and a corresponding Linux IRQ
via the EPC aux-resource API. If the doorbell register is already
exposed via a fixed BAR mapping, provide BAR+offset. Otherwise provide
the DMA address returned by dma_map_resource() (which may be an IOVA
when an IOMMU is enabled) so EPF drivers can map it into BAR space.
When MSI doorbell allocation fails with -ENODEV,
pci_epf_alloc_doorbell() falls back to this embedded backend.
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260414141514.1341429-8-den@valinux.co.jp
|
|
pci-epf-test advertises the doorbell target to the RC as a BAR number
and an offset, and the RC rings the doorbell with a single DWORD MMIO
write.
Some doorbell backends may report that the doorbell target is already
exposed via a platform-owned fixed BAR (db_msg[0].bar/offset). In that
case, reuse the pre-exposed window and do not reprogram the BAR with
pci_epc_set_bar().
Also honor db_msg[0].irq_flags when requesting the doorbell IRQ, and
only restore the original BAR mapping on disable if pci-epf-test
programmed it.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
[bhelgaas: wrap comment]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260414141514.1341429-7-den@valinux.co.jp
|
|
Support doorbell backends where the doorbell target is already exposed
via a platform-owned fixed BAR mapping and/or where the doorbell IRQ
must be requested with specific flags.
When pci_epf_alloc_doorbell() provides db_msg[].bar/offset, reuse the
pre-exposed BAR window and skip programming a new inbound mapping. Also
honor db_msg[].irq_flags when requesting the doorbell IRQ.
Multiple doorbells may share the same Linux IRQ. Avoid duplicate
request_irq() calls by requesting each unique virq once.
Make pci-epf-vntb work with platform-defined or embedded doorbell
backends without exposing backend-specific details to the consumer
layer.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260414141514.1341429-6-den@valinux.co.jp
|
|
There are two ways to graft resource into resource tree in PCI,
pci_assign_resource() and pci_claim_resource(). Only the former logs
the action, which complicated troubleshooting the cases where resources
are assigned by pci_claim_resource(), which mostly assigns the addresses
inherited from the FW.
Add logging into pci_claim_resource() to make troubleshooting easier.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260429122617.7324-2-ilpo.jarvinen@linux.intel.com
|
|
Prepare pci-ep-msi for non-MSI doorbell backends.
Factor MSI doorbell allocation into a helper and extend struct
pci_epf_doorbell_msg with:
- irq_flags: required IRQ request flags (e.g. IRQF_SHARED for some
backends)
- type: doorbell backend type
- bar/offset: pre-exposed doorbell target location, if any
Initialize these fields for the existing MSI-backed doorbell
implementation.
Also add PCI_EPF_DOORBELL_EMBEDDED type, which is to be implemented in a
follow-up patch.
No functional changes.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260414141514.1341429-5-den@valinux.co.jp
|
|
Implement the EPC aux-resource API for DesignWare endpoint controllers
with integrated eDMA.
Currently, only report an interrupt-emulation doorbell register
(PCI_EPC_AUX_DOORBELL_MMIO), including its Linux IRQ and the write data
needed to trigger the interrupt.
If the DMA controller MMIO window is already exposed via a
platform-owned fixed BAR subregion, also provide the BAR number and
offset so EPF drivers can reuse it without reprogramming the BAR.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260414141514.1341429-4-den@valinux.co.jp
|
|
Some DesignWare PCIe controllers integrate an eDMA block whose registers
are located in a dedicated register window.
The EP-side aux-resource code exposes an interrupt-emulation doorbell
register (DOORBELL_MMIO) from that window. Its location is derived from
the start of the eDMA register window plus the doorbell offset already
provided by dw-edma, and the window size is used to validate the
computed register location.
Record the physical base and size of the integrated eDMA register window
in struct dw_pcie so the EP-side DesignWare aux-resource provider can
construct that doorbell resource.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Tested-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260414141514.1341429-3-den@valinux.co.jp
|
|
Endpoint controller drivers may integrate auxiliary blocks (e.g. DMA
engines) whose register windows and descriptor memories metadata need to
be exposed to a remote peer. Endpoint function drivers need a generic
way to discover such resources without hard-coding controller-specific
helpers.
Add pci_epc_get_aux_resources_count() / pci_epc_get_aux_resources() and
the corresponding pci_epc_ops callbacks. The count helper returns the
number of available resources, while the get helper fills a
caller-provided array of resources described by type, physical address
and size, plus type-specific metadata.
Suggested-by: Manivannan Sadhasivam <mani@kernel.org>
Suggested-by: Frank Li <Frank.li@nxp.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260414141514.1341429-2-den@valinux.co.jp
|
|
If a bus has hotplug slots that implement the slot's reset_slot callback,
it is not safe to do the non-slot specific bus reset, so don't fallback to
it. If a slot reset does fail, the subsequent bus reset will attempt a 2nd
link reset on top of previous and fail to handle the hotplug events.
Fixes: 8238cb69c01fe ("PCI: Make reset_subordinate hotplug safe")
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260421150644.3543733-1-kbusch@meta.com
|
|
Revert b0e85c3c8554 ("PCI: Add Kconfig options for MPS/MRRS strategy"),
which allowed build-time selection of the "off", "default", "safe",
"performance", or "peer2peer" strategies for MPS and MRRS configuration.
These strategies can be selected at boot-time using the
"pci=pcie_bus_tune_*" kernel parameters.
Per the discussion mentioned below, these Kconfig options were added to
work around a hardware defect in a WiFi device used in a cable modem. The
defect occurred only when the device was configured with MPS=128, and
Kconfig was a way to avoid that setting. It was easier for the modem
vendor to use Kconfig and update the kernel image than to change the kernel
parameters.
Neither Kconfig nor kernel parameters are a complete solution because the
broken WiFi device may be used in other systems where it may be configured
with MPS=128 and be susceptible to the defect.
Remove the Kconfig settings to simplify the MPS code. If we can identify
the WiFi device in question, we may be able to make a generic quirk to
avoid the problem on all system.
This is not a fix and should not be backported to previous kernels.
Link: https://lore.kernel.org/all/CA+-6iNzd0RJO0L021qz8CKrSviSst6QehY-QtJxz_-EVY0Hj0Q@mail.gmail.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260326221311.1356180-1-bhelgaas@google.com
|
|
Extend the checks in pcim_p2pdma_init() and pcim_p2pdma_provider() to
exclude functions that have pdev->non_mappable_bars set.
Consumers such as VFIO were previously able to map these for access by the
CPU or P2P. Update the comment on non_mappable_bars to show it refers to
any access, not just userspace CPU access.
Fixes: 372d6d1b8ae3c ("PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocation")
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Matt Evans <mattev@meta.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Alex Williamson <alex@shazbot.org>
Link: https://patch.msgid.link/20260423173051.1999679-1-mattev@meta.com
|
|
pci_pwrctrl_is_required() detects whether a device needs PCI pwrctrl
support. It is currently used in pci_pwrctrl_create_device(), but not in
pci_pwrctrl_power_{on/off}_device() APIs. This leads to pwrctrl core trying
to power on/off incompatible devices like USB hub downstream ports defined
in DT.
Add this check to prevent pwrctrl core from poking at wrong devices.
Fixes: b35cf3b6aa1e ("PCI/pwrctrl: Add APIs to power on/off pwrctrl devices")
Reported-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[bhelgaas: split pci_pwrctrl_is_required() move to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260421104102.12322-1-manivannan.sadhasivam@oss.qualcomm.com
|
|
Move pci_pwrctrl_is_required() earlier in the file so it can be used by
pci_pwrctrl_power_off_device() and pci_pwrctrl_power_on_device().
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[bhelgaas: split to its own patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20260421104102.12322-1-manivannan.sadhasivam@oss.qualcomm.com
|
|
sriov_restore_vf_rebar_state() uses the VF Resizable BAR Control register
to decide how many VF BARs to restore (nbars) and which VF BAR each
iteration addresses (bar_idx). bar_idx indexes into dev->sriov->barsz[],
which has only PCI_SRIOV_NUM_BARS (6) entries.
When a device does not respond, config reads typically return
PCI_ERROR_RESPONSE (~0). Both fields are 3 bits wide, so nbars and bar_idx
both evaluate to 7. The barsz[] access then goes out of bounds. UBSAN
reports this as:
UBSAN: array-index-out-of-bounds in drivers/pci/iov.c:948:51 index 7 is out of range for type 'resource_size_t [6]'
Observed on an NVIDIA RTX PRO 1000 GPU (GB207GLM) that stopped responding
during a failed GC6 power state exit. The subsequent pci_restore_state()
invoked sriov_restore_vf_rebar_state() while config reads returned
0xffffffff, triggering the splat.
Bail out if any VF Resizable BAR Control read returns PCI_ERROR_RESPONSE.
No further VF BARs are touched, which is safe because a config read that
returns PCI_ERROR_RESPONSE indicates the device is unreachable and
restoration is pointless. This mirrors the guard in
pci_restore_rebar_state().
Fixes: 5a8f77e24a30 ("PCI/IOV: Restore VF resizable BAR state after reset")
Signed-off-by: Marco Nenciarini <mnencia@kcore.it>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/44a4ae53ec2825816b816c85cd378430d9a95cc6.1776429882.git.mnencia@kcore.it
|
|
pci_restore_rebar_state() uses the Resizable BAR Control register to decide
how many BARs to restore (nbars) and which BAR each iteration addresses
(bar_idx).
When a device does not respond, config reads typically return
PCI_ERROR_RESPONSE (~0). Both fields are 3 bits wide, so nbars and bar_idx
both evaluate to 7, past the spec's valid ranges for both fields.
pci_resource_n() then returns an unrelated resource slot, whose size is
used to derive a nonsensical value written back to the Resizable BAR
Control register.
Bail out if any Resizable BAR Control read returns PCI_ERROR_RESPONSE. No
further BARs are touched, which is safe because a config read that returns
PCI_ERROR_RESPONSE indicates the device is unreachable and restoration is
pointless.
Fixes: d3252ace0bc6 ("PCI: Restore resized BAR state on resume")
Signed-off-by: Marco Nenciarini <mnencia@kcore.it>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/666cac19b5daa0ab0e0ab64454e76b4d24465dbd.1776429882.git.mnencia@kcore.it
|
|
When a PCI device is unbound from its driver, pci_device_remove() sets the
cached power state in pci_dev->current_state to PCI_UNKNOWN. This was
introduced by commit 2449e06a5696 ("PCI: reset pci device state to unknown
state for resume") to invalidate the cached power state in case the system
is subsequently put to sleep.
For bound devices, the cached power state is set to PCI_UNKNOWN in
pci_pm_suspend_noirq(), immediately before entering system sleep.
Extend to unbound devices for consistency.
This obviates the need to change the cached power state on unbind, so stop
doing so.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/af7d11d3ceb231acc90829f7a5c8400c2446744f.1776415510.git.lukas@wunner.de
|
|
In C, bitfields are not necessarily safe to modify from multiple
threads without locking. Switch "of_node_reused" over to the "flags"
field so modifications are safe.
Cc: Johan Hovold <johan@kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Acked-by: Manivannan Sadhasivam <mani@kernel.org> # PCI_PWRCTRL
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20260406162231.v5.8.I806b8636cd3724f6cd1f5e199318ab8694472d90@changeid
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Adjust build infrastructure for 32BIT/64BIT
- Add HIGHMEM (PKMAP and FIX_KMAP) support
- Show and handle CPU vulnerabilites correctly
- Batch the icache maintenance for jump_label
- Add more atomic instructions support for BPF JIT
- Add more features (e.g. fsession) support for BPF trampoline
- Some bug fixes and other small changes
* tag 'loongarch-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (21 commits)
selftests/bpf: Enable CAN_USE_LOAD_ACQ_STORE_REL for LoongArch
LoongArch: BPF: Add fsession support for trampolines
LoongArch: BPF: Introduce emit_store_stack_imm64() helper
LoongArch: BPF: Support up to 12 function arguments for trampoline
LoongArch: BPF: Support small struct arguments for trampoline
LoongArch: BPF: Open code and remove invoke_bpf_mod_ret()
LoongArch: BPF: Support load-acquire and store-release instructions
LoongArch: BPF: Support 8 and 16 bit read-modify-write instructions
LoongArch: BPF: Add the default case in emit_atomic() and rename it
LoongArch: Define instruction formats for AM{SWAP/ADD}.{B/H} and DBAR
LoongArch: Batch the icache maintenance for jump_label
LoongArch: Add flush_icache_all()/local_flush_icache_all()
LoongArch: Add spectre boundry for syscall dispatch table
LoongArch: Show CPU vulnerabilites correctly
LoongArch: Make arch_irq_work_has_interrupt() true only if IPI HW exist
LoongArch: Use get_random_canary() for stack canary init
LoongArch: Improve the logging of disabling KASLR
LoongArch: Align FPU register state to 32 bytes
LoongArch: Handle CONFIG_32BIT in syscall_get_arch()
LoongArch: Add HIGHMEM (PKMAP and FIX_KMAP) support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull PCMCIA updates from Dominik Brodowski:
"A number of minor PCMCIA bugfixes and cleanups, and a patch removing
obsolete host controller drivers"
* tag 'pcmcia-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
pcmcia: remove obsolete host controller drivers
pcmcia: Convert to use less arguments in pci_bus_for_each_resource()
PCMCIA: Fix garbled log messages for KERN_CONT
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for CONFIG_PAGE_TABLE_CHECK and enable it in
debug_defconfig. s390 can only tell user from kernel PTEs via the mm,
so mm_struct is now passed into pxx_user_accessible_page() callbacks
- Expose the PCI function UID as an arch-specific slot attribute in
sysfs so a function can be identified by its user-defined id while
still in standby. Introduces a generic ARCH_PCI_SLOT_GROUPS hook in
drivers/pci/slot.c
- Refresh s390 PCI documentation to reflect current behavior and cover
previously undocumented sysfs attributes
- zcrypt device driver cleanup series: consistent field types, clearer
variable naming, a kernel-doc warning fix, and a comment explaining
the intentional synchronize_rcu() in pkey_handler_register()
- Provide an s390 arch_raw_cpu_ptr() that avoids the detour via
get_lowcore() using alternatives, shrinking defconfig by ~27 kB
- Guard identity-base randomization with kaslr_enabled() so nokaslr
keeps the identity mapping at 0 even with RANDOMIZE_IDENTITY_BASE=y
- Build S390_MODULES_SANITY_TEST as a module only by requiring KUNIT &&
m, since built-in would not exercise module loading
- Remove the permanently commented-out HMCDRV_DEV_CLASS create_class()
code in the hmcdrv driver
- Drop stale ident_map_size extern conflicting with asm/page.h
* tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Fix warning about wrong kernel doc comment
PCI: s390: Expose the UID as an arch specific PCI slot attribute
docs: s390/pci: Improve and update PCI documentation
s390/pkey: Add comment about synchronize_rcu() to pkey base
s390/hmcdrv: Remove commented out code
s390/zcrypt: Slight rework on the agent_id field
s390/zcrypt: Explicitly use a card variable in _zcrypt_send_cprb
s390/zcrypt: Rework MKVP fields and handling
s390/zcrypt: Make apfs a real unsigned int field
s390/zcrypt: Rework domain processing within zcrypt device driver
s390/zcrypt: Move inline function rng_type6cprb_msgx from header to code
s390/percpu: Provide arch_raw_cpu_ptr()
s390: Enable page table check for debug_defconfig
s390/pgtable: Add s390 support for page table check
s390/pgtable: Use set_pmd_bit() to invalidate PMD entry
mm/page_table_check: Pass mm_struct to pxx_user_accessible_page()
s390/boot: Respect kaslr_enabled() for identity randomization
s390/Kconfig: Make modules sanity test a module-only option
s390/setup: Drop stale ident_map_size declaration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Wei Liu:
- Fix cross-compilation for hv tools (Aditya Garg)
- Fix vmemmap_shift exceeding MAX_FOLIO_ORDER in mshv_vtl (Naman Jain)
- Limit channel interrupt scan to relid high water mark (Michael
Kelley)
- Export hv_vmbus_exists() and use it in pci-hyperv (Dexuan Cui)
- Fix cleanup and shutdown issues for MSHV (Jork Loeser)
- Introduce more tracing support for MSHV (Stanislav Kinsburskii)
* tag 'hyperv-next-signed-20260421' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Skip LP/VP creation on kexec
x86/hyperv: move stimer cleanup to hv_machine_shutdown()
Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing
mshv: Add tracepoint for GPA intercept handling
mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER
tools: hv: Fix cross-compilation
Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hyperv
mshv: Introduce tracing support
Drivers: hv: vmbus: Limit channel interrupt scan to relid high water mark
|
|
Adjust build infrastructure (Kconfig, Makefile and ld scripts) to let
us enable both 32BIT/64BIT kernel build.
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Allow TLP Processing Hints to be enabled for RCiEPs (George Abraham
P)
- Enable AtomicOps only if we know the Root Port supports them (Gerd
Bayer)
- Don't enable AtomicOps for RCiEPs since none of them need Atomic
Ops and we can't tell whether the Root Complex would support them
(Gerd Bayer)
- Leave Precision Time Measurement disabled until a driver enables it
to avoid PCIe errors (Mika Westerberg)
- Make pci_set_vga_state() fail if bridge doesn't support VGA
routing, i.e., PCI_BRIDGE_CTL_VGA is not writable, and return
errors to vga_get() callers including userspace via
/dev/vga_arbiter (Simon Richter)
- Validate max-link-speed from DT in j721e, brcmstb, mediatek-gen3,
rzg3s drivers (where the actual controller constraints are known),
and remove validation from the generic OF DT accessor (Hans Zhang)
- Remove pc110pad driver (no longer useful after 486 CPU support
removed) and no_pci_devices() (pc110pad was the last user) (Dmitry
Torokhov, Heiner Kallweit)
Resource management:
- Prevent assigning space to unimplemented bridge windows; previously
we mistakenly assumed prefetchable window existed and assigned
space and put a BAR there (Ahmed Naseef)
- Avoid shrinking bridge windows to fit in the initial Root Port
window; fixes one problem with devices with large BARs connected
via switches, e.g., Thunderbolt (Ilpo Järvinen)
- Pass full extent of empty space, not just the aligned space, to
resource_alignf callback so free space before the requested
alignment can be used (Ilpo Järvinen)
- Place small resources before larger ones for better utilization of
address space (Ilpo Järvinen)
- Fix alignment calculation for resource size larger than align,
e.g., bridge windows larger than the 1MB required alignment (Ilpo
Järvinen)
Reset:
- Update slot handling so all ARI functions are treated as being in
the same slot. They're all reset by Secondary Bus Reset, but
previously drivers of ARI functions that appeared to be on a
non-zero device weren't notified and fatal hardware errors could
result (Keith Busch)
- Make sysfs reset_subordinate hotplug safe to avoid spurious hotplug
events (Keith Busch)
- Hide Secondary Bus Reset ('bus') from sysfs reset_methods if masked
by CXL because it has no effect (Vidya Sagar)
- Avoid FLR for AMD NPU device, where it causes the device to hang
(Lizhi Hou)
Error handling:
- Clear only error bits in PCIe Device Status to avoid accidentally
clearing Emergency Power Reduction Detected (Shuai Xue)
- Check for AER errors even in devices without drivers (Lukas Wunner)
- Initialize ratelimit info so DPC and EDR paths log AER error
information (Kuppuswamy Sathyanarayanan)
Power control:
- Add UPD720201/UPD720202 USB 3.0 xHCI Host Controller .compatible so
generic pwrctrl driver can control it (Neil Armstrong)
Hotplug:
- Set LED_HW_PLUGGABLE for NPEM hotplug-capable ports so LED core
doesn't complain when setting brightness fails because the endpoint
is gone (Richard Cheng)
Peer-to-peer DMA:
- Allow wildcards in list of host bridges that support peer-to-peer
DMA between hierarchy domains and add all Google SoCs (Jacob
Moroni)
Endpoint framework:
- Advertise dynamic inbound mapping support in pci-epf-test and
update host pci_endpoint_test to skip doorbell testing if not
advertised by endpoint (Koichiro Den)
- Return 0, not remaining timeout, when MHI eDMA ops complete so
mhi_ep_ring_add_element() doesn't interpret non-zero as failure
(Daniel Hodges)
- Remove vntb and ntb duplicate resource teardown that leads to oops
when .allow_link() fails or .drop_link() is called (Koichiro Den)
- Disable vntb delayed work before clearing BAR mappings and
doorbells to avoid oops caused by doing the work after resources
have been torn down (Koichiro Den)
- Add a way to describe reserved subregions within BARs, e.g.,
platform-owned fixed register windows, and use it for the RK3588
BAR4 DMA ctrl window (Koichiro Den)
- Add BAR_DISABLED for BARs that will never be available to an EPF
driver, and change some BAR_RESERVED annotations to BAR_DISABLED
(Niklas Cassel)
- Add NTB .get_dma_dev() callback for cases where DMA API requires a
different device, e.g., vNTB devices (Koichiro Den)
- Add reserved region types for MSI-X Table and PBA so Endpoint
controllers can them as describe hardware-owned regions in a
BAR_RESERVED BAR (Manikanta Maddireddy)
- Make Tegra194/234 BAR0 programmable and remove 1MB size limit
(Manikanta Maddireddy)
- Expose Tegra BAR2 (MSI-X) and BAR4 (DMA) as 64-bit BAR_RESERVED
(Manikanta Maddireddy)
- Add Tegra194 and Tegra234 device table entries to pci_endpoint_test
(Manikanta Maddireddy)
- Skip the BAR subrange selftest if there are not enough inbound
window resources to run the test (Christian Bruel)
New native PCIe controller drivers:
- Add DT binding and driver for Andes QiLai SoC PCIe host controller
(Randolph Lin)
- Add DT binding and driver for ESWIN PCIe Root Complex (Senchuan
Zhang)
Baikal T-1 PCIe controller driver:
- Remove driver since it never quite became usable (Andy Shevchenko)
Cadence PCIe controller driver:
- Implement byte/word config reads with dword (32-bit) reads because
some Cadence controllers don't support sub-dword accesses (Aksh
Garg)
CIX Sky1 PCIe controller driver:
- Add 'power-domains' to DT binding for SCMI power domain (Gary Yang)
Freescale i.MX6 PCIe controller driver:
- Add i.MX94 and i.MX943 to fsl,imx6q-pcie-ep DT binding (Richard
Zhu)
- Delay instead of polling for L2/L3 Ready after PME_Turn_off when
suspending i.MX6SX because LTSSM registers are inaccessible
(Richard Zhu)
- Separate PERST# assertion (for resetting endpoints) from core reset
(for resetting the RC itself) to prepare for new DTs with PERST#
GPIO in per-Root Port nodes (Sherry Sun)
- Retain Root Port MSI capability on i.MX7D, i.MX8MM, and i.MX8MQ so
MSI from downstream devices will work (Richard Zhu)
- Fix i.MX95 reference clock source selection when internal refclk is
used (Franz Schnyder)
Freescale Layerscape PCIe controller driver:
- Allow building as a removable module (Sascha Hauer)
MediaTek PCIe Gen3 controller driver:
- Use dev_err_probe() to simplify error paths and make deferred probe
messages visible in /sys/kernel/debug/devices_deferred (Chen-Yu
Tsai)
- Power off device if setup fails (Chen-Yu Tsai)
- Integrate new pwrctrl API to enable power control for WiFi/BT
adapters on mainboard or in PCIe or M.2 slots (Chen-Yu Tsai)
NVIDIA Tegra194 PCIe controller driver:
- Poll less aggressively and non-atomically for PME_TO_Ack during
transition to L2 (Vidya Sagar)
- Disable LTSSM after transition to Detect on surprise link down to
stop toggling between Polling and Detect (Manikanta Maddireddy)
- Don't force the device into the D0 state before L2 when suspending
or shutting down the controller (Vidya Sagar)
- Disable PERST# IRQ only in Endpoint mode because it's not
registered in Root Port mode (Manikanta Maddireddy)
- Handle 'nvidia,refclk-select' as optional (Vidya Sagar)
- Disable direct speed change in Endpoint mode so link speed change
is controlled by the host (Vidya Sagar)
- Set LTR values before link up to avoid bogus LTR messages with 0
latency (Vidya Sagar)
- Allow system suspend when the Endpoint link is down (Vidya Sagar)
- Use DWC IP core version, not Tegra custom values, to avoid DWC core
version check warnings (Manikanta Maddireddy)
- Apply ECRC workaround to devices based on DesignWare 5.00a as well
as 4.90a (Manikanta Maddireddy)
- Disable PM Substate L1.2 in Endpoint mode to work around Tegra234
erratum (Vidya Sagar)
- Delay post-PERST# cleanup until core is powered on to avoid CBB
timeout (Manikanta Maddireddy)
- Assert CLKREQ# so switches that forward it to their downstream side
can bring up those links successfully (Vidya Sagar)
- Calibrate pipe to UPHY for Endpoint mode to reset stale PLL state
from any previous bad link state (Vidya Sagar)
- Remove IRQF_ONESHOT flag from Endpoint interrupt registration so
DMA driver and Endpoint controller driver can share the interrupt
line (Vidya Sagar)
- Enable DMA interrupt to support DMA in both Root Port and Endpoint
modes (Vidya Sagar)
- Enable hardware link retraining after link goes down in Endpoint
mode (Vidya Sagar)
- Add DT binding and driver support for core clock monitoring (Vidya
Sagar)
Qualcomm PCIe controller driver:
- Advertise 'Hot-Plug Capable' and set 'No Command Completed Support'
since Qcom Root Ports support hotplug events like DL_Up/Down and
can accept writes to Slot Control without delays between writes
(Krishna Chaitanya Chundru)
Renesas R-Car PCIe controller driver:
- Mark Endpoint BAR0 and BAR2 as Resizable (Koichiro Den)
- Reduce EPC BAR alignment requirement to 4K (Koichiro Den)
Renesas RZ/G3S PCIe controller driver:
- Add RZ/G3E to DT binding and to driver (John Madieu)
- Assert (not deassert) resets in probe error path (John Madieu)
- Assert resets in suspend path in reverse order they were deasserted
during probe (John Madieu)
- Rework inbound window algorithm to prevent mapping more than
intended region and enforce alignment on size, to prepare for
RZ/G3E support (John Madieu)
Rockchip DesignWare PCIe controller driver:
- Add tracepoints for PCIe controller LTSSM transitions and link rate
changes (Shawn Lin)
- Trace LTSSM events collected by the dw-rockchip debug FIFO (Shawn
Lin)
SOPHGO PCIe controller driver:
- Disable ASPM L0s and L1 on Sophgo 2042 PCIe Root Ports that
advertise support for them (Yao Zi)
Synopsys DesignWare PCIe controller driver:
- Continue with system suspend even if an Endpoint doesn't respond
with PME_TO_Ack message (Manivannan Sadhasivam)
- Set Endpoint MSI-X Table Size in the correct function of a
multi-function device when configuring MSI-X, not in Function 0
(Aksh Garg)
- Set Max Link Width and Max Link Speed for all functions of a
multi-function device, not just Function 0 (Aksh Garg)
- Expose PCIe event counters in groups 5-7 in debugfs (Hans Zhang)
Miscellaneous:
- Warn only once about invalid ACS kernel parameter format (Richard
Cheng)
- Suppress FW_BUG warning when writing sysfs 'numa_node' with the
current value (Li RongQing)
- Drop redundant 'depends on PCI' from Kconfig (Julian Braha)"
* tag 'pci-v7.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (165 commits)
PCI/P2PDMA: Add Google SoCs to the P2P DMA host bridge list
PCI/P2PDMA: Allow wildcard Device IDs in host bridge list
PCI: sg2042: Avoid L0s and L1 on Sophgo 2042 PCIe Root Ports
PCI: cadence: Add flags for disabling ASPM capability for broken Root Ports
PCI: tegra194: Add core monitor clock support
dt-bindings: PCI: tegra194: Add monitor clock support
PCI: tegra194: Enable hardware hot reset mode in Endpoint mode
PCI: tegra194: Enable DMA interrupt
PCI: tegra194: Remove IRQF_ONESHOT flag during Endpoint interrupt registration
PCI: tegra194: Calibrate pipe to UPHY for Endpoint mode
PCI: tegra194: Assert CLKREQ# explicitly by default
PCI: tegra194: Fix CBB timeout caused by DBI access before core power-on
PCI: tegra194: Disable L1.2 capability of Tegra234 EP
PCI: dwc: Apply ECRC workaround to DesignWare 5.00a as well
PCI: tegra194: Use DWC IP core version
PCI: tegra194: Free up Endpoint resources during remove()
PCI: tegra194: Allow system suspend when the Endpoint link is not up
PCI: tegra194: Set LTR message request before PCIe link up in Endpoint mode
PCI: tegra194: Disable direct speed change for Endpoint mode
PCI: tegra194: Use devm_gpiod_get_optional() to parse "nvidia,refclk-select"
...
|
|
With commit f84b21da3624 ("PCI: hv: Don't load the driver for baremetal root partition"),
the bare metal Linux root partition won't use the pci-hyperv driver, but
when a Linux VM runs on the Linux root partition, pci-hyperv's module_init
function init_hv_pci_drv() can still run, e.g. in the case of
CONFIG_PCI_HYPERV=y, even if the VMBus driver is not used in such a VM
(i.e. the hv_vmbus driver's init function returns -ENODEV due to
vmbus_root_device being NULL).
In such a Linux VM, init_hv_pci_drv() runs with a side effect: the 3
hvpci_block_ops callbacks are set to functions that depend on hv_vmbus.
Later, when the MLX driver in such a VM invokes the callbacks, e.g. in
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c:
mlx5_hv_register_invalidate(), hvpci_block_ops.reg_blk_invalidate() is
hv_register_block_invalidate() rather than a NULL function pointer, and
hv_register_block_invalidate() assumes that it can find a struct
hv_pcibus_device from pdev->bus->sysdata, which is false in such a VM.
Consequently, hv_register_block_invalidate() -> get_pcichild_wslot() ->
spin_lock_irqsave() may hang since it can be accessing an invalid
spinlock pointer.
Fix the issue by exporting hv_vmbus_exists() and using it in pci-hyperv:
hv_root_partition() is true and hv_nested is false ==>
hv_vmbus_exists() is false.
hv_root_partition() is true and hv_nested is true ==>
hv_vmbus_exists() is true.
hv_root_partition() is false ==> hv_vmbus_exists() is true.
While at it, rename vmbus_exists() to hv_vmbus_exists() to follow the
convention that all public functions have the hv_ prefix; also change
the return value's type from int to bool to make the code more readable;
also move the two pr_info() calls.
Reported-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI support updates from Rafael Wysocki:
"These include an update of the CMOS RTC driver and the related ACPI
and x86 code that, among other things, switches it over to using the
platform device interface for device binding on x86 instead of the PNP
device driver interface (which allows the code in question to be
simplified quite a bit), a major update of the ACPI Time and Alarm
Device (TAD) driver adding an RTC class device interface to it, and
updates of core ACPI drivers that remove some unnecessary and not
really useful code from them.
Apart from that, two drivers are converted to using the platform
driver interface for device binding instead of the ACPI driver one,
which is slated for removal, support for the Performance Limited
register is added to the ACPI CPPC library and there are some
janitorial updates of it and the related cpufreq CPPC driver, the ACPI
processor driver is fixed and cleaned up, and NVIDIA vendor CPER
record handler is added to the APEI GHES code.
Also, the interface for obtaining a CPU UID from ACPI is consolidated
across architectures and used for fixing a problem with the PCI TPH
Steering Tag on ARM64, there are two updates related to ACPICA, a
minor ACPI OS Services Layer (OSL) update, and a few assorted updates
related to ACPI tables parsing.
Specifics:
- Update maintainers information regarding ACPICA (Rafael Wysocki)
- Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy()
(Kees Cook)
- Trigger an ordered system power off after encountering a fatal
error operator in AML (Armin Wolf)
- Enable ACPI FPDT parsing on LoongArch (Xi Ruoyao)
- Remove the temporary stop-gap acpi_pptt_cache_v1_full structure
from the ACPI PPTT parser (Ben Horgan)
- Add support for exposing ACPI FPDT subtables FBPT and S3PT (Nate
DeSimone)
- Address multiple assorted issues and clean up the code in the ACPI
processor idle driver (Huisong Li)
- Replace strlcat() in the ACPI processor idle drive with a better
alternative (Andy Shevchenko)
- Rearrange and clean up acpi_processor_errata_piix4() (Rafael
Wysocki)
- Move reference performance to capabilities and fix an uninitialized
variable in the ACPI CPPC library (Pengjie Zhang)
- Add support for the Performance Limited Register to the ACPI CPPC
library (Sumit Gupta)
- Add cppc_get_perf() API to read performance controls, extend
cppc_set_epp_perf() for FFH/SystemMemory, and make the ACPI CPPC
library warn on missing mandatory DESIRED_PERF register (Sumit
Gupta)
- Modify the cpufreq CPPC driver to update MIN_PERF/MAX_PERF in
target callbacks to allow it to control performance bounds via
standard scaling_min_freq and scaling_max_freq sysfs attributes and
add sysfs documentation for the Performance Limited Register to it
(Sumit Gupta)
- Add ACPI support to the platform device interface in the CMOS RTC
driver, make the ACPI core device enumeration code create a
platform device for the CMOS RTC, and drop CMOS RTC PNP device
support (Rafael Wysocki)
- Consolidate the x86-specific CMOS RTC handling with the ACPI TAD
driver and clean up the CMOS RTC ACPI address space handler (Rafael
Wysocki)
- Enable ACPI alarm in the CMOS RTC driver if advertised in ACPI FADT
and allow that driver to work without a dedicated IRQ if the ACPI
alarm is used (Rafael Wysocki)
- Clean up the ACPI TAD driver in various ways and add an RTC class
device interface, including both the RTC setting/reading and alarm
timer support, to it (Rafael Wysocki)
- Clean up the ACPI AC and ACPI PAD (processor aggregator device)
drivers (Rafael Wysocki)
- Rework checking for duplicate video bus devices and consolidate
pnp.bus_id workarounds handling in the ACPI video bus driver
(Rafael Wysocki)
- Update the ACPI core device drivers to stop setting
acpi_device_name() unnecessarily (Rafael Wysocki)
- Rearrange code using acpi_device_class() in the ACPI core device
drivers and update them to stop setting acpi_device_class()
unnecessarily (Rafael Wysocki)
- Define ACPI_AC_CLASS in one place (Rafael Wysocki)
- Convert the ni903x_wdt watchdog driver and the xen ACPI PAD driver
to bind to platform devices instead of ACPI devices (Rafael
Wysocki)
- Add devm_ghes_register_vendor_record_notifier(), use it in the PCI
hisi driver, and Add NVIDIA vendor CPER record handler (Kai-Heng
Feng)
- Consolidate the interface for obtaining a CPU UID from ACPI across
architectures and use it to address incorrect PCI TPH Steering Tag
on ARM64 resulting from the invalid assumption that the ACPI
Processor UID would always be the same as the corresponding logical
CPU ID in Linux (Chengwen Feng)"
* tag 'acpi-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (73 commits)
ACPICA: Update maintainers information
watchdog: ni903x_wdt: Convert to a platform driver
ACPI: PAD: xen: Convert to a platform driver
ACPI: processor: idle: Reset cpuidle on C-state list changes
cpuidle: Extract and export no-lock variants of cpuidle_unregister_device()
PCI/TPH: Pass ACPI Processor UID to Cache Locality _DSM
ACPI: PPTT: Use acpi_get_cpu_uid() and remove get_acpi_id_for_cpu()
perf: arm_cspmu: Switch to acpi_get_cpu_uid() from get_acpi_id_for_cpu()
ACPI: Centralize acpi_get_cpu_uid() declaration in include/linux/acpi.h
x86/acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
RISC-V: ACPI: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
LoongArch: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
arm64: acpi: Add acpi_get_cpu_uid() for unified ACPI CPU UID retrieval
ACPI: APEI: GHES: Add NVIDIA vendor CPER record handler
PCI: hisi: Use devm_ghes_register_vendor_record_notifier()
ACPI: APEI: GHES: Add devm_ghes_register_vendor_record_notifier()
ACPI: tables: Enable FPDT on LoongArch
ACPI: processor: idle: Fix NULL pointer dereference in hotplug path
ACPI: processor: idle: Reset power_setup_done flag on initialization failure
ACPI: TAD: Add alarm support to the RTC class device interface
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"debugfs:
- Fix NULL pointer dereference in debugfs_create_str()
- Fix misplaced EXPORT_SYMBOL_GPL for debugfs_create_str()
- Fix soundwire debugfs NULL pointer dereference from uninitialized
firmware_file
device property:
- Make fwnode flags modifications thread safe; widen the field to
unsigned long and use set_bit() / clear_bit() based accessors
- Document how to check for the property presence
devres:
- Separate struct devres_node from its "subclasses" (struct devres,
struct devres_group); give struct devres_node its own release and
free callbacks for per-type dispatch
- Introduce struct devres_action for devres actions, avoiding the
ARCH_DMA_MINALIGN alignment overhead of struct devres
- Export struct devres_node and its init/add/remove/dbginfo
primitives for use by Rust Devres<T>
- Fix missing node debug info in devm_krealloc()
- Use guard(spinlock_irqsave) where applicable; consolidate unlock
paths in devres_release_group()
driver_override:
- Convert PCI, WMI, vdpa, s390/cio, s390/ap, and fsl-mc to the
generic driver_override infrastructure, replacing per-bus
driver_override strings, sysfs attributes, and match logic; fixes a
potential UAF from unsynchronized access to driver_override in bus
match() callbacks
- Simplify __device_set_driver_override() logic
kernfs:
- Send IN_DELETE_SELF and IN_IGNORED inotify events on kernfs file
and directory removal
- Add corresponding selftests for memcg
platform:
- Allow attaching software nodes when creating platform devices via a
new 'swnode' field in struct platform_device_info
- Add kerneldoc for struct platform_device_info
software node:
- Move software node initialization from postcore_initcall() to
driver_init(), making it available early in the boot process
- Move kernel_kobj initialization (ksysfs_init) earlier to support
the above
- Remove software_node_exit(); dead code in a built-in unit
SoC:
- Introduce of_machine_read_compatible() and of_machine_read_model()
OF helpers and export soc_attr_read_machine() to replace direct
accesses to of_root from SoC drivers; also enables
CONFIG_COMPILE_TEST coverage for these drivers
sysfs:
- Constify attribute group array pointers to
'const struct attribute_group *const *' in sysfs functions,
device_add_groups() / device_remove_groups(), and struct class
Rust:
- Devres:
- Embed struct devres_node directly in Devres<T> instead of going
through devm_add_action(), avoiding the extra allocation and the
unnecessary ARCH_DMA_MINALIGN alignment
- I/O:
- Turn IoCapable from a marker trait into a functional trait
carrying the raw I/O accessor implementation (io_read /
io_write), providing working defaults for the per-type Io
methods
- Add RelaxedMmio wrapper type, making relaxed accessors usable in
code generic over the Io trait
- Remove overloaded per-type Io methods and per-backend macros
from Mmio and PCI ConfigSpace
- I/O (Register):
- Add IoLoc trait and generic read/write/update methods to the Io
trait, making I/O operations parameterizable by typed locations
- Add register! macro for defining hardware register types with
typed bitfield accessors backed by Bounded values; supports
direct, relative, and array register addressing
- Add write_reg() / try_write_reg() and LocatedRegister trait
- Update PCI sample driver to demonstrate the register! macro
Example:
```
register! {
/// UART control register.
CTRL(u32) @ 0x18 {
/// Receiver enable.
19:19 rx_enable => bool;
/// Parity configuration.
14:13 parity ?=> Parity;
}
/// FIFO watermark and counter register.
WATER(u32) @ 0x2c {
/// Number of datawords in the receive FIFO.
26:24 rx_count;
/// RX interrupt threshold.
17:16 rx_water;
}
}
impl WATER {
fn rx_above_watermark(&self) -> bool {
self.rx_count() > self.rx_water()
}
}
fn init(bar: &pci::Bar<BAR0_SIZE>) {
let water = WATER::zeroed()
.with_const_rx_water::<1>(); // > 3 would not compile
bar.write_reg(water);
let ctrl = CTRL::zeroed()
.with_parity(Parity::Even)
.with_rx_enable(true);
bar.write_reg(ctrl);
}
fn handle_rx(bar: &pci::Bar<BAR0_SIZE>) {
if bar.read(WATER).rx_above_watermark() {
// drain the FIFO
}
}
fn set_parity(bar: &pci::Bar<BAR0_SIZE>, parity: Parity) {
bar.update(CTRL, |r| r.with_parity(parity));
}
```
- IRQ:
- Move 'static bounds from where clauses to trait declarations for
IRQ handler traits
- Misc:
- Enable the generic_arg_infer Rust feature
- Extend Bounded with shift operations, single-bit bool
conversion, and const get()
Misc:
- Make deferred_probe_timeout default a Kconfig option
- Drop auxiliary_dev_pm_ops; the PM core falls back to driver PM
callbacks when no bus type PM ops are set
- Add conditional guard support for device_lock()
- Add ksysfs.c to the DRIVER CORE MAINTAINERS entry
- Fix kernel-doc warnings in base.h
- Fix stale reference to memory_block_add_nid() in documentation"
* tag 'driver-core-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (67 commits)
bus: fsl-mc: use generic driver_override infrastructure
s390/ap: use generic driver_override infrastructure
s390/cio: use generic driver_override infrastructure
vdpa: use generic driver_override infrastructure
platform/wmi: use generic driver_override infrastructure
PCI: use generic driver_override infrastructure
driver core: make software nodes available earlier
software node: remove software_node_exit()
kernel: ksysfs: initialize kernel_kobj earlier
MAINTAINERS: add ksysfs.c to the DRIVER CORE entry
drivers/base/memory: fix stale reference to memory_block_add_nid()
device property: Document how to check for the property presence
soundwire: debugfs: initialize firmware_file to empty string
debugfs: fix placement of EXPORT_SYMBOL_GPL for debugfs_create_str()
debugfs: check for NULL pointer in debugfs_create_str()
driver core: Make deferred_probe_timeout default a Kconfig option
driver core: simplify __device_set_driver_override() clearing logic
driver core: auxiliary bus: Drop auxiliary_dev_pm_ops
device property: Make modifications of fwnode "flags" thread safe
rust: devres: embed struct devres_node directly
...
|