| Age | Commit message (Collapse) | Author | Files | Lines |
|
The default memory profile configures rxdma_monitor_dst_ring_size as 8092,
which is a typo. The intended value is 8192, consistent with all other ring
sizes in the table being powers of two.
Correct the monitor destination ring size to 8192.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Fixes: defae535dd63 ("wifi: ath12k: Add a table of parameters entries impacting memory consumption")
Signed-off-by: Aaradhana Sahu <aaradhana.sahu@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260616062342.4079796-1-aaradhana.sahu@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
For WCN7850, MAC buffer ring size is updated to 2048 in
955df16f2a4c3 ("wifi: ath12k: change MAC buffer ring size to 2048")
to increase peak throughput.
But during the RX process, a phenomenon can still be observed where
the throughput drops by about 30% from its peak value and then recovers,
and this behavior repeats during RX.
After increasing MAC buffer ring size to 4096, the data rate drop has
gone.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
Signed-off-by: Yingying Tang <yingying.tang@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260610031358.2043716-1-yingying.tang@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Do not populate peer and link_id in ieee80211_rx_status for monitor
MSDUs.
The monitor RX path is handled differently in mac80211 when
RX_FLAG_ONLY_MONITOR is set, and does not consume peer/link metadata.
As such, looking up the peer and updating link_id here is unnecessary.
Additionally, this metadata is not required for monitor mode delivery,
and performing the lookup/update introduces redundant work and the
potential for inconsistent rx_status state if multiple paths modify it.
Hence, remove the peer lookup and link_id update from the monitor MSDU
delivery path.
This also removes the per-MSDU debug logging in the monitor path,
slightly reducing debuggability, but avoids unnecessary overhead in the
monitor RX path.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Signed-off-by: Sushant Butta <sushant.butta@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260609064856.547032-3-sushant.butta@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
monitor mode
Monitor mode delivers raw 802.11 frames, not 802.3/Ethernet frames. Setting
RX_FLAG_8023 for monitor RX is incorrect and can break userspace capture and
analysis. Do not update this flag in the monitor path to ensure correct
handling of captured frames.
In the monitor path, RX_FLAG_ONLY_MONITOR is always set before decap
is evaluated, which forces decap to remain DP_RX_DECAP_TYPE_RAW.
As a result, the condition to set RX_FLAG_8023 can never be satisfied.
Hence, drop this unreachable code.
Also remove the unused hal_rx_mon_ppdu_info parameter from
ath12k_dp_mon_rx_deliver_msdu(), as it was passed but never used.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Signed-off-by: Sushant Butta <sushant.butta@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260609064856.547032-2-sushant.butta@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Currently, the frequency on which each radio is operating
is not available in device_dp_stats. This information is
helpful in debugging the channel-specific throughput and
is available with iw/nl80211 dump.
Extend the device_dp_stats dump to display the center
frequency in the existing per-radio loop.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Signed-off-by: Sreeramya Soratkal <sreeramya.soratkal@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Aishwarya R <aishwarya.r@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260626085253.3927269-4-sreeramya.soratkal@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
In MLO configurations the device_dp_stats debugfs file is read
separately for each ath12k device. Without a timestamp it is impossible
to know whether two snapshots were taken at the same moment, making
counter comparisons across devices unreliable.
Prepend a ktime-based millisecond timestamp to the output header so the
reader can confirm when the snapshot was taken.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Signed-off-by: Sreeramya Soratkal <sreeramya.soratkal@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Aishwarya R <aishwarya.r@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260626085253.3927269-3-sreeramya.soratkal@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
The REO Rx Received and Rx WBM REL SRC Errors display loops in
ath12k_debugfs_dump_device_dp_stats() iterate up to the compile-time
constant ATH12K_MAX_DEVICES. This unconditionally prints zeros
in columns with no hardware behind it, making the output misleading.
Replace the compile-time bound with the runtime ab->ag->num_devices so
only live device slots appear in the output.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Signed-off-by: Sreeramya Soratkal <sreeramya.soratkal@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Aishwarya R <aishwarya.r@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260626085253.3927269-2-sreeramya.soratkal@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Advertise IEEE80211_OFFLOAD_ENCAP_MCAST to inform mac80211 that
multicast frame encapsulation is handled in hardware. This allows
mac80211 to pass Ethernet-formatted multicast frames directly to
the driver.
In ath12k_wifi7_mac_op_tx(), refine the logic that selects the MLO
multicast replication path. Add a sta pointer check so that only unicast
Hardware-encap frames use the direct transmit path, while multicast
Hardware-encap frames fall through to the MLO replication loop and are
transmitted on each active link.
In the MLO replication loop, use skb_clone() for Hardware-encap frames.
These frames are already in Ethernet format and do not require
802.11 link address rewriting by ath12k_mlo_mcast_update_tx_link_address().
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Signed-off-by: Tamizh Chelvam Raja <tamizh.raja@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260623100501.2100119-1-tamizh.raja@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Firmware builds the AP MLD partner profile from the hw_link_id passed in
the vdev start parameters. However, hw_link_id is not always the same as
the logical per-MLD ieee_link_id, since ieee_link_id is assigned per MLD
and not per pdev.
This matters in mixed MLO and SLO setups. For example:
MLD 1 - 5 GHz + 6 GHz (2-link MLO): ieee_link_id 0 and 1
MLD 2 - 6 GHz only (1-link SLO): ieee_link_id 0
MLD 3 - 5 GHz only (1-link SLO): ieee_link_id 0
The same physical 6 GHz radio can use ieee_link_id 1 for one
MLD and ieee_link_id 0 for another. Pass the correct ieee_link_id to
firmware so it can build accurate per-STA profile elements.
Add ieee_link_id to wmi_vdev_start_mlo_params for the self link and to
wmi_partner_link_info for each partner link. Populate these fields in
ath12k_mac_mlo_get_vdev_args() from the corresponding vdev link_id
before encoding the WMI command.
Introduce two new flags in ML params to indicate to firmware when
the new fields are valid:
ATH12K_WMI_FLAG_MLO_IEEE_LINK_IDX_VALID BIT(18) for the self link
ATH12K_WMI_FLAG_MLO_IEEE_LINK_IDX_VALID_PARTNER BIT(19) for partner links
Firmware parses ieee_link_id only when the matching flag is set.
Also fix the debug message by using correct format specifiers and host-endian
values instead of __le32 values.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Co-developed-by: Hari Naraayana Desikan Kannan <hari.kannan@oss.qualcomm.com>
Signed-off-by: Hari Naraayana Desikan Kannan <hari.kannan@oss.qualcomm.com>
Co-developed-by: Karthik M <karthik.m@oss.qualcomm.com>
Signed-off-by: Karthik M <karthik.m@oss.qualcomm.com>
Signed-off-by: Manish Dharanenthiran <manish.dharanenthiran@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260623-ieee_link_id-v2-1-8a89d71baf58@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
During module removal, REOQ LUT cleanup writes 0 to the REOQ/ML-REOQ
LUT address registers. That cleanup runs from ath12k_core_stop(),
after ath12k_qmi_firmware_stop() has already stopped the
firmware (mode OFF), so the register writes can hit an invalid target
access.
Move the REOQ LUT register reset before ath12k_qmi_firmware_stop(),
so the registers are cleared before stopping the firmware,
while register access is still valid.
Additionally, handle the error path where firmware-ready setup fails
after LUT programming but before core_stop() is reached, ensuring the
registers are properly reset in that case as well.
On the crash-recovery path, ath12k_core_reconfigure_on_crash() calls
ath12k_core_qmi_firmware_ready(), which re-enters ath12k_dp_setup()
and ath12k_dp_reoq_lut_setup(), so the LUT registers are reprogrammed
before use and stale values do not persist across recovery.
There is a brief window between the crash and when the LUT registers
are reprogrammed during recovery, during which the registers still hold
the freed DMA memory addresses. This is safe because the device is
non-functional in that window and will not initiate any DMA access
until firmware is restarted and the registers are reprogrammed.
No functional issue has been observed so far due to this sequence.
However, this change proactively avoids potential issues such as
invalid register accesses after firmware stop during module
removal and error handling.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Co-developed-by: P Praneesh <praneesh.p@oss.qualcomm.com>
Signed-off-by: P Praneesh <praneesh.p@oss.qualcomm.com>
Signed-off-by: Aishwarya R <aishwarya.r@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Tamizh Chelvam Raja <tamizh.raja@oss.qualcomm.com>
Link: https://patch.msgid.link/20260619120751.363340-1-aishwarya.r@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Currently ATH12K_USERPD_ID_MASK uses GENMASK(9, 8), which defines a
2-bit field and limits supported UserPD IDs to values 0-3.
Future IPQ5332 multi-PD platform variants support more than three
UserPDs. Expand ATH12K_USERPD_ID_MASK to GENMASK(10, 8), increasing
the field width to 3 bits and allowing UserPD IDs from 0-7.
ATH12K_USERPD_ID_MASK is currently used only while constructing the
ath12k AHB PAS ID, so this change does not affect existing platforms.
Also remove the unused ATH12K_MAX_UPDS definition.
Tested-on: IPQ5332 hw1.0 AHB WLAN.WBE.1.6-01275-QCAHKSWPL_SILICONZ-1
Signed-off-by: Aaradhana Sahu <aaradhana.sahu@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260604031551.4178754-1-aaradhana.sahu@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Validate the pointer to the next RX monitor TLV more strictly by
ensuring that at least a full TLV header is available within the
status buffer before continuing TLV parsing.
Prevent potential out-of-bounds access when handling malformed
or truncated RX monitor status data.
Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260509025819.1641630-6-miaoqing.pan@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Wi-Fi 7 monitor status parsing in dp_mon currently assumes a 64-bit TLV
header and directly decodes tag/len/userid from struct hal_tlv_64_hdr.
On chips using a 32-bit TLV header (e.g. QCC2072), this causes monitor RX
status packets to be dropped during TLV parsing.
Introduce HAL helpers to decode TLV header fields (tag/len/userid/value)
for both 32-bit and 64-bit header layouts. Without changing the actual TLV
parsing logic.
Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260509025819.1641630-5-miaoqing.pan@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Wi-Fi 7 monitor RX status TLV parsing needs to decode TLV headers and
advance the pointer with the correct header alignment. Different targets
use different TLV header layouts (32-bit vs 64-bit), but the HAL ops for
dp_mon RX status header decode and header alignment were not populated
for all wifi7 targets.
Add dp_mon RX status TLV header decode callbacks and TLV header alignment
helpers to the wifi7 HAL ops for QCC2072, QCN9274 and WCN7850. Export
helpers to query the required TLV header alignment for 32-bit and 64-bit
TLV headers so the caller can align the TLV walk correctly across targets.
Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260509025819.1641630-4-miaoqing.pan@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Change TLV decode helpers to return the TLV value pointer and optionally
decode tag/len/usrid via out parameters. This allows reusing the helpers
for DP monitor RX status header TLV parsing and avoids duplicated header
decoding in callers.
No functional change intended.
Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260509025819.1641630-3-miaoqing.pan@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
HAL_TLV_HDR_LEN was using the wrong bitmask; fix it to cover
bits [21:10]. Also drop HAL_SRNG_TLV_HDR_{TAG,LEN} and use the
generic TLV header bit definitions for TLV32/TLV64 encode/decode
to avoid redundant macros.
Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00068-QCACOLSWPL_V1_TO_SILICONZ-1
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
Fixes: d889913205cf ("wifi: ath12k: driver for Qualcomm Wi-Fi 7 devices")
Signed-off-by: Miaoqing Pan <miaoqing.pan@oss.qualcomm.com>
Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260509025819.1641630-2-miaoqing.pan@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
On a split phy qcn9274 (2.4GHz + 5GHz low), "iw phy" reports 320MHz
related features on the 5GHz band while it should not:
Wiphy phy1
[...]
Band 2:
[...]
EHT Iftypes: managed
[...]
EHT PHY Capabilities: (0xe2ffdbe018778000):
320MHz in 6GHz Supported
[...]
Beamformee SS (320MHz): 7
[...]
Number Of Sounding Dimensions (320MHz): 3
[...]
EHT MCS/NSS: (0x22222222222222222200000000):
This is also reflected in the beacons sent by a mesh interface started on
that band. They erroneously advertise 320MHz support too.
This should not happen as IEEE Std 802.11-2024, subclause 9.4.2.323.3 says
we should not set the 320MHz related fields when not operating on a 6GHz
band. For example it says about Bit 0 "Support For 320 MHz In 6 GHz"
"Reserved if the EHT Capabilities element is indicating capabilities for
the 2.4 GHz or 5 GHz bands."
Fix this by clearing the related bits when converting from WMI eht phy
capabilities to mac80211 phy capabilities, for bands other than 6GHz.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00218-QCAHKSWPL_SILICONZ-1
Signed-off-by: Nicolas Escande <nico.escande@gmail.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260623151613.72113-1-nico.escande@gmail.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
The driver contains several unused QMI definitions such as response
length macros, message IDs, firmware segment length definitions, and
CALDB address size definitions.
Remove these unused definitions as they are not referenced anywhere in
the driver.
No functional change intended.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.6-01243-QCAHKSWPL_SILICONZ-1
Signed-off-by: Aaradhana Sahu <aaradhana.sahu@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260623035104.3765404-1-aaradhana.sahu@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Replace incorrect %d format specifiers with %u for unsigned variables
in qmi.c debug messages. Also add missing trailing '\n' in log messages
to ensure proper termination. No functional change intended.
Tested-on: Compile tested only.
Signed-off-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260623-qmi-debug-log-v1-1-79471aa8b898@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Currently, the struct qmi_elem_info initializers in qmi.c are inconsistent
in how they align the assignments, with tabs being used in the majority of
places but spaces being used in some places. In those places replace the
spaces with tabs for consistency.
Also fix incorrect and missing terminating records in the following
qmi_elem_info initializers:
- qmi_wlanfw_shadow_reg_cfg_s_v01_ei[]
- qmi_wlanfw_mem_ready_ind_msg_v01_ei[]
- qmi_wlanfw_fw_ready_ind_msg_v01_ei[]
Tested-on: Compile tested only.
Signed-off-by: Raj Kumar Bhagat <raj.bhagat@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
Link: https://patch.msgid.link/20260623-qmi-inconsistencies-v1-1-0fc17f2b8338@oss.qualcomm.com
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
Determine threaded NAPI policy from runtime IRQ capability of the DP MSI
IRQ.
If irq_can_set_affinity() reports that affinity cannot be set, enable
threaded NAPI for DP interrupt groups so datapath processing is not
constrained by a single-CPU softirq context.
On RB3Gen2, where IRQ affinity is unavailable in the effective IRQ path,
EHT160 UDP downlink throughput improved from 802 Mbps to 2.58 Gbps after
enabling threaded NAPI.
Tested-on: QCC2072 hw1.0 PCI WLAN.COL.1.0.c2-00074-QCACOLSWPL_V1_TO_SILICONZ-1
Signed-off-by: Hangtian Zhu <hangtian.zhu@oss.qualcomm.com>
Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
Link: https://patch.msgid.link/20260519011627.713068-3-hangtian.zhu@oss.qualcomm.com
[Fixed checkpatch "Missing a blank line after declarations"]
Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and IPsec.
Current release - regressions:
- do not acquire dev->tx_global_lock in netdev_watchdog_up()
- ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
- fix deadlock in nested UP notifier events
Current release - new code bugs:
- eth:
- cn20k: fix subbank free list indexing for search order
- airoha: fix BQL underflow in shared QDMA TX ring
Previous releases - regressions:
- netfilter:
- flowtable: fix offloaded ct timeout never being extended
- nf_conncount: prevent connlimit drops for early confirmed ct
Previous releases - always broken:
- require CAP_NET_ADMIN in the originating netns when modifying
cross-netns devices
- report NAPI thread PID in the caller's pid namespace
- mac802154: fix dirty frag in in-place crypto for IOT radios
- sctp: hold socket lock when dumping endpoints in sctp_diag, avoid
an overflow
- eth: gve: fix header buffer corruption with header-split and HW-GRO
- af_key: initialize alg_key_len for IPComp states, prevent OOB read"
* tag 'net-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (213 commits)
selftests: bonding: add a test for VLAN propagation over a bonded real device
vlan: defer real device state propagation to netdev_work
net: add the driver-facing netdev_work scheduling API
net: turn the rx_mode work into a generic netdev_work facility
net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
rxrpc: Fix rxrpc_rotate_tx_rotate() to check there's something to rotate
rxrpc: Fix leak of released call in recvmsg(MSG_PEEK)
rxrpc: Fix socket notification race
rxrpc: Fix potential infinite loop in rxrpc_recvmsg()
rxrpc: Fix oob challenge leak in cleanup after notification failure
rxrpc: Fix the reception of a reply packet before data transmission
afs: Fix uncancelled rxrpc OOB message handler
afs: Fix further netns teardown to cancel the preallocation charger
rxrpc: Fix double unlock in rxrpc_recvmsg()
rxrpc: Fix leak of connection from OOB challenge
rxrpc: Fix ACKALL packet handling
net: hns3: differentiate autoneg default values between copper and fiber
net: hns3: fix permanent link down deadlock after reset
net: hns3: refactor MAC autoneg and speed configuration
net: hns3: unify copper port ksettings configuration path
...
|
|
Breno reports following splats on mlx5:
RTNL: assertion failed at net/core/dev.c (2241)
WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
RIP: 0010:netif_state_change+0xf9/0x130
Call Trace:
<TASK>
__linkwatch_sync_dev+0xea/0x120
ethtool_op_get_link+0xe/0x20
__ethtool_get_link+0x26/0x40
linkstate_prepare_data+0x51/0x200
ethnl_default_doit+0x213/0x470
genl_family_rcv_msg_doit+0xdd/0x110
Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
it just returns the link state, so add an opt-in bit.
Reported-by: Breno Leitao <leitao@debian.org>
Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Breno Leitao <leitao@debian.org>
Acked-by: Harshitha Ramamurthy <hramamurthy@google.com>
Link: https://patch.msgid.link/20260624190439.2521219-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix a link loss issue during driver initialization on optical ports
connected to forced-mode (non-autoneg) remote switches.
Previously, during driver probe or initialization, hclge_configure()
blindly hardcoded hdev->hw.mac.req_autoneg to AUTONEG_ENABLE for all
media types. While this is necessary for copper (BASE-T) ports to
establish a link, many high-speed optical (fiber) ports in data
centers are connected to switches running in forced mode (fixed speed,
autoneg disabled). Forcing autoneg on these optical ports during
initialization causes a permanent link failure since the remote end
refuses to respond to autoneg pulses.
Fix this by implementing media-type differentiated initialization in
hclge_init_ae_dev(). Copper ports continue to default to
AUTONEG_ENABLE, while optical ports strictly inherit the preset
autoneg status pre-configured by the firmware (hdev->hw.mac.autoneg),
preserving native compatibility with forced-mode network environments.
Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260624141319.271439-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix a critical race condition deadlock where the network interface
remains permanently Link Down after a hardware reset under specific
ethtool sequences.
This issue exclusively manifests in firmware-controlled PHY topologies
where the driver relies on the IMP firmware to arbitrate link parameters.
Standard devices driven by the kernel's native PHY_LIB are unaffected.
The deadlock occurs via the following path:
1. User disables autoneg and forces an unmatched speed, forcing link
down: `ethtool -s ethx autoneg off speed 10 duplex full`
2. User re-enables autoneg: `ethtool -s ethx autoneg on`. The netdev
stack passes cmd->base.speed as SPEED_UNKNOWN (0xffffffff).
3. Driver saves req_autoneg=1, but before the interface can link up,
a hardware reset is triggered.
4. During reset recovery, MAC init reads the un-synchronized runtime
state mac.autoneg (which is still 0/OFF), misinterprets it as
forced mode, and pushes the cached SPEED_UNKNOWN into the hardware
registers, causing the MAC firmware state machine to freeze.
Meanwhile, PHY init reads req_autoneg=1 and enables PHY autoneg.
Since the MAC is frozen with 0xffffffff and PHY is running autoneg,
they mismatch permanently.
Fix this by:
1. Intercepting SPEED_UNKNOWN/DUPLEX_UNKNOWN in
hclge_set_phy_link_ksettings() and hclge_cfg_mac_speed_dup_h() to
prevent it from corrupting the driver's cached valid configuration.
2. Save req_autoneg in hclge_set_autoneg().
3. Aligning the state judgment in hclge_set_autoneg_speed_dup() to use
req_autoneg instead of the un-synchronized runtime mac.autoneg,
ensuring both MAC and PHY consistently enter the autoneg branch to
eliminate configuration discrepancies during reset recovery.
Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260624141319.271439-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extract the MAC autoneg and speed/duplex/lane configuration logic out
of hclge_mac_init() and encapsulate it into a new dedicated helper
function hclge_set_autoneg_speed_dup().
In the init path (hclge_init_ae_dev), this helper is now called after
hclge_update_port_info() so that firmware-reported autoneg values are
already populated before applying the link configuration.
Introduce a separate req_lane_num field in struct hclge_mac to isolate
the user-requested lane count from mac.lane_num, which firmware may
overwrite via hclge_get_sfp_info() with stale values from a prior link
lifecycle (e.g., lane_num=4 from 100G). During probe, req_lane_num is
initialized to 0, which instructs firmware to auto-select the correct
lane count for the current speed, rather than reusing the firmware-
reported mac.lane_num that may be inconsistent with the target speed.
This prevents probe failures from mismatched (speed, lane_num) pairs.
In the reset path (hclge_reset_ae_dev), it runs immediately after
hclge_mac_init(), using the previously cached req_* values to restore
the link without re-querying firmware.
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260624141319.271439-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor hns3_set_link_ksettings() and hclge_set_phy_link_ksettings()
to unify the configuration path for copper ports.
Previously, netdevs with a native kernel phy attached bypassed the main
MAC parameter caching logic and returned early via
phy_ethtool_ksettings_set(). This prevented the driver from updating
hdev->hw.mac.req_xxx variables for kernel PHY setups, leaving them
out-of-sync during reset recovery.
Clean this up by routing all copper port configurations through
ops->set_phy_link_ksettings(), and perform driver-level or kernel-level
PHY arbitration inside hclge_set_phy_link_ksettings() via
hnae3_dev_phy_imp_supported(). This ensures that the user's intended link
profiles (req_speed, req_duplex, req_autoneg) are uniformly recorded
across all copper and fiber deployment topologies, laying the groundwork
for stable reset recovery.
For copper ports where neither IMP firmware nor a kernel PHY is available
(e.g. PHY_INEXISTENT), hclge_set_phy_link_ksettings() returns -ENODEV.
In hns3_set_link_ksettings(), this is caught so the configuration falls
through to the existing MAC-level path (check_ksettings_param ->
cfg_mac_speed_dup_h), preserving compatibility with PHY-less copper
deployments.
Signed-off-by: Shuaisong Yang <yangshuaisong@h-partners.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260624141319.271439-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before the commit 755391121038 ("net: mana: Allocate MSI-X vectors
dynamically"), all the MANA IRQs were assigned statically and together
during early driver load.
After this commit, the IRQ allocation for MANA was done in two phases.
HWC IRQ allocated earlier and then, queue IRQs dynamically added at a
later point. By this time, the IRQ weights on vCPUs can become imbalanced
and if IRQ count is greater than the vCPU count the topology aware IRQ
distribution logic in MANA can cause multiple MANA IRQs to land on the
same vCPUs, while other sibling vCPUs have none (case 1).
On SMP enabled, low-vCPU systems, this becomes a bigger problem as the
softIRQ handling overhead of two IRQs on the same vCPUs becomes much more
than their overheads if they were spread across sibling vCPUs.
In such cases when many parallel TCP connections are tested, the
throughput drops significantly.
Fix the affinity assignment logic, in cases where the IRQ count is greater
than the vCPU count and when IRQs are added dynamically, by utilizing all
the vCPUs irrespective of their NUMA/core bindings (case 2).
The results of setting the affinity and hint to NULL were also studied,
and we observed that, with this logic if there are pre-existing IRQs
allocated on the VM (apart from MANA), during MANA IRQs allocation, it
leads to clustering of the MANA queue IRQs again (case 3).
=======================================================
Case 1: without this patch
=======================================================
4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)
TYPE effective vCPU aff
=======================================================
IRQ0: HWC 0
IRQ1: mana_q1 0
IRQ2: mana_q2 2
IRQ3: mana_q3 0
IRQ4: mana_q4 3
%soft on each vCPU(mpstat -P ALL 1) on receiver
vCPU 0 1 2 3
=======================================================
pass 1: 38.85 0.03 24.89 24.65
pass 2: 39.15 0.03 24.57 25.28
pass 3: 40.36 0.03 23.20 23.17
=======================================================
Case 2: with this patch
=======================================================
4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)
TYPE effective vCPU aff
=======================================================
IRQ0: HWC 0
IRQ1: mana_q1 0
IRQ2: mana_q2 1
IRQ3: mana_q3 2
IRQ4: mana_q4 3
%soft on each vCPU(mpstat -P ALL 1) on receiver
vCPU 0 1 2 3
=======================================================
pass 1: 15.42 15.85 14.99 14.51
pass 2: 15.53 15.94 15.81 15.93
pass 3: 16.41 16.35 16.40 16.36
=======================================================
Case 3: with affinity set to NULL
=======================================================
4 vCPU(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)
TYPE effective vCPU aff
=======================================================
IRQ0: HWC 0
IRQ1: mana_q1 2
IRQ2: mana_q2 3
IRQ3: mana_q3 2
IRQ4: mana_q4 3
=======================================================
Throughput Impact(in Gbps, same env)
=======================================================
TCP conn with patch w/o patch aff NULL
20480 15.65 7.73 5.25
10240 15.63 8.93 5.77
8192 15.64 9.69 7.16
6144 15.64 13.16 9.33
4096 15.69 15.75 13.50
2048 15.69 15.83 13.61
1024 15.71 15.28 13.60
Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
Cc: stable@vger.kernel.org
Co-developed-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Yury Norov <ynorov@nvidia.com>
Link: https://patch.msgid.link/20260624072138.1632849-1-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
sparx5_register_notifier_blocks() registers the switchdev blocking
notifier before allocating the ordered workqueue. If the workqueue
allocation fails, the error path unregisters the switchdev and netdevice
notifiers, but leaves the blocking notifier registered.
Add a separate error label for the workqueue allocation failure path and
unregister the switchdev blocking notifier there.
Fixes: d6fce5141929 ("net: sparx5: add switching support")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260623115714.2192074-1-haoxiang_li2024@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nix_setup_bpids() allocates bp->bpids with rvu_alloc_bitmap(), which uses
a plain kcalloc(). If any of the following devm_kcalloc() allocations for
the BPID mapping arrays fails, the function returns without freeing the
bitmap. Free the BPID bitmap before returning from those error paths.
Fixes: d6212d2e41a0 ("octeontx2-af: Create BPIDs free pool")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260623114316.2182271-1-haoxiang_li2024@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For i.MX94 series, all the standalone ENETCs do not support SR-IOV, so
pf->caps.num_vsi is zero. This leads to a divide-by-zero in
enetc4_default_rings_allocation() when distributing rings among PF and
VFs.
Division by zero is undefined behavior in C. On ARM64, the UDIV/SDIV
instructions silently return zero rather than raising an exception, so
the issue does not cause a visible crash. However, relying on this
behavior is incorrect and poses a cross-platform compatibility risk.
Add an explicit check for num_vsi == 0 and return early after the PF's
rings have been configured.
Fixes: 2d673b0e2f8d ("net: enetc: add standalone ENETC support for i.MX94")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260624072726.1238903-1-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-06-22 (ice, i40e, e1000e)
For ice:
Dawid changes call to release control VSI during reset to prevent
leaking it.
Lukasz fixes flow control error check to check value rather than treat
is as bitmap values.
Paul makes link related errors non-fatal to probe to allow for recovery
in certain NVM update situations.
Marcin moves netif_keep_dst() to only be called once when entering
switchdev mode.
ZhaoJinming adds a cleanup path for ice_dpll_init_info() to prevent
memory leaks on error path.
For i40e:
Mohamed Khalfella corrects argument passed in macro to match the
one provided to the macro.
For e1000e:
Dima resolves power state issues by adjusting value of PLL clock gate
and re-enabling K1; a quirk table is added to keep it off for known bad
systems.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
e1000e: Reconfigure PLL clock gate timeout and re-enable K1 on Meteor Lake
i40e: Fix i40e_debug() to use struct i40e_hw argument
ice: dpll: fix memory leak in ice_dpll_init_info error paths
ice: dpll: set pointers to NULL after kfree in ice_dpll_deinit_info
ice: call netif_keep_dst() once when entering switchdev mode
ice: fix ice_init_link() error return preventing probe
ice: fix AQ error code comparison in ice_set_pauseparam()
ice: fix FDIR CTRL VSI resource leak in ice_reset_all_vfs()
====================
Link: https://patch.msgid.link/20260622220059.2471844-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current irq definition of the wake irq and the lpi irq
is wrong, replace them with the right number and name.
Fixes: 30f0ba420ed3 ("net: stmmac: Add glue layer for Spacemit K3 SoC")
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260623074637.503864-3-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current MII interface register definition from the vendor is wrong,
use the right number for the macro. Also, correct the interface mask
in spacemit_set_phy_intf_sel() so it can update the register with the
right number
Fixes: 30f0ba420ed3 ("net: stmmac: Add glue layer for Spacemit K3 SoC")
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260623074637.503864-2-inochiama@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mac->phy_node is acquired via of_parse_phandle() in spl2sw_probe() and
stored in the mac private data, transferring ownership of the
device_node reference to mac. On driver removal, spl2sw_phy_remove()
disconnects the PHY but never drops that reference, so each
probe-then-remove cycle leaks one of_node refcount per port permanently.
Drop the reference after phy_disconnect(). While at it, remove the
redundant inner "if (ndev)" check; comm->ndev[i] was just verified
non-NULL on the line above.
Compile-tested only; no SP7021 hardware available.
Fixes: fd3040b9394c ("net: ethernet: Add driver for Sunplus SP7021")
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/f3bdd4c91f3e2269b4e256075f9dc70808b1b8e9.1782195965.git.shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
gem_init_one() calls gem_remove_one() when register_netdev() fails.
gem_remove_one() unregisters and frees resources owned by the net_device,
including the DMA block, MMIO mapping, PCI regions, and the net_device
itself. gem_init_one() then falls through to its own cleanup labels and
frees the same resources again.
Keep the register_netdev() error path in gem_init_one(): clear drvdata so
PM/remove paths do not see a half-registered device, remove the NAPI
instance added during probe, and let the existing cleanup labels release
the resources once.
The issue was found by a local static-analysis checker for probe error
paths. The reported path was manually inspected before sending this fix.
Compile-tested with CONFIG_SUNGEM=y. Runtime testing was not performed
because no sungem hardware is available.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260623025759.3468566-1-ruoyuw560@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Configurations with mlx5 built-in but macsec=m fail to link:
x86_64-linux-ld: drivers/infiniband/hw/mlx5/macsec.o: in function `mlx5r_add_gid_macsec_operations':
macsec.c:(.text+0x77d): undefined reference to `macsec_netdev_is_offloaded'
x86_64-linux-ld: drivers/infiniband/hw/mlx5/macsec.o: in function `mlx5r_del_gid_macsec_operations':
macsec.c:(.text+0xe81): undefined reference to `macsec_netdev_is_offloaded'
Fix the dependency so this configuration cannot happen.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20260622124229.2444502-1-arnd@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
kalmia_rx_fixup() computes usb_packet_length = skb->len - (2 *
KALMIA_HEADER_LENGTH) as a u16, guarded only by a pre-loop check that
skb->len is at least KALMIA_HEADER_LENGTH, which is 6. A device can
deliver a short bulk-IN frame with skb->len in the 6 to 11 range, or
leave a short trailing remainder on a later loop iteration. Either case
underflows usb_packet_length to about 65530.
That bypasses the usb_packet_length < ether_packet_length truncation path.
The device-supplied ether_packet_length, a le16 up to 65535 read from
header_start[2], then drives a memcmp() and the following skb_trim() and
skb_pull() past the end of the rx buffer. The rx buffer is hard_mtu * 10,
which is 14000 bytes. That is an out of bounds read.
Require both the start and end framing headers to be present before
subtracting them, on every loop iteration.
Fixes: d40261236e8e ("net/usb: Add Samsung Kalmia driver for Samsung GT-B3730")
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/178211531778.2216480.12637613349790980750@maoyixie.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Even with both paths gated on gs->gro_hint, geneve_gro_complete()
re-derives the inner dispatch type and length from the packet and the
current gs->gro_hint, independently of geneve_gro_receive(). The two can
disagree if gs->gro_hint flips under a concurrent geneve_quiesce()/
geneve_unquiesce() (sk_user_data is NULL across a synchronize_net()), or if
the re-read option bytes differ from the ones receive parsed.
geneve_gro_receive() already records the inner network header position in
NAPI_GRO_CB()->inner_network_offset. Have geneve_gro_complete() compute the
offset it is about to dispatch at, adding ETH_HLEN in the ETH_P_TEB case
where eth_gro_complete() steps over the inner MAC header, and bail out if
it lands past inner_network_offset.
Use a lower bound rather than exact equality: between gh_len and the inner
L3 header, geneve_gro_receive() may also have pulled an inner VLAN tag
(vlan_gro_receive() advances the recorded offset past it), which only moves
inner_network_offset further out. A valid frame therefore always satisfies
inner_nh <= inner_network_offset, while a gh_len inflated by a hint
gro_receive() did not honour dispatches past the validated inner header,
i.e. the out-of-bounds completion. Only the latter is rejected.
Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Co-developed-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/20260618032622.484720-2-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
geneve_gro_receive() reads the GRO hint through geneve_sk_gro_hint_off(),
which honours it only when the socket enabled IFLA_GENEVE_GRO_HINT
(gs->gro_hint). geneve_gro_complete() instead calls the low-level
geneve_opt_gro_hint_off() and acts on the hint unconditionally.
On a tunnel without the hint, receive aggregates the frames as plain
ETH_P_TEB while complete still honours an attacker-supplied hint option: it
inflates gh_len by gro_hint->nested_hdr_len (u8) and redirects the dispatch
type, so the inner gro_complete handler runs at nhoff + gh_len, an offset
receive never pulled nor validated, reading out of bounds of the skb head:
BUG: KASAN: slab-out-of-bounds in ipv6_gro_complete (net/ipv6/ip6_offload.c:196)
Read of size 1 at addr ffff88800fe91980 by task exploit/153
ipv6_gro_complete (net/ipv6/ip6_offload.c:196)
geneve_gro_complete (drivers/net/geneve.c:965)
udp_gro_complete (net/ipv4/udp_offload.c:940)
inet_gro_complete (net/ipv4/af_inet.c:1621)
__gro_flush (net/core/gro.c:306)
Gate the complete path on gs->gro_hint too via geneve_sk_gro_hint_off(), so
both paths agree. Tunnels that enable the hint are unaffected.
Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Reported-by: Kyle Zeng <kylebot@openai.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Link: https://patch.msgid.link/20260618032622.484720-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On Marvell MPIC platforms (Armada 370/XP/38x), mvneta uses a percpu
IRQ disable/enable scheme for NAPI: the ISR (mvneta_percpu_isr) calls
disable_percpu_irq() to mask the MPIC per-CPU interrupt and schedules
NAPI poll, which calls enable_percpu_irq() on completion to unmask.
If suspend occurs while NAPI poll is pending (between
disable_percpu_irq in the ISR and enable_percpu_irq in poll
completion), the interrupt is never re-enabled:
1. mvneta_percpu_isr: disable_percpu_irq() + napi_schedule()
=> MPIC masked, percpu_enabled cpumask bit cleared
2. NAPI poll does not complete before suspend proceeds
(on PREEMPT_RT this is highly likely since softirqs run in
ksoftirqd which gets frozen; on non-RT it can happen when
softirq processing is deferred to ksoftirqd)
3. mvneta_stop_dev => napi_disable(): cancels the pending poll
without executing the completion path
4. suspend_device_irqs => IRQCHIP_MASK_ON_SUSPEND: masks MPIC
(already masked, but records IRQS_SUSPENDED)
5. Resume: mpic_resume checks irq_percpu_is_enabled() => false
(bit was cleared in step 1) => skips unmask
6. mvneta_start_dev only restores device-level INTR_NEW_MASK,
does not touch the MPIC per-CPU mask
Result: MPIC per-CPU interrupt stays masked permanently. The NIC
generates interrupts (INTR_NEW_CAUSE != 0) but the CPU never
receives them, causing complete loss of network connectivity.
Fix by calling on_each_cpu(mvneta_percpu_enable) in the resume path
to unconditionally unmask the MPIC per-CPU interrupt regardless of
pre-suspend state.
Fixes: 12bb03b436da ("net: mvneta: Handle per-cpu interrupts")
Signed-off-by: Yun Zhou <yun.zhou@windriver.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://patch.msgid.link/20260622074350.1666290-1-yun.zhou@windriver.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
CGX per-lmac debugfs seq readers obtained struct rvu via
pci_get_drvdata(pci_get_device(..., PCI_DEVID_OCTEONTX2_RVU_AF, ...)),
which leaks a PCI device reference on every read. Store rvu and the CGX
handle in debugfs inode private data when creating stats, mac_filter,
and fwdata files (one context per CGX), and use debugfs aux numbers for
fwdata so lmac_id matches the other CGX debugfs entries.
Fixes: f967488d095e ("octeontx2-af: Add per CGX port level NIX Rx/Tx counters")
Fixes: dbc52debf95f ("octeontx2-af: Debugfs support for DMAC filters")
Fixes: 49f02e6877d1 ("Octeontx2-af: Debugfs support for firmware data")
Cc: Linu Cherian <lcherian@marvell.com>
Reported-by: Yuho Choi <dbgh9129@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260622034229.2254145-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NIX maximum number of LFs can be set via devlink command
but that can be done before assigning any LFs to a PF/VF.
The condition used to check whether any LFs are assigned is
incorrect. This patch fixes that condition.
Fixes: dd7842878633 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1782082853-6941-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
t7xx_cldma_late_init() creates md_ctrl->gpd_dmapool before
initializing the TX and RX rings. If any ring initialization
fails, the error path frees the already initialized rings but
leaves the DMA pool allocated.
Destroy md_ctrl->gpd_dmapool on the late-init failure path
to avoid leaking the DMA pool.
Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20260621031714.3605022-1-haoxiang_li2024@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When multiple netdevs share a QDMA TX ring and one device is stopped,
netdev_tx_reset_subqueue() zeroes that device's BQL counters while its
pending skbs remain in the shared HW TX ring. When NAPI later completes
those skbs via netdev_tx_completed_queue(), the already-zeroed
dql->num_queued counter underflows.
Fix the issue:
- Remove netdev_tx_reset_subqueue() from airoha_dev_stop() so pending
skbs are completed naturally by NAPI with proper BQL accounting.
- Rework airoha_qdma_tx_cleanup() to disable TX DMA, flush BQL
counters, DMA-unmap and free all pending skbs while skb->dev
references are still valid. Use a per-queue flushing flag checked
under q->lock in airoha_dev_xmit() to prevent races between teardown
and transmit. Call airoha_qdma_stop_napi() before
airoha_qdma_tx_cleanup() at the call sites.
- Move DMA engine start into probe. Split DMA teardown so TX DMA is
disabled in airoha_qdma_tx_cleanup() and RX DMA in
airoha_qdma_cleanup().
- Remove qdma->users counter since DMA lifetime is now tied to
probe/cleanup rather than per-netdev open/stop.
Fixes: a9c2ca61fec7 ("net: airoha: Support multiple net_devices for a single FE GDM port")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260620-airoha-bql-fixes-v3-1-76b95374e63e@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On RTL8127A connected to a link partner that advertises 10000baseT
speed cannot be changed to anything other than 10000baseT as 10GbE
is always advertised regardless of any setting. Fix this by
clearing MDIO_AN_10GBT_CTRL_ADV10G bit in rtl822x_config_aneg()'s
call to phy_modify_mmd_changed().
Fixes: 83d962316128 ("net: phy: realtek: add RTL8127-internal PHY")
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Jan Klos <honza.klos@gmail.com>
Link: https://patch.msgid.link/20260620011956.37181-1-honza.klos@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
subbank_srch_order[i] is the physical subbank at search-order slot i,
so each subbank's arr_idx must be i (its slot), not
subbank_srch_order[sb->idx]. The old logic mis-keyed xa_sb_free
and broke allocation traversal order.
Populate arr_idx and xa_sb_free in a single pass over the search
order after subbank structs are initialized.
Fixes: 7ac9d4c4075c ("octeontx2-af: npc: cn20k: add subbank search order control")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260619095100.1864440-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit d7709812e13d ("net: mana: hardening: Validate adapter_mtu from
MANA_QUERY_DEV_CONFIG") rejected any adapter_mtu value smaller than
ETH_MIN_MTU + ETH_HLEN, including 0, returning -EPROTO and failing
mana_probe().
Some older PF firmware versions still in the field report
adapter_mtu as 0 in the MANA_QUERY_DEV_CONFIG response. With the
hardening check in place, the MANA VF driver now fails to load on
those hosts, breaking networking entirely for guests.
MANA hardware always supports the standard Ethernet MTU. Treat a
reported adapter_mtu of 0 as "the PF did not advertise a value" and
fall back to ETH_FRAME_LEN, the same value used for the pre-V2
message version path. Only jumbo frames remain unavailable until
the PF reports a valid MTU.
Other small-but-nonzero bogus values are still rejected, preserving
the original protection against the unsigned-subtraction wrap that
would otherwise let ndev->max_mtu underflow to a huge value.
Fixes: d7709812e13d ("net: mana: hardening: Validate adapter_mtu from MANA_QUERY_DEV_CONFIG")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260619055348.467224-1-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Upon an MDIO CRC error mxl862xx_crc_err_work_fn() walks the DSA ports
and closes the CPU port conduits:
dsa_switch_for_each_cpu_port(dp, priv->ds)
dev_close(dp->conduit);
mxl862xx_remove() unregisters the switch before cancelling this work:
set_bit(MXL862XX_FLAG_WORK_STOPPED, &priv->flags);
cancel_delayed_work_sync(&priv->stats_work);
dsa_unregister_switch(ds);
mxl862xx_host_shutdown(priv);
dsa_unregister_switch() frees the dsa_port objects. If a CRC error
schedules the work during teardown it can run after the ports have been
freed and dereference freed memory.
Guard the port walk with MXL862XX_FLAG_WORK_STOPPED, which is already set
before dsa_unregister_switch(). DSA tears the ports down under
rtnl_lock(), so checking the flag under rtnl_lock() means the work either
runs before teardown and sees valid ports, or runs afterwards, observes
the flag and skips the walk. This mirrors the host_flood_work handler,
which skips torn-down ports under rtnl_lock().
Link: https://sashiko.dev/#/patchset/cover.1780968180.git.daniel%40makrotopia.org?part=2
Fixes: a319d0c8c8ce ("net: dsa: mxl862xx: add CRC for MDIO communication")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/5e55169926c02f2b914e5ada529d7453b943cda4.1781702256.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MXL862XX_API_* macros pass the address of a stack-allocated, __packed
firmware-ABI struct to mxl862xx_api_wrap() as a void *. The struct has an
alignment of 1, so the compiler is free to place it at an odd address.
mxl862xx_api_wrap() reinterprets that buffer as a __le16 * and accesses it
with data[i], for which the compiler assumes the natural 2-byte alignment
of __le16 and emits aligned 16-bit loads/stores (e.g. lhu/sh on MIPS).
When the buffer lands on an odd address these fault on architectures that
do not support unaligned access, such as MIPS32.
-Waddress-of-packed-member does not catch this: the packed origin is
laundered through the void * parameter, so the cast inside api_wrap looks
alignment-safe to the compiler and no warning is emitted.
Use get_unaligned_le16()/put_unaligned_le16() for the three 16-bit word
accesses. The byte accesses (*(u8 *)&data[i], crc16()) are already safe
and are left unchanged.
Link: https://sashiko.dev/#/patchset/cover.1781319534.git.daniel%40makrotopia.org?part=4
Fixes: 23794bec1cb6 ("net: dsa: add basic initial driver for MxL862xx switches")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/599327521db465a534d277de53ab9b6cac01928b.1781702256.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
led_classdev_register_ext() only reads init_data.devicename - it never
stores the pointer. However, the caller allocated devicename with
kasprintf() but never freed it, leaking the string memory.
Fix it with a stack buffer to avoid dynamic buffers completely.
Fixes: 32d617005475 ("net: dsa: realtek: add LED drivers for rtl8366rb")
Signed-off-by: David Yang <mmyangfl@gmail.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260618140200.1888707-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ixp4xx_hss_probe() allocates two HDLC netdevs. The first one is stored
in ndev, initialized, and registered with register_hdlc_device(). The
second one is stored in port->netdev and later used by the remove path
for unregister_hdlc_device() and free_netdev().
This means that the registered netdev is not the same object that is
unregistered and freed on remove. It also leaks the first allocation if
the second alloc_hdlcdev() call fails, and the first allocation is not
checked before ndev is used.
Older code allocated the HDLC netdev only once and stored the same object
in both the local variable and port->netdev. The buggy conversion split
this into two alloc_hdlcdev() calls. A later rename changed the local
variable name to ndev, but the underlying mismatch remained.
Fix this by allocating the HDLC netdev only once and assigning the same
object to port->netdev.
Fixes: 99ebe65eb9c0 ("net: ixp4xx_hss: move out assignment in if condition")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260622043015.643637-1-haoxiang_li2024@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
airoha_tc_htb_alloc_leaf_queue() assigns queue IDs based on the channel
index (opt->qid = AIROHA_NUM_TX_RING + channel), but updates
real_num_tx_queues with a simple increment (num_tx_queues + 1). When QoS
channels are allocated sparsely (e.g., channels 0 and 3 without 1 and
2), the returned qid can exceed real_num_tx_queues, causing out-of-bounds
accesses in the networking stack.
For example, allocating channel 0 then channel 3 results in
real_num_tx_queues = 34 but qid = 35, which is out of range [0, 34).
Fix this by computing real_num_tx_queues based on the highest active
channel index rather than using a simple counter, in both the allocation
and deletion paths.
Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260619-airoha-qos-fixes-v2-2-5c43485038f9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
airoha_tc_htb_alloc_leaf_queue() computes the HTB QoS channel index
as opt->classid % AIROHA_NUM_QOS_CHANNELS and stores it in qos_sq_bmap.
However, airoha_tc_remove_htb_queue() clears the HTB configuration
using queue + 1 as the channel index, causing an off-by-one error.
Use queue directly as the QoS channel index to match the allocation
logic.
Fixes: ef1ca9271313b ("net: airoha: Add sched HTB offload support")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260619-airoha-qos-fixes-v2-1-5c43485038f9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When requesting ownership of the NIC (MAC/PHY control), we set up
the heartbeat to look stale:
/* Initialize heartbeat, set last response to 1 second in the past
* so that we will trigger a timeout if the firmware doesn't respond
*/
fbd->last_heartbeat_response = req_time - HZ;
fbd->last_heartbeat_request = req_time;
The response handler then sets:
fbd->last_heartbeat_response = jiffies;
for which we wait via:
fbnic_fw_init_heartbeat() -> fbnic_fw_heartbeat_current()
The scheme is a bit odd, but it should work in principle.
Fix the ordering of operations. We have to set up the stale heartbeat
before we send the message. Otherwise if the response is very fast
we will override it. This triggers on QEMU if we run on the core
that handles the IRQ, and results in ndo_open failing with ETIMEDOUT.
The change in ordering doesn't impact releasing the ownership.
Both ndo_stop and heartbeat check are under rtnl_lock.
Fixes: 20d2e88cc746 ("eth: fbnic: Add initial messaging to notify FW of our presence")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260622154753.827506-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
au1000_close() calls free_irq() while aup->lock is still held with
spin_lock_irqsave(). free_irq() can sleep because it takes the IRQ
descriptor request mutex, so it does not belong inside the close-time
spinlocked section.
This was found by our static analysis tool and then confirmed by manual
review of the in-tree au1000_close() .ndo_stop path. The reviewed path
keeps aup->lock held across the MAC reset, queue stop and
free_irq(dev->irq, dev).
A directed runtime validation kept that ndo_stop carrier and the same
free_irq(dev->irq, dev) operation under the driver lock. Lockdep reported
"BUG: sleeping function called from invalid context" and "Invalid wait
context" while free_irq() was taking desc->request_mutex, with
au1000_close() and free_irq() on the stack.
Drop aup->lock before freeing the IRQ. The protected close-time work still
stops the device and queue before IRQ teardown, but the sleepable IRQ core
path now runs outside the spinlocked section.
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260619151816.1144289-1-runyu.xiao@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Configured VLANs intermittently stop receiving traffic after a link
down/up cycle, e.g. when the network cable is unplugged and plugged back
in. VLAN filtering stays enabled but all VLAN-tagged frames are dropped
until a VLAN is added or removed again.
The LAN7801 datasheet (DS00002123E) states:
"A portion of the MAC operates on clocks generated by the Ethernet
PHY. During a PHY reset event, this portion of the MAC is designed to
not be taken out of reset until the PHY clocks are operational"
(section 8.10, MAC Reset Watchdog Timer)
"After a reset event, the RFE will automatically initialize the
contents of the VHF to 0h."
(section 7.1.4, VHF Organization)
Thus a link down/up cycle stops and restarts the PHY clock, resets the
PHY-clocked portion of the MAC, and the RFE clears its VLAN/DA hash
filter (VHF) memory. The VHF holds both the VLAN filter table and the
multicast hash table, but the driver never reprograms either from its
shadow copy once the link is back, so both stay empty.
Reprogram the VLAN filter and multicast hash tables on link up.
Reported-by: Sven Schuchmann <schuchmann@schleissheimer.de>
Closes: https://lore.kernel.org/netdev/BEZP281MB224501E38B30BFDC4BD3D364D9E32@BEZP281MB2245.DEUP281.PROD.OUTLOOK.COM/T/#u
Tested-by: Sven Schuchmann <schuchmann@schleissheimer.de>
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
Link: https://patch.msgid.link/20260622102911.484045-1-nb@tipi-net.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
During XDP enablement in veth, if xdp_rxq_info_reg() or
xdp_rxq_info_reg_mem_model() fails, the driver rolls back the changes.
However, the rollback loop:
for (i--; i >= start; i--) {
decrements the loop index 'i' before the first iteration. This
correctly skips unregistering the rxq for the failed index 'i' (as
registration failed or was already cleaned up), but it also
erroneously skips calling netif_napi_deli() for rq[i].xdp_napi.
Since netif_napi_add() was already called for index 'i', this leaves
a dangling napi_struct in the device's napi_list. When the veth
device is later destroyed, the freed queue memory (which contains the
leaked NAPI structure) can be reused.
The subsequent device teardown iterates the NAPI list and
corrupts the reallocated memory, leading to UAF.
Fix this by explicitly deleting the NAPI association for the failed
index 'i' before rolling back the successfully configured queues.
Fixes: b02e5a0ebb17 ("xsk: Propagate napi_id to XDP socket Rx path")
Reported-by: Guenter Roeck <groeck@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260622111825.88337-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
emac_xsk_xmit_zc() handles tx xmit for zero copy and gets called
inside napi context. User application wakes up the kernel while
initiating the transmit which triggers napi to start processing
the tx packets. The num_tx check inside emac_tx_complete_packets()
returns early if no packet transfer happen hindering the call
to emac_xsk_xmit_zc(). Remove this check to let application
wakeup initiate zero copy xmit traffic.
Add __netif_tx_lock() to ensure that the TX queue is protected
from concurrent access during the transmission of XDP frames.
This fixes netdev watchdog timeout for long runs.
Fixes: e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20260618100348.2209907-1-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
pin_duration is converted from the user-provided period to SJA1105
clock ticks and is later passed as the cycle_time argument to
future_base_time().
Very small period values may become zero after the conversion,
which can lead to a division by zero in future_base_time().
Round zero pin_duration up to 1 tick so that the smallest unsupported
periods use the minimum non-zero hardware duration instead of passing
zero to future_base_time().
Fixes: 747e5eb31d59 ("net: dsa: sja1105: configure the PTP_CLK pin as EXT_TS or PER_OUT")
Signed-off-by: Aleksandrova Alyona <aga@itb.spb.ru>
Link: https://patch.msgid.link/20260618110508.53094-1-aga@itb.spb.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In airoha_qdma_set_chan_tx_sched(), the loop clearing queue mask was
using AIROHA_NUM_TX_RING (32) instead of AIROHA_NUM_QOS_QUEUES (8).
Each channel has 8 queues, and TXQ_DISABLE_CHAN_QUEUE_MASK(channel, i)
computes BIT(i + (channel * 8)). With i ranging 0..31, this causes:
- channel 0: clears bit 0..31 (all 4 channels) instead of 0..7
- channel 1: clears bit 8..31 (channels 1-3) instead of 8..15
- channel 2: clears bit 16..31 (channels 2-3) instead of 16..23
- channel 3: clears bit 24..31 (channel 3 only) - correct by accident
While BIT(32+) on arm64 produces 64-bit values truncated to 0 in u32
mask parameter, the loop still incorrectly clears queues within the
same channel beyond queue 7.
Even though this is functionally harmless (the register resets to 0
and is only ever cleared, never set — so clearing extra bits is a
no-op), the loop bound is semantically wrong and should be fixed for
correctness and clarity.
Fix by using AIROHA_NUM_QOS_QUEUES (8) as the loop upper bound.
Fixes: ef1ca9271313 ("net: airoha: Add sched HTB offload support")
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Wayen Yan <win847@gmail.com>
Link: https://patch.msgid.link/178187479434.2400840.1312143943526335838@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the allocation of fp[i].tpa_info fails, the error path will not free
the struct bnx2x_fastpath allocated earlier, as it is not linked to the
bp structure yet. Fix that by linking it immediately after allocation.
Cc: stable@vger.kernel.org
Fixes: 15192a8cf8a8 ("bnx2x: Split the FP structure")
Signed-off-by: Abdun Nihaal <nihaal@cse.iitm.ac.in>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260620062402.89549-1-nihaal@cse.iitm.ac.in
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kernel selftests wait 1.25x of the promised stats refresh time
(as read from ethtool -c). bnxt reports 1sec by default, but
the stats update process has two steps. First device DMAs the
new values, then the service task performs update in full-width
SW counters. So the worst case delay is actually 2x.
Note that the behavior is different for ring stats and port stats.
Port stats are fetched synchronously by the service worker, so
there's no risk of doubling up the delay there.
The problem of stale stats impacts not only tests but real workloads
which monitor egress bandwidth of a NIC. The inaccuracy causes double
counting in the next cycle and spurious overload alarms.
Try to read from the DMA buffer more aggressively, to mitigate
timing issues between DMA and service task. The SW update should
be cheap.
Fixes: 51f307856b60 ("bnxt_en: Allow statistics DMA to be configurable using ethtool -C.")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20260619191538.104165-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each PPP control protocol (LCP/IPCP/IPV6CP) embedded in struct ppp
registers a timer via timer_setup(). That struct ppp is the
hdlc->state allocation, which detach_hdlc_protocol() frees with kfree()
in both teardown paths: unregister_hdlc_device() and the re-attach inside
attach_hdlc_protocol().
The ppp proto never registered a .detach callback, so
detach_hdlc_protocol() performs no timer synchronization before the
kfree(). The only cancel, timer_delete(&proto->timer) in ppp_cp_event(),
is partial (it does not wait for a running callback) and only runs on the
->CLOSED transition; ppp_stop()/ppp_close() do not sync either. A
ppp_timer callback already executing (blocked on ppp->lock) survives the
kfree and then dereferences proto->state / ppp->lock in freed memory,
leading to a use-after-free.
Fix this by adding a .detach helper that calls timer_shutdown_sync() on
every per-proto timer. detach_hdlc_protocol() invokes proto->detach(dev)
before kfree(hdlc->state), so timer_shutdown_sync()
now runs on both free paths.
timer_shutdown_sync() is used instead of timer_delete_sync() because the
keepalive path re-arms the timer through add_timer()/mod_timer() and
shutdown blocks any re-activation during teardown.
Initialize the per-protocol timers in ppp_ioctl() when the protocol is
attached, and remove the now-redundant timer_setup() from ppp_start(), so
that the timers are initialized exactly once at attach time and
ppp_timer_release() never operates on uninitialized timer_list
structures. attach_hdlc_protocol() uses kmalloc() (not kzalloc), so
struct ppp's protos[i].timer is uninitialized garbage until the first
timer_setup(); without this init-at-attach, attaching the PPP protocol
without ever bringing the device up would leave timer_shutdown_sync()
operating on uninitialized memory in .detach. Moving the init out of
ppp_start() (which only runs on NETDEV_UP) into the attach path makes the
initialization unconditional and avoids initializing the same timer_list
twice.
This bug was found by static analysis.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: Fan Wu <fanwu01@zju.edu.cn>
Link: https://patch.msgid.link/20260617020518.116319-1-fanwu01@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
icssg_ndo_get_stats64() unconditionally calls emac_get_stat_by_name()
with FW PA stat names regardless of whether the PA stats block is
present on the hardware. emac_get_stat_by_name() already guards the
PA stats lookup with `if (emac->prueth->pa_stats)`; when that pointer
is NULL the lookup falls through to netdev_err() and returns -EINVAL.
Because ndo_get_stats64 is polled regularly by the networking stack
this produces thousands of log entries of the form:
icssg-prueth icssg1-eth end0: Invalid stats FW_RX_ERROR
A secondary consequence is that the int(-EINVAL) return value is
implicitly widened to a near-ULLONG_MAX unsigned value when accumulated
into the __u64 fields of rtnl_link_stats64, silently corrupting the
rx_errors, rx_dropped and tx_dropped counters reported by `ip -s link`.
Every other PA-aware code path in the driver is already guarded with
the same `if (emac->prueth->pa_stats)` check. Apply the same guard
here.
Fixes: 0d15a26b247d ("net: ti: icssg-prueth: Add ICSSG FW Stats")
Signed-off-by: Philippe Schenker <philippe.schenker@impulsing.ch>
Reviewed-by: Simon Horman <horms@kernel.org>
Cc: danishanwar@ti.com
Cc: rogerq@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260618093037.3448858-1-dev@pschenker.ch
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In ofdpa_port_fdb(), the hash_del() only unlinks the node from
hash table, but does not free it.
Fix this by adding kfree(found) after the !found == removing check,
where the pointer value is no longer needed.
Found by Coccinelle kfree script.
Cc: <stable+noautosel@kernel.org> # rocker is a test harness, it's never loaded on production systems
Signed-off-by: Ziran Zhang <zhangcoder@yeah.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260616013245.7098-1-zhangcoder@yeah.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 3c7bf5af21960 ("e1000e: Introduce private flag to disable K1")
disabled K1 by default on Meteor Lake and newer systems due to packet
loss observed on various platforms. However, disabling K1 caused an
increase in power consumption.
To mitigate this, reconfigure the PLL clock gate value so that K1 can
remain enabled without incurring the additional power consumption.
Re-enable K1 by default, but keep the private flag to support disabling
it via ethtool. Additionally, introduce a DMI quirk table, so that K1 may
be disabled by default on known problematic systems. Currently, this
includes the Dell Pro 16 Plus, where the issue has been reported to persist
despite the changes to the PLL lock timeout.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220954
Link: https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20250623/048860.html
Link: https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20260330/054059.html
Signed-off-by: Dima Ruinskiy <dima.ruinskiy@intel.com>
Co-developed-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Fixes: 3c7bf5af21960 ("e1000e: Introduce private flag to disable K1")
Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
i40e_debug() macro takes struct i40e_hw *h as first argument. But the
macro body uses hw instead of h. This has been working so far because hw
happens to be the name of the variable in the context where the macro is
expanded. Fix the macro to use the passed argument.
Fixes: 5dfd37c37a44 ("i40e: Split i40e_osdep.h")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Several error return paths in ice_dpll_init_info() directly return
without freeing previously allocated resources, causing memory leaks:
- When de->input_prio allocation fails, d->inputs is leaked
- When dp->input_prio allocation fails, d->inputs and de->input_prio
are leaked
- When ice_get_cgu_rclk_pin_info() fails, all previously allocated
inputs/outputs/input_prio are leaked
- When ice_dpll_init_pins_info(RCLK_INPUT) fails, same resources
are leaked
Fix this by jumping to the deinit_info label which properly calls
ice_dpll_deinit_info() to free all allocated resources.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ice_dpll_deinit_info() calls kfree() on several pf->dplls fields
(inputs, outputs, eec.input_prio, pps.input_prio) but does not set
the pointers to NULL afterward. This leaves dangling pointers in the
pf->dplls structure.
While not currently exploitable through existing code paths, this is
unsafe because:
1. If ice_dpll_init_info() is called again after a deinit (e.g. during
driver recovery), and a subsequent allocation within init fails, the
error path will jump to deinit_info and call ice_dpll_deinit_info()
again. Since some pointers still hold the old freed addresses, this
would result in a double-free.
2. Any future code that checks these pointers before use or after free
would be unprotected against use-after-free.
Follow the common kernel convention of setting pointers to NULL after
kfree() so that:
- kfree(NULL) is a safe no-op, preventing double-free
- NULL checks on these pointers become meaningful
This is a preparatory fix for a subsequent patch that routes additional
error paths in ice_dpll_init_info() to the deinit_info label.
Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu")
Signed-off-by: ZhaoJinming <zhaojinming@uniontech.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
netif_keep_dst() only needs to be called once for the uplink VSI, not
once for each port representor. Move it from ice_eswitch_setup_repr()
to ice_eswitch_enable_switchdev().
Fixes: defd52455aee ("ice: do Tx through PF netdev in slow-path")
Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Patryk Holda <patryk.holda@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ice_init_link() can return an error status from ice_update_link_info()
or ice_init_phy_user_cfg(), causing probe to fail.
An incorrect NVM update procedure can result in link/PHY errors, and
the recommended resolution is to update the NVM using the correct
procedure. If the driver fails probe due to link errors, the user
cannot update the NVM to recover. The link/PHY errors logged are
non-fatal: they are already annotated as 'not a fatal error if this
fails'.
Since none of the errors inside ice_init_link() should prevent probe
from completing, convert it to void and remove the error check in the
caller. All failures are already logged; callers have no meaningful
recovery path for link init errors.
Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Cc: stable@vger.kernel.org
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Alexander Nowlin <alexander.nowlin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix unreachable code: the conditionals in ice_set_pauseparam() used
the bitwise-AND operator suggesting aq_failures is a bitmap, but it
is actually an enum, making the third condition logically unreachable.
Replace the if-else ladder with a switch statement. Also move the
aq_failures initialization to the variable declaration and remove the
redundant zeroing from ice_set_fc().
Fixes: fcea6f3da546 ("ice: Add stats and ethtool support")
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Resetting all VFs causes resource leak on VFs with FDIR filters
enabled as CTRL VSIs are only invalidated and not freed. Fix by using
ice_vf_ctrl_vsi_release() instead of ice_vf_ctrl_invalidate_vsi() which
aligns behavior with the ice_reset_vf() function.
Reproduction:
echo 1 > /sys/class/net/$pf/device/sriov_numvfs
ethtool -N $vf flow-type ether proto 0x9000 action 0
echo 1 > /sys/class/net/$pf/device/reset
Fixes: da62c5ff9dcd ("ice: Add support for per VF ctrl VSI enabling")
Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and Thunderbolt driver updates from Greg KH:
"Here is the big set of USB and Thunderbolt driver changes for 7.2-rc1.
Lots of little stuff in here, major highlights include:
- USB4STREAM support for Thunderbolt devices. A new way to send "raw"
data very quickly over a USB4 connection to another system directly
- Other thunderbolt updates and changes to make the stream code work
- xhci driver updates and additions
- typec driver updates and additions
- usb gadget driver updates and fixes for reported issues
- zh_CN documentation translation of the USB documentation
- usb-serial driver updates
- dts cleanups for some USB platforms
- other minor USB driver updates and tweaks
All of these have been in linux-next for over a week with no reported
issues, most of them for many many weeks"
* tag 'usb-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (131 commits)
usb: ucsi: huawei_gaokun: support mode switching
thunderbolt: debugfs: Fix sideband write size check
thunderbolt: debugfs: Fix margining error counter buffer leak
usb: host: xhci-rcar: Split R-Car Gen2 and Gen3 .plat_start() handling
usb: host: xhci-rcar: Remove SET_XHCI_PLAT_PRIV_FOR_RCAR() macro
usb: xhci: allocate internal DCBAA mirror dynamically
usb: xhci: allocate DCBAA based on host controller max slots
usb: xhci: refactor DCBAA struct
xhci: Prevent queuing new commands if xhci is inaccessible
xhci: dbc: detect and recover hung DbC during enumeraton
xhci: dbc: add timestamps to DbC state changes in a new helper.
xhci: dbc: add helper to set and clear DbC DCE enable bit
xhci: dbc: serialize enabling and disabling dbc
xhci: dbc: Fix sysfs ABI Documentation for xhci dbc states
usb: xhci: Improve Soft Retries after short transfers
usb: xhci: Remove isochronous URB_SHORT_NOT_OK handling
usb: xhci: Remove skip_isoc_td()
usb: xhci: Simplify xhci_quiesce()
usb: xhci: remove legacy 'num_trbs_free' tracking
usb: xhci: fix typo in xhci_set_port_power() comment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
"Here is the big set of TTY and Serial driver updates for 7.2-rc1.
Overall we end up removing more code than added, due to an obsolete
synclink_gt driver being removed from the tree, always a nice thing to
see happen.
Other than that driver removal, major things included in here are:
- max310x serial driver updates and fixes
- 8250 driver updates and rework in places to make it more "modern"
- dts file updates
- serial driver core tweaks and updates
- vt code cleanups
- vc_screen crash fixes
- other minor driver updates and cleanups
All of these have been in linux-next for well over a week with no
reported issues"
* tag 'tty-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (49 commits)
serial: 8250_pci: Don't specify conflicting values to pci_device_id members
vc_screen: fix null-ptr-deref in vcs_notifier() during concurrent vcs_write
serial: qcom_geni: Fix RX DMA stall when SE_DMA_RX_LEN_IN is zero
vt: merge ucs_is_zero_width()/ucs_is_double_width() into ucs_get_width()
serial: 8250: fix possible ISR soft lockup
dt-bindings: serial: rs485: remove deprecated .txt binding stub
serial: qcom-geni: trace: Add tracepoint support for Qualcomm GENI serial
tty: serial: Use named initializers for arrays of i2c_device_data
serial: 8250_dw: remove clock-notifier infrastructure
serial: 8250_dw: unregister 8250 port if clk_notifier_register() fails
amba/serial: amba-pl011: Bring back zx29 UART support
serial: 8250: Add support for console flow control
serial: 8250: Check LSR timeout on console flow control
serial: 8250: Set cons_flow on port registration
tty: serial: 8250: protect against NULL uart->port.dev in register
arm64: dts: add support for A9 based Amlogic BY401
dt-bindings: arm: amlogic: add A311Y3 support
serial: max310x: fix compile errors if CONFIG_SPI_MASTER is disabled
serial: qcom-geni: Avoid probing debug console UART without console support
serial: max310x: add comments for PLL limits
...
|
|
The dpaa2-switch driver does not support VLAN uppers while its ports are
bridged. This scenario tried to be prevented by rejecting a bridge join
while VLAN uppers exist but the reverse order was still possible.
This patches adds a check so that the dpaa2-switch also does not accept
VLAN uppers while bridged.
Fixes: f48298d3fbfa ("staging: dpaa2-switch: move the driver out of staging")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260618092813.432535-2-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In airoha_dev_select_queue(), the expression:
queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES;
implicitly converts to unsigned arithmetic: when skb->priority is 0
(the default for unclassified traffic), (0u - 1u) wraps to UINT_MAX,
and UINT_MAX % 8 = 7, routing default best-effort packets to the
highest-priority QoS queue. This causes QoS inversion where the
majority of traffic on a PON gateway starves actual high-priority
flows (VoIP, gaming, etc.).
The "- 1" offset was a leftover from the ETS offload implementation
that has since been removed. The correct mapping is a direct modulo:
queue = skb->priority % AIROHA_NUM_QOS_QUEUES;
This maps priority 0 → queue 0 (lowest), priority 7 → queue 7
(highest), with higher priorities wrapping around. This is the
standard Linux sk_prio → HW queue mapping used by other drivers.
Fixes: 2b288b81560b ("net: airoha: Introduce ndo_select_queue callback")
Link: https://lore.kernel.org/netdev/178185573207.2378135.3729126358670287878@gmail.com/
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Wayen Yan <win847@gmail.com>
Link: https://patch.msgid.link/178194366700.2485734.5368768965976693502@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move devm_request_irq() after devm_platform_ioremap_resource() so that
dev->emacp is mapped before the interrupt handler can fire. An early
interrupt hitting emac_irq() would dereference the NULL dev->emacp and
crash.
Also remove redundant error message. devm_platform_ioremap_resource()
already returns an error message with dev_err_probe().
Fixes: dcc34ef7c834 ("net: ibm: emac: manage emac_irq with devm")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260618023405.415644-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2026-06-20
An overdue pull request for ieee802154, catching up on all the AI found issues
at last.
Shitalkumar Gandhi fixed problems in the ca8210 driver for cases where we could
have a leak or a pointer truncation.
Robertus Diawan Chris made sure we do not overwrite the return code when
associating.
Michael Bommarito worked on properly gating our netlink API use in the llsec
security context.
Ivan Abramov cleaned up the netns cases as he did in other subsystems.
Doruk Tan Ozturk ensures we have the correct skn ready in cryptoo operation (to
avoid a silent overwrite).
Aleksandr Nogikh fixed a kernel-infoleak detected by syzbot.
* tag 'ieee802154-for-net-next-2026-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan-next:
ieee802154: allow legacy LLSEC ADD/DEL ops to pass strict validation
ieee802154: admin-gate legacy LLSEC dump operations
mac802154: Prevent overwrite return code in mac802154_perform_association()
ieee802154: fix kernel-infoleak in dgram_recvmsg()
mac802154: llsec: add skb_cow_data() before in-place crypto
ieee802154: ca8210: fix pointer truncation in kfifo on 64-bit
ieee802154: ca8210: fix cas_ctl leak on spi_async failure
ieee802154: Remove WARN_ON() in cfg802154_pernet_exit()
ieee802154: Avoid calling WARN_ON() on -ENOMEM in cfg802154_switch_netns()
ieee802154: Restore initial state on failed device_rename() in cfg802154_switch_netns()
====================
Link: https://patch.msgid.link/20260620174903.1010671-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On PF shutdown, the current driver free mcs hardware
resources though mcs resources are not allocated to it.
This patch checks the mcs resources status and if resources
are allocated then only sends mailbox message to free them.
Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1781636420-19816-3-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When all MCS resources mapped to a PF are being freed then clear
stats of all those resources too.
Fixes: 815debbbf7b5 ("octeontx2-pf: mcs: Clear stats before freeing resource")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1781636420-19816-2-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Secy control stats counter doesn't exist for CNF10KB platform.
Skip reading this respective register for CNF10KB silicon while
fetching secy stats.
Fixes: 9312150af8da ("octeontx2-af: cn10k: mcs: Support for stats collection")
Signed-off-by: Geetha sowjanya <gakula@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://patch.msgid.link/1781636420-19816-1-git-send-email-sbhatta@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
npc_defrag_alloc_free_slots() always passed NPC_MCAM_KEY_X2 into
__npc_subbank_alloc(), which must match sb->key_type, so defrag never
allocated replacement slots on X4 banks. Pass the subbank key type for
bank 0, and only extend the search into bank 1 for X2 (X4 MCAM indices
are confined to b0b..b0t).
Fixes: 645c6e3c1999 ("octeontx2-af: npc: cn20k: virtual index support")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260617102149.1309913-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In mtk_ppe_init(), when accounting is enabled, the error paths for
dmam_alloc_coherent(mib) and devm_kzalloc(acct) failures return NULL
directly, bypassing the err_free_l2_flows label that destroys the
rhashtable initialized earlier.
While this leak only occurs during probe (not runtime) and the leaked
memory is minimal (an empty rhash table), fixing it ensures proper
error path cleanup consistency.
Fix by changing the two return NULL statements to goto err_free_l2_flows.
Fixes: 603ea5e7ffa7 ("net: ethernet: mtk_eth_soc: fix memory leak in error path")
Signed-off-by: Wayen Yan <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/178167550101.2217645.14579307712717502425@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
prestera_port_sfp_bind() returns err after walking the ports node. If no
child node matches the port's front-panel id, err is never assigned.
Initialize err to 0 because absence of a matching optional port device
tree node is not an error. In that case no phylink is created and port
creation should continue with port->phy_link left NULL. Errors from
malformed matched nodes and phylink_create() still propagate.
Fixes: 52323ef75414 ("net: marvell: prestera: add phylink support")
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Acked-by: Elad Nachman <enachman@marvell.com>
Link: https://patch.msgid.link/20260617193228.1653582-1-ruoyuw560@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ca8210_test_int_driver_write() and ca8210_test_int_user_read() exchange
a kmalloc'd buffer pointer through a struct kfifo, but pass a literal
'4' as the byte count to kfifo_in()/kfifo_out().
This is correct on 32-bit (pointer = 4 bytes), but on 64-bit only the
low 4 bytes of the 8-byte pointer are written into the FIFO. The reader
then reads back 4 bytes into an 8-byte local pointer variable, leaving
the upper 4 bytes uninitialized stack data. The first dereference of
the reconstructed pointer (fifo_buffer[1]) accesses an arbitrary kernel
address and generally results in an oops.
Use sizeof(fifo_buffer) so the byte count matches pointer width on every
architecture.
The driver has no architecture restriction in Kconfig, so any 64-bit
build with CONFIG_IEEE802154_CA8210_DEBUGFS=y is exposed. Issue has
been latent since the driver was added in 2017 because it is most
commonly deployed on 32-bit MCUs.
Found via a custom Coccinelle semantic patch hunting for short-byte
kfifo I/O on byte-mode kfifos used to shuttle pointers.
Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/20260520105750.30144-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
ca8210_spi_transfer() allocates cas_ctl with kzalloc_obj(GFP_ATOMIC)
and relies entirely on the SPI completion callback
ca8210_spi_transfer_complete() to free it.
The spi_async() API only invokes the completion callback on successful
submission. On failure it returns a negative error code without ever
queuing the callback, which leaves cas_ctl and its embedded spi_message
and spi_transfer orphaned. Every kfree(cas_ctl) in the driver is
inside the completion callback, so there is no other reclamation path.
ca8210_spi_transfer() is called from ca8210_spi_exchange(), the
interrupt handler ca8210_interrupt_handler(), and from the retry path
inside the completion callback itself. The exchange and interrupt
handler paths loop on -EBUSY, so under sustained SPI bus contention
every retry iteration leaks a fresh cas_ctl (~600 bytes per
occurrence).
Fix it by freeing cas_ctl on the spi_async() error path. While here,
correct the misleading error string: the function calls spi_async(),
not spi_sync().
Fixes: ded845a781a5 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
Cc: stable@vger.kernel.org
Signed-off-by: Shitalkumar Gandhi <shitalkumar.gandhi@cambiumnetworks.com>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/20260421073259.2259783-1-shitalkumar.gandhi@cambiumnetworks.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
|
|
When __fbnic_set_rx_mode() is called from contexts other than
.ndo_set_rx_mode_async(), the uc and mc addr lists are accessed
without the addr lock that __hw_addr_sync_dev() and
__hw_addr_unsync_dev() require. Wrap these unprotected accesses with
netif_addr_lock_bh(). fbnic_clear_rx_mode() has similar issues.
Fixes: eb690ef8d1c2 ("eth: fbnic: Add L2 address programming")
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260617-linux-fbnic-hwaddr-v1-1-3f9f5dee7f99@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Like commit a24162f18825("i40e: don't advertise IFF_SUPP_NOFCS"),
ngbe and txgbe also advertises IFF_SUPP_NOFCS and allowing users
to use the SO_NOFCS socket option. But the driver does not check
skb->no_fcs, so this option is silently ignored.
With this change, send() fails with -EPROTONOSUPPORT when AF_PACKET
socket is set SO_NOFCS option.
Signed-off-by: Rongguang Wei <weirongguang@kylinos.cn>
Link: https://patch.msgid.link/20260617092854.133992-1-clementwei90@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
create_queues_with_size_backoff() creates XDP TX queues before setting
up the regular TX path. If the subsequent allocation or creation of
regular TX queues fails, the error handling paths omit the teardown of the
XDP TX queues, leading to a resource leak.
Fix this by explicitly destroying the XDP TX queue subset at the two
missing failure points.
The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available. Manual inspection confirms that the bug is still
present in v7.1-rc7.
An x86_64 allyesconfig build showed no new warnings. As we do not have
an ENA device to test with, no runtime testing was able to be performed.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action")
Cc: stable@vger.kernel.org
Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
Reviewed-by: Arthur Kiyanovski <akiyano@amazon.com>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Link: https://patch.msgid.link/20260616142424.4005130-1-dawei.feng@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The DQO RX datapath programs a per-buffer-queue-descriptor
header_buf_addr at post time and reads the split header back at
completion time. Both the post and the read currently index the
header buffer by queue position rather than by the buffer's identity:
- post (gve_rx_post_buffers_dqo): header_buf_addr is computed from
bufq->tail
- read (gve_rx_dqo): the header is read from desc_idx (the completion
queue head index)
This relies on the buffer-queue index and the completion-queue index
being equal for the start of every packet, i.e. on the device consuming
posted buffers and returning completions in the exact same order. That
assumption does not hold once HW-GRO is enabled with multiple
flows: coalesced segments are accepted and completed in an order that
may differ from the order buffers were posted, and segments from
different flows may interleave.
That results in two problems:
1. Wrong header slot on read. Because the read offset is derived from
the completion index (desc_idx) while the device wrote the header to
the address programmed for the buffer's buf_id, the driver can copy
a header belonging to a different packet. This shows up as
throughput drop (about 30% drop and large numbers of TCP
retransmissions) with header-split and HW-GRO both enabled and many
streams.
2. Header buffer reused while still owned by the device. The driver
advances bufq->head by one per completion and re-posts buffers based
on that. Arrival of N RX completions only guarantees that at least N
RX buffer descriptors have been read by the device. It does not
guarantee that the device has relinquished the ownership of all the
buffers corresponding to those N descriptors. With out-of-order
completions (e.g. the completion for a packet copied into buffer N
arrives before the completion for a packet copied into buffer N-1),
the driver can re-post and overwrite a header buffer that the device
is still going to write into, corrupting the header of a packet
whose completion has not yet been processed.
Fix both issues by indexing the header buffer by buf_id on both the post
and read paths. Reading from buf_id's slot is therefore always correct
regardless of completion ordering (fixes problem 1).
Indexing by buf_id also ties each header slot to the lifetime of its
buffer state. A buffer state is only returned to the free/recycle lists
when its own completion (buf_id) is processed, so its header slot can
only be re-posted after the device is done with it. This makes header
slot reuse safe under out-of-order completions (fixes problem 2).
Allocate (gve_rx_alloc_hdr_bufs) and free (gve_rx_free_hdr_bufs) the
header buffers based on num_buf_states to match the buf_id indexing.
Cc: stable@vger.kernel.org
Fixes: 5e37d8254e7f ("gve: Add header split data path")
Signed-off-by: Ankit Garg <nktgrg@google.com>
Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Joshua Washington <joshwash@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260617013208.3781453-1-joshwash@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tbnet_poll() assembles a multi-frame ThunderboltIP packet into one skb. The
first frame goes into the skb linear area and every further frame is added as
a page fragment.
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
page, hdr_size, frame_size,
TBNET_RX_PAGE_SIZE - hdr_size);
A packet of frame_count frames therefore ends up with frame_count - 1
fragments. tbnet_check_frame() only bounds the peer supplied frame_count to
TBNET_RING_SIZE / 4 (64), which is far above MAX_SKB_FRAGS (17 by default). A
peer that sends a packet of 19 or more small frames pushes nr_frags past
MAX_SKB_FRAGS, so skb_add_rx_frag() writes past skb_shinfo()->frags[] and
corrupts memory after the shared info.
Tighten the start of packet bound to MAX_SKB_FRAGS + 1 so a packet can never
produce more fragments than frags[] can hold. This matches the recent skb
frags overflow fixes in other receive paths, for example f0813bcd2d9d ("net:
wwan: t7xx: fix potential skb->frags overflow in RX path") and 600dc40554dc
("net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete()").
Fixes: e69b6c02b4c3 ("net: Add support for networking over Thunderbolt cable")
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Link: https://patch.msgid.link/178163152194.2486768.14724194232649760778@maoyixie.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
nt->buf is exactly MAX_PRINT_CHUNK bytes, but scnprintf() reserves one
byte for its NUL terminator, so a non-fragmented payload of exactly
MAX_PRINT_CHUNK loses its last byte (emitted as a stray NUL in the
release path). Grow nt->buf to MAX_PRINT_CHUNK + 1 and bound the
scnprintf() calls with sizeof(nt->buf); the transmitted length stays
capped at MAX_PRINT_CHUNK.
Alternatively, nt->buf could be left at MAX_PRINT_CHUNK and the NUL byte
reserved by routing exactly-MAX_PRINT_CHUNK payloads to fragmentation
('len < MAX_PRINT_CHUNK'), at the cost of fragmenting those messages.
But it would look less sane, thus the current approach.
Fixes: c62c0a17f9b7 ("netconsole: Append kernel version to message")
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260616-max_print_chunk-v1-1-8dc125d67083@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
the TX queue.
While the exact root cause is not yet fully understood, it is likely
related to a hardware issue where a TSTART write to the NCR register
is missed, preventing the transmission from being kicked off.
Implement a timeout callback to handle TX queue stalls, triggering the
existing restart mechanism to recover.
Link: https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
Fixes: dc110d1b23564 ("net: cadence: macb: Add support for Raspberry Pi RP1 ethernet controller")
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/468f480454a314303bac6a54780b153f689f2267.1781598350.git.andrea.porta@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
foe_check_time is declared as u16 pointer but was allocated with
only ppe_num_entries bytes instead of ppe_num_entries * sizeof(u16).
When airoha_ppe_foe_verify_entry() is called with hash >= ppe_num_entries/2,
it writes beyond the allocated buffer, causing heap buffer overflow and
potential kernel crash.
Fixes: 6d5b601d52a2 ("net: airoha: ppe: Dynamically allocate foe_check_time array in airoha_ppe struct")
Signed-off-by: Wayen Yan <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/178161119471.2163752.14373384830691569758@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
rvu_mbox_handler_lmtst_tbl_setup() uses req->base_pcifunc as a direct
index into the LMT map table to read another function's LMTLINE
physical base address and copy it into the caller's own LMT map table
entry. The mailbox dispatcher authenticates req->hdr.pcifunc from the
IRQ source, but req->base_pcifunc is a separate payload field and is
not sanitized.
Reject the request with -EPERM when a VF caller's base_pcifunc is not a
valid function under its own PF. is_pf_func_valid() bounds the FUNC field
to the PF's configured VF count, keeping the computed index inside the
caller's own slot block.
Fixes: 893ae97214c3 ("octeontx2-af: cn10k: Support configurable LMTST regions")
Reported-by: Yuhao Jiang <danisjiang@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
Link: https://patch.msgid.link/SYBPR01MB78811656934E713B77DA6CEDAFE62@SYBPR01MB7881.ausprd01.prod.outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
pch_gbe_alloc_tx_buffers() allocates an skb for each TX descriptor and
then passes the returned pointer to skb_reserve(). If netdev_alloc_skb()
fails, skb_reserve() dereferences NULL.
Make pch_gbe_alloc_tx_buffers() return an error when an skb allocation
fails. On failure, let pch_gbe_alloc_tx_buffers() clean the partially
allocated TX ring before returning the error. While bringing the device
up, release the RX buffer pool through a shared cleanup helper before
unwinding the IRQ setup.
Cc: stable+noautosel@kernel.org # untested fix to unlikely error path
Fixes: 77555ee72282 ("net: Add Gigabit Ethernet driver of Topcliff PCH")
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260615125043.3537046-1-ruoyuw560@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The blamed commit refactored the prechangeupper event handling but
failed to actually return an error in case
dpaa2_switch_prevent_bridging_with_8021q_upper() detected a 802.1q upper
on a port which tries to join a bridge. Fix this by returning err
instead of 0.
Fixes: 45035febc495 ("net: dpaa2-switch: refactor prechangeupper sanity checks")
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260616105430.3725910-1-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
receive_big() bounds the device-announced length by
(big_packets_num_skbfrags + 1) * PAGE_SIZE. That is still too loose:
add_recvbuf_big() sets sg[1] to start at offset
sizeof(struct padded_vnet_hdr) into the first page, so the chain
actually carries hdr_len + (PAGE_SIZE - sizeof(padded_vnet_hdr)) +
big_packets_num_skbfrags * PAGE_SIZE bytes -- 20 bytes less than the
check allows for the common hdr_len == 12 case.
A malicious virtio backend can announce a len in that gap. page_to_skb()
then walks one frag past the page chain, storing a NULL page->private
into skb_shinfo()->frags[MAX_SKB_FRAGS], which is both an out-of-bounds
write past the static frag array and a NULL frag handed up the rx path.
Bound len by the size add_recvbuf_big() actually advertised.
Fixes: 0c716703965f ("virtio-net: fix received length check in big packets")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Signed-off-by: Xiang Mei <xmei5@asu.edu>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Bui Quang Minh <minhquangbui99@gmail.com>
Link: https://patch.msgid.link/20260616042837.2249468-1-xmei5@asu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
mlx5_query_mtppse() reads the Event Trigger Pin (MTPPSE) register but
reads the returned arm and mode values from the input buffer 'in'
instead of the output buffer 'out', so it always returns the values
that were written rather than the actual hardware state, making the
query useless.
The function has no in-tree callers. Remove it rather than fix it.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260615140406.1828-1-lirongqing@baidu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ehea_create_device_sysfs() creates probe_port and then remove_port. If
the second device_create_file() fails, the helper returns the error but
leaves probe_port installed even though probe treats the sysfs setup as
failed.
Remove probe_port on the remove_port creation failure path so the helper
leaves no partial sysfs state behind.
Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260615070033.43461-1-pengpeng@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
npc_install_mcam_drop_rule() used dev_err() after a successful
rvu_mbox_handler_npc_mcam_write_entry() call, so normal installs appeared
as errors in dmesg. Use dev_dbg() for the success path and keep dev_err()
for real failures.
Fixes: 3571fe07a090 ("octeontx2-af: Drop rules for NPC MCAM")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260615033157.535237-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The send-queue timestamp ring is allocated with qmem_alloc() when
timestamping is used, but otx2_free_sq_res() never freed sq->timestamps,
leaking that memory across ifdown and device removal. Add the missing
qmem_free() alongside the other SQ companion buffers.
Fixes: c9c12d339d93 ("octeontx2-pf: Add support for PTP clock")
Cc: Aleksey Makarov <amakarov@marvell.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260615030704.504536-1-rkannoth@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Everything configured in phylink_config it's assumed to be set before
calling phylink_create() to permit correct parsing of all the different
modes and capabilities.
Commit 51cf06ddafc9 ("net: ethernet: mtk_eth_soc: add support for MT7988
internal 2.5G PHY") while introducing support for 2.5G phy for MT7988,
probably due to an auto-rebase, placed the configuration of the INTERNAL
interface mode for the supported_interfaces for phylink_config right after
phylink_create() introducing a possible problem with supported interfaces
parsing.
While this doesn't currently create any problem/bug, move setting this bit
before phylink_create() to prevent any possible regression in future code
change in phylink core.
Fixes: 51cf06ddafc9 ("net: ethernet: mtk_eth_soc: add support for MT7988 internal 2.5G PHY")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/20260615151106.15438-1-ansuelsmth@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Set User Byte to Save command has three subject bytes.
The PD692x0 protocol guides defines SUB2 with value 0x4e, while SUB1
carries the NVM user byte.
Template only initialized SUB and SUB1.
Fill SUB2 explicitly so the command matches the documented layout.
Signed-off-by: Robert Marko <robert.marko@sartura.hr>
Acked-by: Kory Maincent <kory.maincent@bootlin.com>
Link: https://patch.msgid.link/20260611102517.445549-1-robert.marko@sartura.hr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull virtio updates from Michael Tsirkin:
- new virtio CAN driver
- support for LoongArch architecture in fw_cfg
- support for firmware notifications in vdpa/octeon_ep
- support for VFs in virtio core
- fixes, cleanups all over the place, notably:
- vhost: fix vhost_get_avail_idx for a non empty ring
fixing an significant old perf regression
- READ_ONCE() annotations mean virtio ring is now
free of KCSAN warnings
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (37 commits)
can: virtio: Fix comment in UAPI header
can: virtio: Add virtio CAN driver
virtio: add num_vf callback to virtio_bus
fw_cfg: Add support for LoongArch architecture
vdpa/octeon_ep: fix IRQ-to-ring mapping in interrupt handler
vdpa/octeon_ep: Add vDPA device event handling for firmware notifications
vdpa/octeon_ep: Use 4 bytes for mailbox signature
vdpa/octeon_ep: Fix PF->VF mailbox data address calculation
vhost_task_create: kill unnecessary .exit_signal initialization
vhost: remove unnecessary module_init/exit functions
vdpa/mlx5: Use kvzalloc_flex() for MTT command memory
vdpa_sim_net: switch to dynamic root device
vdpa_sim_blk: switch to dynamic root device
virtio-mem: Destroy mutex before freeing virtio_mem
virtio-balloon: Destroy mutex before freeing virtio_balloon
tools/virtio: fix build for kmalloc_obj API and missing stubs
virtio_ring: Add READ_ONCE annotations for device-writable fields
vduse: fix compat handling for VDUSE_IOTLB_GET_FD/VDUSE_VQ_GET_INFO
tools/virtio: check mmap return value in vringh_test
vhost/net: complete zerocopy ubufs only once
...
|
|
Pull workqueue updates from Tejun Heo:
- Continued progress toward making alloc_workqueue() unbound by
default: more callers converted to WQ_PERCPU / system_percpu_wq /
system_dfl_wq, and new warnings for queues that use neither WQ_PERCPU
nor WQ_UNBOUND or the legacy system_wq / system_unbound_wq.
- Misc: drop the now-trivial apply_wqattrs_lock()/unlock() wrappers,
forbid the TEST_WORKQUEUE benchmark from being built-in, and fix a
spurious pointer level in the worker debug-dump path.
* tag 'wq-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
drm/bridge: anx7625: Add WQ_PERCPU add to alloc_workqueue
wifi: ath6kl: fix invalid workqueue flags in ath6kl_usb_create()
btrfs: Drop WQ_PERCPU from ordered_flags in btrfs_init_workqueues()
workqueue: Add warnings and ensure one among WQ_PERCPU or WQ_UNBOUND is present
workqueue: Add warnings and fallback if system_{unbound}_wq is used
workqueue: drop spurious '*' from print_worker_info() fn declaration
workqueue: forbid TEST_WORKQUEUE from being built-in
workqueue: drop apply_wqattrs_lock()/unlock() wrappers
umh: replace use of system_unbound_wq with system_dfl_wq
rapidio: rio: add WQ_PERCPU to alloc_workqueue users
media: ddbridge: add WQ_PERCPU to alloc_workqueue users
platform: cznic: turris-omnia-mcu: replace use of system_wq with system_percpu_wq
media: synopsys: hdmirx: replace use of system_unbound_wq with system_dfl_wq
virt: acrn: Add WQ_PERCPU to alloc_workqueue users
|
|
Pull bitmap updates from Yury Norov:
"This includes the new FIELD_GET_SIGNED() helper,
bitmap_print_to_pagebuf() removal, RISCV/bitrev support, and a couple
cleanups.
- new handy helper FIELD_GET_SIGNED() (Yury)
- arch test_and_set_bit_lock() and clear_bit_unlock() cleanup (Randy)
- __bf_shf() simplification (Yury)
- bitmap_print_to_pagebuf() removal (Yury)
- RISCV/bitrev conditional support (Jindie, Yury)"
* tag 'bitmap-for-7.2' of https://github.com/norov/linux:
MAINTAINERS: BITOPS: include bitrev.[ch]
arch/riscv: Add bitrev.h file to support rev8 and brev8
bitops: Define generic___bitrev8/16/32 for reuse
lib/bitrev: Introduce GENERIC_BITREVERSE
arch: select HAVE_ARCH_BITREVERSE conditionally on BITREVERSE
bitmap: fix find helper documentation
bitmap: drop bitmap_print_to_pagebuf()
cpumask: switch cpumap_print_to_pagebuf() to using scnprintf()
bitfield: wire __bf_shf to __builtin_ctzll
bitops: use common function parameter names
ptp: switch to using FIELD_GET_SIGNED()
rtc: rv3032: switch to using FIELD_GET_SIGNED()
wifi: rtw89: switch to using FIELD_GET_SIGNED()
iio: mcp9600: switch to using FIELD_GET_SIGNED()
iio: pressure: bmp280: switch to using FIELD_GET_SIGNED()
iio: magnetometer: yas530: switch to using FIELD_GET_SIGNED()
iio: intel_dc_ti_adc: switch to using FIELD_GET_SIGNED()
x86/extable: switch to using FIELD_GET_SIGNED()
bitfield: add FIELD_GET_SIGNED()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"Major changes:
- Recover from BPF arena page faults using a scratch page and add
ptep_try_set() for lockless empty-slot installs on x86 and arm64.
This allows BPF kfuncs to access arena pointers directly.
The 'arena_direct_access' stable branch was created for this work
and was pulled into sched-ext and bpf-next trees (Tejun Heo, Kumar
Kartikeya Dwivedi)
- Lift old restriction and support 6+ arguments in BPF programs and
kfuncs on x86 and arm64 (Yonghong Song, Puranjay Mohan)
Other features and fixes:
- Add 24-bit BTF vlen and reclaim unused bits in the BTF UAPI to ease
addition of new BTF kinds (Alan Maguire)
- Raise the maximum BPF call chain depth from 8 to 16 frames (Alexei
Starovoitov)
- Refactor object relationship tracking in the verifier and fix a
dynptr use-after-free bug (Amery Hung)
- Harden the signed program loader and reject exclusive maps as inner
maps (Daniel Borkmann)
- Replace the verifier min/max bounds fields with a circular number
(cnum) representation and improve 32->64 bit range refinements
(Eduard Zingerman)
- Introduce the arena library and runtime (libarena) with a buddy
allocator, rbtree and SPMC queue data structures, ASAN support and
a parallel test harness. Allow subprograms to return arena pointers
and switch to a BTF type-tag based __arena annotation (Emil
Tsalapatis)
- Cache build IDs in the sleepable stackmap path and avoid faultable
build ID reads under mm locks (Ihor Solodrai)
- Introduce the tracing_multi link to attach a single BPF program to
many kernel functions at once. Allow specifying the uprobe_multi
target via FD (Jiri Olsa)
- Extend the bpf_list family of kfuncs with bpf_list_add/del(), and
bpf_list_is_first/is_last/empty() (Kaitao Cheng)
- Extend the BPF syscall with common attributes support for
prog_load, btf_load and map_create (Leon Hwang)
- Wrap rhashtable as BPF map (Mykyta Yatsenko, Herbert Xu)
- Add sleepable support for tracepoint programs and fix deadlocks in
LRU map due to NMI reentry (Mykyta Yatsenko)
- Fix OOB access in bpf_flow_keys, fix nullness analysis of inner
arrays, enforce write checks for global subprograms (Nuoqi Gui)
- Report the maximum combined stack depth and print a breakdown of
instructions processed per subprogram (Paul Chaignon)
- Add an XDP load-balancer benchmark and arm64 JIT support for stack
arguments (Puranjay Mohan)
- Add kfuncs to traverse over wakeup_sources (Samuel Wu)
- Allow sleepable BPF programs to use LPM trie maps directly (Vlad
Poenaru)
- Many more fixes and cleanups across the verifier, BTF, sockmap,
devmap, bpffs, security hooks, s390/riscv/loongarch JITs,
rqspinlock, libbpf, bpftool, selftests"
* tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (336 commits)
selftests/bpf: Work around llvm stack overflow in crypto progs
selftests/bpf: add test for bpf_msg_pop_data() overflow
bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
sockmap: Fix use-after-free in udp_bpf_recvmsg()
bpf, sockmap: keep sk_msg copy state in sync
bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
selftsets/bpf: Retry map update on helper_fill_hashmap()
selftests/bpf: Add test for sleepable lsm_cgroup rejection
selftests/bpf: Add test to verify the fix for bpf_setsockopt() helper
bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket
selftests/bpf: Avoid static LLVM linking for cross builds
selftests/bpf: Use common CFLAGS for urandom_read
selftests/bpf: Initialize operation name before use
tools/bpf: build: Append extra cflags
libbpf: Initialize CFLAGS before including Makefile.include
bpftool: Append extra host flags
bpftool: Avoid adding EXTRA_CFLAGS to HOST_CFLAGS
bpftool: Pass host flags to bootstrap libbpf
selftests/bpf: correct CONFIG_PPC64 macro name in comment
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Work on removing rtnl_lock protection throughout the stack
continues. In this chapter:
- don't use rtnl_lock for IPv6 multicast routing configuration
- don't take rtnl_lock in ethtool for modern drivers
- prepare Qdisc dump callbacks for rtnl_lock removal
- Support dumping just ifindex + name of all interfaces, under RCU.
It's a common operation for Netlink CLI tools (when translating
names to ifindexes) and previously required full rtnl_lock.
- Support dumping qdiscs and page pools for a specific netdev. Even
tho user space wants a dump of all netdevs, most of the time, the
OOO programming model results in repeating the dump for each
netdev. Which, in absence of a cache, leads to a O(n^2) behavior.
- Flush nexthops once on multi-nexthop removal (e.g. when device goes
down), another O(n^2) -> O(n) improvement.
- Rehash locally generated traffic to a different nexthop on
retransmit timeout.
- Honor oif when choosing nexthop for locally generated IPv6 traffic.
- Convert TCP Auth Option to crypto library, and drop non-RFC algos.
- Increase subflow limits in MPTCP to 64 and endpoint limit to 256.
- Support MPTCP signaling of IPv6 address + port (ADD_ADDR). We need
to selectively skip reporting of the standard TCP Timestamp option,
because they won't fit into the header space together (12 + 30 >
40).
- Support using bridge neighbor suppression, Duplicate Address
Detection, Gratuitous ARP and unsolicited NA forwarding - in EVPN
deployments, e.g. VXLAN fabrics (IPv4 and IPv6).
- Improve link state reporting for upper netdevs (e.g. macvlan) over
tunnel devices (again, mostly for EVPN deployments).
- Support binding GENEVE tunnels to a local address.
- Speed up UDP tunnel destruction (remove one synchronize_rcu()).
- Support exponential field encoding in multicast (IGMPv3 and MLDv2).
- Support attaching PSP crypto offload to containers (veth, netkit).
- Add a new IPSec Netlink message XFRM_MSG_MIGRATE_STATE that allows
migrating individual IPsec SAs independently of their policies.
The existing XFRM_MSG_MIGRATE is tightly coupled to policy+SA
migration, lacks SPI for unique SA identification, and cannot
express reqid changes or migrate Transport mode selectors.
The new interface identifies the SA via SPI and mark, supports
reqid changes, address family changes, encap removal, and uses an
atomic create+install flow under x->lock to prevent SN/IV reuse
during AEAD SA migration.
- Implement GRO/GSO support for PPPoE.
- Convert sockopt callbacks in a number of protocols to iov_iter.
Cross-tree stuff:
- Remove support for Crypto TFM cloning (unblocked after the TCP Auth
Option rework). This feature regressed performance for all crypto
API users, since it changed crypto transformation objects into
reference-counted objects.
- Add FCrypt-PCBC implementation to rxrpc and remove it from the
global crypto API as obsolete and insecure.
Wireless:
- Major rework of station bandwidth handling, fixing issues with
lower capability than AP.
- Cleanups for EMLSR spec issues (drafts differed).
- More Neighbor Awareness Networking (Wi-Fi Aware) work (multicast,
schedule improvements, multi-station etc.)
- Some Ultra High Reliability (UHR) / IEEE 802.11bn (D1.4) work
(e.g. non-primary channel access, UHR DBE support).
- Fine Timing Measurement ranging (i.e. distance measurement) APIs.
Netfilter:
- Use per-rule hash initval in nf_conncount. This avoids unnecessary
lock contention with short keys (e.g. conntrack zones) in different
namespaces.
- Various safety improvements, both in packet parsing and object
lifetimes. Notably add refcounts to conntrack timeout policy.
Deletions:
- Remove TLS + sockmap integration. TLS wants to pin user pages to
avoid a copy, and sockmap wants to write to the input stream. More
work on this integration is clearly needed, and we can't find any
users (original author admitted that they never deployed it).
- Remove support for TLS offload with TCP Offload Engine (the far
more common opportunistic offload is retained). The locking looks
unfixable (driver sleeps under TCP spin locks) and people from the
vendor that added this are AWOL.
- Remove more ATM code, trying to leave behind only what PPPoATM
needs, AAL5 and br2684 with permanent circuits.
- Remove AppleTalk. Let it join hamradio in our out of tree protocol
graveyard, I mean, repository.
- Disable 32-bit x_tables compatibility (32bit binaries on 64bit
kernel) interface in user namespaces. To be deleted completely,
soon.
- Remove 5/10 MHz support from cfg80211/mac80211.
Drivers:
- Software:
- Support DEVMEM/DMABUF Tx over NETMEM_TX_NO_DMA devices (netkit)
- bonding: add knob to strictly follow 802.3ad for link state
- New drivers:
- Alibaba Elastic Ethernet Adaptor (cloud vNIC).
- NXP NETC switch within i.MX94.
- DPLL:
- Add operational state to pins (implement in zl3073x).
- Add generic DPLL type, for daisy-chaining DPLLs (implement in ice).
- Ethernet high-speed NICs:
- Huawei (hinic3):
- enhance tc flow offload support with queue selection,
tunnels
- nVidia/Mellanox:
- avoid over-copying payload to the skb's linear part (up to
60% win for LRO on slow CPUs like ARM64 V2)
- expose more per-queue stats over the standard API
- support additional, unprivileged PFs in the DPU
configuration
- support Socket Direct (multi-PF) with switchdev offloads
- add a pool / frag allocator for DMA mapped buffers for
control objects, save memory on systems with 64kB page size
- take advantage of the ability to dynamically change RSS
table size, even when table is configured by the user
- increase the max RSS table size for even traffic
distribution
- Ethernet NICs:
- Marvell/Aquantia:
- AQC113 PTP support
- Realtek USB (r8152):
- support 10Gbit Link Speeds and Energy-Efficient Ethernet
(EEE)
- support firmware loaded (for RTL8157/RTL8159)
- support for the RTL8159
- Intel (ixgbe):
- support Energy-Efficient Ethernet (EEE) on E610 devices
- Ethernet switches:
- Airoha:
- support multiple netdevs on a single GDM block / port
- Marvell (mv88e6xxx):
- support SERDES of mv88e6321
- Microchip (ksz8/9):
- rework the driver callbacks to remove one indirection layer
- Motorcomm (yt921x):
- support port rate policing
- support TBF qdisc offload
- support ACL/flower offload
- nVidia/Mellanox:
- expose per-PG rx_discards
- Realtek:
- rtl8365mb: bridge offloading and VLAN support
- Ethernet PHYs:
- Airoha:
- support Airoha AN8801R Gigabit PHYs.
- Micrel:
- implement 3 low-loss cable tunables
- Realtek:
- support MDI swapping for RTL8226-CG
- support MDIO for RTL931x
- Qualcomm:
- at803x: Rx and Tx clock management for IPQ5018 PHY
- Motorcomm:
- support YT8522 100M RMII PHY
- set drive strength in YT8531s RGMII
- TI:
- dp83822: add optional external PHY clock
- Bluetooth:
- hci_sync: add support for HCI_LE_Set_Host_Feature [v2]
- SMP: use AES-CMAC library API
- Intel:
- support Product level reset
- support smart trigger dump
- Mediatek:
- add event filter to filter specific event
- Realtek:
- fix RTL8761B/BU broken LE extended scan
- WiFi:
- Broadcom (b43):
- new support for a 11n device
- MediaTek (mt76):
- support mt7927
- mt792x: broken usb transport detection
- mt7921: regulatory improvements
- Qualcomm (ath9k):
- GPIO interface improvements
- Qualcomm (ath12k):
- WDS support
- replace dynamic memory allocation in WMI Rx path
- thermal throttling/cooling device support
- 6 GHz incumbent interference detection
- channel 177 in 5 GHz
- Realtek (rt89):
- RTL8922AU support
- USB 3 mode switch for performance
- better monitor radiotap support
- RTL8922DE preparations"
* tag 'net-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1778 commits)
ipv4: fib_rule: Move fib4_rules_exit() to ->exit().
net: serialize netif_running() check in enqueue_to_backlog()
net: skmsg: preserve sg.copy across SG transforms
appletalk: move the protocol out of tree
appletalk: stop storing per-interface state in struct net_device
selftests/bpf: test that TLS crypto is rejected on a sockmap socket
selftests/bpf: drop the unused kTLS program from test_sockmap
selftests/bpf: remove sockmap + ktls tests
tls: remove dead sockmap (psock) handling from the SW path
tls: reject the combination of TLS and sockmap
atm: remove orphaned uAPI for deleted drivers, protocols and SVCs
atm: remove unused ATM PHY operations
atm: remove the unused pre_send and send_bh device operations
atm: remove the unused change_qos device operation
atm: remove SVC socket support and the signaling daemon interface
atm: remove the local ATM (NSAP) address registry
atm: remove dead SONET PHY ioctls
atm: remove the unused send_oam / push_oam callbacks
atm: remove AAL3/4 transport support
net: dsa: sja1105: fix lastused timestamp in flower stats
...
|
|
Merge in late fixes in preparation for the net-next PR.
Conflicts:
net/tls/tls_sw.c
406e8a651a7b ("net: skmsg: preserve sg.copy across SG transforms")
79511603a65b ("tls: remove dead sockmap (psock) handling from the SW path")
drivers/net/ethernet/microsoft/mana/mana_en.c
f8fd56977eeea ("net: mana: guard TX wq object destroy with INVALID_MANA_HANDLE check")
d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size")
https://lore.kernel.org/ajAPXu-C_PuTgV-a@sirena.org.uk
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
flow_stats_update() takes an absolute timestamp for lastused, not delta.
Fix that.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260614141320.1133321-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The number of times that link has gone down at the port level is tracked
by the firmware and sent to the driver via regular DMA writes to an
instance of struct ionic_port_status in the driver's memory.
This statistic was never reported in favor of a driver-derived stat, but
doing it in the driver was never necessary since firmware had been
reporting it the whole time. Since it would be more accurate and true to
the description of the statistic to get this count at the PHY level,
replace the driver-calculated statistic with one derived from the
firmware one and remove the driver-calculated one entirely.
The stat reported by the ethtool .get_link_ext_stats() handler is
normalized to 0 on driver load and any device resets that require the
driver to rebuild state while also handling overflows.
Signed-off-by: Eric Joyner <eric.joyner@amd.com>
Link: https://patch.msgid.link/20260614205303.48088-5-eric.joyner@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This stat contains the number of total bits that the PHY has received;
it's useful for BER calculations. Add it to the ethtool stats output.
However, since this is one of the new "extra port stats", it's reported
in a different manner than the existing port stats and only
conditionally added to the ethtool stats output list: both the
DEV_CAP_EXTRA_STATS capability must be supported by the firmware, and
the firmware must set the value of the statistic to something other than
IONIC_STAT_INVALID.
To help support this scheme, the extra port stats region is initialized to
0xff's/IONIC_STAT_INVALID by the driver, to ensure the statistics that
the driver knows about but the firmware does not are still invalid
to the driver.
Signed-off-by: Eric Joyner <eric.joyner@amd.com>
Link: https://patch.msgid.link/20260614205303.48088-4-eric.joyner@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a new structure to report additional statistics from the firmware to
struct ionic_port_info. This new struct currently only contains FEC
related statistics, but any new port-level statistics collected by the
firmware would go into it.
The new structure is located in the same area as the unused
ionic_port_pb_stats structure, so this patch also removes that and its
supporting enumerations since they was never used in this driver.
Finally, to indicate firmware support for the new structure, introduce a
new device capability that the driver can use to see if the attached
device supports reporting these extra stats.
Signed-off-by: Eric Joyner <eric.joyner@amd.com>
Link: https://patch.msgid.link/20260614205303.48088-3-eric.joyner@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current check will fail if SR-IOV is not initialized for the
physical function; this is because is_physfn is 0 if sriov_init() isn't
run or fails. Change the check that prevents getting the link down count
to use is_virtfn instead so that VFs don't get this functionality, which
was the original intent.
Fixes: 132b4ebfa090 ("ionic: add support for ethtool extended stat link_down_count")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Eric Joyner <eric.joyner@amd.com>
Link: https://patch.msgid.link/20260614205303.48088-2-eric.joyner@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MxL862xx has two XPCS/SerDes interfaces (XPCS0 for ports 9-12,
XPCS1 for ports 13-16). Each can operate in various single-lane modes
(SGMII, 1000Base-X, 2500Base-X, 10GBase-R, 10GBase-KR, USXGMII) or as
QSGMII or 10G_QXGMII providing four sub-ports per interface.
Implement phylink PCS operations using the firmware's XPCS API:
- pcs_enable/pcs_disable: refcount the sub-ports sharing an XPCS
and power it down once the last sub-port is released.
- pcs_config: configure negotiation mode and CL37/SGMII advertising.
- pcs_get_state: read link state and the link-partner ability word
from firmware and decode using phylink's standard CL37, SGMII, and
USXGMII decoders.
- pcs_an_restart: restart CL37 or CL73 auto-negotiation.
- pcs_link_up: force speed/duplex for SGMII.
- pcs_inband_caps: report per-mode in-band status capabilities.
Register a PCS instance for each SerDes interface and
QSGMII/10G_QXGMII sub-ports during setup. Advertise the supported
interface modes in phylink_get_caps based on port number.
Firmware older than 1.0.84 lacks the XPCS API and instead configures
the SerDes itself, using defaults stored in flash. mac_select_pcs()
returns NULL in that case while the single-lane interface modes stay
advertised, so a CPU port keeps working in the firmware-configured
mode.
Lacking support for expressing PHY-side role modes in Linux only the
MAC-side of SGMII, QSGMII, USXGMII and 10G_QXGMII are implemented for
now.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Link: https://patch.msgid.link/736e4df02e4cb8c530c1670cbe7efac20b5d696d.1781319534.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the MXL862XX_API_WRITE, MXL862XX_API_READ and
MXL862XX_API_READ_QUIET convenience macros from mxl862xx.c to
mxl862xx-host.h next to the mxl862xx_api_wrap() prototype they wrap.
This makes them available to other compilation units that include
mxl862xx-host.h, which is needed once the SerDes PCS code in
mxl862xx-phylink.c also calls firmware commands.
No functional change.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/914f57931e79cc3932a9f32813465c08d29cf4bf.1781319534.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the phylink MAC operations and get_caps callback from mxl862xx.c
into a dedicated mxl862xx-phylink.c file. This prepares for the SerDes
PCS implementation which adds substantial phylink/PCS code -- keeping
it in a separate file avoids function-position churn in the main
driver file.
No functional change.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/fb9336de94bef47a0834287cbca87954e5e4c795.1781319534.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Query the firmware version at init (already done in wait_ready),
cache it in priv->fw_version, and provide MXL862XX_FW_VER_MIN()
for version-gated code paths throughout the driver.
MXL862XX_FW_VER() packs major/minor/revision into a u32 with
bitwise shifts so that versions compare with natural ordering,
independent of host endianness.
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/91a26a8ffeaa2ce1729f98347e93e779973976bb.1781319534.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
QDMA users counter is always accessed holding RTNL lock so we do not
require atomic_t for it.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
OA TC6 MAC-PHY appends FCS to the incoming frame. It must be
removed from the frame before being passed to the stack.
With FCS in the frame, many applications, like ping or any
application that uses IP layer may work as they may
carry the packet size information in the protocol.
Application like ptp4l, particularly if it uses layer 2
for its communication, it will fail with "bad message" due to
the extra 4 bytes added by the presence of FCS.
Fixes: d70a0d8f2f2d ("net: ethernet: oa_tc6: implement receive path to receive rx ethernet frames")
Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
Link: https://patch.msgid.link/20260611-level-trigger-v5-3-4533a9e85ce2@onsemi.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As "dev" pointer in oa_tc6 structure is never initialized,
mbiobus->parent was initialized with NULL. This change
fixes it by initializing it with device pointer of spi.
Fixes: 8f9bf857e43b ("net: ethernet: oa_tc6: implement internal PHY initialization")
Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
Link: https://patch.msgid.link/20260611-level-trigger-v5-2-4533a9e85ce2@onsemi.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
According OPEN Alliance 10BASET1x MAC-PHY Serial Interface
specification, interrupt is active low, level triggered.
Code used edge triggered interrupt which has the risk of losing an
interrupt on instances like when interrupt is disabled. Level
triggered interrupt won't be deasserted unless handler runs and
clear the interrupting conditions.
Interrupt handler mechanism is changed to threaded irq from
interrupt handler and kernel thread waiting on work queue.
Threaded irq mechanism is best suited for level triggered interrupt
as it disables the interrupt until handler is run in thread level,
while giving us an ability to have interrupt context handler to
signal the threaded irq handler.
Introduced a logic to disable the device interrupt on error. Error
could be due in data chunk's header and footer or SPI interface itself.
This will avoid having repeated interrupts, in case the driver couldn't
recover from the error condition with the available recovery mechanism.
Fixes: 2c6ce5354453 ("net: ethernet: oa_tc6: implement mac-phy interrupt")
Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
Link: https://patch.msgid.link/20260611-level-trigger-v5-1-4533a9e85ce2@onsemi.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
emac_xsk_xmit_zc() has the same issue as the fixed emac_xmit_xdp_frame():
it always sets the CPPI5 descriptor destination tag to emac->port_id,
which directs the PRU firmware to transmit on only one slave port in HSR
mode, breaking redundancy.
Apply the same fix: in HSR offload mode when NETIF_F_HW_HSR_DUP is set,
use PRUETH_UNDIRECTED_PKT_DST_TAG (port 0) so the PRU duplicates frames
to both ports. Also set PRUETH_UNDIRECTED_PKT_TAG_INS when
NETIF_F_HW_HSR_TAG_INS is set so the PRU re-inserts the HSR sequence tag
that was stripped by the PRU on RX before the XDP program saw the frame.
This ensures XSK XDP_TX frames in HSR mode are treated identically to
skb TX via hsr0.
Fixes: 8756ef2eb078 ("net: ti: icssg-prueth: Add AF_XDP zero copy for TX")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20260611185744.2498070-4-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
emac_xmit_xdp_frame() always sets the CPPI5 descriptor destination
tag to emac->port_id, which directs the PRU firmware to transmit
the frame on that specific slave port only. In HSR offload mode
this bypasses the firmware's HSR duplication logic: the frame goes
out on one ring leg and never appears on the other, breaking HSR
redundancy for XDP_TX paths.
icssg_ndo_start_xmit() already handles this correctly: when HSR
offload mode is active and NETIF_F_HW_HSR_DUP is set it substitutes
PRUETH_UNDIRECTED_PKT_DST_TAG (port 0) so the PRU duplicates the
frame to both slave ports. It also sets PRUETH_UNDIRECTED_PKT_TAG_INS
in epib[1] when NETIF_F_HW_HSR_TAG_INS is set so the PRU inserts the
HSR sequence tag, which XDP_TX frames lack (the tag is stripped by
the PRU on RX before the frame reaches the XDP program).
Apply the same logic in emac_xmit_xdp_frame() so XDP_TX frames in
HSR mode are treated identically to skb TX via hsr0.
Fixes: 62aa3246f462 ("net: ti: icssg-prueth: Add XDP support")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20260611185744.2498070-3-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
emac_rx_packet_zc() calls prueth_rx_alloc_zc() with count (frames
received in the current NAPI poll) as the allocation budget. Two
problems arise from this:
1. When the CPPI5 descriptor pool is exhausted (avail_desc == 0,
FDQ already holds the maximum number of descriptors), count > 0
still triggers allocation attempts that all fail, spamming the
kernel log with "rx push: failed to allocate descriptor" at
high packet rates.
2. The XSK wakeup condition "ret < count" is wrong when avail_desc
is zero: ret == 0 and count can be up to 64, so the condition is
always true. This causes ~200 spurious ndo_xsk_wakeup() calls
per second even when the FDQ is already full, wasting CPU cycles
in repeated NAPI invocations that process zero frames.
Fix both by introducing alloc_budget = min(budget, avail_desc):
- When avail_desc == 0 no allocation is attempted, avoiding pool
exhaustion errors. The wakeup condition "ret < alloc_budget"
evaluates to 0 < 0 == false, correctly clearing the wakeup flag
so the hardware IRQ re-arms NAPI without spurious kicks.
- In steady state avail_desc == count <= budget, so alloc_budget
== count and behaviour is unchanged.
- After a dry-ring stall (count == 0, avail_desc > 0), alloc_budget
> 0 causes new descriptors to be posted to the FDQ so the hardware
can resume receiving immediately.
Fixes: 7a64bb388df3 ("net: ti: icssg-prueth: Add AF_XDP zero copy for RX")
Signed-off-by: Meghana Malladi <m-malladi@ti.com>
Link: https://patch.msgid.link/20260611185744.2498070-2-m-malladi@ti.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both airoha_eth.c and airoha_npu.c declare SPDX-License-Identifier:
GPL-2.0-only but use MODULE_LICENSE("GPL"), which the kernel module
loader interprets as GPL-2.0+ (any GPL version). This mismatch causes
license compliance tools (FOSSology, ScanCode, etc.) to misidentify
the effective license as more permissive than intended.
Replace MODULE_LICENSE("GPL") with MODULE_LICENSE("GPL v2") to
align with the GPL-2.0-only SPDX identifier. Per include/linux/module.h,
"GPL v2" maps to GPL-2.0-only, matching the source files' declared
license.
Signed-off-by: Wayen <win847@gmail.com>
Link: https://patch.msgid.link/6a2ded59.63d39acb.391892.7632@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A comment in
drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.h
incorrectly refers to CONFIG_MLX5_HWS_STEERING instead of
CONFIG_MLX5_HW_STEERING. Correct it.
Discovered while searching for CONFIG_* symbols referenced in code but
not defined in any Kconfig file.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20260613225904.140791-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix several typos found during code review:
- Kconfig: "Aiorha" -> "Airoha" in NET_AIROHA_FLOW_STATS help text
- Comment: "CMD1" -> "CDM1" (Central DMA, not Command)
- Comments: "GMD1/2/3/4" -> "GDM1/2/3/4" (Gigabit DMA, not GMD)
These are pure comment and documentation fixes with no functional impact.
Signed-off-by: Wayen.Yan <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/6a2ca74a.c5b1db4e.21a698.01e7@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In airoha_fe_pse_ports_init(), the inner condition for PPE1 queue
reservation is identical to the for-loop bound, making it always true
and the else branch dead code:
for (q = 0; q < pse_port_num_queues[FE_PSE_PORT_PPE1]; q++) {
if (q < pse_port_num_queues[FE_PSE_PORT_PPE1]) /* always true */
set RSV_PAGES;
else
set 0; /* unreachable */
}
The intended behavior is to reserve pages only for the first half of
the queues, matching the PPE2 implementation on line 334 which
correctly uses the /2 divisor. Fix the PPE1 condition accordingly.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Wayen.Yan <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/6a2ca3de.ad59c0a6.147df9.2ac1@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
airoha_ppe_get_wdma_info() returns -1 when the last path in the
forwarding path stack is not of type DEV_PATH_MTK_WDMA. This is not
a standard kernel error code. Replace it with -EINVAL since the
input path type is invalid from the caller's perspective.
Signed-off-by: Wayen.Yan <win847@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/6a2ca3d9.ad59c0a6.147df9.2a62@mx.google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
aq_ptp.c / aq_ptp.h:
- Add aq_ptp_state enum (AQ_PTP_FIRST_INIT, AQ_PTP_LINK_UP,
AQ_PTP_NO_LINK) to distinguish first init from link-change events;
on AQC113 only reset the TSG clock on first init to avoid disrupting
ongoing synchronization.
- Add aq_ptp_dpath_enable() for comprehensive L3/L4 PTP filter
setup/teardown, replacing the previous single-filter approach with
an array of 4 slots for IPv4 and IPv6 PTP multicast addresses
(224.0.1.129, 224.0.0.107, ff0e::181, ff02::6b).
- Add aq_ptp_parse_rx_filters() to map hwtstamp_rx_filters to L2/L4
enable flags and call aq_ptp_dpath_enable().
- Re-apply RX filters on link change (hardware state lost after reset).
- Extend PTP ring alloc/init/start/stop to handle AQC113 PTP ring ops.
- Add per-instance PTP offset table for AQC113 with empirically measured
values at 100M/1G/2.5G/5G/10G link speeds.
- Export aq_ptp_dpath_enable() and updated ring helpers in aq_ptp.h.
aq_hw.h:
- Include hw_atl2/hw_atl2.h for AQC113 PTP type definitions.
aq_nic.c:
- Account for PTP IRQ vector (AQ_HW_PTP_IRQS) in vector count math.
- Call hw_atl2 PTP re-enable hook after hardware reset in
aq_nic_update_link_status().
aq_pci_func.c:
- Pass PTP IRQ index to aq_ptp_irq_alloc() in probe path.
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-13-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
aq_ring.h / aq_ring.c:
- Add ptp_ts_deadline field to aq_ring_s to track TX timestamp timeout.
- In aq_ring_tx_clean(): when hw_ring_tx_ptp_get_ts() returns 0 (HW not
yet written back the timestamp), clear buff->is_mapped and buff->pa
before breaking to prevent double dma_unmap on retry. When
ptp_ts_deadline expires, dequeue and drop the head of skb_ring to keep
it in lockstep with buff_ring, then clear request_ts and free the skb
via dev_kfree_skb_any() to unblock the ring.
aq_main.c:
- Add IPv6 PTP packet detection in aq_ndev_start_xmit() using
ipv6_hdr()->nexthdr for ETH_P_IPV6 frames, steering them through
aq_ptp_xmit() alongside the existing IPv4 path.
- Use PTP_EV_PORT/PTP_GEN_PORT constants instead of magic numbers 319/320.
- Remove duplicate aq_reapply_rxnfc_all_rules() and
aq_filters_vlans_update()
calls from aq_ndev_open() - now covered by aq_nic_start(), which also
ensures filters are restored correctly after PM resume.
aq_nic.c:
- Move aq_reapply_rxnfc_all_rules() and aq_filters_vlans_update() into
aq_nic_start() after hardware init, replacing the duplicate calls that
were removed from aq_ndev_open().
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-12-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the hardware-layer PTP implementation for AQC113 (Antigua):
- hw_atl2.h/hw_atl2_utils.h/hw_atl2_internal.h: add PTP offset
constants, RX timestamp size (HW_ATL2_RX_TS_SIZE=8), and reduced
HW_ATL2_RXBUF_MAX=172 (AQC113 on-chip RX packet buffer hardware
limit for data TCs).
- hw_atl2.c: implement hw_atl2_enable_ptp() to reset and enable TSG
clocks and set PTP TC scheduling priority after hardware reset.
- hw_atl2.c: implement hw_atl2_adj_sys_clock(), hw_atl2_adj_clock_freq(),
and aq_get_ptp_ts() for TSG clock read/adjust/increment operations.
- hw_atl2.c: implement hw_atl2_gpio_pulse() for PPS output generation
via TSG pulse generator.
- hw_atl2.c: implement hw_atl2_hw_tx_ptp_ring_init() and
hw_atl2_hw_rx_ptp_ring_init() for PTP ring setup.
- hw_atl2.c: implement hw_atl2_hw_ring_tx_ptp_get_ts() to read TX
timestamp from descriptor writeback, and hw_atl2_hw_rx_extract_ts()
to extract RX timestamp from the 8-byte packet trailer.
- hw_atl2.c: add hw_atl2_hw_get_clk_sel() helper.
- Wire all new ops into hw_atl2_ops.
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-11-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend the aq_hw_ops interface with new function pointers required for
PTP support on AQC113:
- enable_ptp: enable/disable PTP counter with clock selection
- hw_ring_tx_ptp_get_ts: read TX timestamp from descriptor writeback
- hw_tx_ptp_ring_init/hw_rx_ptp_ring_init: per-ring PTP initialization
- hw_get_clk_sel: query active TSG clock selection
Update existing hw_ops signatures to support AQC113 dual-clock
architecture:
- hw_gpio_pulse: add clk_sel and hightime parameters
- hw_extts_gpio_enable: add channel parameter
Add PTP-related hardware defines:
- AQ_HW_TXD_CTL_TS_EN/TS_TSG0 for TX descriptor timestamp control
- AQ2_HW_PTP_COUNTER_HZ for AQC113 TSG clock frequency
- AQ_HW_PTP_IRQS for PTP interrupt vector accounting
- PTP enable flags (L2/L4) and TSG clock selection constants
Add request_ts and clk_sel bitfields to aq_ring_buff_s for per-packet
TX timestamp request tracking.
Update hw_atl_b0.c (AQC107) implementations:
- Adapt gpio_pulse and extts_gpio_enable to new signatures
- Add TX descriptor timestamp bits for AQC113 when ANTIGUA chip
feature is detected
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-10-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add PTP traffic class (TC) buffer reservation and TX path
improvements for AQC113:
- Reserve dedicated TX and RX buffer space for PTP TC when PTP is
enabled, reducing user TC buffers accordingly (TX: 8KB, RX: 16KB).
- Configure PTP TC with no flow control and highest priority
scheduling to ensure timely PTP packet transmission.
TX path improvements:
- Increase TX data and descriptor read-request limits when firmware
has already enabled extended PCIe tag mode.
Also simplify RSS queue calculation in hw_atl2_hw_rss_set() by
extracting to a local variable and use unsigned types for loop
variables to match their usage.
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-9-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement complete RX filter management for AQC113 hardware:
- Add tag-based ethertype filter policy (hw_atl2_filter_tag_get/put)
that allocates and releases ART tags for L2 ethertype filters.
- Add L3/L4 filter sharing via serialized usage counters in
hw_atl2_l3_filter/hw_atl2_l4_filter, managed through
hw_atl2_rxf_l3_get/put and hw_atl2_rxf_l4_get/put.
- Implement L3 (IPv4/IPv6 source/destination address and protocol)
filter find, get (program HW and increment refcount), and put
(decrement refcount and clear HW when last user releases).
- Implement L4 (TCP/UDP/SCTP source/destination port) filter management
with the same find/get/put pattern.
- Add combined L3L4 filter configuration (hw_atl2_new_fl3l4_configure)
that translates legacy aq_rx_filter_l3l4 commands into AQC113 separate
L3+L4 filter programming with Action Resolver Table (ART) entries.
- Add L2 ethertype filter set/clear (hw_atl2_hw_fl2_set/clear) with
tag-based ART integration.
- Wire .hw_filter_l2_set, .hw_filter_l2_clear, .hw_filter_l3l4_set
into hw_atl2_ops.
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-8-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix initialization issues in hw_atl2 to correctly support AQC113:
- hw_atl2_hw_reset: replace unconditional priv memset with selective
field clears so that l3l4_filters[].l3_index and l4_index can be
initialized to -1 (not allocated) rather than 0; 0 is a valid filter
index and would incorrectly appear as an occupied slot after a reset.
- hw_atl2_hw_init_new_rx_filters: use firmware-reported ART section
base and count (clamped to 16) instead of hardcoded 0xFFFF mask;
enable simultaneous IPv4/IPv6 L3 filter mode (rpf_l3_v6_v4_select);
tag the UC MAC slot using firmware-supplied l2_filters_base_index
instead of hardcoded HW_ATL2_MAC_UC.
- hw_atl2_hw_init_rx_path: enable only the firmware-assigned MAC slot
(priv->l2_filters_base_index) instead of always slot 0.
- Add hw_atl2_hw_mac_addr_set() that programs the MAC address into
the firmware-assigned L2 filter slot. Wire into hw_atl2_ops
replacing the A1 hw_atl_b0_hw_mac_addr_set; call it from hw_init.
- Wire .hw_get_regs into hw_atl2_ops.
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-7-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
register dump
Add filter infrastructure for AQC113 hardware:
- Define L3 (IPv4/IPv6), L4 (TCP/UDP/SCTP), and combined L3L4 filter
structures with serialized usage counter for filter sharing.
- Define tag policy structure for ethertype filter management.
- Add RPF L3/L4 command bit definitions for filter programming.
- Add filter count constants for L3L4, L3V4, L4, VLAN, and ethertype.
- Extend hw_atl2_priv with filter arrays, base indices, and counts
discovered from firmware.
Query filter capabilities from firmware shared memory at init time
to discover available L2/L3/L4/VLAN/ethertype filter resources and
ART (Action Resolver Table) configuration.
Add hardware register dump utility for AQC113 debug support.
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-6-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add low-level hardware register definitions and accessor functions
for AQC113 (Antigua) chip features:
- L3/L4 filter command, tag, and address registers for IPv4/IPv6
- Ethertype filter tag registers
- TSG (Time Stamp Generator) clock control, modification, and
GPIO event generation/input timestamp registers
- TX descriptor timestamp writeback, timestamp enable, and AVB
enable registers
- TX data/descriptor read request limit registers
- TPB highest priority TC registers
- PCIe extended tag enable register
- RX descriptor timestamp request register
- Action resolver section enable getter
- GPIO special mode and TSG external GPIO TS input select
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-5-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor aq_set_data_fl3l4() to take an ethtool_rx_flow_spec pointer and
an explicit HW register location instead of driver-internal structures
(aq_nic_s, aq_rx_filter). This makes the function reusable for PTP
filter setup which constructs flow specs independently.
Key changes:
- Add aq_is_ipv6_flow_type() helper to derive IPv6 status from the
flow_type field, replacing the dependency on rx_fltrs->fl3l4.is_ipv6
shared state.
- Change aq_set_data_fl3l4() signature to accept (fsp, data, location,
add) and export it via aq_filters.h.
- Update aq_add_del_fl3l4() to compute the HW register location and
pass it explicitly.
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-4-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move active_ipv4/active_ipv6 bitmap updates from aq_set_data_fl3l4()
into aq_add_del_fl3l4() after the hardware write succeeds. The bitmaps
track which filter slots are actively programmed in hardware and must
only be updated once the HW write is confirmed.
The bitmap updates in aq_nic_reserve_filter() and aq_nic_release_filter()
are intentionally retained: they guard the aq_check_approve_fl3l4()
IPv4/IPv6 mixing validation for callers such as the AQC113 PTP path that
program filters directly via hw_atl2_new_fl3l4_configure() without going
through aq_add_del_fl3l4().
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-3-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Correct three issues in aq_set_data_fl3l4() required for the AQC113
PTP filter path introduced later in this series:
1. Mask FLOW_EXT from flow_type before the protocol switch statement.
Flow types with FLOW_EXT set (e.g. TCP_V4_FLOW | FLOW_EXT) fall
through to the default case and skip protocol comparison flags.
2. Extend the L3 address comparison check to cover all four IPv6
words. The original code only checked ip_src[0]/ip_dst[0] and
required !is_ipv6, so CMP_SRC_ADDR_L3/CMP_DEST_ADDR_L3 were never
set for IPv6 filters.
3. Use explicit flow type checks for port extraction instead of
negating IP_USER_FLOW/IPV6_USER_FLOW. The old check did not mask
FLOW_EXT, so IP_USER_FLOW | FLOW_EXT would incorrectly attempt
port extraction. Use the actual flow type to pick the correct
union member directly.
Signed-off-by: Sukhdeep Singh <sukhdeeps@marvell.com>
Link: https://patch.msgid.link/20260610115448.272-2-sukhdeeps@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The NETC switch does not age out dynamic FDB entries automatically.
Without software management, stale entries persist after topology
changes and cause incorrect forwarding.
Add a delayed work that periodically removes entries that have not been
refreshed within the specified cycles. The effective ageing time is:
ageing_time = fdbt_ageing_delay * 100
Default values are 3s interval and 100 cycles (300s total), matching
the IEEE 802.1Q default ageing time. The work starts when the first
port joins a bridge (tracked via br_cnt) and is cancelled when the
last port leaves. All FDB operations are serialized under fdbt_lock.
Implement .set_ageing_time() to allow the bridge layer to reconfigure
ageing parameters on demand.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-10-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Wire up the port_bridge_join, port_bridge_leave and port_vlan_filtering
DSA callbacks to support both VLAN-unaware and VLAN-aware bridge modes.
For VLAN-unaware bridges, each bridge instance is assigned a dedicated
internal PVID via NETC_VLAN_UNAWARE_PVID(bridge.num), counting down
from VID 4095. A VFT entry is created for this PVID with hardware MAC
learning and flood-on-miss forwarding enabled. The CPU port is included
as a VFT member so frames can reach the host. The reserved VID range is
blocked in port_vlan_add to prevent user-space conflicts.
Only one VLAN-aware bridge is supported at a time; this constraint is
enforced in port_bridge_join and port_vlan_filtering. The per-port PVID
is tracked in software and written to the BPDVR register whenever VLAN
filtering is active.
When a port leaves the bridge, its dynamic FDB entries are flushed right
away in port_bridge_leave(), without waiting for the ageing cycle. When
a link down event occurs on a port, netc_mac_link_down() will also clear
the port's dynamic FDB entries via netc_port_remove_dynamic_entries().
Non-bridge ports have no dynamic FDB entries, so this call is always
safe. Additionally, .port_fast_age() callback is added to flush the
dynamic FDB entries associated to a port.
Host flood rules are removed from the ingress port filter table when a
port joins a bridge to avoid bypassing FDB lookup and MAC learning.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-9-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Implement the DSA .port_vlan_add and .port_vlan_del operations to enable
VLAN-aware bridge offloading on the NETC switch.
VLAN membership is maintained in the VLAN Filter Table (VFT). Adding the
first port to a VLAN creates a new VFT entry with hardware MAC learning
and flood-on-miss forwarding; subsequent ports update the existing
entry's membership bitmap. Removing the last port deletes the entry.
Egress tagging is handled through the Egress Treatment Table (ETT). Each
VLAN is allocated a group of ETT entries, one per available port. Ports
are assigned a sequential ett_offset during initialisation, used to
address each port's entry within the group. Untagged ports configure the
ETT to strip the outer VLAN tag; tagged ports pass frames through
unmodified. Each ETT group is optionally paired with an Egress Counter
Table (ECT) group for per-port frame counting, allocated on a best-effort
basis. When the egress rule of an ETT entry changes, the counter of the
corresponding ECT entry will be recounted to track the number of frames
that match the new egress rule.
A software shadow list serialised by vft_lock tracks active VLAN state
across both port membership and egress tagging. VID 0 is used for single
port mode and is ignored by both callbacks.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-8-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
NTMP index tables require software to allocate and manage entry IDs.
Add two bitmap helper functions to facilitate this management:
ntmp_lookup_free_eid(): finds the first zero bit in the given bitmap,
sets it to mark the entry as in-use, and returns the corresponding entry
ID. Returns NTMP_NULL_ENTRY_ID if no free entry is available.
ntmp_clear_eid_bitmap(): clears the bit associated with the given entry
ID in the bitmap to mark the entry as free. It is a no-op if the entry
ID is NTMP_NULL_ENTRY_ID.
Both functions are exported for use by other modules, such as the NETC
switch driver which needs to manage group index bitmaps for the Egress
Treatment Table (ETT) and Egress Count Table (ECT).
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-7-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The Egress Treatment Table (ETT) and Egress Count Table (ECT) are both
index tables whose entry IDs are allocated by software. Every num_ports
entries form a group, where each entry in the group corresponds to one
port. To facilitate group allocation and management, initialize the group
index bitmaps for both tables based on hardware capabilities reported by
ETTCAPR and ECTCAPR registers.
The bitmap size per table is calculated as the total number of hardware
entries divided by the number of available ports, which gives the number
of groups available for software allocation. A set bit in the bitmap
represents a group index that has been allocated.
These bitmaps will be used by subsequent patches that add VLAN support.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-6-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The egress count table is a static bounded index table, egress related
statistics are maintained in this table. The table is implemented as a
linear array of entries accessed using an index (0, 1, 2, ..., n) that
uniquely identifies an entry within the array. Egress Counter Entry ID
(EC_EID) is used as an index to an entry in this table. The EC_EID is
specified in the egress treatment table.
Egress count table entries are always present and enabled. The table
only supports access via entry ID, which is assigned by the software.
And it supports Update, Query and Query followed by Update operations.
Currently, only Update operation is supported.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-5-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each entry in the egress treatment table contains the egress packet
processing actions to be applied to a grouping or scope of packets
exiting on a particular egress port of the switch. A scope of packets,
for example, could be the packets exiting a particular VLAN, matching
a particular 802.1Q bridge forwarding entry or belonging to a stream
identified at ingress. The egress treatment table is implemented as a
linear array of entries accessed using an index (0,1, 2, ..., n) that
uniquely identifies an entry within the array.
The egress treatment table only supports access vid entry ID, which is
assigned by the software. It supports Add, Update, Delete and Query
operations. Note that only Query operation is not supported yet.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-4-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add two interfaces to manage entries in the VLAN filter table:
ntmp_vft_update_entry(): Update the configuration element data of the
specified VLAN filter entry based on the given VLAN ID. It uses the
exact key access method to locate the entry.
ntmp_vft_delete_entry(): Delete the VLAN filter entry corresponding to
the specified VLAN ID. It also uses the exact key access method to
identify the target entry.
In addition, introduce struct vft_req_qd to describe the request data
buffer format for Query and Delete actions of the VLAN filter table,
which contains a common request data header and a VLAN access key.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-3-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add three interfaces to manage dynamic entries in the FDB table:
ntmp_fdbt_update_activity_element(): Update the activity element of all
dynamic FDB entries. For each entry, if its activity flag is not set,
which means no packet has matched this entry since the last update, the
activity counter is incremented. Otherwise, both the activity flag and
activity counter are reset. The activity counter is used to track how
long an FDB entry has been inactive, which is useful for implementing
an ageing mechanism.
ntmp_fdbt_delete_ageing_entries(): Delete all dynamic FDB entries whose
activity flag is not set and whose activity counter is greater than or
equal to the specified threshold. This is used to remove stale entries
that have been inactive for too long.
ntmp_fdbt_delete_port_dynamic_entries(): Delete all dynamic FDB entries
associated with the specified switch port. This is typically called when
a port goes down or is removed from a bridge.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260611021458.2629145-2-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 7662abf4db94 ("net: phy: sfp: Add support for SMBus module access")
added SMBus access for SFP modules, but limited it to single-byte
transfers. As a side effect, hwmon is disabled (16-bit reads cannot be
guaranteed atomic) and a warning is printed.
Many SMBus-only I2C controllers in the wild support more than just
byte access, and SFP cages are often wired to such controllers
rather than to a full-featured I2C controller -- e.g. the SMBus
controllers in the Realtek longan and mango SoCs, which advertise
word access and I2C block reads. Today, they cannot drive an SFP at
all without falling back to the byte-only path.
Extend sfp_smbus_read()/sfp_smbus_write() so that, in addition to
the existing byte access, they also use SMBus word access and SMBus
I2C block access whenever the adapter advertises them. Both
directions are handled in a single read and a single write helper
that pick the largest supported transfer per chunk and fall back as
needed.
I2C-block is preferred unconditionally when available: the protocol
carries any length 1..32, so it can serve every chunk -- including
the 1- and 2-byte tails -- without help from word or byte access.
Note that this requires I2C_FUNC_SMBUS_I2C_BLOCK, which reads a
caller-specified number of bytes. This deviates from the official
SMBus Block Read (length is supplied by the slave) but is widely
supported by Linux I2C controllers/drivers.
Capability matrix this implementation supports:
- BYTE only: works (unchanged behaviour); 1-byte
xfers, hwmon disabled.
- BYTE + WORD: word for >=2-byte chunks, byte for
trailing odd byte.
- I2C_BLOCK present (with or
without BYTE/WORD): block as the universal transport for
every chunk.
- WORD only (no BYTE/BLOCK): accepted with WARN_ONCE. Even-length
transfers work; odd-length transfers
(e.g. the 3-byte cotsworks fixup
write) hit the BYTE branch which the
adapter does not implement, so the
xfer returns an error and the
operation is aborted. No mainline
I2C driver was found to advertise
WORD without BYTE; the warning lets
us learn about it if it ever shows
up.
Adapters with asymmetric R/W capabilities (e.g. only READ_I2C_BLOCK
but not WRITE_I2C_BLOCK) remain functionally correct -- the
per-iteration fallback uses the direction-specific bits -- but the
shared i2c_max_block_size is sized by the all-bits-set check, so a
transfer in the better-supported direction is not upgraded. None of
the mainline I2C bus drivers surveyed during review advertise such
asymmetry; promoting i2c_max_block_size to per-direction sizes can
be revisited if needed.
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260614133418.2068201-3-jelonek.jonas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The SFP driver assumes all I2C adapters support reading and writing the
pre-defined block size SFP_EEPROM_BLOCK_SIZE of 16 bytes. This constant
was probably chosen based on good guesses and known limitations of a
range of I2C adapters and SFP modules.
However, I2C adapters may even support less and usually need to specify
this via I2C quirks. Theoretically, such an adapter may provide full
functionality but only support a read and write length of e.g. 8 bytes.
Currently, the SFP driver doesn't account for that.
Add handling for I2C quirks in SFP I2C configuration taking the fields
max_read_len and max_write_len in struct i2c_adapter_quirks into account
to further limit the maximum block size if needed.
Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260614133418.2068201-2-jelonek.jonas@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
otx2vf_register_mbox_intr() currently installs the VF mailbox IRQ
handler before clearing stale mailbox interrupt state. The code then says
that local interrupt bits should be cleared first to avoid spurious
interrupts, but that clear still happens only after request_irq() has
already made the handler reachable.
A running system can reach this during VF mailbox interrupt registration
while stale or latched RVU_VF_INT state is still present. If delivery
happens in the request_irq()-to-clear window,
otx2vf_vfaf_mbox_intr_handler() can run before local quiesce and touch
the same vf->mbox and vf->mbox_wq carrier that probe and teardown later
reuse or destroy.
Move the stale mailbox interrupt clear ahead of request_irq(), but keep
interrupt enabling after the handler is installed. This closes the
pre-clear early-IRQ window without creating a new enable-before-handler
window.
Fixes: 3184fb5ba96e ("octeontx2-vf: Virtual function driver support")
Cc: stable@vger.kernel.org
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260611160014.3202224-3-runyu.xiao@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
otx2_register_mbox_intr() currently installs the PF mailbox IRQ handler
before clearing stale mailbox interrupt state. The function itself then
comments that the local interrupt bits must be cleared first to avoid
spurious interrupts, but that clear happens only after request_irq() has
already exposed the handler to irq delivery.
A running system can reach this during PF mailbox interrupt registration
while stale or latched RVU_PF_INT state is still present. If delivery
happens in the request_irq()-to-clear window,
otx2_pfaf_mbox_intr_handler() can run before local quiesce and touch
the same pf->mbox and pf->mbox_wq carrier that probe and teardown later
reuse or destroy.
Move the stale mailbox interrupt clear ahead of request_irq(), but keep
interrupt enabling after the handler is installed. This closes the
pre-clear early-IRQ window without creating a new enable-before-handler
window.
Fixes: 5a6d7c9daef3 ("octeontx2-pf: Mailbox communication with AF")
Cc: stable@vger.kernel.org
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ratheesh Kannoth <rkannoth@marvell.com>
Link: https://patch.msgid.link/20260611160014.3202224-2-runyu.xiao@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
An SFP cage (compatible "sff,sfp") whose MOD_DEF0 signal is not wired to a
GPIO currently falls back to sff_gpio_get_state(), which unconditionally
reports the module as present. An empty cage therefore fails its probe and
is parked in SFP_MOD_ERROR forever; because SFP_F_PRESENT never deasserts
there is no REMOVE event to recover the state machine, so a module inserted
after boot is never detected, and empty cages spam -EIO at boot.
This affects boards that route none of the cage presence signal to a
software-readable input. On the NicGiga S100-0800S-M (RTL9303, 8x SFP+) the
cage I2C bus is the switch's SMBus master; TX_DISABLE is driven via a
PCA9534 I/O expander, but no MOD_ABS/MOD_DEF0 line reaches a readable GPIO
(the RTL9303 gpio0 lines read stuck-low, the single PCA9534 is fully
consumed by TX_DISABLE, and there is no RTL8231). The Horaco ZX-SW82TS-L2P
(RTL9302D, 2x SFP+) is independently affected in the same way.
For such an SFP cage, derive presence from a throttled single-byte I2C read
of the module EEPROM instead: a successful read asserts SFP_F_PRESENT,
R_PROBE_ABSENT consecutive failures clear it (to ride out a transient error
on a live module). The existing poll then emits SFP_E_INSERT / SFP_E_REMOVE
normally, giving working hot-plug and silencing the boot-time -EIO spam on
empty cages. Presence is re-probed every T_PROBE_PRESENT, so insertion is
detected within that interval and removal within
T_PROBE_PRESENT * R_PROBE_ABSENT.
A soldered-down module (compatible "sff,sff") has no presence signal and is
genuinely always present, so it continues to use sff_gpio_get_state(); the
new path is gated on the cage type advertising SFP_F_PRESENT.
Signed-off-by: Greg Patrick <gregspatrick@hotmail.com>
Tested-by: Manuel Stocker <mensi@mensi.ch>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Link: https://patch.msgid.link/20260611175341.2223184-1-gregspatrick@hotmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The devlink resource ID for ATU collides with the sentinel
DEVLINK_RESOURCE_ID_PARENT_TOP (0). As a result, ATU_bin_* are
registered as in fact registered as top-level siblings, not as children
of ATU.
Whether intentional or unintentional, clarify it by keeping the real
resource IDs starting at 1. Unfortunately ATU_bin_* are already
registered at top-level, so keep their parent to PARENT_TOP.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260611070856.889700-5-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This might not cause real problems, but the hellcreek devlink resource
ID collides with the sentinel DEVLINK_RESOURCE_ID_PARENT_TOP (0). Avoid
it by keeping the real resource IDs starting at 1.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://patch.msgid.link/20260611070856.889700-4-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This might not cause real problems, but the b53 devlink resource ID
collides with the sentinel DEVLINK_RESOURCE_ID_PARENT_TOP (0). Avoid it
by keeping the real resource IDs starting at 1.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260611070856.889700-3-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This might not cause real problems, but the dsa_loop devlink resource ID
collides with the sentinel DEVLINK_RESOURCE_ID_PARENT_TOP (0). Avoid it
by keeping the real resource IDs starting at 1.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260611070856.889700-2-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
One fewer allocation for the priv struct.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://patch.msgid.link/20260608045640.5172-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the restriction blocking SD on embedded CPU PFs (ECPF), enabling
SD functionality on BlueField DPUs. Remove the blocker preventing SD
devices from transitioning to switchdev mode.
The infrastructure added in earlier patches properly handles this case.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-16-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Allow SD devices to transition to switchdev before the SD group is
fully up. Metadata allocation requires the SD group to be ready, so
defer it from esw_offloads_enable() until SD shared-FDB activation.
Add mlx5_esw_offloads_init_deferred_metadata() which allocates per-vport
metadata and refreshes the ingress ACLs that were previously programmed
with metadata=0. The helper is idempotent and can be called multiple
times.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-15-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On an SD device, vport representors are not functional until the SD
group is combined and shared FDB is active. Skip the initial load and
the reload paths in that window; reps are loaded as part of the SD LAG
activation flow once it becomes active.
In addition, explicitly unload representors when SD LAG is destroyed.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-14-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Enable MPESW LAG creation over SD LAG members, forming a composite LAG
hierarchy. This allows bonding multiple SD groups together under a
single MPESW configuration with shared FDB.
When enabling composite MPESW, the individual SD LAG shared FDB
configurations are temporarily torn down and recreated when the
composite LAG is disabled.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-13-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SD LAG is a virtual LAG without hardware LAG support, so it cannot use
the firmware vport LAG commands. Implement a software-based vport LAG
using egress ACL bounce rules.
Add esw_set_slave_egress_rule() to create an egress ACL rule on the
slave's manager vport that bounces traffic to the master's manager
vport. This achieves the same traffic steering as hardware vport LAG.
Redirect mlx5_cmd_create_vport_lag() and mlx5_cmd_destroy_vport_lag()
to the software implementation when operating in SD LAG mode.
In addition, adjust lag_demux creation to check SD LAG mode as well.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-12-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend mlx5_lag_disable_change() to properly disable both regular LAG
and SD LAG when requested. Each LAG type uses its own devcom component
for locking.
Use mlx5_sd_get_devcom() helper to retrieve the SD devcom component,
needed for proper locking when disabling SD LAG.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-11-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The lag demux resources (flow table, flow group, and rules xarray)
are stored on the shared ldev. With Socket Direct, multiple SD groups
each create their own demux FT/FG during their master's IB device
initialization. Since they all write to the same ldev fields, the
second group's init overwrites the first group's pointers, leaking
the first group's FT/FG.
During teardown, the cleanup uses the overwritten pointers, destroying
the wrong group's resources and leaving leaked flow tables in the LAG
namespace. These leaked tables can interfere with subsequently created
demux tables.
Move the demux resources from the shared ldev to per-master lag_func
instances. Each master device now owns its own independent demux
state. The rule_add and rule_del helpers look up the appropriate
master's lag_func via the existing filter/group infrastructure.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-10-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When eswitch is disabled, notify the SD layer so it can clean up
SD-specific resources such as the TX flow table root configuration
on secondary devices.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-9-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the eswitch transitions, propagate the change to SD: secondaries
get their TX flow table root reconfigured for the new mode, and when
all group devices move to switchdev, the per-group shared FDB is
activated.
Shared FDB activation is best-effort - failure does not block the
eswitch transition; the next transition retries.
Note: the existing mlx5_get_sd() guard that blocks switchdev for SD
devices is intentionally retained. It will be removed once all
supporting patches are in place.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-8-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In Socket Direct configurations the primary and secondary PFs share the
same native_port_num. The eswitch vport metadata encodes pf_num in its
upper bits to distinguish vports across PFs. Without SD-awareness, both
PFs generate identical metadata, causing FDB rules to steer traffic to
the wrong representor.
Add mlx5_sd_pf_num_get() which remaps the pf_num for SD devices.
Use it so each PF in an SD group produces unique vport metadata.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-7-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add mlx5_fs_cmd_query_l2table_silent() to query the current silent mode
state from firmware. This allows detecting if firmware has already put
secondary devices into silent mode.
During SD group registration, query the silent mode of each device. If
a device is already in silent mode (set by firmware), record this in
the fw_silents_secondaries flag and use it to help determine the
primary/secondary roles.
When fw_silents_secondaries is set, skip the driver-initiated silent
mode set/unset operations since firmware manages this state. This
handles configurations where firmware persistently silences secondary
devices.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor SD group registration to use devcom event-driven role
determination to ensure SD is marked as ready only after roles are fully
assigned and the group state is consistent, making outside accessors,
which will be added in downstream patches, safe to use without races.
The devcom events:
- SD_PRIMARY_SET event: each device compares bus numbers with peers
to determine which should be primary
- SD_SECONDARIES_SET event: secondaries register themselves with the
elected primary device
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-5-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some devcom events are not expected to fail. Rather than attempting
a rollback that may not be meaningful, allow callers to pass
DEVCOM_CANT_FAIL as the rollback_event to indicate that the event
handler should not fail. If it does, emit a warning and stop
propagating to further peers, but skip the rollback path.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Factor mlx5_devcom_send_event() into two functions:
- mlx5_devcom_locked_send_event(): performs the dispatch (and
rollback) with comp->sem already held by the caller.
- mlx5_devcom_send_event(): unchanged wrapper that takes comp->sem,
calls the locked variant, and releases it.
This lets callers bracket multiple event broadcasts under a single
held write lock, eliminating the gap between consecutive dispatches
where peer state could change.
Will be used by a downstream patch.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-3-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
SD secondary devices share the primary's uplink and do not have
their own uplink representor. When reloading IB reps on secondary
devices, skip the uplink and only load VF/SF vport IB reps.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260612113904.537595-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Ingo Molnar:
- CPUID API updates (Ahmed S. Darwish):
- Introduce a centralized CPUID parser
- Introduce a centralized CPUID data model
- Introduce <asm/cpuid/leaf_types.h>
- Rename cpuid_leaf()/cpuid_subleaf() APIs
- treewide: Explicitly include the x86 CPUID headers
- Update to x86-cpuid-db v3.1 (Maciej Wieczor-Retman)
- Continued removal of pre-i586 support and related simplifications
(Ingo Molnar)
- Add Intel CPU model number for rugged Panther Lake (Tony Luck)
- Misc fixes, updates and cleanups by Arnd Bergmann, Chao Gao, Lukas
Bulwahn, Sohil Mehta, Maciej Wieczor-Retman.
* tag 'x86-cpu-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (25 commits)
x86/cpu: Make CONFIG_X86_CX8 unconditional
x86/cpu: Remove unused !CONFIG_X86_TSC code
x86/cpuid: Update bitfields to x86-cpuid-db v3.1
tools/x86/kcpuid: Update bitfields to x86-cpuid-db v3.1
x86/cpu: Make CONFIG_X86_TSC unconditional
MAINTAINERS: Drop obsolete FPU EMULATOR section
x86/cpu: Fix a F00F bug warning and clean up surrounding code
x86/cpu: Add Intel CPU model number for rugged Panther Lake
x86/cpuid: Introduce a centralized CPUID parser
x86/cpu: Introduce a centralized CPUID data model
x86/cpuid: Introduce <asm/cpuid/leaf_types.h>
x86/cpuid: Rename cpuid_leaf()/cpuid_subleaf() APIs
x86/cpu: Do not include the CPUID API header in asm/processor.h
Documentation: core-api/cpu_hotplug: Remove stale cpu0_hotplug docs
x86/cpu, cpufreq: Remove AMD ELAN support
x86/fpu: Remove the math-emu/ FPU emulation library
x86/fpu: Remove the 'no387' boot option
x86/fpu: Remove MATH_EMULATION and related glue code
treewide: Explicitly include the x86 CPUID headers
x86/cpu: Remove the CONFIG_X86_INVD_BUG quirk
...
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull timekeeping updates from Thomas Gleixner:
"Updates for NTP/timekeeping and PTP:
- Expand timekeeping snapshot mechanisms
The various snapshot functions are mostly used for PTP to collect
"atomic" snapshots of various involved clocks.
They lack support for the recently introduced AUX clocks and do not
provide the underlying counter value (e.g. TSC) to user space.
Exposing the counter value snapshot allows for better control and
steering.
Convert the hard wired ktime_get_snapshot() to take a clock ID,
which allows the caller to select the clock ID to be captured along
with CLOCK_MONONOTONIC_RAW. Additionally capture the underlying
hardware counter value and the clock source ID of the counter.
Expand the hardware based snapshot capture where devices provide a
mechanism to snapshot the hardware PTP clock and the system counter
(usually via PCI/PTM) to support AUX clocks and also provide the
captured counter value back to the caller and not only the clock
timestamps derived from it.
- Add a new optional read_snapshot() callback to clocksources
That is required to capture atomic snapshots from clocksources
which are derived from TSC with a scaling mechanism (e.g. Hyper-V,
KVMclock).
The value pair is handed back in the snapshot structure to the
callers, so they can do the necessary correlations in a more
precise way.
This touches usage sites of the affected functions and data structure
all over the tree, but stays fully backwards compatible for the
existing user space exposed interfaces. New PTP IOCTLs will provide
access to the extended functionality in later kernel versions"
* tag 'timers-ptp-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (28 commits)
ptp: vmclock: Use hw_cycles from snapshot for precise TSC pairing
x86/kvmclock: Implement read_snapshot() for kvmclock clocksource
clocksource/hyperv: Implement read_snapshot() for TSC page clocksource
timekeeping: Add clocksource read_snapshot() method and hw_cycles to snapshot
ptp: Switch to ktime_get_snapshot_id() for pre/post timestamps
timekeeping: Add support for AUX clock cross timestamping
timekeeping: Remove system_device_crosststamp::sys_realtime
ALSA: hda/common: Use system_device_crosststamp::sys_systime
wifi: iwlwifi: Use system_device_crosststamp::sys_systime
ptp: Use system_device_crosststamp::sys_systime
timekeeping: Prepare for cross timestamps on arbitrary clock IDs
timekeeping: Remove ktime_get_snapshot()
virtio_rtc: Use provided clock ID for history snapshot
net/mlx5: Use provided clock ID for history snapshot
igc: Use provided clock ID for history snapshot
ice/ptp: Use provided clock ID for history snapshot
wifi: iwlwifi: Adopt PTP cross timestamps to core changes
timekeeping: Add CLOCK ID to system_device_crosststamp
timekeeping: Add system_counterval_t to struct system_device_crosststamp
timekeeping: Add CLOCK_AUX support for ktime_get_snapshot_id()
...
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core updates from Danilo Krummrich:
"Deferred probe:
- Fix race where deferred probe timeout work could be permanently
canceled by using mod_delayed_work()
- Fix missing jiffies conversion in deferred_probe_extend_timeout()
- Guard timeout extension with delayed_work_pending() to prevent
premature firing
- Use system_percpu_wq instead of the deprecated system_wq
- Update deferred_probe_timeout documentation
device:
- Replace direct struct device bitfield access (can_match, dma_iommu,
dma_skip_sync, dma_ops_bypass, state_synced, dma_coherent,
of_node_reused, offline, offline_disabled) with flag-based
accessors using bit operations
- Reject devices with unregistered buses
- Delete unused DEVICE_ATTR_PREALLOC()
- Add low-level device attribute macros with const show/store
callbacks, allowing device attributes to reside in read-only memory
- Move core device attributes to read-only memory
- Constify group array pointers in driver_add_groups() /
driver_remove_groups(), struct bus_type, and struct device_driver
device property:
- Fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()
- Initialize all fields of fwnode_handle in fwnode_init()
- Provide swnode_get()/swnode_put() wrappers around kobject_get/put()
- Allow passing struct software_node_ref_args pointers directly to
PROPERTY_ENTRY_REF()
driver_override:
- Migrate amba, cdx, vmbus, and rpmsg to the generic driver_override
infrastructure, fixing a UAF from unsynchronized access to
driver_override in bus match() callbacks
- Remove the now-unused driver_set_override()
firmware loader:
- Fix recursive lock deadlock in device_cache_fw_images() when async
work falls back to synchronous execution
- Fix device reference leak in firmware_upload_register()
platform:
- Pass KBUILD_MODNAME through the platform driver registration macro
to create module symlinks in sysfs for built-in drivers; move
module_kset initialization to a pure_initcall and tegra cbb
registration to core_initcall to ensure correct ordering
- Pass THIS_MODULE implicitly through a coresight_init_driver() macro
sysfs:
- Upgrade OOB write detection in sysfs_kf_seq_show() from printk to
WARN
- Add return value clamping to sysfs_kf_read()
Rust:
- ACPI:
Fix missing match data for PRP0001 by exporting
acpi_of_match_device()
- Auxiliary:
Replace drvdata() with dedicated registration data on
auxiliary_device. drvdata() exposed the driver's bus device private
data beyond the driver's own scope, creating ordering constraints
and forcing the data to outlive all registrations that access it.
Registration data is instead scoped structurally to the
Registration object, making lifecycle ordering enforced by
construction rather than convention.
- Rust-native device driver lifetimes (HRT):
Allow Rust device drivers to carry a lifetime parameter on their
bus device private data, tied to the device binding scope -- the
interval during which a bus device is bound to a driver. Device
resources like pci::Bar<'a> and IoMem<'a> can be stored directly in
the driver's bus device private data with a lifetime bounded by the
binding scope, so the compiler enforces at build time that they do
not outlive the binding. This removes Devres indirection from every
access site and eliminates try_access() failure paths in
destructors.
Bus driver traits use a Generic Associated Type (GAT) Data<'bound>
to introduce the lifetime on the private data, rather than
parameterizing the Driver trait itself. Auxiliary registration
data, where the lifetime is not introduced by a trait callback but
must be threaded through Registration, uses the ForLt trait (a
type-level abstraction for types generic over a lifetime).
Misc:
- Fix DT overlayed devices not probing by reverting the broken
treewide overlay fix and re-running fw_devlink consumer pickup when
an overlay is applied to a bound device
- Use root_device_register() for faux bus root device; add sanity
check for failed bus init
- Fix dev_has_sync_state() data race with READ_ONCE() and move it to
base.h
- Avoid spurious device_links warning when removing a device while
its supplier is unbinding
- Switch ISA bus to dynamic root device
- Fix suspicious RCU usage in kernfs_put()
- Remove devcoredump exit callback
- Constify devfreq_event_class"
* tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core: (81 commits)
software node: allow passing reference args to PROPERTY_ENTRY_REF()
driver core: platform: set mod_name in driver registration
coresight: pass THIS_MODULE implicitly through a macro
kernel: param: initialize module_kset in a pure_initcall
soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
firmware_loader: Fix recursive lock in device_cache_fw_images()
driver core: Use system_percpu_wq instead of system_wq
driver core: remove driver_set_override()
rpmsg: use generic driver_override infrastructure
Drivers: hv: vmbus: use generic driver_override infrastructure
cdx: use generic driver_override infrastructure
amba: use generic driver_override infrastructure
rust: devres: add 'static bound to Devres<T>
samples: rust: rust_driver_auxiliary: showcase lifetime-bound registration data
rust: auxiliary: generalize Registration over ForLt
rust: types: add `ForLt` trait for higher-ranked lifetime support
gpu: nova-core: separate driver type from driver data
samples: rust: rust_driver_pci: use HRT lifetime for Bar
rust: io: make IoMem and ExclusiveIoMem lifetime-parameterized
rust: pci: make Bar lifetime-parameterized
...
|
|
ath6kl_usb_create() currently creates ath6kl_wq with flags set to 0:
alloc_workqueue("ath6kl_wq", 0, 0)
This triggers a runtime warning in __alloc_workqueue() because the queue is
created with neither WQ_PERCPU nor WQ_UNBOUND set:
workqueue: ath6kl_wq is using neither WQ_PERCPU or WQ_UNBOUND.
Setting WQ_PERCPU.
Set WQ_PERCPU explicitly to match the actual execution model and remove the
warning during device probe. No functional change intended.
Fixes: 21c05ca88a54 ("workqueue: Add warnings and ensure one among WQ_PERCPU or WQ_UNBOUND is present")
Reported-by: syzbot+f80c62f371ba6a1e7d79@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/6a289c01.39669fcc.33b062.00aa.GAE@google.com/T/
Signed-off-by: wuyankun <wuyankun@uniontech.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
GRO_LEGACY_MAX_SIZE = 65536; total_len being 65536 is too big to fit
into a u16. As can be seen in skb_gro_receive, packets bigger or equal
to gro_max_size (or GRO_LEGACY_MAX_SIZE) are dropped with -E2BIG. Apply
the same boundary to geneve_post_decap_hint to avoid writing 65536 to a
16-bit iph->tot_len field with an overflow.
Fixes: fd0dd796576e ("geneve: use GRO hint option in the RX path")
Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20260611192955.604661-3-alice.kernel@fastmail.im
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The hclge_main.c file has become very large,
so the fd code has been moved to a separate hclge_fd.c file.
This patch only moves the code and does not modify any functionality.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-7-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, the tc tool only supports adding and deleting rules from
the driver but does not support querying rules from the driver.
This patch adds a rule dump file in debugfs to check whether the driver's
configuration matches the configuration issued by tc flow.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-6-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, the driver does not support FLOW_DISSECTOR_KEY_IP and
FLOW_DISSECTOR_KEY_ENC_KEYID. But the hardware supports
ip_tos (FLOW_DISSECTOR_KEY_IP) and
outer_tun_vni (FLOW_DISSECTOR_KEY_ENC_KEYID).
This patch adds support for FLOW_DISSECTOR_KEY_IP and
FLOW_DISSECTOR_KEY_ENC_KEYID.
Additionally, since tc flow cannot effectively support
l2_user_def, l3_user_def, and l4_user_def,
this patch explicitly sets them to not be used.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, the driver supports only one action:HCLGE_FD_ACTION_SELECT_TC.
This patch adds support for HCLGE_FD_ACTION_SELECT_QUEUE and
HCLGE_FD_ACTION_DROP_PACKET.
A rule can have only one action. Therefore, the driver intercepts rules
that have multiple actions or no action.
Note: The driver considers cls_flower->classid as an action:
HCLGE_FD_ACTION_SELECT_TC.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, when the tc tool is used to set flow table rules, the IP address
and MAC address can be configured separately, for example, src_xx or dst_xx
can be configured separately.
Therefore, the driver needs to check whether the mask is all zero in
keys, such as FLOW_DISSECTOR_KEY_IPV4_ADDRS, FLOW_DISSECTOR_KEY_IPV6_ADDRS,
and FLOW_DISSECTOR_KEY_ETH_ADDRS.
If the mask is all zero, the tuple is not configured.
In this case, the driver adds the tuple to unused_tuple.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Remove the tc parameter from the add_cls_flower() ops callback and
refactor action parsing to support future extensions for SELECT_QUEUE
and DROP_PACKET actions.
Changes:
* Remove the tc parameter from the add_cls_flower() callback signature.
* Extract TC-based action parsing into hclge_get_tc_flower_action().
* Move the dissector->used_keys check from hclge_parse_cls_flower() to
hclge_check_cls_flower(), and restrict ETH_ADDRS to
HCLGE_FD_MODE_DEPTH_2K_WIDTH_400B_STAGE_1 mode since hardware only
supports MAC matching there.
* Migrate error reporting from dev_err() to netlink extended ACK (extack).
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Link: https://patch.msgid.link/20260610060618.834987-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For both the join and leave paths, the logic goes through the following
steps: determines which FDB should be used on a port after the current
changeupper change, populate the private port structures with the new
FDB and, if necessary, make as not used the old FDB.
Instead of having two distinct paths inside the
dpaa2_switch_port_set_fdb() for linking=true and linking=false, unify
them. This will hopefully help in making this function easier to read.
No behavior changes are expected.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-6-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move the FDB selection for when a port leaves bridge into a new helper -
dpaa2_switch_fdb_for_leave(). This will hopefully make the
dpaa2_switch_port_set_fdb() function easier to read and follow. The new
helper only determines the FDB to be used, any updates into the private
port structure still gets done in the set_fdb() function.
No changes in the actual behavior are intended.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-5-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The dpaa2_switch_port_set_fdb() function handles the setup of the FDB
for both changeupper cases: join and leave. Move the code block which
handles the join path into a new helper - dpaa2_switch_fdb_for_join() -
with the hope that the entire function will become easier to read and
extend with other use cases in the future.
This new helper just determines and returns what FDB should be used for
a specific port, the cleanup of the old FDB and the actual setup in the
per port structure remains in the dpaa2_switch_port_set_fdb() function.
No changes in the actual behavior are intended.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-4-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The dpaa2_switch_port_set_fdb() function is hard to follow and
open-coding the in-use check into it makes it even harder to read.
Factor out that code block into a new helper -
dpaa2_switch_fdb_in_use_by_others().
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-3-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since there dpaa2_switch_port_set_fdb() never fails and its return value
was never checked, change its prototype to return void.
Also, instead of determining if the DPAA2 port is joining or leaving an
upper based on the value of the 'bridge_dev' parameter, add the
'linking' parameter to explicitly specify the action.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/20260610150912.1788482-2-ioana.ciornei@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix spelling errors in code comments:
- igc_diag.c: 'autonegotioation' -> 'autonegotiation'
- igc_main.c: 'revisons' -> 'revisions' (two occurrences)
Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-16-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix spelling errors in code comments:
- e1000_nvm.c: 'likley' -> 'likely'
- e1000_mac.c: 'auto-negotitation' -> 'auto-negotiation'
- e1000_mbx.h: 'exra' -> 'extra'
- e1000_defines.h: 'Aserted' -> 'Asserted'
Signed-off-by: Maximilian Pezzullo <maximilianpezzullo@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Joe Damato <joe@dama.to>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-15-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
[Why]
In e1000_set_eeprom(), the eeprom_buff is allocated to hold a range of
words. However, only the boundary words (the first and the last) are
populated from the EEPROM if the write request is not word-aligned.
The words in the middle of the buffer remain uninitialized because they
are intended to be completely overwritten by the new data via memcpy().
The previous implementation had a loop that performed le16_to_cpus()
on the entire buffer. This resulted in endianness conversion being
performed on uninitialized memory for all interior words.
Fix this by converting the endianness only for the boundary words
immediately after they are successfully read from the EEPROM.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Co-developed-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Agalakov Daniil <ade@amicon.ru>
Tested-by: Avigail Dahan <avigailx.dahan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-14-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
[Why]
In e1000_set_eeprom(), the eeprom_buff is allocated to hold a range of
words. However, only the boundary words (the first and the last) are
populated from the EEPROM if the write request is not word-aligned.
The words in the middle of the buffer remain uninitialized because they
are intended to be completely overwritten by the new data via memcpy().
The previous implementation had a loop that performed le16_to_cpus()
on the entire buffer. This resulted in endianness conversion being
performed on uninitialized memory for all interior words.
Fix this by converting the endianness only for the boundary words
immediately after they are successfully read from the EEPROM.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Co-developed-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Iskhakov Daniil <dish@amicon.ru>
Signed-off-by: Agalakov Daniil <ade@amicon.ru>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-13-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The __napi_schedule_irqoff() macro is intended to bypass saving and
restoring IRQ state when scheduling is requested from an IRQ handler,
where hard interrupts are already disabled. Use this macro in all three
interrupt handlers.
This was tested on a system with an I218-V and MSI interrupts. Because
this is an optimization, I was interested in measuring the impact, so I
added ktime_get() time measurement to e1000_intr_msi and a print of the
last sample in the watchdog task. For each test case I ran a
bi-directional iperf3 to saturate the line. With some help from awk,
here are the statistics.
49 samples each, all units ns
previous: min 678 max 1265 mean 879.429 median 806 stddev 137.188
noirq: min 707 max 1165 mean 811.857 median 790 stddev 89.486
According to this informal comparison, the mean time to handle an
interrupt from start to finish is improved by about 8% under load.
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Michal Cohen <michalx.cohen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20260609213559.178657-12-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|