Mellanox nmlx5_core
driver 4.23
issues on ESXi 8.0 Update 1
Problem Inventory - Mellanox Driver Update on ESXi 8.0u1 causing network virtualization issues
After installing ESXi 8.0 Update 1, some issues start to appear with affected nmlx5_core
adapters:
- Delayed / Failed IP discovery on VLAN-backed segments, even within the same host. Once in the ARP cache, no issues persist
- Delayed / Failed IP discovery, IP allocation failures on VLAN trunked port-groups, even within the same host. Issues persist even after IP discovery is established
- Overlay encapsulation offload failures:
- ICMP with any payload size will function bidirectionally via Edge Transport Nodes / FRRLinux machines, but TCP and UDP will not
- All overlay traffic encapsulated by a vSphere host flows correctly between workloads on the sane NSX overlay segment
- All overlay traffic encapsulated by a vSphere host flows correctly between segments on the same NSX distributed router
These issues are seen on the following hardware models:
MCX4121A-ACAT
firmware revisions14.25
and14.32
These issues are experienced with the upgrade to vSphere 8.0 Update 1, which includes the following updated driver:
nmlx5-core 4.23.0.36-8vmw.800.1.0.20513097
This driver from NVIDIA ships with support for both Bluefield SmartNIC and ConnectX Generation 5 network adapters as one package, and rolling back to a previous release of ESXi 8 with the previous driver (nmlx5-core 4.22
) immediately resolves all overlay issues
Resolution
UPDATE: This problem has been resolved with ESXi 8.0 update 1c