1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 3================================================= 4Mellanox ConnectX(R) mlx5 core VPI Network Driver 5================================================= 6 7Copyright (c) 2019, Mellanox Technologies LTD. 8 9Contents 10======== 11 12- `Enabling the driver and kconfig options`_ 13- `Devlink info`_ 14- `Devlink parameters`_ 15- `Devlink health reporters`_ 16- `mlx5 tracepoints`_ 17 18Enabling the driver and kconfig options 19================================================ 20 21| mlx5 core is modular and most of the major mlx5 core driver features can be selected (compiled in/out) 22| at build time via kernel Kconfig flags. 23| Basic features, ethernet net device rx/tx offloads and XDP, are available with the most basic flags 24| CONFIG_MLX5_CORE=y/m and CONFIG_MLX5_CORE_EN=y. 25| For the list of advanced features please see below. 26 27**CONFIG_MLX5_CORE=(y/m/n)** (module mlx5_core.ko) 28 29| The driver can be enabled by choosing CONFIG_MLX5_CORE=y/m in kernel config. 30| This will provide mlx5 core driver for mlx5 ulps to interface with (mlx5e, mlx5_ib). 31 32 33**CONFIG_MLX5_CORE_EN=(y/n)** 34 35| Choosing this option will allow basic ethernet netdevice support with all of the standard rx/tx offloads. 36| mlx5e is the mlx5 ulp driver which provides netdevice kernel interface, when chosen, mlx5e will be 37| built-in into mlx5_core.ko. 38 39 40**CONFIG_MLX5_EN_ARFS=(y/n)** 41 42| Enables Hardware-accelerated receive flow steering (arfs) support, and ntuple filtering. 43| https://community.mellanox.com/s/article/howto-configure-arfs-on-connectx-4 44 45 46**CONFIG_MLX5_EN_RXNFC=(y/n)** 47 48| Enables ethtool receive network flow classification, which allows user defined 49| flow rules to direct traffic into arbitrary rx queue via ethtool set/get_rxnfc API. 50 51 52**CONFIG_MLX5_CORE_EN_DCB=(y/n)**: 53 54| Enables `Data Center Bridging (DCB) Support <https://community.mellanox.com/s/article/howto-auto-config-pfc-and-ets-on-connectx-4-via-lldp-dcbx>`_. 55 56 57**CONFIG_MLX5_MPFS=(y/n)** 58 59| Ethernet Multi-Physical Function Switch (MPFS) support in ConnectX NIC. 60| MPFs is required for when `Multi-Host <http://www.mellanox.com/page/multihost>`_ configuration is enabled to allow passing 61| user configured unicast MAC addresses to the requesting PF. 62 63 64**CONFIG_MLX5_ESWITCH=(y/n)** 65 66| Ethernet SRIOV E-Switch support in ConnectX NIC. E-Switch provides internal SRIOV packet steering 67| and switching for the enabled VFs and PF in two available modes: 68| 1) `Legacy SRIOV mode (L2 mac vlan steering based) <https://community.mellanox.com/s/article/howto-configure-sr-iov-for-connectx-4-connectx-5-with-kvm--ethernet-x>`_. 69| 2) `Switchdev mode (eswitch offloads) <https://www.mellanox.com/related-docs/prod_software/ASAP2_Hardware_Offloading_for_vSwitches_User_Manual_v4.4.pdf>`_. 70 71 72**CONFIG_MLX5_CORE_IPOIB=(y/n)** 73 74| IPoIB offloads & acceleration support. 75| Requires CONFIG_MLX5_CORE_EN to provide an accelerated interface for the rdma 76| IPoIB ulp netdevice. 77 78 79**CONFIG_MLX5_FPGA=(y/n)** 80 81| Build support for the Innova family of network cards by Mellanox Technologies. 82| Innova network cards are comprised of a ConnectX chip and an FPGA chip on one board. 83| If you select this option, the mlx5_core driver will include the Innova FPGA core and allow 84| building sandbox-specific client drivers. 85 86 87**CONFIG_MLX5_EN_IPSEC=(y/n)** 88 89| Enables `IPSec XFRM cryptography-offload accelaration <http://www.mellanox.com/related-docs/prod_software/Mellanox_Innova_IPsec_Ethernet_Adapter_Card_User_Manual.pdf>`_. 90 91**CONFIG_MLX5_EN_TLS=(y/n)** 92 93| TLS cryptography-offload accelaration. 94 95 96**CONFIG_MLX5_INFINIBAND=(y/n/m)** (module mlx5_ib.ko) 97 98| Provides low-level InfiniBand/RDMA and `RoCE <https://community.mellanox.com/s/article/recommended-network-configuration-examples-for-roce-deployment>`_ support. 99 100 101**External options** ( Choose if the corresponding mlx5 feature is required ) 102 103- CONFIG_PTP_1588_CLOCK: When chosen, mlx5 ptp support will be enabled 104- CONFIG_VXLAN: When chosen, mlx5 vxlan support will be enabled. 105- CONFIG_MLXFW: When chosen, mlx5 firmware flashing support will be enabled (via devlink and ethtool). 106 107Devlink info 108============ 109 110The devlink info reports the running and stored firmware versions on device. 111It also prints the device PSID which represents the HCA board type ID. 112 113User command example:: 114 115 $ devlink dev info pci/0000:00:06.0 116 pci/0000:00:06.0: 117 driver mlx5_core 118 versions: 119 fixed: 120 fw.psid MT_0000000009 121 running: 122 fw.version 16.26.0100 123 stored: 124 fw.version 16.26.0100 125 126Devlink parameters 127================== 128 129flow_steering_mode: Device flow steering mode 130--------------------------------------------- 131The flow steering mode parameter controls the flow steering mode of the driver. 132Two modes are supported: 1331. 'dmfs' - Device managed flow steering. 1342. 'smfs - Software/Driver managed flow steering. 135 136In DMFS mode, the HW steering entities are created and managed through the 137Firmware. 138In SMFS mode, the HW steering entities are created and managed though by 139the driver directly into Hardware without firmware intervention. 140 141SMFS mode is faster and provides better rule inserstion rate compared to default DMFS mode. 142 143User command examples: 144 145- Set SMFS flow steering mode:: 146 147 $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime 148 149- Read device flow steering mode:: 150 151 $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode 152 pci/0000:06:00.0: 153 name flow_steering_mode type driver-specific 154 values: 155 cmode runtime value smfs 156 157enable_roce: RoCE enablement state 158---------------------------------- 159RoCE enablement state controls driver support for RoCE traffic. 160When RoCE is disabled, there is no gid table, only raw ethernet QPs are supported and traffic on the well known UDP RoCE port is handled as raw ethernet traffic. 161 162To change RoCE enablement state a user must change the driverinit cmode value and run devlink reload. 163 164User command examples: 165 166- Disable RoCE:: 167 168 $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit 169 $ devlink dev reload pci/0000:06:00.0 170 171- Read RoCE enablement state:: 172 173 $ devlink dev param show pci/0000:06:00.0 name enable_roce 174 pci/0000:06:00.0: 175 name enable_roce type generic 176 values: 177 cmode driverinit value true 178 179Devlink health reporters 180======================== 181 182tx reporter 183----------- 184The tx reporter is responsible for reporting and recovering of the following two error scenarios: 185 186- TX timeout 187 Report on kernel tx timeout detection. 188 Recover by searching lost interrupts. 189- TX error completion 190 Report on error tx completion. 191 Recover by flushing the TX queue and reset it. 192 193TX reporter also support on demand diagnose callback, on which it provides 194real time information of its send queues status. 195 196User commands examples: 197 198- Diagnose send queues status:: 199 200 $ devlink health diagnose pci/0000:82:00.0 reporter tx 201 202NOTE: This command has valid output only when interface is up, otherwise the command has empty output. 203 204- Show number of tx errors indicated, number of recover flows ended successfully, 205 is autorecover enabled and graceful period from last recover:: 206 207 $ devlink health show pci/0000:82:00.0 reporter tx 208 209rx reporter 210----------- 211The rx reporter is responsible for reporting and recovering of the following two error scenarios: 212 213- RX queues initialization (population) timeout 214 RX queues descriptors population on ring initialization is done in 215 napi context via triggering an irq, in case of a failure to get 216 the minimum amount of descriptors, a timeout would occur and it 217 could be recoverable by polling the EQ (Event Queue). 218- RX completions with errors (reported by HW on interrupt context) 219 Report on rx completion error. 220 Recover (if needed) by flushing the related queue and reset it. 221 222RX reporter also supports on demand diagnose callback, on which it 223provides real time information of its receive queues status. 224 225- Diagnose rx queues status, and corresponding completion queue:: 226 227 $ devlink health diagnose pci/0000:82:00.0 reporter rx 228 229NOTE: This command has valid output only when interface is up, otherwise the command has empty output. 230 231- Show number of rx errors indicated, number of recover flows ended successfully, 232 is autorecover enabled and graceful period from last recover:: 233 234 $ devlink health show pci/0000:82:00.0 reporter rx 235 236fw reporter 237----------- 238The fw reporter implements diagnose and dump callbacks. 239It follows symptoms of fw error such as fw syndrome by triggering 240fw core dump and storing it into the dump buffer. 241The fw reporter diagnose command can be triggered any time by the user to check 242current fw status. 243 244User commands examples: 245 246- Check fw heath status:: 247 248 $ devlink health diagnose pci/0000:82:00.0 reporter fw 249 250- Read FW core dump if already stored or trigger new one:: 251 252 $ devlink health dump show pci/0000:82:00.0 reporter fw 253 254NOTE: This command can run only on the PF which has fw tracer ownership, 255running it on other PF or any VF will return "Operation not permitted". 256 257fw fatal reporter 258----------------- 259The fw fatal reporter implements dump and recover callbacks. 260It follows fatal errors indications by CR-space dump and recover flow. 261The CR-space dump uses vsc interface which is valid even if the FW command 262interface is not functional, which is the case in most FW fatal errors. 263The recover function runs recover flow which reloads the driver and triggers fw 264reset if needed. 265 266User commands examples: 267 268- Run fw recover flow manually:: 269 270 $ devlink health recover pci/0000:82:00.0 reporter fw_fatal 271 272- Read FW CR-space dump if already strored or trigger new one:: 273 274 $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal 275 276NOTE: This command can run only on PF. 277 278mlx5 tracepoints 279================ 280 281mlx5 driver provides internal trace points for tracking and debugging using 282kernel tracepoints interfaces (refer to Documentation/trace/ftrace.rst). 283 284For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/ 285 286tc and eswitch offloads tracepoints: 287 288- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5:: 289 290 $ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event 291 $ cat /sys/kernel/debug/tracing/trace 292 ... 293 tc-6535 [019] ...1 2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT 294 295- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5:: 296 297 $ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event 298 $ cat /sys/kernel/debug/tracing/trace 299 ... 300 tc-6569 [010] .N.1 2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL 301 302- mlx5e_stats_flower: trace flower stats request:: 303 304 $ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event 305 $ cat /sys/kernel/debug/tracing/trace 306 ... 307 tc-6546 [010] ...1 2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217 308 309- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5:: 310 311 $ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event 312 $ cat /sys/kernel/debug/tracing/trace 313 ... 314 kworker/u48:4-8806 [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1 315 316- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events:: 317 318 $ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event 319 $ cat /sys/kernel/debug/tracing/trace 320 ... 321 kworker/u48:7-2221 [009] ...1 1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1 322