Lines Matching +full:on +full:- +full:device
1 .. SPDX-License-Identifier: GPL-2.0
8 :Authors: - Linas Vepstas <linasvepstas@gmail.com>
9 - Richard Lary <rlary@us.ibm.com>
10 - Mike Mason <mmlnx@us.ibm.com>
14 PCI errors on the bus, such as parity errors on the data and address
16 chipsets are able to deal with these errors; these include PCI-E chipsets,
17 and the PCI-host bridges found on IBM Power4, Power5 and Power6-based
18 pSeries boxes. A typical action taken is to disconnect the affected device,
22 offered, so that the affected PCI device(s) are reset and put back
24 between the affected device drivers and the PCI controller chip.
25 This document describes a generic API for notifying device drivers
31 is reported as soon as possible to all affected device drivers,
32 including multiple instances of a device driver on multi-function
33 cards. This allows device drivers to avoid deadlocking in spinloops,
34 waiting for some i/o-space register to change, when it never will.
39 is forced by the need to handle multi-function devices, that is,
40 devices that have multiple device drivers associated with them.
42 of reset it desires, the choices being a simple re-enabling of I/O
47 After a reset and/or a re-enabling of I/O, all drivers are
48 again notified, so that they may then perform any device setup/config
52 The biggest reason for choosing a kernel-based implementation rather
53 than a user-space implementation was the need to deal with bus
56 file system is disconnected, a user-space mechanism would have to go
59 from/reconnection to their underlying block device. By contrast,
60 bus errors are easy to manage in the device driver. Indeed, most
61 device drivers already handle very similar recovery procedures;
62 for example, the SCSI-generic layer already provides significant
69 Design and implementation details below, based on a chain of
74 pci_driver. A driver that fails to provide the structure is "non-aware",
100 PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
101 PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
102 PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
103 PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
104 PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
116 event will be platform-dependent, but will follow the general
120 -------------------
121 A PCI bus error is detected by the PCI hardware. On powerpc, the slot
127 --------------------
128 Platform calls the error_detected() callback on every instance of
131 At this point, the device might not be accessible anymore, depending on
132 the platform (the slot will be isolated on powerpc). The driver may
137 touch the device. Within this function and after it returns, the driver
144 - PCI_ERS_RESULT_CAN_RECOVER
149 - PCI_ERS_RESULT_NEED_RESET
152 - PCI_ERS_RESULT_DISCONNECT
155 The next step taken will depend on the result codes returned by the
158 If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER,
159 then the platform should re-enable IOs on the slot (or do nothing in
171 The current powerpc implementation assumes that a device driver will
174 thus, if one device sleeps/schedules, all devices are affected.
175 Doing better requires complex multi-threaded logic in the error
180 The current powerpc implementation doesn't much care if the device
182 a value of 0xff on read, and writes will be dropped. If more than
184 assumes that the device driver has gone into an infinite loop
186 get the device working again.
189 --------------------
190 The platform re-enables MMIO to the device (but typically not the
191 DMA), and then calls the mmio_enabled() callback on all affected
192 device drivers.
196 start operations again, only to peek/poke at the device, extract diagnostic
197 information, if any, and eventually do things like trigger a device local
199 all drivers on a segment agree that they can try to recover and if no automatic
200 link reset was performed by the HW. If the platform can't just re-enable IOs
211 such an error might cause IOs to be re-blocked for the whole
213 on the same segment might have done, forcing the whole segment
217 - PCI_ERS_RESULT_RECOVERED
218 Driver returns this if it thinks the device is fully
222 allowed to proceed, as another driver on the
224 slot reset on platforms that support it.
226 - PCI_ERS_RESULT_NEED_RESET
227 Driver returns this if it thinks the device is not
231 - PCI_ERS_RESULT_DISCONNECT
235 The next step taken depends on the results returned by the drivers.
243 ------------------
244 The platform resets the link. This is a PCI-Express specific step
249 ------------------
252 platform will perform a slot reset on the requesting PCI device(s).
254 will be platform-dependent. Upon completion of slot reset, the
255 platform will call the device slot_reset() callback.
263 power-on followed by power-on BIOS/system firmware initialization.
264 Soft reset is also known as hot-reset.
267 and results in device's state machines, hardware logic, port states and
276 performed by toggling the slot electrical power off/on.
280 a slot reset, the device driver will almost always use its standard
281 device initialization routines, and an unusual config space setup
284 This call gives drivers the chance to re-initialize the hardware
285 (re-download firmware, etc.). At this point, the driver may assume
288 memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
292 at this point. If all device drivers report success on this
297 it can't get the device operational after reset. If the platform
299 cycle) and then call slot_reset() again. If the device still can't
302 device will be considered "dead" in this case.
304 Drivers for multi-function cards will need to coordinate among
305 themselves as to which driver instance will perform any "one-shot"
306 or global device initialization. For example, the Symbios sym53cxx2
307 driver performs device init only from PCI function 0::
309 + if (PCI_FUNC(pdev->devfn) == 0)
313 - PCI_ERS_RESULT_DISCONNECT
323 + pdev->needs_freset = 1;
331 The current powerpc implementation does not try a power-cycle
337 -------------------------
338 The platform will call the resume() callback on all affected device
339 drivers if all drivers on the segment have returned
349 -------------------------
351 the device. The platform will call error_detected() with a
354 The device driver should, at this point, assume the worst. It should
355 cancel all pending I/O, refuse all new I/O, returning -EIO to
356 higher layers. The device driver should then clean up all of its
361 permanent failure in some way. If the device is hotplug-capable,
362 the operator will probably want to remove and replace the device.
364 caused by over-heating, some by a poorly seated card. Many
367 errors. See the discussion in Documentation/arch/powerpc/eeh-pci-error-recovery.rst
368 for additional detail on real-life experience of the causes of
373 ---------------------------
376 recover (disconnect them) and try to let other cards on the same segment
381 device is dead or has been isolated, there is a problem :)
385 - There is no guarantee that interrupt delivery can proceed from any
386 device on the segment starting from the error detection and until the
390 - There is no guarantee that interrupt delivery is stopped, that is,
397 interrupts are routed to error-management capable slots and can deal
407 the file Documentation/arch/powerpc/eeh-pci-error-recovery.rst
409 As of this writing, there is a growing list of device drivers with
413 - drivers/scsi/ipr
414 - drivers/scsi/sym53c8xx_2
415 - drivers/scsi/qla2xxx
416 - drivers/scsi/lpfc
417 - drivers/next/bnx2.c
418 - drivers/next/e100.c
419 - drivers/net/e1000
420 - drivers/net/e1000e
421 - drivers/net/ixgbe
422 - drivers/net/cxgb3
423 - drivers/net/s2io.c
429 - drivers/cxl/pci.c
432 -------