• Home
  • Raw
  • Download

Lines Matching +full:system +full:- +full:control

14   is the probability that a system will produce correct outputs.
20 is the probability that a system is operational at a given time
27 is the simplicity and speed with which a system can be repaired or
33 -------------
35 In order to reduce systems downtime, a system should be capable of detecting
38 the system administrator to take the action of replacing a component before
39 it causes data loss or system downtime.
47 Self-Monitoring, Analysis and Reporting Technology (SMART).
55 ---------------
68 * **Correctable Error (CE)** - the error detection mechanism detected and
70 Kernel mechanisms allow the system administrator to consider them as fatal.
72 * **Uncorrected Error (UE)** - the amount of errors happened above the error
73 correction threshold, and the system was unable to auto-correct.
75 * **Fatal Error** - when an UE error happens on a critical component of the
76 system (for example, a piece of the Kernel got corrupted by an UE), the
79 * **Non-fatal Error** - when an UE error happens on an unused component,
80 like a CPU in power down state or an unused memory bank, the system may
87 The mechanism for handling non-fatal errors is usually complex and may
89 policy desired by the system administrator.
92 ------------------------------------
94 Just detecting a hardware flaw is usually not enough, as the system needs
113 Locator: ChannelA-DIMM0
121 On the above example, a DDR4 SO-DIMM memory module is located at the
122 system's memory labeled as "BANK 0", as given by the *bank locator* field.
123 Please notice that, on such system, the *total width* is equal to the
146 There, the DDR3 RDIMM memory module is located at the system's memory labeled
150 Such kind of memory is called Error-correcting code memory (ECC memory).
153 labels on their system's board to use exactly the same BIOS, meaning that
157 ----------
187 mode called "Lock-Step", where it groups two memory modules together,
188 doing 128-bit reads/writes. That gives 16 bits for error correction, with
195 the system checks both memory modules, in order to check if both provide
198 memory modules (or 4 memory modules, if the system is also on Lock-step
204 EDAC - Error Detection And Correction
210 was "out-of-tree" and maintained at http://bluesmoke.sourceforge.net.
218 -------
221 that occur within the computer system running under linux.
224 ------
232 CE events only, the system can and will continue to operate as no data
237 and system panics.
240 -----------------------
245 This new device type allows for non-memory type of ECC hardware detectors
257 ----------------
263 There are several add-in adapters that do **not** follow the PCI specification
280 ----------
283 Controller (MC) driver modules. On a given system, the CORE is loaded
288 Thus, to "report" on what version a system is running, one must report
293 -------
298 hardware-specific modules and have the dependencies load the necessary
310 ---------------
312 EDAC presents a ``sysfs`` interface for control and reporting purposes. It
313 lives in the /sys/devices/system/edac directory.
318 mc memory controller(s) system
319 pci PCI control and status system
325 ----------------------------
328 are laid out in a Chip-Select Row (``csrowX``) and Channel table (``chX``).
331 .. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely
333 packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI
346 for more than 2 channels, like Fully Buffered DIMMs (FB-DIMMs) memory
349 +------------+-----------------------+
351 +------------+-----------+-----------+
355 +------------+-----------+-----------+
357 +------------+-----------+-----------+
359 +------------+-----------+-----------+
361 +------------+-----------+-----------+
363 +------------+-----------+-----------+
365 +------------+-----------+-----------+
370 +---------+---------+
372 +---------+---------+
374 +---------+---------+
376 Labels for these slots are usually silk-screened on the motherboard.
394 ``/sys/devices/system/edac/mc``, each memory controller will be
400 |->mc0
401 |->mc1
402 |->mc2
410 |->csrow0
411 |->csrow2
412 |->csrow3
417 order to have dual-channel mode be operational. Since both csrow2 and
422 control and attribute files.
425 -------------------
427 In ``mcX`` directories are EDAC control and attribute files for
432 Documentation/ABI/testing/sysfs-devices-edac
436 ----------------------------------
441 A typical EDAC system has the following structure under
442 ``/sys/devices/system/edac/``\ [#f6]_::
444 /sys/devices/system/edac/
491 In the ``dimmX`` directories are EDAC control and attribute files for
494 - ``size`` - Total memory managed by this csrow attribute file
499 - ``dimm_ue_count`` - Uncorrectable Errors count attribute file
504 will panic the system.
506 - ``dimm_ce_count`` - Correctable Errors count attribute file
512 monitored for non-zero values and report such information
513 to the system administrator.
515 - ``dimm_dev_type`` - Device type attribute file
521 - x1
522 - x2
523 - x4
524 - x8
526 - ``dimm_edac_mode`` - EDAC Mode of operation attribute file
531 - ``dimm_label`` - memory module label control file
533 This control file allows this DIMM to have a label assigned
535 the output can provide the DIMM label in the system log.
545 - ``dimm_location`` - location of the memory module
552 - *csrow* and *channel* - used when the memory controller
553 doesn't identify a single DIMM - e. g. in ``rankX`` dir;
554 - *branch*, *channel*, *slot* - typically used on FB-DIMM memory
556 - *channel*, *slot* - used on Nehalem and newer Intel drivers.
558 - ``dimm_mem_type`` - Memory Type attribute file
564 - Registered-DDR
565 - Unbuffered-DDR
577 ----------------------
580 directories. As this API doesn't work properly for Rambus, FB-DIMMs and
584 In the ``csrowX`` directories are EDAC control and attribute files for
588 - ``ue_count`` - Total Uncorrectable Errors count attribute file
593 will panic the system.
596 - ``ce_count`` - Total Correctable Errors count attribute file
602 monitored for non-zero values and report such information
603 to the system administrator.
606 - ``size_mb`` - Total memory managed by this csrow attribute file
612 - ``mem_type`` - Memory Type attribute file
618 - Registered-DDR
619 - Unbuffered-DDR
622 - ``edac_mode`` - EDAC Mode of operation attribute file
628 - ``dev_type`` - Device type attribute file
634 - x1
635 - x2
636 - x4
637 - x8
640 - ``ch0_ce_count`` - Channel 0 CE Count attribute file
646 - ``ch0_ue_count`` - Channel 0 UE Count attribute file
652 - ``ch0_dimm_label`` - Channel 0 DIMM Label control file
655 This control file allows this DIMM to have a label assigned
657 the output can provide the DIMM label in the system log.
668 - ``ch1_ce_count`` - Channel 1 CE Count attribute file
675 - ``ch1_ue_count`` - Channel 1 UE Count attribute file
682 - ``ch1_dimm_label`` - Channel 1 DIMM Label control file
684 This control file allows this DIMM to have a label assigned
686 the output can provide the DIMM label in the system log.
697 System Logging
698 --------------
700 If logging for UEs and CEs is enabled, then system logs will contain
709 +---------------------------------------+-------------+
713 +---------------------------------------+-------------+
715 +---------------------------------------+-------------+
717 +---------------------------------------+-------------+
719 +---------------------------------------+-------------+
722 +---------------------------------------+-------------+
724 +---------------------------------------+-------------+
726 +---------------------------------------+-------------+
728 +---------------------------------------+-------------+
730 +---------------------------------------+-------------+
731 | And then an optional, driver-specific | |
734 +---------------------------------------+-------------+
737 type, a notice of "no info" and then an optional, driver-specific error
742 ------------------------
752 -------------------
754 Under ``/sys/devices/system/edac/pci`` are control and attribute files as
758 - ``check_pci_parity`` - Enable/Disable PCI Parity checking control file
760 This control file enables or disables the PCI Bus Parity scanning
766 echo "1" >/sys/devices/system/edac/pci/check_pci_parity
770 echo "0" >/sys/devices/system/edac/pci/check_pci_parity
773 - ``pci_parity_count`` - Parity Count
780 -----------------
782 - ``edac_mc_panic_on_ue`` - Panic on UE control file
786 occurs - it is indeterminate what was uncorrected and the operating
787 system context might be so mangled that continuing will lead to further
800 - ``edac_mc_log_ue`` - Log UE control file
804 are reported through the system message log system. UE statistics
816 - ``edac_mc_log_ce`` - Log CE control file
820 errors are reported through the system message log system.
832 - ``edac_mc_poll_msec`` - Polling period control file
851 - ``panic_on_pci_parity`` - Panic on PCI PARITY Error
854 This control file enables or disables panicking when a parity
873 ----------------
880 At the location ``/sys/devices/system/edac`` (sysfs) new edac_device devices
887 /sys/devices/system/edac/test-instance
897 panic_on_ue boolean to ``panic`` the system if an UE is encountered
902 The test_device_edac device adds at least one of its own custom control:
909 One out-of-tree driver uses controls here to allow
917 ---------
922 +----------------+
923 | test-instance0 |
924 +----------------+
936 ------
941 +-------------+
942 | test-block0 |
943 +-------------+
955 The ``test_device_edac`` device adds 4 attributes and 1 control:
958 test-block-bits-0 for every POLL cycle this counter
960 test-block-bits-1 every 10 cycles, this counter is bumped once,
961 and test-block-bits-0 is set to 0
962 test-block-bits-2 every 100 cycles, this counter is bumped once,
963 and test-block-bits-1 is set to 0
964 test-block-bits-3 every 1000 cycles, this counter is bumped once,
965 and test-block-bits-2 is set to 0
970 reset-counters writing ANY thing to this control will
983 --------------------------------------------------
1037 ``/sys/devices/system/edac/mc/mc?/``:
1039 - ``inject_addrmatch/*``:
1057 echo 2 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/dimm
1058 echo 1 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/rank
1062 echo any >/sys/devices/system/edac/mc/mc0/inject_addrmatch/dimm
1063 echo any >/sys/devices/system/edac/mc/mc0/inject_addrmatch/rank
1065 - ``inject_eccmask``:
1068 - ``inject_section``:
1075 - ``inject_type``:
1078 bit 0 - repeat
1079 bit 1 - ecc
1080 bit 2 - parity
1082 - ``inject_enable``:
1094 echo 2 >/sys/devices/system/edac/mc/mc0/inject_addrmatch/channel
1095 echo 2 >/sys/devices/system/edac/mc/mc0/inject_type
1096 echo 64 >/sys/devices/system/edac/mc/mc0/inject_eccmask
1097 echo 3 >/sys/devices/system/edac/mc/mc0/inject_section
1098 echo 1 >/sys/devices/system/edac/mc/mc0/inject_enable
1106 …EDAC MC0: UE row 0, channel-a= 0 channel-b= 0 labels "-": NON_FATAL (addr = 0x0075b980, socket=0, …
1121 $ for i in /sys/devices/system/edac/mc/mc0/all_channel_counts/*; do echo $i; cat $i; done
1122 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
1124 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
1126 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2
1155 ------------------------------------------
1158 (available from http://support.amd.com/en-us/search/tech-docs):
1181 Models 30h-3Fh Processors
1185 :Link: http://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf
1188 Models 60h-6Fh Processors
1192 :Link: http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf
1195 Models 00h-0Fh Processors
1206 - 7 Dec 2005
1207 - 17 Jul 2007 Updated
1211 - 05 Aug 2009 Nehalem interface
1212 - 26 Oct 2016 Converted to ReST and cleanups at the Nehalem section
1216 - Doug Thompson, Dave Jiang, Dave Peterson et al,
1217 - Mauro Carvalho Chehab
1218 - Borislav Petkov
1219 - original author: Thayne Harbaugh