MCE test suite HOWTO ==================== 11 November 2008 Huang Ying Section 4.2 (Test with kdump test driver) is based on the README of LTP kdump test case. Abstract -------- This document explains the structure and design of MCE test suite, the kernel patch and user space tools needed for automatic tests, usage guide and how to add new test cases into test suite. 0. Quick shortcut ------------------ - Install the Linux kernel with full MCE injection support, including latest Linux kernel (2.6.31) and MCE injection enhancement patchset in: http://ftp.kernel.org/pub/linux/kernel/people/yhuang/mce/. Make sure following configuration options are enabled: CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_INJECT=y or CONFIG_X86_MCE_INJECT=m - Get mcelog git version from git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git. and install in /usr/sbin (or rather first in your $PATH) git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git cd mcelog make sudo make install - Get mce-inject git version from git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git. git clone git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git cd mce-inject make sudo make install - Install page-types tool (sec 3.4), which is accompanied with Linux kernel source (2.6.32 or newer). cd $KERNEL_SRC/Documentation/vm/ gcc -o page-types page-types.c cp page-types /usr/bin/ - Run make test This will do the basic tests, but not the more complicated kdump ones. For more information on those read below. 1. Introduction --------------- The MCE test suite is a collection of tools and test scripts for testing the Linux kernel MCE processing features. The goal is to cover most Linux kernel MCE processing code paths and features with automation tests. If you just want to start testing as quickly as possible, you can skip section 2 and section 3, and go section 4.1 directly. 2. Structure ------------ The main intention behind the design is to re-use test cases amongst various test methods (represented as test drivers), such as kdump based, kernel MCE panic log (tolerant=3) based, etc. 2.1 Test cases Test cases are grouped into test case classes. The test cases in one class share the similar triggering, result collecting and result verifying methods. They can be used in same set of test drivers. The interface of a test case class is a shell script, usually named as cases.sh under a sub-directory of cases/. The following command line option should be supported by the test case class shell script: cases.sh enumerate enumerate test cases in class, print test case names to stdout cases.sh trigger trigger the test case cases.sh get_result get the result of test case cases.sh verify verify the result of test case, and print the verify result to stdout When execute cases.sh [trigger|get_result|verify], the test case is specified via environment variable this_case, which must be one of the test case names returned by "cases.sh enumerate". Other environment variables are also used to pass some information from driver to test cases, such as: this_case name current test case driver name of test driver klog file name which holds kernel log during test KSRC_DIR (for gcov) kernel source code directory GCOV (for gcov) gcov collection method vmcore (for kdump) vmcore file name reboot (for kdump) indicate there is a reboot between test case trigger and test case verify, some context has been gone. Several test case classes are provided with the test suite. cases/soft-inj/* is based on mce-inject MCE software injection tool. cases/apei-inj/* is based on apei-inj APEI haredware injection tool. cases///cases.sh Interface of the test case class cases///data/ Directory contains data file cases///refer/ Directory contains data file for reference MCE records if necessary. For document of various test cases, please refer to doc/cases/*. 2.2 Test drivers Test drivers drive the test procedure, its main structure is a loop over test case classes specified in configuration file. For each test case class, test driver loops over test cases returned by "cases.sh enumerate". And, for each test case, it calls "cases.sh" to trigger, get_result and verify the test case. Test driver also do some common work for test cases, such as kdump driver collects vmcore file, and invoking gcovdump command to get gcov data file. The interface of test driver is driver.sh, which is usually put in drivers// directory. The test configuration file should be used as the only command line parameter for driver.sh. Test case classes should be specified in test configuration file as CASES variable, details below. 2.3 Test configuration file Test configuration file is a shell script to specify parameters for test drivers and test cases. It must be put in config/ directory. The parameters are represented as shell variables as follow: CASES Name of test case classes, separate by white space. START_BACKGROUND Shell command to start a background process during testing, used for random testing.* STOP_BACKGROUND Shell command to stop the background process during testing. COREDIR (for kdump) directory contains Linux kernel crash core dump after kdump. VMLINUX (for kdump, gcov) vmlinux of Linux kernel GCOV (for gcov) Enable GCOV if set none zero. KSRC_DIR (for gcov) Kernel source code directory * To test MCE processing under random environment, a background process can be automatically run simultaneously during MCE testing. The start/stop command is specified via START_BACKGROUND and STOP_BACKGROUND. 2.4 Test result After test, the general test result will go results//result. The format of general test result is as follow: : Passed: item 1 description Failed: item 2 description ... Passed: item n description One blank line is used to separate test cases. Additional test result for various test cases will go "results///. For in-package test case class, additional test results include: results////klog Kernel log during testing results////mcelog mcelog output during testing results////mcelog_refer mcelog output reference results////mce_64.c.gcov (for gcov) gcov output file 3. Tools -------- 3.1 mce-inject mce-inject is a software MCE injection tool, which is based on Linux kernel software MCE injection mechanism. To inject a MCE into Linux kernel via mce-inject, a data file should be provided. The syntax is similar to the logging output by mcelog with some extensions. Please refer to the documentation of mce-inject for more information. The mce-inject program must be executable in $PATH. 3.2 mcelog mcelog read /dev/mcelog and prints the stored machine check records to stdout. It is used by MCE test suite to verify MCE records generated by kernel is same as reference records, at most time, same as input records. The current git mcelog version is needed for MCE test suite to work properly. Please refer to document of mcelog for more information. The latest mcelog can be gotten via git snapshot from git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git. Note you need the git version of mcelog available in $PATH. 3.3 gcovdump gcov is a test coverage tool, the original implementation is used for user space program only. LTP (Linux Test Project) provides the kernel gcov support. But MCE test involves panic or kdump, so gcovdump is developed to dump gcov data from kdump crash dump core. gcovdump has been merged by LTP cvs. For more information please refer to gcovdump document. The latest gcovdump can be gotten from cvs: http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/. 3.4 page-types A tool to query page types, which is accompanied with Linux kernel source (2.6.32 or newer, $KERNEL_SRC/Documentation/vm/page-types.c). It is required for MCE apei-inj testing. 4. Usage Guide -------------- 4.1 Test with simple test driver 4.1.1 Simple test driver The simple test driver just call cases.sh of test cases one by one in a loop. So it is not permitted for test cases to trigger real panic or reboot during test. For MCE testing, a special processing mode to just log everything in case of MCE is used for the simple test driver, it is enabled via set MCE parameter "tolerant=3" during testing. "tolerant" can be set via writing: /sys/devices/system/machinecheck/machinecheck0/tolerant 4.1.2 test instruction The following is the basic test instruction, for some additional features such as gcov support, please refer to corresponding instructions. a. Linux kernel and user space tools as follow should be installed - A Linux kernel with full MCE injection support (see 0) - mce-inject tool (see 3.1) - mcelog with proper version (see 3.2) - page-types (see 3.4) b. Modify config/simple.conf or create a new test configuration file. Refer to section 2.3 for more instruction about test configuration file. c. Run "make". Carefully check for any errors. d. It is recommended to stop cron before testing. Because there might be another mcelog reading events running on background by cron, which will upset the test. /etc/init.d/crond stop e. To be root and invoke simple test driver on test configuration file as follow Run "make test" to do all the standard tests that do not require special set up. f. General test result will go results/simple/result. Test log will go work/simple/log. Additional test results for various test cases will go results/simple//. For more details about in-package test case class, please refer to section 2.1. 4.2 Test with kdump test driver 4.2.1 kdump test driver The kdump test driver is based on the kdump test case in Linux Test Project, thank LTP for their excellent work! The kdump driver helps run tests which trigger crash/panic and generate result and report via kdump. The test scripts cycle through a series of crash/panic scenarios. Each test cycle does the following: a. Triggers a test case which triggers crash/panic (MCE with tolerant=1). b. Kdump kernel boots and saves a vmcore. c. System reboots to 1st kernel. d. Verifies test case, generate result and report. e. After a 1 to 2 minute delay, the next test case is run. 4.2.2 test instruction Follow the steps to setup kdump test driver. The test driver is written for SuSE Linux Enterprise Server 10 (and onward releases), OpenSUSE, Fedora, Debian, as well as RedHat Enterprise Linux 5. Since KDUMP is supported by the above mentioned distro's the test driver was written and tested on them. Contribution towards supporting more distributions are welcome. a. Install Linux kernel with full MCE injection and KDUMP support. In addition to MCE injection support in section 0, the following configuration options should be enabled too: CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y b. Install these additional packages: For SLES10 or OpenSUSE Distro: * kernel-kdump * kernel-source * kexec-tools For RHEL5 or Fedora distro: * kexec-tools * kernel-devel c. Configure where to put the kdump /proc/vmcore files. The path should be specified via COREDIR in test configuration file. By default, the kdump /proc/vmcore files will be put into /var/crash. For SLES10 or OpenSUSE Distro: * edit KDUMP_SAVEDIR in /etc/sysconfig/kdump For RHEL5 or Fedora distro: * edit path in /etc/kdump.conf d. In addition to bzImage and modules of Linux kernel should be installed on test machine, the vmlinux of Linux kernel should be put on test machine and specified via VMLINUX in test configuration file. e. Make sure the partition where the test driver is running has space for the tests results and one vmcore file (size of physical memory). f. Now, reboot system. Test if kdump works by starting kdump and triggering kernel panic. For SLES10 or OpenSUSE Distro: service boot.kdump restart chkconfig boot.kdump on echo "c" > /proc/sysrq-trigger For RHEL5 or Fedora distro: service kdump restart /sbin/chkconfig kdump on echo "c" > /proc/sysrq-trigger After system reboot, check if there are vmcore files. By default, they are in /var/crash/*/. If yes, "kdump" works in the system. g. Create a new test configuration file or use a existing one in config/, such as kdump.conf. Note: not all test case classes can be used with kdump test driver, see "important points" below. h. Run "make". Carefully check for any errors. i. To be root and run "drivers/kdump/driver.sh " or "make test-kdump" (for a full test) j. After test is done, the test log of the last run of kdump driver will be displayed on main console. Few Important points to remember: - kdump test driver request that a real panic should be triggered when test case is triggered. So not all test case classes can be used with kdump test driver, for example, all test case classes for corrected MCE can not be used with kdump test driver. - If you need to stop the tests before all test cases have run, run "crontab -r" and "killall driver.sh" within 1 minute after the 1st kernel reboots. Then, if you'd like to carry on tests from that point on, run: rm work/kdump/stamps/setupped drivers/kdump/driver.sh If you'd like to start tests from the beginning, run: make reset drivers/kdump/driver.sh - If a failure occurs when booting the kdump kernel, you'll need to manually reset the system so it reboots back to the 1st kernel and continues on to the next test. For this reason, it's best to monitor the tests from a console. If possible, setup a serial console (not a must, any type of console setup will do). If using minicom, enable saving of kernel messages displayed on minicom into a file, by pressing ctrl+a+l on the console. Else, when it is observed that the kdump kernel has failed to boot, manually copy the boot message into a file to enable the debugging the cause of the hang. - The results are saved in results/kdump/result, which also shows where you are in the test run. When the "Test run complete" entry appears in that file, you're done. Verbose log can be found at work/log. - The test machine would be unavailable for any other work during the period of the test run. 4.3 Gcov support Gcov is a test coverage tool. It can be used to discover untested parts of program, collect branch taken statistics to optimize program, etc. In MCE test suite, it is used to get test coverage, that is, which C statements are covered by each test case. Gcov support is optional, if you don't care about test coverage information, just skip this section. a. Make sure your kernel has gcov support. You can find lasted kernel gcov patches from: http://ltp.sourceforge.net/coverage/gcov.php A README for kernel gcov can be found from: http://ltp.sourceforge.net/coverage/gcov/readme.php Notes: CONFIG_GCOV_ALL does not work for me. Add the line EXTRA_CFLAGS += $(KBUILD_GCOV_FLAGS) to the respective Makefiles are more stable. For example, this line can be added into "linux/arch/x86/kernel/cpu/mcheck/Makefile" b. If you want to use gcov with kdump test driver, please install gcovdump tool(see section 3.4). The latest gcovdump can be gotten from cvs: http://ltp.cvs.sourceforge.net/viewvc/ltp/utils/analysis/gcov-kdump/. c. Linux kernel source source code should be put on the test machine. Its root directory should be specified in test configuration file via KSRC_DIR. d. In addition to bzImage and modules of Linux kernel should be installed on test machine, the vmlinux of Linux kernel should be put on test machine and specified via VMLINUX of test configuration file. e. Make sure gcov is available in your test system. It comes with gcc package normally. If kdump test driver is used, a tool named gcovdump is also needed to dump *.gcda from crash dump image. f. In test configuration file, make sure the following setting is available: # enable GCOV support GCOV=1 # kernel source is needed to get gcov graph KSRC_DIR= VMLINUX= g. After testing, *.c.gcov will be generated in test case result directory, such as results/kdump/soft-inj/non-panic/corrected/mce_64.c.gcov. h. To merge gcov graph data from several test cases, a tool named gcov_merge.py in tools sub-directory can be used. For example, tools/gcov_merge results/kdump/soft-inj/*/*/mce_64.c.gcov Will output merged gcov graph from all test cases under soft-inj. This can be used to check coverage of several test cases. 4.4 tool Some tools are provided to help analyze test result. - tools/grep_result.sh Grep from general test result (results//result) in terms of test case instead of line, because the result of one test case may span several line. Usage: cat results//result | tools/grep_result.sh Where are same as options available to /bin/grep. - tools/loop-mce-test Run mce test cases in a loop. It exits on failure of any one of the test cases. This script is using simple test driver. Usage: ./loop-mce-test Note that only simple test configure file can be used here. 5. Add test cases ----------------- 5.1 Add test case to in-package test class All in-package test classes use mce-inject software injection tool and follows same structure. The steps to add a new test case is as follow: a. Find an appropriate test case class to add your test case. b. Add a new mce-inject data file into to cases/soft-inj//data/. c. If the reference mcelog is different from mce-inject input data file, put that reference file into cases/soft-inj//refer/. d. In cases/soft-inj//cases.sh, there are shell commands "case" in shell functions get_result() and verify(). Add a branch in each shell command "case" for your test case. 5.2 Add test class To add a new test class, add a cases.sh under a sub-directory of cases/, and follow the test case class interface definition in section 2.1. The general result output format should follow that in section 2.4.