1 2Date : 2004-Nov-26 3Author: Gerald Schaefer (geraldsc@de.ibm.com) 4 5 6 Linux API for read access to z/VM Monitor Records 7 ================================================= 8 9 10Description 11=========== 12This item delivers a new Linux API in the form of a misc char device that is 13usable from user space and allows read access to the z/VM Monitor Records 14collected by the *MONITOR System Service of z/VM. 15 16 17User Requirements 18================= 19The z/VM guest on which you want to access this API needs to be configured in 20order to allow IUCV connections to the *MONITOR service, i.e. it needs the 21IUCV *MONITOR statement in its user entry. If the monitor DCSS to be used is 22restricted (likely), you also need the NAMESAVE <DCSS NAME> statement. 23This item will use the IUCV device driver to access the z/VM services, so you 24need a kernel with IUCV support. You also need z/VM version 4.4 or 5.1. 25 26There are two options for being able to load the monitor DCSS (examples assume 27that the monitor DCSS begins at 144 MB and ends at 152 MB). You can query the 28location of the monitor DCSS with the Class E privileged CP command Q NSS MAP 29(the values BEGPAG and ENDPAG are given in units of 4K pages). 30 31See also "CP Command and Utility Reference" (SC24-6081-00) for more information 32on the DEF STOR and Q NSS MAP commands, as well as "Saved Segments Planning 33and Administration" (SC24-6116-00) for more information on DCSSes. 34 351st option: 36----------- 37You can use the CP command DEF STOR CONFIG to define a "memory hole" in your 38guest virtual storage around the address range of the DCSS. 39 40Example: DEF STOR CONFIG 0.140M 200M.200M 41 42This defines two blocks of storage, the first is 140MB in size an begins at 43address 0MB, the second is 200MB in size and begins at address 200MB, 44resulting in a total storage of 340MB. Note that the first block should 45always start at 0 and be at least 64MB in size. 46 472nd option: 48----------- 49Your guest virtual storage has to end below the starting address of the DCSS 50and you have to specify the "mem=" kernel parameter in your parmfile with a 51value greater than the ending address of the DCSS. 52 53Example: DEF STOR 140M 54 55This defines 140MB storage size for your guest, the parameter "mem=160M" is 56added to the parmfile. 57 58 59User Interface 60============== 61The char device is implemented as a kernel module named "monreader", 62which can be loaded via the modprobe command, or it can be compiled into the 63kernel instead. There is one optional module (or kernel) parameter, "mondcss", 64to specify the name of the monitor DCSS. If the module is compiled into the 65kernel, the kernel parameter "monreader.mondcss=<DCSS NAME>" can be specified 66in the parmfile. 67 68The default name for the DCSS is "MONDCSS" if none is specified. In case that 69there are other users already connected to the *MONITOR service (e.g. 70Performance Toolkit), the monitor DCSS is already defined and you have to use 71the same DCSS. The CP command Q MONITOR (Class E privileged) shows the name 72of the monitor DCSS, if already defined, and the users connected to the 73*MONITOR service. 74Refer to the "z/VM Performance" book (SC24-6109-00) on how to create a monitor 75DCSS if your z/VM doesn't have one already, you need Class E privileges to 76define and save a DCSS. 77 78Example: 79-------- 80modprobe monreader mondcss=MYDCSS 81 82This loads the module and sets the DCSS name to "MYDCSS". 83 84NOTE: 85----- 86This API provides no interface to control the *MONITOR service, e.g. specify 87which data should be collected. This can be done by the CP command MONITOR 88(Class E privileged), see "CP Command and Utility Reference". 89 90Device nodes with udev: 91----------------------- 92After loading the module, a char device will be created along with the device 93node /<udev directory>/monreader. 94 95Device nodes without udev: 96-------------------------- 97If your distribution does not support udev, a device node will not be created 98automatically and you have to create it manually after loading the module. 99Therefore you need to know the major and minor numbers of the device. These 100numbers can be found in /sys/class/misc/monreader/dev. 101Typing cat /sys/class/misc/monreader/dev will give an output of the form 102<major>:<minor>. The device node can be created via the mknod command, enter 103mknod <name> c <major> <minor>, where <name> is the name of the device node 104to be created. 105 106Example: 107-------- 108# modprobe monreader 109# cat /sys/class/misc/monreader/dev 11010:63 111# mknod /dev/monreader c 10 63 112 113This loads the module with the default monitor DCSS (MONDCSS) and creates a 114device node. 115 116File operations: 117---------------- 118The following file operations are supported: open, release, read, poll. 119There are two alternative methods for reading: either non-blocking read in 120conjunction with polling, or blocking read without polling. IOCTLs are not 121supported. 122 123Read: 124----- 125Reading from the device provides a 12 Byte monitor control element (MCE), 126followed by a set of one or more contiguous monitor records (similar to the 127output of the CMS utility MONWRITE without the 4K control blocks). The MCE 128contains information on the type of the following record set (sample/event 129data), the monitor domains contained within it and the start and end address 130of the record set in the monitor DCSS. The start and end address can be used 131to determine the size of the record set, the end address is the address of the 132last byte of data. The start address is needed to handle "end-of-frame" records 133correctly (domain 1, record 13), i.e. it can be used to determine the record 134start offset relative to a 4K page (frame) boundary. 135 136See "Appendix A: *MONITOR" in the "z/VM Performance" document for a description 137of the monitor control element layout. The layout of the monitor records can 138be found here (z/VM 5.1): http://www.vm.ibm.com/pubs/mon510/index.html 139 140The layout of the data stream provided by the monreader device is as follows: 141... 142<0 byte read> 143<first MCE> \ 144<first set of records> | 145... |- data set 146<last MCE> | 147<last set of records> / 148<0 byte read> 149... 150 151There may be more than one combination of MCE and corresponding record set 152within one data set and the end of each data set is indicated by a successful 153read with a return value of 0 (0 byte read). 154Any received data must be considered invalid until a complete set was 155read successfully, including the closing 0 byte read. Therefore you should 156always read the complete set into a buffer before processing the data. 157 158The maximum size of a data set can be as large as the size of the 159monitor DCSS, so design the buffer adequately or use dynamic memory allocation. 160The size of the monitor DCSS will be printed into syslog after loading the 161module. You can also use the (Class E privileged) CP command Q NSS MAP to 162list all available segments and information about them. 163 164As with most char devices, error conditions are indicated by returning a 165negative value for the number of bytes read. In this case, the errno variable 166indicates the error condition: 167 168EIO: reply failed, read data is invalid and the application 169 should discard the data read since the last successful read with 0 size. 170EFAULT: copy_to_user failed, read data is invalid and the application should 171 discard the data read since the last successful read with 0 size. 172EAGAIN: occurs on a non-blocking read if there is no data available at the 173 moment. There is no data missing or corrupted, just try again or rather 174 use polling for non-blocking reads. 175EOVERFLOW: message limit reached, the data read since the last successful 176 read with 0 size is valid but subsequent records may be missing. 177 178In the last case (EOVERFLOW) there may be missing data, in the first two cases 179(EIO, EFAULT) there will be missing data. It's up to the application if it will 180continue reading subsequent data or rather exit. 181 182Open: 183----- 184Only one user is allowed to open the char device. If it is already in use, the 185open function will fail (return a negative value) and set errno to EBUSY. 186The open function may also fail if an IUCV connection to the *MONITOR service 187cannot be established. In this case errno will be set to EIO and an error 188message with an IPUSER SEVER code will be printed into syslog. The IPUSER SEVER 189codes are described in the "z/VM Performance" book, Appendix A. 190 191NOTE: 192----- 193As soon as the device is opened, incoming messages will be accepted and they 194will account for the message limit, i.e. opening the device without reading 195from it will provoke the "message limit reached" error (EOVERFLOW error code) 196eventually. 197 198