1Introduction: 2 3The Flexible Filesystem Benchmark (FFSB) is a filesystem performance 4measurement tool. It is a multi-threaded application (using 5pthreads), written entirely in C with cross-platform portability in 6mind. It differs from other filesystem benchmarks in that the user 7may supply a profile to create custom workloads, while most other 8filesystem benchmarks use a fixed set of workloads. 9 10As of version 5.1, it supports seven different basic operations, support 11for multiple groups of threads with different operation mixtures, 12support for operation across multiple filesystems, and support for 13filesystem aging prior to benchmarking. 14 15 16Differences from version 4.0 and older: 17 18Version 5.0 and above represent almost a total re-write and many 19things have changed. In version 5.0 and above FFSB moved to a 20time-regulated run versus doing a set number of different operations 21and timing the whole thing. This is primarily to better deal with the 22use of multiple threadgroups which would otherwise not be synchronized 23at termination time. 24 25Additionally, the FFSB configuration file format has changed in 26version 5.0, although we do support old-style configuration files 27along with a run-time passed on the command line. In this mode, 28version 5.0 and above ignores the iterations parameter, and simply 29uses the time specified on the command line. 30 31Behaviorally, most of the old operations are the same -- sequential 32reads and sequential writes work as they did before. One change in 33version 5.0 is the skip-read behavior of reading then seeking forward 34a fixed amount then reading again is removed, we now support fully 35randomized reads and writes from random offsets within the file. 36 37Version 4.0 didn't support overwrites (only appends) so we interpret 38writes in old config files to be append operations. 39 40On Linux, CPU utilization information will only be accurate for 41systems using NPTL, older Linuxthreads systems will probably only see 42zeros for CPU utilization because Linuxthreads is non-compliant to 43POSIX. Version 4.0 and older could be recompiled to work on 44Linuxthreads, but in 5.0 and later we no longer support this. 45 46We no longer support the "outputfile" on the command line. 47 48One should simply use tee or similar to capture the output. FFSB 49unbuffers standard out for this purpose, and errors are sent on 50standard error. 51 52Global options: 53 54There are eight valid global options placed at the beginning of the 55profile. Three of them are required: num_filesystems (number of 56filesystems), num_threadgroups (number of threadgroups), and time 57(running time of the benchmark). The other five options are: 58 59directio - each call to open will be made using O_DIRECT 60alignio - aligns all block operations for random reads and writes 61 on 4k boundaries. 62bufferedio - currently ignored: it is intended to use libc 63 fread,rwrite, instead of just unix read and write calls 64verbose - currently ignored 65 66callout - calls and external command and waits for its termination 67 before FFSB begins the benchmark phase. 68 This is useful for synchronizing distributed clients, 69 starting profilers, etc. 70 71They must be specified in the above order (num_filesystems, 72num_threadgroups, time, directio, alignio, bufferedio, verbose, 73callout). 74 75 76 77Filesystems: 78 79Filesystems are specified to FFSB in the form of a directory. FFSB 80assumes that the filesystem is mounted at this directory and will not 81do any verification of this fact beyond ensuring it can read/write to 82the location. So be careful to ensure something with enough space to 83handle the dataset is in fact mounted at the specified location. 84 85In the filesystem clause of the profile, one may set the starting 86number of files and directories as well as a minimum and maximum 87filesize for the filesystem. One may also specify the blocksize 88used for creating the files separately in the filesystem clause. 89 90Also, if a filesystem is to be aged, a special threadgroup clause may 91be embedded in a filesystem clause to specify the operation mixture 92and number of threads used to age the filesystem. This threadgroup is 93run until filesystem utilization reaches the specified amount. 94 95Inheritance -- if you are using multiple filesystems, all attributes 96except the location should be inherited from the previous filesystem. 97This is done to make it easier to add groups of similar filesystems. 98In this case, only the location is required in the filesystem clause. 99 100As of version 5.1, filesystem re-use is supported if a given 101filesystem hasn't been modified beyond it's orginal specifications 102(number of files and directories is correct, and file sizes are within 103specifications). This can be a huge time saver if one wishes to do 104multiple runs on the same data-set without altering it during a run, 105because the fileset doesn't need to be recreated before each run. 106 107To do this, specify "reuse=1" in the filesystem clause, and FFSB will 108verify the fileset first, and if it checks out it will use it. 109Otherwise, it will remove everything and re-create the filesets for 110that filesystem. 111 112Threadgroups: 113 114An arbitrary number of threadgroups with differing numbers of threads 115and operation mixes can be specified. The operations are specified 116using a weighting for each operation, if an operation isn't specified 117it's weighting is assumed to be zero (not used). 118 119"Think-time" for a threadgroup may also be specified in millisecond 120amounts using the "op_delay" parameter, where every thread will wait 121for the specified amount between each operation. 122 123Operations: 124 125All operations begin by randomly selecting a filesystem from the list 126of filesystems specified in the profile. The distribution aims to be 127uniform across all filesystems. 128 129 130The seven operations are: 131 132reads - read() calls with an overall amount and a blocksize 133 operates on existing files. Care must be taken to ensure 134 that the read amount is smaller than the size of any possible 135 file. 136 137 If random_read is specified, then the each individual blocks 138 will be read starting from a random point with the file, and 139 this will continue until the entire amount specified has been 140 read. This offset of each random block will be totally 141 random to the byte level, unless the "alignio" global parameter 142 is on, and then the reads will be 4096 byte aligned. This is 143 generally recommended. 144 145 146readall - Very similar to read above, except it doesn't take an 147 amount; it simply reads the entire file sequentially using the 148 read_blocksize. This is useful for situations where 149 different filesystems have differently sized files, and sequential 150 read patterns across all filesystems are desired. 151 152writes - write() calls with an overall amount and blocksize 153 this is an overwrite operation and will not enlarge an existing 154 file, again one must be careful not to specify a write amount 155 that is larger than any possible file in the data set. 156 157 If random_write is specified, then the each individual blocks 158 will be written starting from a random point with the file, and 159 this will continue until the entire amount specified has been 160 written out. This offset of each random block will be totally 161 random to the byte level, unless the "alignio" global parameter 162 is on, and then the writes will be 4096 byte aligned. This 163 is generally recommended. 164 165 If the fsync_flag parameter for the threadgroup is non-zero, 166 then after all of the write calls are finished, fsync() will 167 be called on the file descriptor before the file is closed. 168 169 170creates - creates a file using open() call and determines the size 171 randomly between on the constraints (min_filesize and 172 max_filesize) for the selected filesystem. Write operations will 173 be done using the same blocksize as is specified for the 174 write operation. 175deletes - calls unlink() on a filename and removes it from the 176 internal data-structures. One must be careful to ensure 177 there are enough files to delete at all times or else the benchmark 178 will terminate. 179appends - calls write() using the append flag with an overall amount 180 and a blocksize to be appended onto a randomly chosen file. 181metas - this is actually a mix of several different directory 182 operations. Each "meta" operation consists of two directory 183 creates, one directory remove, and a directory rename. 184 These operations are all carried out separately from the 185 other 5 operations. 186 187Operation accounting: 188 189Each operation which uses a blocksize counts each read/write of a 190blocksize as an operation (reads,writes,creates, and appends) whereas 191deletes and metas are considered single operations. 192 193Running the benchmark: 194 195There are three phases to running the benchmark, aging, fileset 196creates, and the benchmark phase. 197 198The create phase is carried out across all filesystems simultaneously 199with one dedicated thread per filesystem. 200 201After the create phase, sync() is called to ensure all dirty data gets 202written out before the benchmark phase begins, and sync() is again 203called at the end of the benchmark phase. The time in sync() at the 204end of the benchmark phase is counted as part of the benchmark phase. 205 206Caveats/Holes/Bugs: 207 208Aging and aging across multiple filesystems simultaneously hasn't been tested 209very much. 210 211If *any* i/o operation or system call/libc call fails, the benchmark 212will terminate immediately. 213 214The parser doesn't handle mal-formed or incorrect profiles very well 215(or at all). 216 217The parser doesn't check to make sure all of the appropriate options 218have been specified. For example, if writes are specified in a 219threadgroup but write_blocksize isn't specified, the parse won't catch 220it, but the benchmark run will fail later on. 221 222 223Configuration Files (new style): 224 225New Style Configuration allows for arbitrary newlines between lines, 226and comments using '#' at the start of a line. Also it allows tabs, 227whitespace before and after configuration parameters. 228 229The new style configuration file is broken up into three main parts: 230 231global parameters, filesystems, and threadgroups 232 233The sections must be in the above order. 234 235Global parameters: 236 237Global parameters are described above, the first three are always 238required. Example: 239 240---------- 241 242num_filesystems=1 243num_threadgroups=1 244time=30 # time is in seconds 245 246directio=0 # don't use direct io 247alignio=1 # align random IOs to 4k 248bufferedio=0 # this does nothing right now 249verbose=0 # this does nothing right now 250 251 # calls and external command and waits 252 # everything until the newline is taken 253 # so you can have abritrary parmeters 254callout=synchronize.sh myhostname 255 256--------- 257 258All of these must appear in this order, though you can leave out the 259optional ones. 260 261Filesystems: 262 263Filesystems describe different logical sets of files residing in 264different directorys. There is no strict requirement that they 265actually be on different filesystems, only that the directory 266specified already exists. 267 268Filesystems are specified by a clause with a filesystem number like 269this: 270 271[filesystem0] 272 location=/mnt/testing/ 273 num_files=10 274 num_dirs=1 275 max_filesize=4096 276 min_filesize=4096 277[end0] 278 279 280The clause must always begin with [filesystemX] and end with [endX] 281where X is the number of that filesystem. 282 283You should start wiht X = 0, and increment by one for each following 284filesystem. If they are out of order, things will likely break. 285 286The required information for each filesystem is: location, num_files, 287num_dirs, max_filesize, and min_filesize. Beyond those the following 288four options are supported: 289 290 291 292reuse=1 # check the filesystem to see if it is reusable 293 294 # filesystem aging, three components required 295 # takes agefs=1 to turn it on 296 # then a valid threadgroup specification 297 # then a desired utilization percentage 298 299agefs=1 # age the filesystem according to the following threadgroup 300 [threadgroup0] 301 num_threads=10 302 write_size=40960 303 write_blocksize=4096 304 create_weight=10 305 append_weight=10 306 delete_weight=1 307 [end0] 308desired_util=0.20 # In this case, age until the fs is 20% full 309 310create_blocksize=4096 # specify the blocksize to write() 311 # for creating the fileset, defaults to 4096 312 313age_blocksize=4096 # specify the blocksize to write() for aging 314 315 316Also, to allow lazy people to use lots of filesystems, we support 317filesystem inheritance, which simply copies all options but the 318location from the previous filesystem clause if nothing is specified. 319Obviously, this doesn't work for filesystem0. (May not work for aging 320either?) 321 322Full blown filesystem clause example: 323 324---- 325 326[filesystem0] 327 328 # required parts 329 330 location=/home/sonny/tmp 331 num_files=100 332 num_dirs=100 333 max_filesize=65536 334 min_filesize=4096 335 336 # aging part 337 agefs=0 338 [threadgroup0] 339 num_threads=10 340 write_size=40960 341 write_blocksize=4096 342 create_weight=10 343 append_weight=10 344 delete_weight=1 345 [end0] 346 desired_util=0.02 # age until 2% full 347 348 # other optional commands 349 350 create_blocksize=1024 # use a small create blocksize 351 age_blocksize=1024 # and smaller age create blocksize 352 reuse=0 # don't reuse it 353[end0] 354 355 356 357-- 358 359Threadgroups: 360 361Threadgropus are very similar to filesystems in that any number of 362them can be specified in clauses, and they must be in order starting 363with threadgroup0. 364 365Example: 366 367--- 368 369[threadgroup0] 370 num_threads=32 371 read_weight=4 372 append_weight=1 373 374 write_size=4096 375 write_blocksize=4096 376 377 read_size=4096 378 read_blocksize=4096 379[end0] 380 381--- 382 383In a threadgroup clause, num_threads is required and must be at least 3841. Then, at least one operation must be given a weight greater than 0 385to be a valid threadgroup. Operations can be given a weighting of 0, 386and in this case they are ignored. 387 388Certain operations will also require other commands, for example, if 389read_weight is greater than zero, then one must also include a 390read_size and a read_blocksize. Here's the table of requirements and 391options: 392 393 394Operation Requirements Options 395-- -- -- 396read_weight read_size, read_blocksize read_random 397readall_weight read_blocksize none 398write_weight write_size, write_blocksize write_random,fsync_file 399create_weight write_blocksize or create_blocksize none 400append_weight write_blocksize, write_size none 401delete_weight none none 402meta_weight none none 403 404 405 406Other threadgroup options: 407 408op_delay=10 # specify a wait between operations in milli-seconds 409 410bindfs=3 # This allows you to restrict a threadgroup's operation 411 # to a specific filesystem number. Currently only 412 # binding to one specific filesystem is supported 413 414