1<html> 2 3<head> 4<meta http-equiv="Content-Language" content="en-us"> 5<meta name="GENERATOR" content="Microsoft FrontPage 5.0"> 6<meta name="ProgId" content="FrontPage.Editor.Document"> 7<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 8<title>Boost Filesystem Library Design</title> 9<link href="styles.css" rel="stylesheet"> 10</head> 11 12<body bgcolor="#FFFFFF"> 13 14<h1> 15<img border="0" src="../../../boost.png" align="center" width="277" height="86">Filesystem 16Library Design</h1> 17 18<p><a href="#Introduction">Introduction</a><br> 19<a href="#Requirements">Requirements</a><br> 20<a href="#Realities">Realities</a><br> 21<a href="#Rationale">Rationale</a><br> 22<a href="#Abandoned_Designs">Abandoned_Designs</a><br> 23<a href="#References">References</a></p> 24 25<h2><a name="Introduction">Introduction</a></h2> 26 27<p>The primary motivation for beginning work on the Filesystem Library was 28frustration with Boost administrative tools. Scripts were written in 29Python, Perl, Bash, and Windows command languages. There was no single 30scripting language familiar and acceptable to all Boost administrators. Yet they 31were all skilled C++ programmers - why couldn't C++ be used as the scripting 32language?</p> 33 34<p>The key feature C++ lacked for script-like applications was the ability to 35perform portable filesystem operations on directories and their contents. The 36Filesystem Library was developed to fill that void.</p> 37 38<p>The intent is not to compete with traditional scripting languages, but to 39provide a solution for situations where C++ is already the language 40of choice..</p> 41 42<h2><a name="Requirements">Requirements</a></h2> 43<ul> 44 <li>Be able to write portable script-style filesystem operations in modern 45 C++.<br> 46 <br> 47 Rationale: This is a common programming need. It is both an 48 embarrassment and a hardship that this is not possible with either the current 49 C++ or Boost libraries. The need is particularly acute 50 when C++ is the only toolset allowed in the tool chain. File system 51 operations are provided by many languages used on multiple platforms, 52 such as Perl and Python, as well as by many platform specific scripting 53 languages. All operating systems provide some form of API for filesystem 54 operations, and the POSIX bindings are increasingly available even on 55 operating systems not normally associated with POSIX, such as the Mac, z/OS, 56 or OS/390.<br> 57 </li> 58 <li>Work within the <a href="#Realities">realities</a> described below.<br> 59 <br> 60 Rationale: This isn't a research project. The need is for something that works on 61 today's platforms, including some of the embedded operating systems 62 with limited file systems. Because of the emphasis on portability, such a 63 library would be much more useful if standardized. That means being able to 64 work with a much wider range of platforms that just Unix or Windows and their 65 clones.<br> 66 </li> 67 <li>Avoid dangerous programming practices. Particularly, all-too-easy-to-ignore error notifications 68 and use of global variables. If a dangerous feature is provided, identify it as such.<br> 69 <br> 70 Rationale: Normally this would be covered by "the usual Boost requirements...", 71 but it is mentioned explicitly because the equivalent native platform and 72 scripting language interfaces often depend on all-too-easy-to-ignore error 73 notifications and global variables like "current 74 working directory".<br> 75 </li> 76 <li>Structure the library so that it is still useful even if some functionality 77 does not map well onto a given platform or directory tree. Particularly, much 78 useful functionality should be portable even to flat 79(non-hierarchical) filesystems.<br> 80 <br> 81 Rationale: Much functionality which does not 82 require a hierarchical directory structure is still useful on flat-structure 83 filesystems. There are many systems, particularly embedded systems, 84 where even very limited functionality is still useful.</li> 85</ul> 86<ul> 87 <li>Interface smoothly with current C++ Standard Library input/output 88 facilities. For example, paths should be 89 easy to use in std::basic_fstream constructors.<br> 90 <br> 91 Rationale: One of the most common uses of file system functionality is to 92 manipulate paths for eventual use in input/output operations. 93 Thus the need to interface smoothly with standard library I/O.<br> 94 </li> 95 <li>Suitable for eventual standardization. The implication of this requirement 96 is that the interface be close to minimal, and that great care be take 97 regarding portability.<br> 98 <br> 99 Rationale: The lack of file system operations is a serious hole 100 in the current standard, with no other known candidates to fill that hole. 101 Libraries with elaborate interfaces and difficult to port specifications are much less likely to be accepted for 102 standardization.<br> 103 </li> 104 <li>The usual Boost <a href="http://www.boost.org/more/lib_guide.htm">requirements and 105 guidelines</a> apply.<br> 106 </li> 107 <li>Encourage, but do not require, portability in path names.<br> 108 <br> 109 Rationale: For paths which originate from user input it is unreasonable to 110 require portable path syntax.<br> 111 </li> 112 <li>Avoid giving the illusion of portability where portability in fact does not 113 exist.<br> 114 <br> 115 Rationale: Leaving important behavior unspecified or "implementation defined" does a 116 great disservice to programmers using a library because it makes it appear 117 that code relying on the behavior is portable, when in fact there is nothing 118 portable about it. The only case where such under-specification is acceptable is when both users and implementors know from 119 other sources exactly what behavior is required, yet for some reason it isn't 120 possible to specify it exactly.</li> 121</ul> 122<h2><a name="Realities">Realities</a></h2> 123<ul> 124 <li>Some operating systems have a single directory tree root, others have 125 multiple roots.<br> 126 </li> 127 <li>Some file systems provide both a long and short form of filenames.<br> 128 </li> 129 <li>Some file systems have different syntax for file paths and directory 130 paths.<br> 131 </li> 132 <li>Some file systems have different rules for valid file names and valid 133 directory names.<br> 134 </li> 135 <li>Some file systems (ISO-9660, level 1, for example) use very restricted 136 (so-called 8.3) file names.<br> 137 </li> 138 <li>Some operating systems allow file systems with different 139 characteristics to be "mounted" within a directory tree. Thus an 140 ISO-9660 or Windows 141 file system may end up as a sub-tree of a POSIX directory tree.<br> 142 </li> 143 <li>Wide-character versions of directory and file operations are available on some operating 144 systems, and not available on others.<br> 145 </li> 146 <li>There is no law that says directory hierarchies have to be specified in 147 terms of left-to-right decent from the root.<br> 148 </li> 149 <li>Some file systems have a concept of file "version number" or "generation 150 number". Some don't.<br> 151 </li> 152 <li>Not all operating systems use single character separators in path names. Some use 153 paired notations. A typical fully-specified OpenVMS filename 154 might look something like this:<br> 155 <br> 156 <code> DISK$SCRATCH:[GEORGE.PROJECT1.DAT]BIG_DATA_FILE.NTP;5<br> 157 </code><br> 158 The general OpenVMS format is:<br> 159 <br> 160 161 <i>Device:[directories.dot.separated]filename.extension;version_number</i><br> 162 </li> 163 <li>For common file systems, determining if two descriptors are for same 164 entity is extremely difficult or impossible. For example, the concept of 165 equality can be different for each portion of a path - some portions may be 166 case or locale sensitive, others not. Case sensitivity is a property of the 167 pathname itself, and not the platform. Determining collating sequence is even 168 worse.<br> 169 </li> 170 <li>Race-conditions may occur. Directory trees, directories, files, and file attributes are in effect shared between all threads, processes, and computers which have access to the 171 filesystem. That may well include computers on the other side of the 172 world or in orbit around the world. This implies that file system operations 173 may fail in unexpected ways. For example:<br> 174 <br> 175 <code> assert( exists("foo") == exists("foo") ); 176 // may fail!<br> 177 assert( is_directory("foo") == is_directory("foo"); 178 // may fail!<br> 179 </code><br> 180 In the first example, the file may have been deleted between calls to 181 exists(). In the second example, the file may have been deleted and then 182 replaced by a directory of the same name between the calls to is_directory().<br> 183 </li> 184 <li>Even though an application may be portable, it still will have to traffic 185 in system specific paths occasionally; user provided input is a common 186 example.<br> 187 </li> 188 <li><a name="symbolic-link-use-case">Symbolic</a> links cause canonical and 189 normal form of some paths to represent different files or directories. For 190 example, given the directory hierarchy <code>/a/b/c</code>, with a symbolic 191 link in <code>/a</code> named <code>x</code> pointing to <code>b/c</code>, 192 then under POSIX Pathname Resolution rules a path of <code>"/a/x/.."</code> 193 should resolve to <code>"/a/b"</code>. If <code>"/a/x/.."</code> were first 194 normalized to <code>"/a"</code>, it would resolve incorrectly. (Case supplied 195 by Walter Landry.)</li> 196</ul> 197 198<h2><a name="Rationale">Rationale</a></h2> 199 200<p>The <a href="#Requirements">Requirements</a> and <a href="#Realities"> 201Realities</a> above drove much of the C++ interface design. In particular, 202the desire to make script-like code straightforward caused a great deal of 203effort to go into ensuring that apparently simple expressions like <i>exists( "foo" 204)</i> work as expected.</p> 205 206<p>See the <a href="faq.htm">FAQ</a> for the rationale behind many detailed 207design decisions.</p> 208 209<p>Several key insights went into the <i>path</i> class design:</p> 210<ul> 211 <li>Decoupling of the input formats, internal conceptual (<i>vector<string></i> 212 or other sequence) 213 model, and output formats.</li> 214 <li>Providing two input formats (generic and O/S specific) broke a major 215 design deadlock.</li> 216 <li>Providing several output formats solved another set of previously 217 intractable problems.</li> 218 <li>Several non-obvious functions (particularly decomposition and composition) 219 are required to support portable code. (Peter Dimov, Thomas Witt, Glen 220 Knowles, others.)</li> 221</ul> 222 223<p>Error checking was a particularly difficult area. One key insight was that 224with file and directory names, portability isn't a universal truth. 225Rather, the programmer must think out the question "What operating systems do I 226want this path to be portable to?" By providing support for several 227answers to that question, the Filesystem Library alerts programmers of the need 228to ask it in the first place.</p> 229<h2><a name="Abandoned_Designs">Abandoned Designs</a></h2> 230<h3>operations.hpp</h3> 231<p>Dietmar K�hl's original dir_it design and implementation supported 232wide-character file and directory names. It was abandoned after extensive 233discussions among Library Working Group members failed to identify portable 234semantics for wide-character names on systems not providing native support. See 235<a href="faq.htm#wide-character_names">FAQ</a>.</p> 236<p>Previous iterations of the interface design used explicitly named functions providing a 237large number of convenience operations, with no compile-time or run-time 238options. There were so many function names that they were very confusing to use, 239and the interface was much larger. Any benefits seemed theoretical rather than 240real. </p> 241<p>Designs based on compile time (rather than runtime) flag and option selection 242(via policy, enum, or int template parameters) became so complicated that they 243were abandoned, often after investing quite a bit of time and effort. The need 244to qualify attribute or option names with namespaces, even aliases, made use in 245template parameters ugly; that wasn't fully appreciated until actually writing 246real code.</p> 247<p>Yet another set of convenience functions ( for example, <i>remove</i> with 248permissive, prune, recurse, and other options, plus predicate, and possibly 249other, filtering features) were abandoned because the details became both 250complex and contentious.</p> 251 252<p>What is left is a toolkit of low-level operations from which the user can 253create more complex convenience operations, plus a very small number of 254convenience functions which were found to be useful enough to justify inclusion.</p> 255 256<h3>path.hpp</h3> 257 258<p>There were so many abandoned path designs, I've lost track. Policy-based 259class templates in several flavors, constructor supplied runtime policies, 260operation specific runtime policies, they were all considered, often 261implemented, and ultimately abandoned as far too complicated for any small 262benefits observed.</p> 263 264<p>Additional design considerations apply to <a href="v3_design.html">Internationalization</a>. </p> 265 266<h3>error checking</h3> 267 268<p>A number of designs for the error checking machinery were abandoned, some 269after experiments with implementations. Totally automatic error checking was 270attempted in particular. But automatic error checking tended to make the overall 271library design much more complicated.</p> 272 273<p>Some designs associated error checking mechanisms with paths. Some with 274operations functions. A policy-based error checking template design was 275partially implemented, then abandoned as too complicated for everyday 276script-like programs.</p> 277 278<p>The final design, which depends partially on explicit error checking function 279calls, is much simpler and straightforward, although it does depend to 280some extent on programmer discipline. But it should allow programmers who 281are concerned about portability to be reasonably sure that their programs will 282work correctly on their choice of target systems.</p> 283 284<h2><a name="References">References</a></h2> 285 286<table border="0" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%"> 287 <tr> 288 <td width="13%" valign="top">[<a name="IBM-01">IBM-01</a>]</td> 289 <td width="87%">IBM Corporation, <i>z/OS V1R3.0 C/C++ Run-Time 290Library Reference</i>, SA22-7821-02, 2001, 291<a href="http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/"> 292 www-1.ibm.com/servers/eserver/zseries/zos/bkserv/</a></td> 293 </tr> 294 <tr> 295 <td width="13%" valign="top">[<a name="ISO-9660">ISO-9660</a>]</td> 296 <td width="87%">International Standards Organization, 1988</td> 297 </tr> 298 <tr> 299 <td width="13%" valign="top">[<a name="Kuhn">Kuhn</a>]</td> 300 <td width="87%">UTF-8 and Unicode FAQ for Unix/Linux, 301<a href="http://www.cl.cam.ac.uk/~mgk25/unicode.html"> 302 www.cl.cam.ac.uk/~mgk25/unicode.html</a></td> 303 </tr> 304 <tr> 305 <td width="13%" valign="top">[<a name="MSDN">MSDN</a>] </td> 306 <td width="87%">Microsoft Platform SDK for Windows, Storage Start 307Page, 308<a href="http://msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp"> 309 msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp</a></td> 310 </tr> 311 <tr> 312 <td width="13%" valign="top">[<a name="POSIX-01">POSIX-01</a>]</td> 313 <td width="87%">IEEE Std 1003.1-2001, ISO/IEC 9945:2002, and The Open Group Base Specifications, Issue 6. Also known as The 314 Single Unix<font face="Times New Roman">� Specification, Version 3. 315 Available from each of the organizations involved in its creation. For 316 example, read online or download from 317 <a href="http://www.unix.org/single_unix_specification/"> 318 www.unix.org/single_unix_specification/</a>.</font> The ISO JTC1/SC22/WG15 - POSIX 319homepage is <a href="http://www.open-std.org/jtc1/sc22/WG15/"> 320 www.open-std.org/jtc1/sc22/WG15/</a></td> 321 </tr> 322 <tr> 323 <td width="13%" valign="top">[<a name="URI">URI</a>]</td> 324 <td width="87%">RFC-2396, Uniform Resource Identifiers (URI): Generic 325Syntax, <a href="http://www.ietf.org/rfc/rfc2396.txt"> 326 www.ietf.org/rfc/rfc2396.txt</a></td> 327 </tr> 328 <tr> 329 <td width="13%" valign="top">[<a name="UTF-16">UTF-16</a>]</td> 330 <td width="87%">Wikipedia, UTF-16, 331<a href="http://en.wikipedia.org/wiki/UTF-16"> 332 en.wikipedia.org/wiki/UTF-16</a></td> 333 </tr> 334 <tr> 335 <td width="13%" valign="top">[<a name="Wulf-Shaw-73">Wulf-Shaw-73</a>]</td> 336 <td width="87%">William Wulf, Mary Shaw, <i>Global 337Variable Considered Harmful</i>, ACM SIGPLAN Notices, 8, 2, 1973, pp. 23-34</td> 338 </tr> 339</table> 340 341<hr> 342<p>Revised 343<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->26 December, 2014<!--webbot bot="Timestamp" endspan i-checksum="38646" --></p> 344 345<p>© Copyright Beman Dawes, 2002</p> 346<p> Use, modification, and distribution are subject to the Boost Software 347License, Version 1.0. (See accompanying file <a href="../../../LICENSE_1_0.txt"> 348LICENSE_1_0.txt</a> or copy at <a href="http://www.boost.org/LICENSE_1_0.txt"> 349www.boost.org/LICENSE_1_0.txt</a>)</p> 350 351</body> 352 353</html> 354