1## fdsan 2 3[TOC] 4 5### Background 6*What problem is fdsan trying to solve? Why should I care?* 7 8fdsan (file descriptor sanitizer) detects mishandling of file descriptor ownership, which tend to manifest as *use-after-close* and *double-close*. These errors are direct analogues of the memory allocation *use-after-free* and *double-free* bugs, but tend to be much more difficult to diagnose and fix. With `malloc` and `free`, implementations have free reign to detect errors and abort on double free. File descriptors, on the other hand, are mandated by the POSIX standard to be allocated with the lowest available number being returned for new allocations. As a result, many file descriptor bugs can *never* be noticed on the thread on which the error occurred, and will manifest as "impossible" behavior on another thread. 9 10For example, given two threads running the following code: 11```cpp 12void thread_one() { 13 int fd = open("/dev/null", O_RDONLY); 14 close(fd); 15 close(fd); 16} 17 18void thread_two() { 19 while (true) { 20 int fd = open("log", O_WRONLY | O_APPEND); 21 if (write(fd, "foo", 3) != 3) { 22 err(1, "write failed!"); 23 } 24 } 25} 26``` 27the following interleaving is possible: 28```cpp 29thread one thread two 30open("/dev/null", O_RDONLY) = 123 31close(123) = 0 32 open("log", O_WRONLY | APPEND) = 123 33close(123) = 0 34 write(123, "foo", 3) = -1 (EBADF) 35 err(1, "write failed!") 36``` 37 38Assertion failures are probably the most innocuous result that can arise from these bugs: silent data corruption [[1](#footnotes), [2](#footnotes)] or security vulnerabilities are also possible (e.g. suppose thread two was saving user data to disk when a third thread came in and opened a socket to the Internet). 39 40### Design 41*What does fdsan do?* 42 43fdsan attempts to detect and/or prevent file descriptor mismanagement by enforcing file descriptor ownership. Like how most memory allocations can have their ownership handled by types such as `std::unique_ptr`, almost all file descriptors can be associated with a unique owner which is responsible for their closure. fdsan provides functions to associate a file descriptor with an owner; if someone tries to close a file descriptor that they don't own, depending on configuration, either a warning is emitted, or the process aborts. 44 45The way this is implemented is by providing functions to set a 64-bit closure tag on a file descriptor. The tag consists of an 8-bit type byte that identifies the type of the owner (`enum android_fdan_owner_type` in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/master/libc/include/android/fdsan.h)), and a 56-bit value. The value should ideally be something that uniquely identifies the object (object address for native objects and `System.identityHashCode` for Java objects), but in cases where it's hard to derive an identifier for the "owner" that should close a file descriptor, even using the same value for all file descriptors in the module can be useful, since it'll catch other code that closes your file descriptors. 46 47If a file descriptor that's been marked with a tag is closed with an incorrect tag, or without a tag, we know something has gone wrong, and can generate diagnostics or abort. 48 49### Enabling fdsan (as a user) 50*How do I use fdsan?* 51 52fdsan has four severity levels: 53 - disabled (`ANDROID_FDSAN_ERROR_LEVEL_DISABLED`) 54 - warn-once (`ANDROID_FDSAN_ERROR_LEVEL_WARN_ONCE`) 55 - Upon detecting an error, emit a warning to logcat, generate a tombstone, and then continue execution with fdsan disabled. 56 - warn-always (`ANDROID_FDSAN_ERROR_LEVEL_WARN_ALWAYS`) 57 - Same as warn-once, except without disabling after the first warning. 58 - fatal (`ANDROID_FDSAN_ERROR_LEVEL_FATAL`) 59 - Abort upon detecting an error. 60 61In Android Q, fdsan has a global default of warn-once. fdsan can be made more or less strict at runtime via the `android_fdsan_set_error_level` function in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/master/libc/include/android/fdsan.h). 62 63The likelihood of fdsan catching a file descriptor error is proportional to the percentage of file descriptors in your process that are tagged with an owner. 64 65### Using fdsan to fix a bug 66*No, really, how do I use fdsan?* 67 68Let's look at a simple contrived example that uses sleeps to force a particular interleaving of thread execution. 69 70```cpp 71#include <err.h> 72#include <unistd.h> 73 74#include <chrono> 75#include <thread> 76#include <vector> 77 78#include <android-base/unique_fd.h> 79 80using namespace std::chrono_literals; 81using std::this_thread::sleep_for; 82 83void victim() { 84 sleep_for(300ms); 85 int fd = dup(STDOUT_FILENO); 86 sleep_for(200ms); 87 ssize_t rc = write(fd, "good\n", 5); 88 if (rc == -1) { 89 err(1, "good failed to write?!"); 90 } 91 close(fd); 92} 93 94void bystander() { 95 sleep_for(100ms); 96 int fd = dup(STDOUT_FILENO); 97 sleep_for(300ms); 98 close(fd); 99} 100 101void offender() { 102 int fd = dup(STDOUT_FILENO); 103 close(fd); 104 sleep_for(200ms); 105 close(fd); 106} 107 108int main() { 109 std::vector<std::thread> threads; 110 for (auto function : { victim, bystander, offender }) { 111 threads.emplace_back(function); 112 } 113 for (auto& thread : threads) { 114 thread.join(); 115 } 116} 117``` 118 119When running the program, the threads' executions will be interleaved as follows: 120 121```cpp 122// victim bystander offender 123 int fd = dup(1); // 3 124 close(3); 125 int fd = dup(1); // 3 126 close(3); 127int fd = dup(1); // 3 128 close(3); 129write(3, "good\n") = ; 130``` 131 132which results in the following output: 133 134 fdsan_test: good failed to write?!: Bad file descriptor 135 136This implies that either we're accidentally closing out file descriptor too early, or someone else is helpfully closing it for us. Let's use `android::base::unique_fd` in `victim` to guard the file descriptor with fdsan: 137 138```diff 139--- a/fdsan_test.cpp 140+++ b/fdsan_test.cpp 141@@ -12,13 +12,12 @@ using std::this_thread::sleep_for; 142 143 void victim() { 144 sleep_for(200ms); 145- int fd = dup(STDOUT_FILENO); 146+ android::base::unique_fd fd(dup(STDOUT_FILENO)); 147 sleep_for(200ms); 148 ssize_t rc = write(fd, "good\n", 5); 149 if (rc == -1) { 150 err(1, "good failed to write?!"); 151 } 152- close(fd); 153 } 154 155 void bystander() { 156``` 157 158Now that we've guarded the file descriptor with fdsan, we should be able to find where the double close is: 159 160``` 161pid: 25587, tid: 25589, name: fdsan_test >>> fdsan_test <<< 162signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr -------- 163Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by unique_fd 0x7bf15dc448' 164 x0 0000000000000000 x1 00000000000063f5 x2 0000000000000023 x3 0000007bf14de338 165 x4 0000007bf14de3b8 x5 3463643531666237 x6 3463643531666237 x7 3834346364353166 166 x8 00000000000000f0 x9 0000000000000000 x10 0000000000000059 x11 0000000000000035 167 x12 0000007bf1bebcfa x13 0000007bf14ddf0a x14 0000007bf14ddf0a x15 0000000000000000 168 x16 0000007bf1c33048 x17 0000007bf1ba9990 x18 0000000000000000 x19 00000000000063f3 169 x20 00000000000063f5 x21 0000007bf14de588 x22 0000007bf1f1b864 x23 0000000000000001 170 x24 0000007bf14de130 x25 0000007bf13e1000 x26 0000007bf1f1f580 x27 0000005ab43ab8f0 171 x28 0000000000000000 x29 0000007bf14de400 172 sp 0000007bf14ddff0 lr 0000007bf1b5fd6c pc 0000007bf1b5fd90 173 174backtrace: 175 #00 pc 0000000000008d90 /system/lib64/libc.so (fdsan_error(char const*, ...)+384) 176 #01 pc 0000000000008ba8 /system/lib64/libc.so (android_fdsan_close_with_tag+632) 177 #02 pc 00000000000092a0 /system/lib64/libc.so (close+16) 178 #03 pc 00000000000003e4 /system/bin/fdsan_test (bystander()+84) 179 #04 pc 0000000000000918 /system/bin/fdsan_test 180 #05 pc 000000000006689c /system/lib64/libc.so (__pthread_start(void*)+36) 181 #06 pc 000000000000712c /system/lib64/libc.so (__start_thread+68) 182``` 183 184...in the obviously correct bystander? What's going on here? 185 186The reason for this is (hopefully!) not a bug in fdsan, and will commonly be seen when tracking down double-closes in processes that have sparse fdsan coverage. What actually happened is that the culprit closed `bystander`'s file descriptor between its open and close, which resulted in `bystander` being blamed for closing `victim`'s fd. If we store `bystander`'s fd in a `unique_fd` as well, we should get something more useful: 187```diff 188--- a/tmp/fdsan_test.cpp 189+++ b/tmp/fdsan_test.cpp 190@@ -23,9 +23,8 @@ void victim() { 191 192 void bystander() { 193 sleep_for(100ms); 194- int fd = dup(STDOUT_FILENO); 195+ android::base::unique_fd fd(dup(STDOUT_FILENO)); 196 sleep_for(200ms); 197- close(fd); 198 } 199``` 200giving us: 201``` 202pid: 25779, tid: 25782, name: fdsan_test >>> fdsan_test <<< 203signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr -------- 204Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by unique_fd 0x6fef9ff448' 205 x0 0000000000000000 x1 00000000000064b6 x2 0000000000000023 x3 0000006fef901338 206 x4 0000006fef9013b8 x5 3466663966656636 x6 3466663966656636 x7 3834346666396665 207 x8 00000000000000f0 x9 0000000000000000 x10 0000000000000059 x11 0000000000000039 208 x12 0000006ff0055cfa x13 0000006fef900f0a x14 0000006fef900f0a x15 0000000000000000 209 x16 0000006ff009d048 x17 0000006ff0013990 x18 0000000000000000 x19 00000000000064b3 210 x20 00000000000064b6 x21 0000006fef901588 x22 0000006ff04ff864 x23 0000000000000001 211 x24 0000006fef901130 x25 0000006fef804000 x26 0000006ff0503580 x27 0000006368aa18f8 212 x28 0000000000000000 x29 0000006fef901400 213 sp 0000006fef900ff0 lr 0000006feffc9d6c pc 0000006feffc9d90 214 215backtrace: 216 #00 pc 0000000000008d90 /system/lib64/libc.so (fdsan_error(char const*, ...)+384) 217 #01 pc 0000000000008ba8 /system/lib64/libc.so (android_fdsan_close_with_tag+632) 218 #02 pc 00000000000092a0 /system/lib64/libc.so (close+16) 219 #03 pc 000000000000045c /system/bin/fdsan_test (offender()+68) 220 #04 pc 0000000000000920 /system/bin/fdsan_test 221 #05 pc 000000000006689c /system/lib64/libc.so (__pthread_start(void*)+36) 222 #06 pc 000000000000712c /system/lib64/libc.so (__start_thread+68) 223``` 224 225Hooray! 226 227In a real application, things are probably not going to be as detectable or reproducible as our toy example, which is a good reason to try to maximize the usage of fdsan-enabled types like `unique_fd` and `ParcelFileDescriptor`, to improve the odds that double closes in other code get detected. 228 229### Enabling fdsan (as a C++ library implementer) 230 231fdsan operates via two main primitives. `android_fdsan_exchange_owner_tag` modifies a file descriptor's close tag, and `android_fdsan_close_with_tag` closes a file descriptor with its tag. In the `<android/fdsan.h>` header, these are marked with `__attribute__((weak))`, so instead of passing down the platform version from JNI, availability of the functions can be queried directly. An example implementation of unique_fd follows: 232 233```cpp 234/* 235 * Copyright (C) 2018 The Android Open Source Project 236 * All rights reserved. 237 * 238 * Redistribution and use in source and binary forms, with or without 239 * modification, are permitted provided that the following conditions 240 * are met: 241 * * Redistributions of source code must retain the above copyright 242 * notice, this list of conditions and the following disclaimer. 243 * * Redistributions in binary form must reproduce the above copyright 244 * notice, this list of conditions and the following disclaimer in 245 * the documentation and/or other materials provided with the 246 * distribution. 247 * 248 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 249 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 250 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 251 * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 252 * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 253 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, 254 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 255 * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 256 * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 257 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 258 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 259 * SUCH DAMAGE. 260 */ 261 262#pragma once 263 264#include <android/fdsan.h> 265#include <unistd.h> 266 267#include <utility> 268 269struct unique_fd { 270 unique_fd() = default; 271 272 explicit unique_fd(int fd) { 273 reset(fd); 274 } 275 276 unique_fd(const unique_fd& copy) = delete; 277 unique_fd(unique_fd&& move) { 278 *this = std::move(move); 279 } 280 281 ~unique_fd() { 282 reset(); 283 } 284 285 unique_fd& operator=(const unique_fd& copy) = delete; 286 unique_fd& operator=(unique_fd&& move) { 287 if (this == &move) { 288 return *this; 289 } 290 291 reset(); 292 293 if (move.fd_ != -1) { 294 fd_ = move.fd_; 295 move.fd_ = -1; 296 297 // Acquire ownership from the moved-from object. 298 exchange_tag(fd_, move.tag(), tag()); 299 } 300 301 return *this; 302 } 303 304 int get() { return fd_; } 305 306 int release() { 307 if (fd_ == -1) { 308 return -1; 309 } 310 311 int fd = fd_; 312 fd_ = -1; 313 314 // Release ownership. 315 exchange_tag(fd, tag(), 0); 316 return fd; 317 } 318 319 void reset(int new_fd = -1) { 320 if (fd_ != -1) { 321 close(fd_, tag()); 322 fd_ = -1; 323 } 324 325 if (new_fd != -1) { 326 fd_ = new_fd; 327 328 // Acquire ownership of the presumably unowned fd. 329 exchange_tag(fd_, 0, tag()); 330 } 331 } 332 333 private: 334 int fd_ = -1; 335 336 // The obvious choice of tag to use is the address of the object. 337 uint64_t tag() { 338 return reinterpret_cast<uint64_t>(this); 339 } 340 341 // These functions are marked with __attribute__((weak)), so that their 342 // availability can be determined at runtime. These wrappers will use them 343 // if available, and fall back to no-ops or regular close on pre-Q devices. 344 static void exchange_tag(int fd, uint64_t old_tag, uint64_t new_tag) { 345 if (android_fdsan_exchange_owner_tag) { 346 android_fdsan_exchange_owner_tag(fd, old_tag, new_tag); 347 } 348 } 349 350 static int close(int fd, uint64_t tag) { 351 if (android_fdsan_close_with_tag) { 352 return android_fdsan_close_with_tag(fd, tag); 353 } else { 354 return ::close(fd); 355 } 356 } 357}; 358``` 359 360### Frequently seen bugs 361 * Native APIs not making it clear when they take ownership of a file descriptor. <br/> 362 * Solution: accept `unique_fd` instead of `int` in functions that take ownership. 363 * [Example one](https://android-review.googlesource.com/c/platform/system/core/+/721985), [two](https://android-review.googlesource.com/c/platform/frameworks/native/+/709451) 364 * Receiving a `ParcelFileDescriptor` via Intent, and then passing it into JNI code that ends up calling close on it. <br/> 365 * Solution: ¯\\\_(ツ)\_/¯. Use fdsan? 366 * [Example one](https://android-review.googlesource.com/c/platform/system/bt/+/710104), [two](https://android-review.googlesource.com/c/platform/frameworks/base/+/732305) 367 368### Footnotes 3691. [How To Corrupt An SQLite Database File](https://www.sqlite.org/howtocorrupt.html#_continuing_to_use_a_file_descriptor_after_it_has_been_closed) 370 3712. [<b><i>50%</i></b> of Facebook's iOS crashes caused by a file descriptor double close leading to SQLite database corruption](https://code.fb.com/ios/debugging-file-corruption-on-ios/) 372