1## fdsan
2
3[TOC]
4
5fdsan is a file descriptor sanitizer added to Android in API level 29.
6
7### Background
8*What problem is fdsan trying to solve? Why should I care?*
9
10fdsan (file descriptor sanitizer) detects mishandling of file descriptor ownership, which tend to manifest as *use-after-close* and *double-close*. These errors are direct analogues of the memory allocation *use-after-free* and *double-free* bugs, but tend to be much more difficult to diagnose and fix. With `malloc` and `free`, implementations have free reign to detect errors and abort on double free. File descriptors, on the other hand, are mandated by the POSIX standard to be allocated with the lowest available number being returned for new allocations. As a result, many file descriptor bugs can *never* be noticed on the thread on which the error occurred, and will manifest as "impossible" behavior on another thread.
11
12For example, given two threads running the following code:
13```cpp
14void thread_one() {
15    int fd = open("/dev/null", O_RDONLY);
16    close(fd);
17    close(fd);
18}
19
20void thread_two() {
21    while (true) {
22        int fd = open("log", O_WRONLY | O_APPEND);
23        if (write(fd, "foo", 3) != 3) {
24            err(1, "write failed!");
25        }
26    }
27}
28```
29the following interleaving is possible:
30```cpp
31thread one                                thread two
32open("/dev/null", O_RDONLY) = 123
33close(123) = 0
34                                          open("log", O_WRONLY | APPEND) = 123
35close(123) = 0
36                                          write(123, "foo", 3) = -1 (EBADF)
37                                          err(1, "write failed!")
38```
39
40Assertion failures are probably the most innocuous result that can arise from these bugs: silent data corruption [[1](#footnotes), [2](#footnotes)] or security vulnerabilities are also possible (e.g. suppose thread two was saving user data to disk when a third thread came in and opened a socket to the Internet).
41
42### Design
43*What does fdsan do?*
44
45fdsan attempts to detect and/or prevent file descriptor mismanagement by enforcing file descriptor ownership. Like how most memory allocations can have their ownership handled by types such as `std::unique_ptr`, almost all file descriptors can be associated with a unique owner which is responsible for their closure. fdsan provides functions to associate a file descriptor with an owner; if someone tries to close a file descriptor that they don't own, depending on configuration, either a warning is emitted, or the process aborts.
46
47The way this is implemented is by providing functions to set a 64-bit closure tag on a file descriptor. The tag consists of an 8-bit type byte that identifies the type of the owner (`enum android_fdan_owner_type` in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/master/libc/include/android/fdsan.h)), and a 56-bit value. The value should ideally be something that uniquely identifies the object (object address for native objects and `System.identityHashCode` for Java objects), but in cases where it's hard to derive an identifier for the "owner" that should close a file descriptor, even using the same value for all file descriptors in the module can be useful, since it'll catch other code that closes your file descriptors.
48
49If a file descriptor that's been marked with a tag is closed with an incorrect tag, or without a tag, we know something has gone wrong, and can generate diagnostics or abort.
50
51### Enabling fdsan (as a user)
52*How do I use fdsan?*
53
54fdsan has four severity levels:
55 - disabled (`ANDROID_FDSAN_ERROR_LEVEL_DISABLED`)
56 - warn-once (`ANDROID_FDSAN_ERROR_LEVEL_WARN_ONCE`)
57   - Upon detecting an error, emit a warning to logcat, generate a tombstone, and then continue execution with fdsan disabled.
58 - warn-always (`ANDROID_FDSAN_ERROR_LEVEL_WARN_ALWAYS`)
59   - Same as warn-once, except without disabling after the first warning.
60 - fatal (`ANDROID_FDSAN_ERROR_LEVEL_FATAL`)
61   - Abort upon detecting an error.
62
63In Android Q, fdsan has a global default of warn-once. fdsan can be made more or less strict at runtime via the `android_fdsan_set_error_level` function in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/master/libc/include/android/fdsan.h).
64
65The likelihood of fdsan catching a file descriptor error is proportional to the percentage of file descriptors in your process that are tagged with an owner.
66
67### Using fdsan to fix a bug
68*No, really, how do I use fdsan?*
69
70Let's look at a simple contrived example that uses sleeps to force a particular interleaving of thread execution.
71
72```cpp
73#include <err.h>
74#include <unistd.h>
75
76#include <chrono>
77#include <thread>
78#include <vector>
79
80#include <android-base/unique_fd.h>
81
82using namespace std::chrono_literals;
83using std::this_thread::sleep_for;
84
85void victim() {
86  sleep_for(300ms);
87  int fd = dup(STDOUT_FILENO);
88  sleep_for(200ms);
89  ssize_t rc = write(fd, "good\n", 5);
90  if (rc == -1) {
91    err(1, "good failed to write?!");
92  }
93  close(fd);
94}
95
96void bystander() {
97  sleep_for(100ms);
98  int fd = dup(STDOUT_FILENO);
99  sleep_for(300ms);
100  close(fd);
101}
102
103void offender() {
104  int fd = dup(STDOUT_FILENO);
105  close(fd);
106  sleep_for(200ms);
107  close(fd);
108}
109
110int main() {
111  std::vector<std::thread> threads;
112  for (auto function : { victim, bystander, offender }) {
113    threads.emplace_back(function);
114  }
115  for (auto& thread : threads) {
116    thread.join();
117  }
118}
119```
120
121When running the program, the threads' executions will be interleaved as follows:
122
123```cpp
124// victim                         bystander                       offender
125                                                                  int fd = dup(1); // 3
126                                                                  close(3);
127                                  int fd = dup(1); // 3
128                                                                  close(3);
129int fd = dup(1); // 3
130                                  close(3);
131write(3, "good\n") = ��;
132```
133
134which results in the following output:
135
136    fdsan_test: good failed to write?!: Bad file descriptor
137
138This implies that either we're accidentally closing out file descriptor too early, or someone else is helpfully closing it for us. Let's use `android::base::unique_fd` in `victim` to guard the file descriptor with fdsan:
139
140```diff
141--- a/fdsan_test.cpp
142+++ b/fdsan_test.cpp
143@@ -12,13 +12,12 @@ using std::this_thread::sleep_for;
144
145 void victim() {
146   sleep_for(200ms);
147-  int fd = dup(STDOUT_FILENO);
148+  android::base::unique_fd fd(dup(STDOUT_FILENO));
149   sleep_for(200ms);
150   ssize_t rc = write(fd, "good\n", 5);
151   if (rc == -1) {
152     err(1, "good failed to write?!");
153   }
154-  close(fd);
155 }
156
157 void bystander() {
158```
159
160Now that we've guarded the file descriptor with fdsan, we should be able to find where the double close is:
161
162```
163pid: 25587, tid: 25589, name: fdsan_test  >>> fdsan_test <<<
164signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr --------
165Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by unique_fd 0x7bf15dc448'
166    x0  0000000000000000  x1  00000000000063f5  x2  0000000000000023  x3  0000007bf14de338
167    x4  0000007bf14de3b8  x5  3463643531666237  x6  3463643531666237  x7  3834346364353166
168    x8  00000000000000f0  x9  0000000000000000  x10 0000000000000059  x11 0000000000000035
169    x12 0000007bf1bebcfa  x13 0000007bf14ddf0a  x14 0000007bf14ddf0a  x15 0000000000000000
170    x16 0000007bf1c33048  x17 0000007bf1ba9990  x18 0000000000000000  x19 00000000000063f3
171    x20 00000000000063f5  x21 0000007bf14de588  x22 0000007bf1f1b864  x23 0000000000000001
172    x24 0000007bf14de130  x25 0000007bf13e1000  x26 0000007bf1f1f580  x27 0000005ab43ab8f0
173    x28 0000000000000000  x29 0000007bf14de400
174    sp  0000007bf14ddff0  lr  0000007bf1b5fd6c  pc  0000007bf1b5fd90
175
176backtrace:
177    #00 pc 0000000000008d90  /system/lib64/libc.so (fdsan_error(char const*, ...)+384)
178    #01 pc 0000000000008ba8  /system/lib64/libc.so (android_fdsan_close_with_tag+632)
179    #02 pc 00000000000092a0  /system/lib64/libc.so (close+16)
180    #03 pc 00000000000003e4  /system/bin/fdsan_test (bystander()+84)
181    #04 pc 0000000000000918  /system/bin/fdsan_test
182    #05 pc 000000000006689c  /system/lib64/libc.so (__pthread_start(void*)+36)
183    #06 pc 000000000000712c  /system/lib64/libc.so (__start_thread+68)
184```
185
186...in the obviously correct bystander? What's going on here?
187
188The reason for this is (hopefully!) not a bug in fdsan, and will commonly be seen when tracking down double-closes in processes that have sparse fdsan coverage. What actually happened is that the culprit closed `bystander`'s file descriptor between its open and close, which resulted in `bystander` being blamed for closing `victim`'s fd. If we store `bystander`'s fd in a `unique_fd` as well, we should get something more useful:
189```diff
190--- a/tmp/fdsan_test.cpp
191+++ b/tmp/fdsan_test.cpp
192@@ -23,9 +23,8 @@ void victim() {
193
194 void bystander() {
195   sleep_for(100ms);
196-  int fd = dup(STDOUT_FILENO);
197+  android::base::unique_fd fd(dup(STDOUT_FILENO));
198   sleep_for(200ms);
199-  close(fd);
200 }
201```
202giving us:
203```
204pid: 25779, tid: 25782, name: fdsan_test  >>> fdsan_test <<<
205signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr --------
206Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by unique_fd 0x6fef9ff448'
207    x0  0000000000000000  x1  00000000000064b6  x2  0000000000000023  x3  0000006fef901338
208    x4  0000006fef9013b8  x5  3466663966656636  x6  3466663966656636  x7  3834346666396665
209    x8  00000000000000f0  x9  0000000000000000  x10 0000000000000059  x11 0000000000000039
210    x12 0000006ff0055cfa  x13 0000006fef900f0a  x14 0000006fef900f0a  x15 0000000000000000
211    x16 0000006ff009d048  x17 0000006ff0013990  x18 0000000000000000  x19 00000000000064b3
212    x20 00000000000064b6  x21 0000006fef901588  x22 0000006ff04ff864  x23 0000000000000001
213    x24 0000006fef901130  x25 0000006fef804000  x26 0000006ff0503580  x27 0000006368aa18f8
214    x28 0000000000000000  x29 0000006fef901400
215    sp  0000006fef900ff0  lr  0000006feffc9d6c  pc  0000006feffc9d90
216
217backtrace:
218    #00 pc 0000000000008d90  /system/lib64/libc.so (fdsan_error(char const*, ...)+384)
219    #01 pc 0000000000008ba8  /system/lib64/libc.so (android_fdsan_close_with_tag+632)
220    #02 pc 00000000000092a0  /system/lib64/libc.so (close+16)
221    #03 pc 000000000000045c  /system/bin/fdsan_test (offender()+68)
222    #04 pc 0000000000000920  /system/bin/fdsan_test
223    #05 pc 000000000006689c  /system/lib64/libc.so (__pthread_start(void*)+36)
224    #06 pc 000000000000712c  /system/lib64/libc.so (__start_thread+68)
225```
226
227Hooray!
228
229In a real application, things are probably not going to be as detectable or reproducible as our toy example, which is a good reason to try to maximize the usage of fdsan-enabled types like `unique_fd` and `ParcelFileDescriptor`, to improve the odds that double closes in other code get detected.
230
231### Enabling fdsan (as a C++ library implementer)
232
233fdsan operates via two main primitives. `android_fdsan_exchange_owner_tag` modifies a file descriptor's close tag, and `android_fdsan_close_with_tag` closes a file descriptor with its tag. In the `<android/fdsan.h>` header, these are marked with `__attribute__((weak))`, so instead of passing down the platform version from JNI, availability of the functions can be queried directly. An example implementation of unique_fd follows:
234
235```cpp
236/*
237 * Copyright (C) 2018 The Android Open Source Project
238 * All rights reserved.
239 *
240 * Redistribution and use in source and binary forms, with or without
241 * modification, are permitted provided that the following conditions
242 * are met:
243 *  * Redistributions of source code must retain the above copyright
244 *    notice, this list of conditions and the following disclaimer.
245 *  * Redistributions in binary form must reproduce the above copyright
246 *    notice, this list of conditions and the following disclaimer in
247 *    the documentation and/or other materials provided with the
248 *    distribution.
249 *
250 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
251 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
252 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
253 * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
254 * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
255 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
256 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
257 * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
258 * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
259 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
260 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
261 * SUCH DAMAGE.
262 */
263
264#pragma once
265
266#include <android/fdsan.h>
267#include <unistd.h>
268
269#include <utility>
270
271struct unique_fd {
272    unique_fd() = default;
273
274    explicit unique_fd(int fd) {
275        reset(fd);
276    }
277
278    unique_fd(const unique_fd& copy) = delete;
279    unique_fd(unique_fd&& move) {
280        *this = std::move(move);
281    }
282
283    ~unique_fd() {
284        reset();
285    }
286
287    unique_fd& operator=(const unique_fd& copy) = delete;
288    unique_fd& operator=(unique_fd&& move) {
289        if (this == &move) {
290            return *this;
291        }
292
293        reset();
294
295        if (move.fd_ != -1) {
296            fd_ = move.fd_;
297            move.fd_ = -1;
298
299            // Acquire ownership from the moved-from object.
300            exchange_tag(fd_, move.tag(), tag());
301        }
302
303        return *this;
304    }
305
306    int get() { return fd_; }
307
308    int release() {
309        if (fd_ == -1) {
310            return -1;
311        }
312
313        int fd = fd_;
314        fd_ = -1;
315
316        // Release ownership.
317        exchange_tag(fd, tag(), 0);
318        return fd;
319    }
320
321    void reset(int new_fd = -1) {
322        if (fd_ != -1) {
323            close(fd_, tag());
324            fd_ = -1;
325        }
326
327        if (new_fd != -1) {
328            fd_ = new_fd;
329
330            // Acquire ownership of the presumably unowned fd.
331            exchange_tag(fd_, 0, tag());
332        }
333    }
334
335  private:
336    int fd_ = -1;
337
338    // The obvious choice of tag to use is the address of the object.
339    uint64_t tag() {
340        return reinterpret_cast<uint64_t>(this);
341    }
342
343    // These functions are marked with __attribute__((weak)), so that their
344    // availability can be determined at runtime. These wrappers will use them
345    // if available, and fall back to no-ops or regular close on pre-Q devices.
346    static void exchange_tag(int fd, uint64_t old_tag, uint64_t new_tag) {
347        if (android_fdsan_exchange_owner_tag) {
348            android_fdsan_exchange_owner_tag(fd, old_tag, new_tag);
349        }
350    }
351
352    static int close(int fd, uint64_t tag) {
353        if (android_fdsan_close_with_tag) {
354            return android_fdsan_close_with_tag(fd, tag);
355        } else {
356            return ::close(fd);
357        }
358    }
359};
360```
361
362### Frequently seen bugs
363 * Native APIs not making it clear when they take ownership of a file descriptor. <br/>
364   * Solution: accept `unique_fd` instead of `int` in functions that take ownership.
365   * [Example one](https://android-review.googlesource.com/c/platform/system/core/+/721985), [two](https://android-review.googlesource.com/c/platform/frameworks/native/+/709451)
366 * Receiving a `ParcelFileDescriptor` via Intent, and then passing it into JNI code that ends up calling close on it. <br/>
367   * Solution: ¯\\\_(ツ)\_/¯. Use fdsan?
368   * [Example one](https://android-review.googlesource.com/c/platform/system/bt/+/710104), [two](https://android-review.googlesource.com/c/platform/frameworks/base/+/732305)
369
370### Footnotes
3711. [How To Corrupt An SQLite Database File](https://www.sqlite.org/howtocorrupt.html#_continuing_to_use_a_file_descriptor_after_it_has_been_closed)
372
3732. [<b><i>50%</i></b> of Facebook's iOS crashes caused by a file descriptor double close leading to SQLite database corruption](https://code.fb.com/ios/debugging-file-corruption-on-ios/)
374