1================= 2DataFlowSanitizer 3================= 4 5.. toctree:: 6 :hidden: 7 8 DataFlowSanitizerDesign 9 10.. contents:: 11 :local: 12 13Introduction 14============ 15 16DataFlowSanitizer is a generalised dynamic data flow analysis. 17 18Unlike other Sanitizer tools, this tool is not designed to detect a 19specific class of bugs on its own. Instead, it provides a generic 20dynamic data flow analysis framework to be used by clients to help 21detect application-specific issues within their own code. 22 23Usage 24===== 25 26With no program changes, applying DataFlowSanitizer to a program 27will not alter its behavior. To use DataFlowSanitizer, the program 28uses API functions to apply tags to data to cause it to be tracked, and to 29check the tag of a specific data item. DataFlowSanitizer manages 30the propagation of tags through the program according to its data flow. 31 32The APIs are defined in the header file ``sanitizer/dfsan_interface.h``. 33For further information about each function, please refer to the header 34file. 35 36ABI List 37-------- 38 39DataFlowSanitizer uses a list of functions known as an ABI list to decide 40whether a call to a specific function should use the operating system's native 41ABI or whether it should use a variant of this ABI that also propagates labels 42through function parameters and return values. The ABI list file also controls 43how labels are propagated in the former case. DataFlowSanitizer comes with a 44default ABI list which is intended to eventually cover the glibc library on 45Linux but it may become necessary for users to extend the ABI list in cases 46where a particular library or function cannot be instrumented (e.g. because 47it is implemented in assembly or another language which DataFlowSanitizer does 48not support) or a function is called from a library or function which cannot 49be instrumented. 50 51DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`. 52The pass treats every function in the ``uninstrumented`` category in the 53ABI list file as conforming to the native ABI. Unless the ABI list contains 54additional categories for those functions, a call to one of those functions 55will produce a warning message, as the labelling behavior of the function 56is unknown. The other supported categories are ``discard``, ``functional`` 57and ``custom``. 58 59* ``discard`` -- To the extent that this function writes to (user-accessible) 60 memory, it also updates labels in shadow memory (this condition is trivially 61 satisfied for functions which do not write to user-accessible memory). Its 62 return value is unlabelled. 63* ``functional`` -- Like ``discard``, except that the label of its return value 64 is the union of the label of its arguments. 65* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F`` 66 is called, where ``F`` is the name of the function. This function may wrap 67 the original function or provide its own implementation. This category is 68 generally used for uninstrumentable functions which write to user-accessible 69 memory or which have more complex label propagation behavior. The signature 70 of ``__dfsw_F`` is based on that of ``F`` with each argument having a 71 label of type ``dfsan_label`` appended to the argument list. If ``F`` 72 is of non-void return type a final argument of type ``dfsan_label *`` 73 is appended to which the custom function can store the label for the 74 return value. For example: 75 76.. code-block:: c++ 77 78 void f(int x); 79 void __dfsw_f(int x, dfsan_label x_label); 80 81 void *memcpy(void *dest, const void *src, size_t n); 82 void *__dfsw_memcpy(void *dest, const void *src, size_t n, 83 dfsan_label dest_label, dfsan_label src_label, 84 dfsan_label n_label, dfsan_label *ret_label); 85 86If a function defined in the translation unit being compiled belongs to the 87``uninstrumented`` category, it will be compiled so as to conform to the 88native ABI. Its arguments will be assumed to be unlabelled, but it will 89propagate labels in shadow memory. 90 91For example: 92 93.. code-block:: none 94 95 # main is called by the C runtime using the native ABI. 96 fun:main=uninstrumented 97 fun:main=discard 98 99 # malloc only writes to its internal data structures, not user-accessible memory. 100 fun:malloc=uninstrumented 101 fun:malloc=discard 102 103 # tolower is a pure function. 104 fun:tolower=uninstrumented 105 fun:tolower=functional 106 107 # memcpy needs to copy the shadow from the source to the destination region. 108 # This is done in a custom function. 109 fun:memcpy=uninstrumented 110 fun:memcpy=custom 111 112Example 113======= 114 115The following program demonstrates label propagation by checking that 116the correct labels are propagated. 117 118.. code-block:: c++ 119 120 #include <sanitizer/dfsan_interface.h> 121 #include <assert.h> 122 123 int main(void) { 124 int i = 1; 125 dfsan_label i_label = dfsan_create_label("i", 0); 126 dfsan_set_label(i_label, &i, sizeof(i)); 127 128 int j = 2; 129 dfsan_label j_label = dfsan_create_label("j", 0); 130 dfsan_set_label(j_label, &j, sizeof(j)); 131 132 int k = 3; 133 dfsan_label k_label = dfsan_create_label("k", 0); 134 dfsan_set_label(k_label, &k, sizeof(k)); 135 136 dfsan_label ij_label = dfsan_get_label(i + j); 137 assert(dfsan_has_label(ij_label, i_label)); 138 assert(dfsan_has_label(ij_label, j_label)); 139 assert(!dfsan_has_label(ij_label, k_label)); 140 141 dfsan_label ijk_label = dfsan_get_label(i + j + k); 142 assert(dfsan_has_label(ijk_label, i_label)); 143 assert(dfsan_has_label(ijk_label, j_label)); 144 assert(dfsan_has_label(ijk_label, k_label)); 145 146 return 0; 147 } 148 149Current status 150============== 151 152DataFlowSanitizer is a work in progress, currently under development for 153x86\_64 Linux. 154 155Design 156====== 157 158Please refer to the :doc:`design document<DataFlowSanitizerDesign>`. 159