1<h1>Courgette Internals</h1> 2 3<h2>Patch Generation</h2> 4 5<p><img src="generation.png" alt="Patch Generation" title="" /></p> 6 7<ul> 8<li><p>courgette_tool.cc:GenerateEnsemblePatch kicks off the patch 9generation by calling ensemble_create.cc:GenerateEnsemblePatch</p></li> 10<li><p>The files are read in by in courgette:SourceStream objects</p></li> 11<li><p>ensemble_create.cc:GenerateEnsemblePatch uses FindGenerators, which 12uses MakeGenerator to create 13patch_generator_x86_32.h:PatchGeneratorX86_32 classes.</p></li> 14<li><p>PatchGeneratorX86_32's Transform method transforms the input file 15using Courgette's core techniques that make the bsdiff delta 16smaller. The steps it takes are the following:</p> 17 18<ul> 19<li><p><em>disassemble</em> the old and new binaries into AssemblyProgram 20objects,</p></li> 21<li><p><em>adjust</em> the new AssemblyProgram object, and</p></li> 22<li><p><em>encode</em> the AssemblyProgram object back into raw bytes.</p></li> 23</ul></li> 24</ul> 25 26<h3>Disassemble</h3> 27 28<ul> 29<li><p>The input is a pointer to a buffer containing the raw bytes of the 30input file.</p></li> 31<li><p>Disassembly converts certain machine instructions that reference 32addresses to Courgette instructions. It is not actually 33disassembly, but this is the term the code-base uses. Specifically, 34it detects instructions that use absolute addresses given by the 35binary file's relocation table, and relative addresses used in 36relative branches.</p></li> 37<li><p>Done by disassemble:ParseDetectedExecutable, which selects the 38appropriate Disassembler subclass by looking at the binary file's 39headers.</p> 40 41<ul> 42<li><p>disassembler_win32_x86.h defines the PE/COFF x86 disassembler</p></li> 43<li><p>disassembler_elf_32_x86.h defines the ELF 32-bit x86 disassembler</p></li> 44<li><p>disassembler_elf_32_arm.h defines the ELF 32-bit arm disassembler</p></li> 45</ul></li> 46<li><p>The Disassembler replaces the relocation table with a Courgette 47instruction that can regenerate the relocation table.</p></li> 48<li><p>The Disassembler builds a list of addresses referenced by the 49machine code, numbering each one.</p></li> 50<li><p>The Disassembler replaces and address used in machine instructions 51with its index number.</p></li> 52<li><p>The output is an assembly_program.h:AssemblyProgram class, which 53contains a list of instructions, machine or Courgette, and a mapping 54of indices to actual addresses.</p></li> 55</ul> 56 57<h3>Adjust</h3> 58 59<ul> 60<li><p>This step takes the AssemblyProgram for the old file and reassigns 61the indices that map to actual addresses. It is performed by 62adjustment_method.cc:Adjust().</p></li> 63<li><p>The goal is the match the indices from the old program to the new 64program as closely as possible.</p></li> 65<li><p>When matched correctly, machine instructions that jump to the 66function in both the new and old binary will look the same to 67bsdiff, even the function is located in a different part of the 68binary.</p></li> 69</ul> 70 71<h3>Encode</h3> 72 73<ul> 74<li><p>This step takes an AssemblyProgram object and encodes both the 75instructions and the mapping of indices to addresses as byte 76vectors. This format can be written to a file directly, and is also 77more appropriate for bsdiffing. It is done by 78AssemblyProgram.Encode().</p></li> 79<li><p>encoded_program.h:EncodedProgram defines the binary format and a 80WriteTo method that writes to a file.</p></li> 81</ul> 82 83<h3>bsdiff</h3> 84 85<ul> 86<li>simple_delta.c:GenerateSimpleDelta</li> 87</ul> 88 89<h2>Patch Application</h2> 90 91<p><img src="application.png" alt="Patch Application" title="" /></p> 92 93<ul> 94<li><p>courgette_tool.cc:ApplyEnsemblePatch kicks off the patch generation 95by calling ensemble_apply.cc:ApplyEnsemblePatch</p></li> 96<li><p>ensemble_create.cc:ApplyEnsemblePatch, reads and verifies the 97patch's header, then calls the overloaded version of 98ensemble_create.cc:ApplyEnsemblePatch.</p></li> 99<li><p>The patch is read into an ensemble<em>apply.cc:EnsemblePatchApplication 100object, which generates a set of patcher</em>x86<em>32.h:PatcherX86</em>32 101objects for the sections in the patch.</p></li> 102<li><p>The original file is disassembled and encoded via a call 103EnsemblePatchApplication.TransformUp, which in turn call 104patcher<em>x86</em>32.h:PatcherX86_32.Transform.</p></li> 105<li><p>The transformed file is then bspatched via 106EnsemblePatchApplication.SubpatchTransformedElements, which calls 107EnsemblePatchApplication.SubpatchStreamSets, which calls 108simple_delta.cc:ApplySimpleDelta, Courgette's built-in 109implementation of bspatch.</p></li> 110<li><p>Finally, EnsemblePatchApplication.TransformDown assembles, i.e., 111reverses the encoding and disassembly, on the patched binary data. 112This is done by calling PatcherX86<em>32.Reform, which in turn calls 113the global function encoded</em>program.cc:Assemble, which calls 114EncodedProgram.AssembleTo.</p></li> 115</ul> 116 117<h2>Glossary</h2> 118 119<p><strong>Adjust</strong>: Reassign address indices in the new program to match more 120 closely those from the old.</p> 121 122<p><strong>Assembly program</strong>: The output of <em>disassembly</em>. Contains a list of 123 <em>Courgette instructions</em> and an index of branch target addresses.</p> 124 125<p><strong>Assemble</strong>: Convert an <em>assembly program</em> back into an object file 126 by evaluating the <em>Courgette instructions</em> and leaving the machine 127 instructions in place.</p> 128 129<p><strong>Courgette instruction</strong>: Replaces machine instructions in the 130 program. Courgette instructions replace branches with an index to 131 the target addresses and replace part of the relocation table.</p> 132 133<p><strong>Disassembler</strong>: Takes a binary file and produces an <em>assembly 134 program</em>.</p> 135 136<p><strong>Encode</strong>: Convert an <em>assembly program</em> into an <em>encoded program</em> by 137 serializing its data structures into byte vectors more appropriate 138 for storage in a file.</p> 139 140<p><strong>Encoded Program</strong>: The output of encoding.</p> 141 142<p><strong>Ensemble</strong>: A Courgette-style patch containing sections for the list 143 of branch addresses, the encoded program. It supports patching 144 multiple object files at once.</p> 145 146<p><strong>Opcode</strong>: The number corresponding to either a machine or <em>Courgette 147 instruction</em>.</p> 148