• Home
Name Date Size #Lines LOC

..--

aosp/include/03-May-2024-7,7884,457

fuzzers/03-May-2024-1,046773

testdata/03-May-2024-776457

Android.bpD03-May-20244.2 KiB179170

BUILD.gnD03-May-20245.7 KiB244224

DIR_METADATAD03-May-202453 43

LICENSED03-May-20241.5 KiB2827

METADATAD03-May-2024378 1513

MODULE_LICENSE_CHROMED03-May-20240

OWNERSD03-May-202499 125

README.mdD03-May-202412.1 KiB297226

abs32_utils.ccD03-May-20246.9 KiB212158

abs32_utils.hD03-May-20245.2 KiB14380

abs32_utils_unittest.ccD03-May-202422 KiB544447

address_translator.ccD03-May-20249.6 KiB259159

address_translator.hD03-May-20247.8 KiB20098

address_translator_unittest.ccD03-May-202422.6 KiB587463

algorithm.hD03-May-20245.3 KiB14881

algorithm_unittest.ccD03-May-202415.7 KiB348287

arm_utils.ccD03-May-202419.7 KiB598451

arm_utils.hD03-May-202419.4 KiB424244

arm_utils_unittest.ccD03-May-202436.3 KiB863629

binary_data_histogram.ccD03-May-20242.8 KiB9259

binary_data_histogram.hD03-May-20243.2 KiB9142

binary_data_histogram_unittest.ccD03-May-20245.2 KiB13394

buffer_sink.ccD03-May-2024340 124

buffer_sink.hD03-May-20242.3 KiB6942

buffer_sink_unittest.ccD03-May-20242.1 KiB7246

buffer_source.ccD03-May-20243.3 KiB10673

buffer_source.hD03-May-20245.2 KiB14272

buffer_source_unittest.ccD03-May-202413.4 KiB348281

buffer_view.hD03-May-20246.7 KiB218133

buffer_view_unittest.ccD03-May-20249.8 KiB299227

buildflags.hD03-May-2024428 158

crc32.ccD03-May-20241.1 KiB4426

crc32.hD03-May-2024478 187

crc32_unittest.ccD03-May-20241.4 KiB4822

disassembler.ccD03-May-20241.4 KiB5330

disassembler.hD03-May-20245.9 KiB15572

disassembler_dex.ccD03-May-202479 KiB1,8911,431

disassembler_dex.hD03-May-202412.1 KiB296236

disassembler_dex_unittest.ccD03-May-20241.6 KiB5229

disassembler_elf.ccD03-May-202428.7 KiB759553

disassembler_elf.hD03-May-202411.9 KiB357229

disassembler_elf_unittest.ccD03-May-20246.5 KiB178133

disassembler_no_op.ccD03-May-2024814 3218

disassembler_no_op.hD03-May-20241.2 KiB4225

disassembler_win32.ccD03-May-202414.5 KiB411285

disassembler_win32.hD03-May-20244.5 KiB13585

disassembler_ztf.ccD03-May-202423.1 KiB654480

disassembler_ztf.hD03-May-20246.8 KiB20491

disassembler_ztf_unittest.ccD03-May-202416.5 KiB403309

element_detection.ccD03-May-20246.2 KiB201162

element_detection.hD03-May-20242.1 KiB6332

element_detection_unittest.ccD03-May-20243.4 KiB10369

encoded_view.ccD03-May-20242.7 KiB7949

encoded_view.hD03-May-20245.4 KiB186112

encoded_view_unittest.ccD03-May-20246.5 KiB203168

ensemble_matcher.ccD03-May-20241.2 KiB3819

ensemble_matcher.hD03-May-20242.2 KiB6125

equivalence_map.ccD03-May-202420.1 KiB550421

equivalence_map.hD03-May-20249.1 KiB209102

equivalence_map_unittest.ccD03-May-202427.6 KiB636478

heuristic_ensemble_matcher.ccD03-May-202413.2 KiB370273

heuristic_ensemble_matcher.hD03-May-20241.4 KiB4019

image_index.ccD03-May-20242.6 KiB7952

image_index.hD03-May-20243.7 KiB11765

image_index_unittest.ccD03-May-20243.9 KiB132102

image_utils.hD03-May-20247.8 KiB226140

image_utils_unittest.ccD03-May-20241.1 KiB3423

imposed_ensemble_matcher.ccD03-May-20244.9 KiB144107

imposed_ensemble_matcher.hD03-May-20242.9 KiB8448

imposed_ensemble_matcher_unittest.ccD03-May-20247.8 KiB215156

integration_test.ccD03-May-20243.7 KiB10471

io_utils.ccD03-May-20241.3 KiB5333

io_utils.hD03-May-20244.1 KiB14590

io_utils_unittest.ccD03-May-20245.3 KiB161129

main_utils.ccD03-May-20248.6 KiB260189

main_utils.hD03-May-20241.2 KiB3511

mapped_file.ccD03-May-20242 KiB7054

mapped_file.hD03-May-20242.5 KiB8352

mapped_file_unittest.ccD03-May-20241.9 KiB6248

patch_read_write_unittest.ccD03-May-202426.3 KiB794604

patch_reader.ccD03-May-202413.6 KiB402306

patch_reader.hD03-May-20249.3 KiB286160

patch_utils.hD03-May-20244.3 KiB14090

patch_utils_unittest.ccD03-May-20245.2 KiB170122

patch_writer.ccD03-May-202410.1 KiB299217

patch_writer.hD03-May-20249.4 KiB273147

reference_bytes_mixer.ccD03-May-20245.9 KiB151109

reference_bytes_mixer.hD03-May-20244.9 KiB11945

reference_set.ccD03-May-20241.9 KiB6143

reference_set.hD03-May-20242.3 KiB6534

reference_set_unittest.ccD03-May-20241.6 KiB5033

rel32_finder.ccD03-May-202410.7 KiB295219

rel32_finder.hD03-May-20249.7 KiB285139

rel32_finder_unittest.ccD03-May-202430.9 KiB744575

rel32_utils.ccD03-May-20242.2 KiB6846

rel32_utils.hD03-May-20246.8 KiB185133

rel32_utils_unittest.ccD03-May-202421.8 KiB542427

reloc_elf.ccD03-May-20246 KiB164116

reloc_elf.hD03-May-20243.4 KiB10367

reloc_elf_unittest.ccD03-May-20249 KiB243182

reloc_win32.ccD03-May-20246.8 KiB197151

reloc_win32.hD03-May-20245.1 KiB14172

reloc_win32_unittest.ccD03-May-202410.4 KiB252199

suffix_array.hD03-May-202420.1 KiB476257

suffix_array_unittest.ccD03-May-202410.5 KiB343264

target_pool.ccD03-May-20242.8 KiB8559

target_pool.hD03-May-20242.9 KiB8239

target_pool_unittest.ccD03-May-20242.2 KiB6549

targets_affinity.ccD03-May-20244.1 KiB10980

targets_affinity.hD03-May-20242.9 KiB7532

targets_affinity_unittest.ccD03-May-20245.9 KiB13297

test_disassembler.ccD03-May-20242 KiB6244

test_disassembler.hD03-May-20242.7 KiB7853

test_reference_reader.ccD03-May-2024586 2111

test_reference_reader.hD03-May-2024868 3318

test_utils.ccD03-May-2024651 2717

test_utils.hD03-May-20241 KiB3619

type_dex.hD03-May-20248.5 KiB333251

type_elf.hD03-May-20246.6 KiB284234

type_win_pe.hD03-May-20245.7 KiB192146

type_ztf.hD03-May-20241.4 KiB5531

typed_value.hD03-May-20241.7 KiB5832

typed_value_unittest.ccD03-May-20241.1 KiB4127

version_info.hD03-May-20241.2 KiB318

zucchini.hD03-May-20242.7 KiB7335

zucchini_apply.ccD03-May-20248.4 KiB218182

zucchini_apply.hD03-May-20241.8 KiB4221

zucchini_apply_unittest.ccD03-May-2024384 155

zucchini_commands.ccD03-May-20245.6 KiB147123

zucchini_commands.hD03-May-20241.6 KiB5524

zucchini_exe_version.rc.versionD03-May-20241.3 KiB4743

zucchini_gen.ccD03-May-202419.3 KiB462351

zucchini_gen.hD03-May-20243.8 KiB8647

zucchini_gen_unittest.ccD03-May-20247.4 KiB182138

zucchini_integration.ccD03-May-20248.8 KiB238196

zucchini_integration.hD03-May-20243.6 KiB7931

zucchini_main.ccD03-May-20241.7 KiB5643

zucchini_main_aosp.ccD03-May-20242.2 KiB7041

zucchini_tools.ccD03-May-20244.8 KiB141115

zucchini_tools.hD03-May-20241.7 KiB4620

README.md

1
2## Basic Definitions for Patching
3
4**Binary**: Executable image and data. Binaries may persist in an archive
5(e.g., chrome.7z), and need to be periodically updated. Formats for binaries
6include {PE files EXE / DLL, ELF, DEX}. Architectures binaries include
7{x86, x64, ARM, AArch64, Dalvik}. A binary is also referred to as an executable
8or an image file.
9
10**Patching**: Sending a "new" file to clients who have an "old" file by
11computing and transmitting a "patch" that can be used to transform "old" into
12"new". Patches are compressed for transmission. A key performance metric is
13patch size, which refers to the size of compressed patch file. For our
14experiments we use 7z.
15
16**Patch generation**: Computation of a "patch" from "old" and "new". This can be
17expensive (e.g., ~15-20 min for Chrome, using 1 GB of RAM), but since patch
18generation is a run-once step on the server-side when releasing "new" binaries,
19the expense is not too critical.
20
21**Patch application**: Transformation from "old" binaries to "new", using a
22(downloaded) "patch". This is executed on client side on updates, so resource
23constraints (e.g., time, RAM, disk space) is more stringent. Also, fault-
24tolerance is important. This is usually achieved by an update system by having
25a fallback method of directly downloading "new" in case of patching failure.
26
27**Offset**: Position relative to the start of a file.
28
29**Local offset**: An offset relative to the start of a region of a file.
30
31**Element**: A region in a file with associated executable type, represented by
32the tuple (exe_type, offset, length). Every Element in new file is associated
33with an Element in old file and patched independently.
34
35**Reference**: A directed connection between two offsets in a binary. For
36example, consider jump instructions in x86:
37
38    00401000: E9 3D 00 00 00     jmp         00401042
39
40Here, the 4 bytes `[3D 00 00 00]` starting at address `00401001` point to
41address `00401042` in memory. This forms a reference from `offset(00401001)`
42(length 4) to `offset(00401042)`, where `offset(addr)` indicates the disk
43offset corresponding to `addr`. A reference has a location, length (implicitly
44determined by reference type), body, and target.
45
46**Location**: The starting offset of bytes that store a reference. In the
47preceding example, `offset(00401001)` is a location. Each location is the
48beginning of a reference body.
49
50**Body**: The span of bytes that encodes reference data, i.e.,
51[location, location + length) =
52[location, location + 1, ..., location + length - 1].
53In the preceding example, `length = 4`, so the reference body is
54`[00401001, 00401001 + 4) = [00401001, 00401002, 00401003, 00401004]`.
55All reference bodies in an image must not overlap, and often regions boundaries
56are required to not straddle a reference body.
57
58**Target**: The offset that's the destination of a reference. In the preceding
59example, `offset(00401042)` is the target. Different references can share common
60targets. For example, in
61
62    00401000: E9 3D 00 00 00     jmp         00401042
63    00401005: EB 3B              jmp         00401042
64
65we have two references with different locations and bodies, but same target
66of `00401042`.
67
68Because the bytes that encode a reference depend on its target, and potentially
69on its location, they are more likely to get modified from an old version of a
70binary to a newer version. This is why "naive" patching does not do well on
71binaries.
72
73**Target Key**: An alternative representation of a Target for a fixed pool, as its
74index in the sorted list of Target offsets. Keys are useful since:
75  * Their numerical index are smaller than offsets, allowing more efficient
76  storage of target correction data in patch.
77  * They simplify association from Targets to Labels.
78
79**Disassembler**: Architecture specific data and operations, used to extract and
80correct references in a binary.
81
82**Type of reference**: The type of a reference determines the binary
83representation used to encode its target. This affects how references are parsed
84and written by a disassembler. There can be many types of references in the same
85binary.
86
87A reference is represented by the tuple (disassembler, location, target, type).
88This tuple contains sufficient information to write the reference in a binary.
89
90**Pool of targets**: Collection of targets that is assumed to have some semantic
91relationship. Each reference type belong to exactly one reference pool. Targets
92for references in the same pool are shared.
93
94For example, the following describes two pools defined for Dalvik Executable
95format (DEX). Both pools spawn multiple types of references.
96
971. Index in string table.
98  - From bytecode to string index using 16 bits.
99  - From bytecode to string index using 32 bits.
100  - From field item to string index using 32 bits.
1012. Address in code.
102  - Relative 16 bits pointer.
103  - Relative 32 bits pointer.
104
105Boundaries between different pools can be ambiguous. Having all targets belong
106to the same pool can reduce redundancy, but will use more memory and might
107cause larger corrections to happen, so this is a trade-off that can be resolved
108with benchmarks.
109
110**Abs32 references**: References whose targets are adjusted by the OS during
111program load. In an image, a **relocation table** typically provides locations
112of abs32 references. At each abs32 location, the stored bytes then encode
113semantic information about the target (e.g., as RVA).
114
115**Rel32 references**: References embedded within machine code, in which targets
116are encoded as some delta relative to the reference's location. Typical examples
117of rel32 references are branching instructions and instruction pointer-relative
118memory access.
119
120**Equivalence**: A (src_offset, dst_offset, length) tuple describing a region of
121"old" binary, at an offset of |src_offset|, that is similar to a region of "new"
122binary, at an offset of |dst_offset|.
123
124**Raw delta unit**: Describes a raw modification to apply on the new image, as a
125pair (copy_offset, diff), where copy_offset describes the position in new file
126as an offset in the data that was copied from the old file, and diff is the
127bytewise difference to apply.
128
129**Associated Targets**: A target in "old" binary is associated with a target in
130"new" binary if both targets:
1311. are part of similar regions from the same equivalence, and
1322. have the same local offset (relative to respective start regions), and
1333. are not part of any larger region from a different equivalence.
134Not all targets are necessarily associated with another target.
135
136**Target Affinity**: Level of confidence in the association between two targets.
137The affinity between targets that are potentially associated is measured based
138on surrounding content, as well as reference type.
139
140**Label**: An integer assigned for each Target in "old" and "new" binary as part
141of generating a patch, and used to alias targets when searching for similar
142regions that will form equivalences. Labels are assigned such that
143associated targets in old and new binaries share the same Label. Unmatched
144Targets have a Label of 0. For example, given
145  * "Old" targets = [0x1111, 0x3333, 0x5555, 0x7777],
146  * "New" targets = [0x2222, 0x4444, 0x6666, 0x8888],
147to represent matchings 0x1111 <=> 0x6666,  0x3333 <=> 0x2222, we'd assign
148  * Label 1 to 0x1111 (in "old") and 0x6666 (in "new"),
149  * Label 2 to 0x3333 (in "old") and 0x2222 (in "new").
150 Represented as arrays indexed over Target Keys, we'd have:
151  * "Old" labels = [1, 2, 0 ,0],
152  * "New" labels = [2, 0, 1, 0].
153
154**Encoded Image**: The result of projecting the content of an image to scalar
155values that describe content on a higher level of abstraction, masking away
156undesirable noise in raw content. Notably, the projection encodes references
157based on their associated label.
158
159## Interfaces
160
161zucchini_lib: Core Zucchini library that operate on buffers to generate and
162apply patches.
163
164zucchini_io: Wrapper on zucchini_lib that handles file I/O, using memory-mapped
165I/O to interface with zucchini_lib.
166
167zucchini: Stand-alone executable that parses command-line arguments, and passes
168the results to zucchini_io. Also implements various helper flows.
169
170## Zucchini Ensemble Patch Format
171
172### Types
173
174**int8**: 8-bit unsigned int.
175
176**uint32**: 32-bit unsigned int, little-endian.
177
178**int32**:  32-bit signed int, little-endian.
179
180**Varints**: This is a generic variable-length encoding for integer quantities
181that strips away leading (most-significant) null bytes.
182The Varints format is borrowed from protocol-buffers, see
183[documentation](https://developers.google.com/protocol-buffers/docs/encoding#varints)
184for more info.
185
186**varuint32**: A uint32 encoded using Varints format.
187
188**varint32**: A int32 encoded using Varints format.
189
190### File Layout
191
192Name | Format | Description
193--- | --- | ---
194header | PatchHeader | The header.
195elements_count | uint32 | Number of patch units.
196elements | PatchElement[elements_count] | List of all patch elements.
197
198Position of elements in new file is ascending.
199
200### Structures
201
202**PatchHeader**
203
204Name | Format | Description
205--- | --- | ---
206magic | uint32 = kMagic | Magic value.
207major_version | uint16 | Major version number indicating breaking changes.
208minor_version | uint16 | Minor version number indicating possibly breaking changes.
209old_size | uint32 | Size of old file in bytes.
210old_crc | uint32 | CRC32 of old file.
211new_size | uint32 | Size of new file in bytes.
212new_crc | uint32 | CRC32 of new file.
213
214**kMagic** == `'Z' | ('u' << 8) | ('c' << 16) | ('c' << 24)`
215
216**PatchElement**
217Contains all the information required to produce a single element in new file.
218
219Name | Format | Description
220--- | --- | ---
221header | PatchElementHeader | The header.
222equivalences | EquivalenceList | List of equivalences.
223raw_deltas | RawDeltaList | List of raw deltas.
224reference_deltas | ReferenceDeltaList | List of reference deltas.
225pool_count | uint32 | Number of pools.
226extra_targets | ExtraTargetList[pool_count] | Lists of extra targets.
227
228**PatchElementHeader**
229Describes a correspondence between an element in old and in new files. Some
230redundancy arise from storing |new_offset|, but it is necessary to make
231PatchElement self contained.
232
233Name | Format | Description
234--- | --- | ---
235old_offset | uint32 | Starting offset of the element in old file.
236old_length | uint32 | Length of the element in old file.
237new_offset | uint32 | Starting offset of the element in new file.
238new_length | uint32 | Length of the element in new file.
239exe_type | uint32 | Executable type for this unit, see `enum ExecutableType`.
240version | uint16 | Version specific to the executable type for this unit.
241
242**EquivalenceList**
243Encodes a list of equivalences, where dst offsets (in new image) are ascending.
244
245Name | Format | Description
246--- | --- | ---
247src_skip | Buffer<varint32> | Src offset for each equivalence, delta encoded.
248dst_skip | Buffer<varuint32> | Dst offset for each equivalence, delta encoded.
249copy_count | Buffer<varuint32> | Length for each equivalence.
250
251**RawDeltaList**
252Encodes a list of raw delta units, with ascending copy offsets.
253
254Name | Format | Description
255--- | --- | ---
256raw_delta_skip | Buffer<varuint32> | Copy offset for each delta unit, delta encoded and biased by -1.
257raw_delta_diff | Buffer<int8> | Bytewise difference for each delta unit.
258
259**ReferenceDeltaList**
260Encodes a list of reference deltas, in the order they appear in the new
261image file. A reference delta is a signed integer representing a jump through a
262list of targets.
263
264Name | Format | Description
265--- | --- | ---
266reference_delta | Buffer<varuint32> | Vector of reference deltas.
267
268**ExtraTargetList**
269Encodes a list of additional targets in the new image file, in ascending
270order.
271
272Name | Format | Description
273--- | --- | ---
274pool_tag | uint8_t | Unique identifier for this pool of targets.
275extra_targets | Buffer<varuint32> | Additional targets, delta encoded and biased by -1.
276
277**Buffer<T>**
278A generic vector of data.
279
280Name | Format | Description
281--- | --- | ---
282size |uint32 | Size of content in bytes.
283content |T[] | List of integers.
284
285# Format Changelog
286All breaking changes to zucchini patch format will be documented in this
287section.
288
289The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
290
291## [Unreleased]
292
293## [1.0] - 2021-10-27
294### Added
295Major/Minor version is encoded in PatchHeader
296Disassembler version associated with an element version is encoded in PatchElementHeader.
297