1 2----------------------------------------------------------------------------- 3Info about the relationship between Segments and SegInfos 4----------------------------------------------------------------------------- 5 6SegInfo is from the very original Valgrind code, and so it predates 7Segments. It's poorly named now; its really just a container for all 8the object file metadata (symbols, debug info, etc). 9 10Segments describe memory mapped into the address space, and so any 11address-space chaging operation needs to update the Segment structure. 12After the process is initalized, this means one of: 13 14 * mmap 15 * munmap 16 * mprotect 17 * brk 18 * stack growth 19 20A piece of address space may or may not be mmaped from a file. 21 22A SegInfo specifically describes memory mmaped from an ELF object file. 23Because a single ELF file may be mmaped with multiple Segments, multiple 24Segments can point to one Seginfo. A SegInfo can relate to a memory 25range which is not yet mmaped. For example, if the process mmaps the 26first page of an ELF file (the one containing the header), a SegInfo 27will be created for that ELF file's mappings, which will include memory 28which will be later mmaped by the client's ELF loader. If a new mmap 29appears in the address range of an existing SegInfo, it will have that 30SegInfo attached to it, presumably because its part of a .so file. 31Similarly, if a Segment gets split (by mprotect, for example), the two 32pieces will still be associated with the same SegInfo. For this reason, 33the address/length info in a SegInfo is not a duplicate of the Segment 34address/length. 35 36This is complex for several reasons: 37 38 1. We assume that if a process is mmaping a file which contains an 39 ELF header, it intends to use it as an ELF object. If a program 40 which just mmaps ELF files but just uses it as raw data (copy, for 41 example), we still treat it as a shared-library opening. 42 2. Even if it is being loaded as a shared library/other ELF object, 43 Valgrind doesn't control the mmaps. It just observes the mmaps 44 being generated by the client and has to cope. One of the reasons 45 that Valgrind has to make its own mmap of each .so for reading 46 symtab information is because the client won't necessary mmap the 47 right pieces, or do so in the wrong order for us. 48 49SegInfos are reference counted, and freed when no Segments point to them any 50more. 51 52> Aha. So the range of a SegInfo will always be equal to or greater 53> than the range of its parent Segment? Or can you eg. mmap a whole 54> file plus some extra pages, and then the SegInfo won't cover the extra 55> part of the range? 56 57That would be unusual, but possible. You could imagine ld generating an 58ELF file via a mapping this way (which would probably upset Valgrind no 59end). 60 61----------------------------------------------------------------------------- 62More from John Reiser 63----------------------------------------------------------------------------- 64> Can a Segment get split (eg. by mprotect)? 65 66This happens when a debugger inserts a breakpoint, or when ld-linux 67relocates a module that has DT_TEXTREL, or when a co-resident monitor 68rewrites some instructions. On x86, a shared lib with relocations to 69.text "works" just fine. The modified pages are no longer sharable, 70but the instruction stream is functional. It's even rather common, 71when a builder forgets to use -fpic for one or more files. It 72can be done on purpose when the modularity is more important than 73the page sharing. Non-pic code is faster, too: register %ebx is 74not dedicated to _GLOBAL_OFFSET_TABLE_ addressing, and global variables 75can be accessed by [relocated] inline 32-bit offset rather than by 76address fetched from the GOT. 77 78> Can a new mmap appear in the address range of an existing SegInfo? 79 80On x86_64 the static linker ld inserts a 1MB "hole" between .text 81and .data. This is on advice from the hardware performance mavens, 82because various caching+prefetching hardware can look ahead that far. 83Currently ld-linux leaves this as PROT_NONE, but anybody else is 84free to override that assignment. 85 86> From peering at various /proc/*/maps files, the following scheme 87> sounds plausible: 88> 89> Load symbols following an mmap if: 90> 91> map is to a file 92> map has r-x permissions 93> file has a valid ELF header 94> possibly: mapping is > 1 page (catches the case of mapping first 95> page just to examine the header) 96> 97> If the client wants to subsequently chop up the mapping, or change its 98> permissions, we ignore that. I have never seen any evidence in 99> proc/*/maps that ld.so does such things. 100 101glibc-2.3.5 ld-linux does. It finds the minimum interval of pages which 102covers the p_memsz of all PT_LOAD, mmap()s that much from the file [even if 103this maps beyond EOF of the file], then munmap()s [or mprotect(,,PROT_NONE)] 104everything that is not covered by the first PT_LOAD, then 105mmap(,,,MAP_FIXED,,) each remaining PT_LOAD. This is done to overcome the 106possibility that a kernel which randomizes the placement of mmap(0, ...) 107might place the first PT_LOAD so that subsequent PT_LOAD [must maintain 108relative addressing to other PT_LOAD from the same file] would evict 109something else. Needless to say, ld-linux assumes that it is the only actor 110(well, dlopen() does try for mutual exclusion) and that any "holes" between 111PT_LOAD from the same module are ignorable as far as allocation is 112concerned. Also, there is nothing to stop a file from having PT_LOAD that 113overlap, or appear in non-ascending order, etc. The results might depend on 114order of processing, but always it has been by order of appearance in the 115file. [Probably this is a good way to trigger "bugs" in ld-linux and/or the 116kernel.] 117 118Some algorithms and data structures internal to glibc-2.3.5 assume that 119modules do not overlap. In particular, ld-linux sometimes searches 120for __builtin_return_address_(0) in a set of intervals in order to determine 121which shared lib called ld-linux. This matters for dlsym(), dlmopen(), 122etc., and assumes that the intervals are a disjoint cover of any 123"legal" callers. ld-linux tries to hide all of this from the prying 124eyes of anyone else [the internal version of struct link_map contains 125much more than specified in <link.h>]. Some of this is good because 126it changes very frequently, but some parts are bad because in the past 127ld-linux has been slow to provide needed services [such as 128dl_iterate_phdr()] and even antagonistic towards anybody else 129trying for peaceful co-existence without the blessing of ld-linux. 130 131