1 //! Implementation of C++11-consistent weak memory emulation using store buffers 2 //! based on Dynamic Race Detection for C++ ("the paper"): 3 //! <https://www.doc.ic.ac.uk/~afd/homepages/papers/pdfs/2017/POPL.pdf> 4 //! 5 //! This implementation will never generate weak memory behaviours forbidden by the C++11 model, 6 //! but it is incapable of producing all possible weak behaviours allowed by the model. There are 7 //! certain weak behaviours observable on real hardware but not while using this. 8 //! 9 //! Note that this implementation does not fully take into account of C++20's memory model revision to SC accesses 10 //! and fences introduced by P0668 (<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0668r5.html>). 11 //! This implementation is not fully correct under the revised C++20 model and may generate behaviours C++20 12 //! disallows (<https://github.com/rust-lang/miri/issues/2301>). 13 //! 14 //! A modification is made to the paper's model to partially address C++20 changes. 15 //! Specifically, if an SC load reads from an atomic store of any ordering, then a later SC load cannot read from 16 //! an earlier store in the location's modification order. This is to prevent creating a backwards S edge from the second 17 //! load to the first, as a result of C++20's coherence-ordered before rules. 18 //! 19 //! Rust follows the C++20 memory model (except for the Consume ordering and some operations not performable through C++'s 20 //! `std::atomic<T>` API). It is therefore possible for this implementation to generate behaviours never observable when the 21 //! same program is compiled and run natively. Unfortunately, no literature exists at the time of writing which proposes 22 //! an implementable and C++20-compatible relaxed memory model that supports all atomic operation existing in Rust. The closest one is 23 //! A Promising Semantics for Relaxed-Memory Concurrency by Jeehoon Kang et al. (<https://www.cs.tau.ac.il/~orilahav/papers/popl17.pdf>) 24 //! However, this model lacks SC accesses and is therefore unusable by Miri (SC accesses are everywhere in library code). 25 //! 26 //! If you find anything that proposes a relaxed memory model that is C++20-consistent, supports all orderings Rust's atomic accesses 27 //! and fences accept, and is implementable (with operational semantics), please open a GitHub issue! 28 //! 29 //! One characteristic of this implementation, in contrast to some other notable operational models such as ones proposed in 30 //! Taming Release-Acquire Consistency by Ori Lahav et al. (<https://plv.mpi-sws.org/sra/paper.pdf>) or Promising Semantics noted above, 31 //! is that this implementation does not require each thread to hold an isolated view of the entire memory. Here, store buffers are per-location 32 //! and shared across all threads. This is more memory efficient but does require store elements (representing writes to a location) to record 33 //! information about reads, whereas in the other two models it is the other way round: reads points to the write it got its value from. 34 //! Additionally, writes in our implementation do not have globally unique timestamps attached. In the other two models this timestamp is 35 //! used to make sure a value in a thread's view is not overwritten by a write that occurred earlier than the one in the existing view. 36 //! In our implementation, this is detected using read information attached to store elements, as there is no data structure representing reads. 37 //! 38 //! The C++ memory model is built around the notion of an 'atomic object', so it would be natural 39 //! to attach store buffers to atomic objects. However, Rust follows LLVM in that it only has 40 //! 'atomic accesses'. Therefore Miri cannot know when and where atomic 'objects' are being 41 //! created or destroyed, to manage its store buffers. Instead, we hence lazily create an 42 //! atomic object on the first atomic access to a given region, and we destroy that object 43 //! on the next non-atomic or imperfectly overlapping atomic access to that region. 44 //! These lazy (de)allocations happen in memory_accessed() on non-atomic accesses, and 45 //! get_or_create_store_buffer() on atomic accesses. This mostly works well, but it does 46 //! lead to some issues (<https://github.com/rust-lang/miri/issues/2164>). 47 //! 48 //! One consequence of this difference is that safe/sound Rust allows for more operations on atomic locations 49 //! than the C++20 atomic API was intended to allow, such as non-atomically accessing 50 //! a previously atomically accessed location, or accessing previously atomically accessed locations with a differently sized operation 51 //! (such as accessing the top 16 bits of an AtomicU32). These scenarios are generally undiscussed in formalisations of C++ memory model. 52 //! In Rust, these operations can only be done through a `&mut AtomicFoo` reference or one derived from it, therefore these operations 53 //! can only happen after all previous accesses on the same locations. This implementation is adapted to allow these operations. 54 //! A mixed atomicity read that races with writes, or a write that races with reads or writes will still cause UBs to be thrown. 55 //! Mixed size atomic accesses must not race with any other atomic access, whether read or write, or a UB will be thrown. 56 //! You can refer to test cases in weak_memory/extra_cpp.rs and weak_memory/extra_cpp_unsafe.rs for examples of these operations. 57 58 // Our and the author's own implementation (tsan11) of the paper have some deviations from the provided operational semantics in §5.3: 59 // 1. In the operational semantics, store elements keep a copy of the atomic object's vector clock (AtomicCellClocks::sync_vector in miri), 60 // but this is not used anywhere so it's omitted here. 61 // 62 // 2. In the operational semantics, each store element keeps the timestamp of a thread when it loads from the store. 63 // If the same thread loads from the same store element multiple times, then the timestamps at all loads are saved in a list of load elements. 64 // This is not necessary as later loads by the same thread will always have greater timestamp values, so we only need to record the timestamp of the first 65 // load by each thread. This optimisation is done in tsan11 66 // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.h#L35-L37) 67 // and here. 68 // 69 // 3. §4.5 of the paper wants an SC store to mark all existing stores in the buffer that happens before it 70 // as SC. This is not done in the operational semantics but implemented correctly in tsan11 71 // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.cc#L160-L167) 72 // and here. 73 // 74 // 4. W_SC ; R_SC case requires the SC load to ignore all but last store maked SC (stores not marked SC are not 75 // affected). But this rule is applied to all loads in ReadsFromSet from the paper (last two lines of code), not just SC load. 76 // This is implemented correctly in tsan11 77 // (https://github.com/ChrisLidbury/tsan11/blob/ecbd6b81e9b9454e01cba78eb9d88684168132c7/lib/tsan/rtl/tsan_relaxed.cc#L295) 78 // and here. 79 80 use std::{ 81 cell::{Ref, RefCell}, 82 collections::VecDeque, 83 }; 84 85 use rustc_const_eval::interpret::{alloc_range, AllocRange, InterpResult, MPlaceTy, Scalar}; 86 use rustc_data_structures::fx::FxHashMap; 87 88 use crate::*; 89 90 use super::{ 91 data_race::{GlobalState as DataRaceState, ThreadClockSet}, 92 range_object_map::{AccessType, RangeObjectMap}, 93 vector_clock::{VClock, VTimestamp, VectorIdx}, 94 }; 95 96 pub type AllocState = StoreBufferAlloc; 97 98 // Each store buffer must be bounded otherwise it will grow indefinitely. 99 // However, bounding the store buffer means restricting the amount of weak 100 // behaviours observable. The author picked 128 as a good tradeoff 101 // so we follow them here. 102 const STORE_BUFFER_LIMIT: usize = 128; 103 104 #[derive(Debug, Clone)] 105 pub struct StoreBufferAlloc { 106 /// Store buffer of each atomic object in this allocation 107 // Behind a RefCell because we need to allocate/remove on read access 108 store_buffers: RefCell<RangeObjectMap<StoreBuffer>>, 109 } 110 111 impl VisitTags for StoreBufferAlloc { visit_tags(&self, visit: &mut dyn FnMut(BorTag))112 fn visit_tags(&self, visit: &mut dyn FnMut(BorTag)) { 113 let Self { store_buffers } = self; 114 for val in store_buffers 115 .borrow() 116 .iter() 117 .flat_map(|buf| buf.buffer.iter().map(|element| &element.val)) 118 { 119 val.visit_tags(visit); 120 } 121 } 122 } 123 124 #[derive(Debug, Clone, PartialEq, Eq)] 125 pub(super) struct StoreBuffer { 126 // Stores to this location in modification order 127 buffer: VecDeque<StoreElement>, 128 } 129 130 /// Whether a load returned the latest value or not. 131 #[derive(PartialEq, Eq)] 132 enum LoadRecency { 133 Latest, 134 Outdated, 135 } 136 137 #[derive(Debug, Clone, PartialEq, Eq)] 138 struct StoreElement { 139 /// The identifier of the vector index, corresponding to a thread 140 /// that performed the store. 141 store_index: VectorIdx, 142 143 /// Whether this store is SC. 144 is_seqcst: bool, 145 146 /// The timestamp of the storing thread when it performed the store 147 timestamp: VTimestamp, 148 /// The value of this store 149 // FIXME: this means the store must be fully initialized; 150 // we will have to change this if we want to support atomics on 151 // (partially) uninitialized data. 152 val: Scalar<Provenance>, 153 154 /// Metadata about loads from this store element, 155 /// behind a RefCell to keep load op take &self 156 load_info: RefCell<LoadInfo>, 157 } 158 159 #[derive(Debug, Clone, PartialEq, Eq, Default)] 160 struct LoadInfo { 161 /// Timestamp of first loads from this store element by each thread 162 timestamps: FxHashMap<VectorIdx, VTimestamp>, 163 /// Whether this store element has been read by an SC load 164 sc_loaded: bool, 165 } 166 167 impl StoreBufferAlloc { new_allocation() -> Self168 pub fn new_allocation() -> Self { 169 Self { store_buffers: RefCell::new(RangeObjectMap::new()) } 170 } 171 172 /// Checks if the range imperfectly overlaps with existing buffers 173 /// Used to determine if mixed-size atomic accesses is_overlapping(&self, range: AllocRange) -> bool174 fn is_overlapping(&self, range: AllocRange) -> bool { 175 let buffers = self.store_buffers.borrow(); 176 let access_type = buffers.access_type(range); 177 matches!(access_type, AccessType::ImperfectlyOverlapping(_)) 178 } 179 180 /// When a non-atomic access happens on a location that has been atomically accessed 181 /// before without data race, we can determine that the non-atomic access fully happens 182 /// after all the prior atomic accesses so the location no longer needs to exhibit 183 /// any weak memory behaviours until further atomic accesses. memory_accessed(&self, range: AllocRange, global: &DataRaceState)184 pub fn memory_accessed(&self, range: AllocRange, global: &DataRaceState) { 185 if !global.ongoing_action_data_race_free() { 186 let mut buffers = self.store_buffers.borrow_mut(); 187 let access_type = buffers.access_type(range); 188 match access_type { 189 AccessType::PerfectlyOverlapping(pos) => { 190 buffers.remove_from_pos(pos); 191 } 192 AccessType::ImperfectlyOverlapping(pos_range) => { 193 buffers.remove_pos_range(pos_range); 194 } 195 AccessType::Empty(_) => { 196 // The range had no weak behaviours attached, do nothing 197 } 198 } 199 } 200 } 201 202 /// Gets a store buffer associated with an atomic object in this allocation, 203 /// or creates one with the specified initial value if no atomic object exists yet. get_or_create_store_buffer<'tcx>( &self, range: AllocRange, init: Scalar<Provenance>, ) -> InterpResult<'tcx, Ref<'_, StoreBuffer>>204 fn get_or_create_store_buffer<'tcx>( 205 &self, 206 range: AllocRange, 207 init: Scalar<Provenance>, 208 ) -> InterpResult<'tcx, Ref<'_, StoreBuffer>> { 209 let access_type = self.store_buffers.borrow().access_type(range); 210 let pos = match access_type { 211 AccessType::PerfectlyOverlapping(pos) => pos, 212 AccessType::Empty(pos) => { 213 let mut buffers = self.store_buffers.borrow_mut(); 214 buffers.insert_at_pos(pos, range, StoreBuffer::new(init)); 215 pos 216 } 217 AccessType::ImperfectlyOverlapping(pos_range) => { 218 // Once we reach here we would've already checked that this access is not racy 219 let mut buffers = self.store_buffers.borrow_mut(); 220 buffers.remove_pos_range(pos_range.clone()); 221 buffers.insert_at_pos(pos_range.start, range, StoreBuffer::new(init)); 222 pos_range.start 223 } 224 }; 225 Ok(Ref::map(self.store_buffers.borrow(), |buffer| &buffer[pos])) 226 } 227 228 /// Gets a mutable store buffer associated with an atomic object in this allocation get_or_create_store_buffer_mut<'tcx>( &mut self, range: AllocRange, init: Scalar<Provenance>, ) -> InterpResult<'tcx, &mut StoreBuffer>229 fn get_or_create_store_buffer_mut<'tcx>( 230 &mut self, 231 range: AllocRange, 232 init: Scalar<Provenance>, 233 ) -> InterpResult<'tcx, &mut StoreBuffer> { 234 let buffers = self.store_buffers.get_mut(); 235 let access_type = buffers.access_type(range); 236 let pos = match access_type { 237 AccessType::PerfectlyOverlapping(pos) => pos, 238 AccessType::Empty(pos) => { 239 buffers.insert_at_pos(pos, range, StoreBuffer::new(init)); 240 pos 241 } 242 AccessType::ImperfectlyOverlapping(pos_range) => { 243 buffers.remove_pos_range(pos_range.clone()); 244 buffers.insert_at_pos(pos_range.start, range, StoreBuffer::new(init)); 245 pos_range.start 246 } 247 }; 248 Ok(&mut buffers[pos]) 249 } 250 } 251 252 impl<'mir, 'tcx: 'mir> StoreBuffer { new(init: Scalar<Provenance>) -> Self253 fn new(init: Scalar<Provenance>) -> Self { 254 let mut buffer = VecDeque::new(); 255 buffer.reserve(STORE_BUFFER_LIMIT); 256 let mut ret = Self { buffer }; 257 let store_elem = StoreElement { 258 // The thread index and timestamp of the initialisation write 259 // are never meaningfully used, so it's fine to leave them as 0 260 store_index: VectorIdx::from(0), 261 timestamp: VTimestamp::ZERO, 262 val: init, 263 is_seqcst: false, 264 load_info: RefCell::new(LoadInfo::default()), 265 }; 266 ret.buffer.push_back(store_elem); 267 ret 268 } 269 270 /// Reads from the last store in modification order read_from_last_store( &self, global: &DataRaceState, thread_mgr: &ThreadManager<'_, '_>, is_seqcst: bool, )271 fn read_from_last_store( 272 &self, 273 global: &DataRaceState, 274 thread_mgr: &ThreadManager<'_, '_>, 275 is_seqcst: bool, 276 ) { 277 let store_elem = self.buffer.back(); 278 if let Some(store_elem) = store_elem { 279 let (index, clocks) = global.current_thread_state(thread_mgr); 280 store_elem.load_impl(index, &clocks, is_seqcst); 281 } 282 } 283 buffered_read( &self, global: &DataRaceState, thread_mgr: &ThreadManager<'_, '_>, is_seqcst: bool, rng: &mut (impl rand::Rng + ?Sized), validate: impl FnOnce() -> InterpResult<'tcx>, ) -> InterpResult<'tcx, (Scalar<Provenance>, LoadRecency)>284 fn buffered_read( 285 &self, 286 global: &DataRaceState, 287 thread_mgr: &ThreadManager<'_, '_>, 288 is_seqcst: bool, 289 rng: &mut (impl rand::Rng + ?Sized), 290 validate: impl FnOnce() -> InterpResult<'tcx>, 291 ) -> InterpResult<'tcx, (Scalar<Provenance>, LoadRecency)> { 292 // Having a live borrow to store_buffer while calling validate_atomic_load is fine 293 // because the race detector doesn't touch store_buffer 294 295 let (store_elem, recency) = { 296 // The `clocks` we got here must be dropped before calling validate_atomic_load 297 // as the race detector will update it 298 let (.., clocks) = global.current_thread_state(thread_mgr); 299 // Load from a valid entry in the store buffer 300 self.fetch_store(is_seqcst, &clocks, &mut *rng) 301 }; 302 303 // Unlike in buffered_atomic_write, thread clock updates have to be done 304 // after we've picked a store element from the store buffer, as presented 305 // in ATOMIC LOAD rule of the paper. This is because fetch_store 306 // requires access to ThreadClockSet.clock, which is updated by the race detector 307 validate()?; 308 309 let (index, clocks) = global.current_thread_state(thread_mgr); 310 let loaded = store_elem.load_impl(index, &clocks, is_seqcst); 311 Ok((loaded, recency)) 312 } 313 buffered_write( &mut self, val: Scalar<Provenance>, global: &DataRaceState, thread_mgr: &ThreadManager<'_, '_>, is_seqcst: bool, ) -> InterpResult<'tcx>314 fn buffered_write( 315 &mut self, 316 val: Scalar<Provenance>, 317 global: &DataRaceState, 318 thread_mgr: &ThreadManager<'_, '_>, 319 is_seqcst: bool, 320 ) -> InterpResult<'tcx> { 321 let (index, clocks) = global.current_thread_state(thread_mgr); 322 323 self.store_impl(val, index, &clocks.clock, is_seqcst); 324 Ok(()) 325 } 326 327 #[allow(clippy::if_same_then_else, clippy::needless_bool)] 328 /// Selects a valid store element in the buffer. fetch_store<R: rand::Rng + ?Sized>( &self, is_seqcst: bool, clocks: &ThreadClockSet, rng: &mut R, ) -> (&StoreElement, LoadRecency)329 fn fetch_store<R: rand::Rng + ?Sized>( 330 &self, 331 is_seqcst: bool, 332 clocks: &ThreadClockSet, 333 rng: &mut R, 334 ) -> (&StoreElement, LoadRecency) { 335 use rand::seq::IteratorRandom; 336 let mut found_sc = false; 337 // FIXME: we want an inclusive take_while (stops after a false predicate, but 338 // includes the element that gave the false), but such function doesn't yet 339 // exist in the standard library https://github.com/rust-lang/rust/issues/62208 340 // so we have to hack around it with keep_searching 341 let mut keep_searching = true; 342 let candidates = self 343 .buffer 344 .iter() 345 .rev() 346 .take_while(move |&store_elem| { 347 if !keep_searching { 348 return false; 349 } 350 351 keep_searching = if store_elem.timestamp <= clocks.clock[store_elem.store_index] { 352 // CoWR: if a store happens-before the current load, 353 // then we can't read-from anything earlier in modification order. 354 // C++20 §6.9.2.2 [intro.races] paragraph 18 355 false 356 } else if store_elem.load_info.borrow().timestamps.iter().any( 357 |(&load_index, &load_timestamp)| load_timestamp <= clocks.clock[load_index], 358 ) { 359 // CoRR: if there was a load from this store which happened-before the current load, 360 // then we cannot read-from anything earlier in modification order. 361 // C++20 §6.9.2.2 [intro.races] paragraph 16 362 false 363 } else if store_elem.timestamp <= clocks.fence_seqcst[store_elem.store_index] { 364 // The current load, which may be sequenced-after an SC fence, cannot read-before 365 // the last store sequenced-before an SC fence in another thread. 366 // C++17 §32.4 [atomics.order] paragraph 6 367 false 368 } else if store_elem.timestamp <= clocks.write_seqcst[store_elem.store_index] 369 && store_elem.is_seqcst 370 { 371 // The current non-SC load, which may be sequenced-after an SC fence, 372 // cannot read-before the last SC store executed before the fence. 373 // C++17 §32.4 [atomics.order] paragraph 4 374 false 375 } else if is_seqcst 376 && store_elem.timestamp <= clocks.read_seqcst[store_elem.store_index] 377 { 378 // The current SC load cannot read-before the last store sequenced-before 379 // the last SC fence. 380 // C++17 §32.4 [atomics.order] paragraph 5 381 false 382 } else if is_seqcst && store_elem.load_info.borrow().sc_loaded { 383 // The current SC load cannot read-before a store that an earlier SC load has observed. 384 // See https://github.com/rust-lang/miri/issues/2301#issuecomment-1222720427 385 // Consequences of C++20 §31.4 [atomics.order] paragraph 3.1, 3.3 (coherence-ordered before) 386 // and 4.1 (coherence-ordered before between SC makes global total order S) 387 false 388 } else { 389 true 390 }; 391 392 true 393 }) 394 .filter(|&store_elem| { 395 if is_seqcst && store_elem.is_seqcst { 396 // An SC load needs to ignore all but last store maked SC (stores not marked SC are not 397 // affected) 398 let include = !found_sc; 399 found_sc = true; 400 include 401 } else { 402 true 403 } 404 }); 405 406 let chosen = candidates.choose(rng).expect("store buffer cannot be empty"); 407 if std::ptr::eq(chosen, self.buffer.back().expect("store buffer cannot be empty")) { 408 (chosen, LoadRecency::Latest) 409 } else { 410 (chosen, LoadRecency::Outdated) 411 } 412 } 413 414 /// ATOMIC STORE IMPL in the paper (except we don't need the location's vector clock) store_impl( &mut self, val: Scalar<Provenance>, index: VectorIdx, thread_clock: &VClock, is_seqcst: bool, )415 fn store_impl( 416 &mut self, 417 val: Scalar<Provenance>, 418 index: VectorIdx, 419 thread_clock: &VClock, 420 is_seqcst: bool, 421 ) { 422 let store_elem = StoreElement { 423 store_index: index, 424 timestamp: thread_clock[index], 425 // In the language provided in the paper, an atomic store takes the value from a 426 // non-atomic memory location. 427 // But we already have the immediate value here so we don't need to do the memory 428 // access 429 val, 430 is_seqcst, 431 load_info: RefCell::new(LoadInfo::default()), 432 }; 433 self.buffer.push_back(store_elem); 434 if self.buffer.len() > STORE_BUFFER_LIMIT { 435 self.buffer.pop_front(); 436 } 437 if is_seqcst { 438 // Every store that happens before this needs to be marked as SC 439 // so that in a later SC load, only the last SC store (i.e. this one) or stores that 440 // aren't ordered by hb with the last SC is picked. 441 self.buffer.iter_mut().rev().for_each(|elem| { 442 if elem.timestamp <= thread_clock[elem.store_index] { 443 elem.is_seqcst = true; 444 } 445 }) 446 } 447 } 448 } 449 450 impl StoreElement { 451 /// ATOMIC LOAD IMPL in the paper 452 /// Unlike the operational semantics in the paper, we don't need to keep track 453 /// of the thread timestamp for every single load. Keeping track of the first (smallest) 454 /// timestamp of each thread that has loaded from a store is sufficient: if the earliest 455 /// load of another thread happens before the current one, then we must stop searching the store 456 /// buffer regardless of subsequent loads by the same thread; if the earliest load of another 457 /// thread doesn't happen before the current one, then no subsequent load by the other thread 458 /// can happen before the current one. load_impl( &self, index: VectorIdx, clocks: &ThreadClockSet, is_seqcst: bool, ) -> Scalar<Provenance>459 fn load_impl( 460 &self, 461 index: VectorIdx, 462 clocks: &ThreadClockSet, 463 is_seqcst: bool, 464 ) -> Scalar<Provenance> { 465 let mut load_info = self.load_info.borrow_mut(); 466 load_info.sc_loaded |= is_seqcst; 467 let _ = load_info.timestamps.try_insert(index, clocks.clock[index]); 468 self.val 469 } 470 } 471 472 impl<'mir, 'tcx: 'mir> EvalContextExt<'mir, 'tcx> for crate::MiriInterpCx<'mir, 'tcx> {} 473 pub(super) trait EvalContextExt<'mir, 'tcx: 'mir>: 474 crate::MiriInterpCxExt<'mir, 'tcx> 475 { 476 // If weak memory emulation is enabled, check if this atomic op imperfectly overlaps with a previous 477 // atomic read or write. If it does, then we require it to be ordered (non-racy) with all previous atomic 478 // accesses on all the bytes in range validate_overlapping_atomic( &self, place: &MPlaceTy<'tcx, Provenance>, ) -> InterpResult<'tcx>479 fn validate_overlapping_atomic( 480 &self, 481 place: &MPlaceTy<'tcx, Provenance>, 482 ) -> InterpResult<'tcx> { 483 let this = self.eval_context_ref(); 484 let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr)?; 485 if let crate::AllocExtra { 486 weak_memory: Some(alloc_buffers), 487 data_race: Some(alloc_clocks), 488 .. 489 } = this.get_alloc_extra(alloc_id)? 490 { 491 let range = alloc_range(base_offset, place.layout.size); 492 if alloc_buffers.is_overlapping(range) 493 && !alloc_clocks.race_free_with_atomic( 494 range, 495 this.machine.data_race.as_ref().unwrap(), 496 &this.machine.threads, 497 ) 498 { 499 throw_unsup_format!( 500 "racy imperfectly overlapping atomic access is not possible in the C++20 memory model, and not supported by Miri's weak memory emulation" 501 ); 502 } 503 } 504 Ok(()) 505 } 506 buffered_atomic_rmw( &mut self, new_val: Scalar<Provenance>, place: &MPlaceTy<'tcx, Provenance>, atomic: AtomicRwOrd, init: Scalar<Provenance>, ) -> InterpResult<'tcx>507 fn buffered_atomic_rmw( 508 &mut self, 509 new_val: Scalar<Provenance>, 510 place: &MPlaceTy<'tcx, Provenance>, 511 atomic: AtomicRwOrd, 512 init: Scalar<Provenance>, 513 ) -> InterpResult<'tcx> { 514 let this = self.eval_context_mut(); 515 let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr)?; 516 if let ( 517 crate::AllocExtra { weak_memory: Some(alloc_buffers), .. }, 518 crate::MiriMachine { data_race: Some(global), threads, .. }, 519 ) = this.get_alloc_extra_mut(alloc_id)? 520 { 521 if atomic == AtomicRwOrd::SeqCst { 522 global.sc_read(threads); 523 global.sc_write(threads); 524 } 525 let range = alloc_range(base_offset, place.layout.size); 526 let buffer = alloc_buffers.get_or_create_store_buffer_mut(range, init)?; 527 buffer.read_from_last_store(global, threads, atomic == AtomicRwOrd::SeqCst); 528 buffer.buffered_write(new_val, global, threads, atomic == AtomicRwOrd::SeqCst)?; 529 } 530 Ok(()) 531 } 532 buffered_atomic_read( &self, place: &MPlaceTy<'tcx, Provenance>, atomic: AtomicReadOrd, latest_in_mo: Scalar<Provenance>, validate: impl FnOnce() -> InterpResult<'tcx>, ) -> InterpResult<'tcx, Scalar<Provenance>>533 fn buffered_atomic_read( 534 &self, 535 place: &MPlaceTy<'tcx, Provenance>, 536 atomic: AtomicReadOrd, 537 latest_in_mo: Scalar<Provenance>, 538 validate: impl FnOnce() -> InterpResult<'tcx>, 539 ) -> InterpResult<'tcx, Scalar<Provenance>> { 540 let this = self.eval_context_ref(); 541 if let Some(global) = &this.machine.data_race { 542 let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr)?; 543 if let Some(alloc_buffers) = this.get_alloc_extra(alloc_id)?.weak_memory.as_ref() { 544 if atomic == AtomicReadOrd::SeqCst { 545 global.sc_read(&this.machine.threads); 546 } 547 let mut rng = this.machine.rng.borrow_mut(); 548 let buffer = alloc_buffers.get_or_create_store_buffer( 549 alloc_range(base_offset, place.layout.size), 550 latest_in_mo, 551 )?; 552 let (loaded, recency) = buffer.buffered_read( 553 global, 554 &this.machine.threads, 555 atomic == AtomicReadOrd::SeqCst, 556 &mut *rng, 557 validate, 558 )?; 559 if global.track_outdated_loads && recency == LoadRecency::Outdated { 560 this.emit_diagnostic(NonHaltingDiagnostic::WeakMemoryOutdatedLoad); 561 } 562 563 return Ok(loaded); 564 } 565 } 566 567 // Race detector or weak memory disabled, simply read the latest value 568 validate()?; 569 Ok(latest_in_mo) 570 } 571 buffered_atomic_write( &mut self, val: Scalar<Provenance>, dest: &MPlaceTy<'tcx, Provenance>, atomic: AtomicWriteOrd, init: Scalar<Provenance>, ) -> InterpResult<'tcx>572 fn buffered_atomic_write( 573 &mut self, 574 val: Scalar<Provenance>, 575 dest: &MPlaceTy<'tcx, Provenance>, 576 atomic: AtomicWriteOrd, 577 init: Scalar<Provenance>, 578 ) -> InterpResult<'tcx> { 579 let this = self.eval_context_mut(); 580 let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(dest.ptr)?; 581 if let ( 582 crate::AllocExtra { weak_memory: Some(alloc_buffers), .. }, 583 crate::MiriMachine { data_race: Some(global), threads, .. }, 584 ) = this.get_alloc_extra_mut(alloc_id)? 585 { 586 if atomic == AtomicWriteOrd::SeqCst { 587 global.sc_write(threads); 588 } 589 590 // UGLY HACK: in write_scalar_atomic() we don't know the value before our write, 591 // so init == val always. If the buffer is fresh then we would've duplicated an entry, 592 // so we need to remove it. 593 // See https://github.com/rust-lang/miri/issues/2164 594 let was_empty = matches!( 595 alloc_buffers 596 .store_buffers 597 .borrow() 598 .access_type(alloc_range(base_offset, dest.layout.size)), 599 AccessType::Empty(_) 600 ); 601 let buffer = alloc_buffers 602 .get_or_create_store_buffer_mut(alloc_range(base_offset, dest.layout.size), init)?; 603 if was_empty { 604 buffer.buffer.pop_front(); 605 } 606 607 buffer.buffered_write(val, global, threads, atomic == AtomicWriteOrd::SeqCst)?; 608 } 609 610 // Caller should've written to dest with the vanilla scalar write, we do nothing here 611 Ok(()) 612 } 613 614 /// Caller should never need to consult the store buffer for the latest value. 615 /// This function is used exclusively for failed atomic_compare_exchange_scalar 616 /// to perform load_impl on the latest store element perform_read_on_buffered_latest( &self, place: &MPlaceTy<'tcx, Provenance>, atomic: AtomicReadOrd, init: Scalar<Provenance>, ) -> InterpResult<'tcx>617 fn perform_read_on_buffered_latest( 618 &self, 619 place: &MPlaceTy<'tcx, Provenance>, 620 atomic: AtomicReadOrd, 621 init: Scalar<Provenance>, 622 ) -> InterpResult<'tcx> { 623 let this = self.eval_context_ref(); 624 625 if let Some(global) = &this.machine.data_race { 626 if atomic == AtomicReadOrd::SeqCst { 627 global.sc_read(&this.machine.threads); 628 } 629 let size = place.layout.size; 630 let (alloc_id, base_offset, ..) = this.ptr_get_alloc_id(place.ptr)?; 631 if let Some(alloc_buffers) = this.get_alloc_extra(alloc_id)?.weak_memory.as_ref() { 632 let buffer = alloc_buffers 633 .get_or_create_store_buffer(alloc_range(base_offset, size), init)?; 634 buffer.read_from_last_store( 635 global, 636 &this.machine.threads, 637 atomic == AtomicReadOrd::SeqCst, 638 ); 639 } 640 } 641 Ok(()) 642 } 643 } 644