1 // Copyright 2020, The Android Open Source Project 2 // 3 // Licensed under the Apache License, Version 2.0 (the "License"); 4 // you may not use this file except in compliance with the License. 5 // You may obtain a copy of the License at 6 // 7 // http://www.apache.org/licenses/LICENSE-2.0 8 // 9 // Unless required by applicable law or agreed to in writing, software 10 // distributed under the License is distributed on an "AS IS" BASIS, 11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 // See the License for the specific language governing permissions and 13 // limitations under the License. 14 15 //! This crate implements the `IKeystoreOperation` AIDL interface, which represents 16 //! an ongoing key operation, as well as the operation database, which is mainly 17 //! required for tracking operations for the purpose of pruning. 18 //! This crate also implements an operation pruning strategy. 19 //! 20 //! Operations implement the API calls update, finish, and abort. 21 //! Additionally, an operation can be dropped and pruned. The former 22 //! happens if the client deletes a binder to the operation object. 23 //! An existing operation may get pruned when running out of operation 24 //! slots and a new operation takes precedence. 25 //! 26 //! ## Operation Lifecycle 27 //! An operation gets created when the client calls `IKeystoreSecurityLevel::create`. 28 //! It may receive zero or more update request. The lifecycle ends when: 29 //! * `update` yields an error. 30 //! * `finish` is called. 31 //! * `abort` is called. 32 //! * The operation gets dropped. 33 //! * The operation gets pruned. 34 //! 35 //! `Operation` has an `Outcome` member. While the outcome is `Outcome::Unknown`, 36 //! the operation is active and in a good state. Any of the above conditions may 37 //! change the outcome to one of the defined outcomes Success, Abort, Dropped, 38 //! Pruned, or ErrorCode. The latter is chosen in the case of an unexpected error, during 39 //! `update` or `finish`. `Success` is chosen iff `finish` completes without error. 40 //! Note that all operations get dropped eventually in the sense that they lose 41 //! their last reference and get destroyed. At that point, the fate of the operation 42 //! gets logged. However, an operation will transition to `Outcome::Dropped` iff 43 //! the operation was still active (`Outcome::Unknown`) at that time. 44 //! 45 //! ## Operation Dropping 46 //! To observe the dropping of an operation, we have to make sure that there 47 //! are no strong references to the IBinder representing this operation. 48 //! This would be simple enough if the operation object would need to be accessed 49 //! only by transactions. But to perform pruning, we have to retain a reference to the 50 //! original operation object. 51 //! 52 //! ## Operation Pruning 53 //! Pruning an operation happens during the creation of a new operation. 54 //! We have to iterate through the operation database to find a suitable 55 //! candidate. Then we abort and finalize this operation setting its outcome to 56 //! `Outcome::Pruned`. The corresponding KeyMint operation slot will have been freed 57 //! up at this point, but the `Operation` object lingers. When the client 58 //! attempts to use the operation again they will receive 59 //! ErrorCode::INVALID_OPERATION_HANDLE indicating that the operation no longer 60 //! exits. This should be the cue for the client to destroy its binder. 61 //! At that point the operation gets dropped. 62 //! 63 //! ## Architecture 64 //! The `IKeystoreOperation` trait is implemented by `KeystoreOperation`. 65 //! This acts as a proxy object holding a strong reference to actual operation 66 //! implementation `Operation`. 67 //! 68 //! ``` 69 //! struct KeystoreOperation { 70 //! operation: Mutex<Option<Arc<Operation>>>, 71 //! } 72 //! ``` 73 //! 74 //! The `Mutex` serves two purposes. It provides interior mutability allowing 75 //! us to set the Option to None. We do this when the life cycle ends during 76 //! a call to `update`, `finish`, or `abort`. As a result most of the Operation 77 //! related resources are freed. The `KeystoreOperation` proxy object still 78 //! lingers until dropped by the client. 79 //! The second purpose is to protect operations against concurrent usage. 80 //! Failing to lock this mutex yields `ResponseCode::OPERATION_BUSY` and indicates 81 //! a programming error in the client. 82 //! 83 //! Note that the Mutex only protects the operation against concurrent client calls. 84 //! We still retain weak references to the operation in the operation database: 85 //! 86 //! ``` 87 //! struct OperationDb { 88 //! operations: Mutex<Vec<Weak<Operation>>> 89 //! } 90 //! ``` 91 //! 92 //! This allows us to access the operations for the purpose of pruning. 93 //! We do this in three phases. 94 //! 1. We gather the pruning information. Besides non mutable information, 95 //! we access `last_usage` which is protected by a mutex. 96 //! We only lock this mutex for single statements at a time. During 97 //! this phase we hold the operation db lock. 98 //! 2. We choose a pruning candidate by computing the pruning resistance 99 //! of each operation. We do this entirely with information we now 100 //! have on the stack without holding any locks. 101 //! (See `OperationDb::prune` for more details on the pruning strategy.) 102 //! 3. During pruning we briefly lock the operation database again to get the 103 //! the pruning candidate by index. We then attempt to abort the candidate. 104 //! If the candidate was touched in the meantime or is currently fulfilling 105 //! a request (i.e., the client calls update, finish, or abort), 106 //! we go back to 1 and try again. 107 //! 108 //! So the outer Mutex in `KeystoreOperation::operation` only protects 109 //! operations against concurrent client calls but not against concurrent 110 //! pruning attempts. This is what the `Operation::outcome` mutex is used for. 111 //! 112 //! ``` 113 //! struct Operation { 114 //! ... 115 //! outcome: Mutex<Outcome>, 116 //! ... 117 //! } 118 //! ``` 119 //! 120 //! Any request that can change the outcome, i.e., `update`, `finish`, `abort`, 121 //! `drop`, and `prune` has to take the outcome lock and check if the outcome 122 //! is still `Outcome::Unknown` before entering. `prune` is special in that 123 //! it will `try_lock`, because we don't want to be blocked on a potentially 124 //! long running request at another operation. If it fails to get the lock 125 //! the operation is either being touched, which changes its pruning resistance, 126 //! or it transitions to its end-of-life, which means we may get a free slot. 127 //! Either way, we have to revaluate the pruning scores. 128 129 use crate::enforcements::AuthInfo; 130 use crate::error::{ 131 error_to_serialized_error, into_binder, into_logged_binder, map_km_error, Error, ErrorCode, 132 ResponseCode, SerializedError, 133 }; 134 use crate::ks_err; 135 use crate::metrics_store::log_key_operation_event_stats; 136 use crate::utils::watchdog as wd; 137 use android_hardware_security_keymint::aidl::android::hardware::security::keymint::{ 138 IKeyMintOperation::IKeyMintOperation, KeyParameter::KeyParameter, KeyPurpose::KeyPurpose, 139 SecurityLevel::SecurityLevel, 140 }; 141 use android_hardware_security_keymint::binder::{BinderFeatures, Strong}; 142 use android_system_keystore2::aidl::android::system::keystore2::{ 143 IKeystoreOperation::BnKeystoreOperation, IKeystoreOperation::IKeystoreOperation, 144 }; 145 use anyhow::{anyhow, Context, Result}; 146 use std::{ 147 collections::HashMap, 148 sync::{Arc, Mutex, MutexGuard, Weak}, 149 time::Duration, 150 time::Instant, 151 }; 152 153 /// Operations have `Outcome::Unknown` as long as they are active. They transition 154 /// to one of the other variants exactly once. The distinction in outcome is mainly 155 /// for the statistic. 156 #[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd)] 157 pub enum Outcome { 158 /// Operations have `Outcome::Unknown` as long as they are active. 159 Unknown, 160 /// Operation is successful. 161 Success, 162 /// Operation is aborted. 163 Abort, 164 /// Operation is dropped. 165 Dropped, 166 /// Operation is pruned. 167 Pruned, 168 /// Operation is failed with the error code. 169 ErrorCode(SerializedError), 170 } 171 172 /// Operation bundles all of the operation related resources and tracks the operation's 173 /// outcome. 174 #[derive(Debug)] 175 pub struct Operation { 176 // The index of this operation in the OperationDb. 177 index: usize, 178 km_op: Strong<dyn IKeyMintOperation>, 179 last_usage: Mutex<Instant>, 180 outcome: Mutex<Outcome>, 181 owner: u32, // Uid of the operation's owner. 182 auth_info: Mutex<AuthInfo>, 183 forced: bool, 184 logging_info: LoggingInfo, 185 } 186 187 /// Keeps track of the information required for logging operations. 188 #[derive(Debug)] 189 pub struct LoggingInfo { 190 sec_level: SecurityLevel, 191 purpose: KeyPurpose, 192 op_params: Vec<KeyParameter>, 193 key_upgraded: bool, 194 } 195 196 impl LoggingInfo { 197 /// Constructor new( sec_level: SecurityLevel, purpose: KeyPurpose, op_params: Vec<KeyParameter>, key_upgraded: bool, ) -> LoggingInfo198 pub fn new( 199 sec_level: SecurityLevel, 200 purpose: KeyPurpose, 201 op_params: Vec<KeyParameter>, 202 key_upgraded: bool, 203 ) -> LoggingInfo { 204 Self { sec_level, purpose, op_params, key_upgraded } 205 } 206 } 207 208 struct PruningInfo { 209 last_usage: Instant, 210 owner: u32, 211 index: usize, 212 forced: bool, 213 } 214 215 // We don't except more than 32KiB of data in `update`, `updateAad`, and `finish`. 216 const MAX_RECEIVE_DATA: usize = 0x8000; 217 218 impl Operation { 219 /// Constructor new( index: usize, km_op: binder::Strong<dyn IKeyMintOperation>, owner: u32, auth_info: AuthInfo, forced: bool, logging_info: LoggingInfo, ) -> Self220 pub fn new( 221 index: usize, 222 km_op: binder::Strong<dyn IKeyMintOperation>, 223 owner: u32, 224 auth_info: AuthInfo, 225 forced: bool, 226 logging_info: LoggingInfo, 227 ) -> Self { 228 Self { 229 index, 230 km_op, 231 last_usage: Mutex::new(Instant::now()), 232 outcome: Mutex::new(Outcome::Unknown), 233 owner, 234 auth_info: Mutex::new(auth_info), 235 forced, 236 logging_info, 237 } 238 } 239 watch(&self, id: &'static str) -> Option<wd::WatchPoint>240 fn watch(&self, id: &'static str) -> Option<wd::WatchPoint> { 241 let sec_level = self.logging_info.sec_level; 242 wd::watch_millis_with(id, wd::DEFAULT_TIMEOUT_MS, sec_level) 243 } 244 get_pruning_info(&self) -> Option<PruningInfo>245 fn get_pruning_info(&self) -> Option<PruningInfo> { 246 // An operation may be finalized. 247 if let Ok(guard) = self.outcome.try_lock() { 248 match *guard { 249 Outcome::Unknown => {} 250 // If the outcome is any other than unknown, it has been finalized, 251 // and we can no longer consider it for pruning. 252 _ => return None, 253 } 254 } 255 // Else: If we could not grab the lock, this means that the operation is currently 256 // being used and it may be transitioning to finalized or it was simply updated. 257 // In any case it is fair game to consider it for pruning. If the operation 258 // transitioned to a final state, we will notice when we attempt to prune, and 259 // a subsequent attempt to create a new operation will succeed. 260 Some(PruningInfo { 261 // Expect safety: 262 // `last_usage` is locked only for primitive single line statements. 263 // There is no chance to panic and poison the mutex. 264 last_usage: *self.last_usage.lock().expect("In get_pruning_info."), 265 owner: self.owner, 266 index: self.index, 267 forced: self.forced, 268 }) 269 } 270 prune(&self, last_usage: Instant) -> Result<(), Error>271 fn prune(&self, last_usage: Instant) -> Result<(), Error> { 272 let mut locked_outcome = match self.outcome.try_lock() { 273 Ok(guard) => match *guard { 274 Outcome::Unknown => guard, 275 _ => return Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)), 276 }, 277 Err(_) => return Err(Error::Rc(ResponseCode::OPERATION_BUSY)), 278 }; 279 280 // In `OperationDb::prune`, which is our caller, we first gather the pruning 281 // information including the last usage. When we select a candidate 282 // we call `prune` on that candidate passing the last_usage 283 // that we gathered earlier. If the actual last usage 284 // has changed since than, it means the operation was busy in the 285 // meantime, which means that we have to reevaluate the pruning score. 286 // 287 // Expect safety: 288 // `last_usage` is locked only for primitive single line statements. 289 // There is no chance to panic and poison the mutex. 290 if *self.last_usage.lock().expect("In Operation::prune()") != last_usage { 291 return Err(Error::Rc(ResponseCode::OPERATION_BUSY)); 292 } 293 *locked_outcome = Outcome::Pruned; 294 295 let _wp = self.watch("Operation::prune: calling IKeyMintOperation::abort()"); 296 297 // We abort the operation. If there was an error we log it but ignore it. 298 if let Err(e) = map_km_error(self.km_op.abort()) { 299 log::warn!("In prune: KeyMint::abort failed with {:?}.", e); 300 } 301 302 Ok(()) 303 } 304 305 // This function takes a Result from a KeyMint call and inspects it for errors. 306 // If an error was found it updates the given `locked_outcome` accordingly. 307 // It forwards the Result unmodified. 308 // The precondition to this call must be *locked_outcome == Outcome::Unknown. 309 // Ideally the `locked_outcome` came from a successful call to `check_active` 310 // see below. update_outcome<T>( &self, locked_outcome: &mut Outcome, err: Result<T, Error>, ) -> Result<T, Error>311 fn update_outcome<T>( 312 &self, 313 locked_outcome: &mut Outcome, 314 err: Result<T, Error>, 315 ) -> Result<T, Error> { 316 if let Err(e) = &err { 317 *locked_outcome = Outcome::ErrorCode(error_to_serialized_error(e)) 318 } 319 err 320 } 321 322 // This function grabs the outcome lock and checks the current outcome state. 323 // If the outcome is still `Outcome::Unknown`, this function returns 324 // the locked outcome for further updates. In any other case it returns 325 // ErrorCode::INVALID_OPERATION_HANDLE indicating that this operation has 326 // been finalized and is no longer active. check_active(&self) -> Result<MutexGuard<Outcome>>327 fn check_active(&self) -> Result<MutexGuard<Outcome>> { 328 let guard = self.outcome.lock().expect("In check_active."); 329 match *guard { 330 Outcome::Unknown => Ok(guard), 331 _ => Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) 332 .context(ks_err!("Call on finalized operation with outcome: {:?}.", *guard)), 333 } 334 } 335 336 // This function checks the amount of input data sent to us. We reject any buffer 337 // exceeding MAX_RECEIVE_DATA bytes as input to `update`, `update_aad`, and `finish` 338 // in order to force clients into using reasonable limits. check_input_length(data: &[u8]) -> Result<()>339 fn check_input_length(data: &[u8]) -> Result<()> { 340 if data.len() > MAX_RECEIVE_DATA { 341 // This error code is unique, no context required here. 342 return Err(anyhow!(Error::Rc(ResponseCode::TOO_MUCH_DATA))); 343 } 344 Ok(()) 345 } 346 347 // Update the last usage to now. touch(&self)348 fn touch(&self) { 349 // Expect safety: 350 // `last_usage` is locked only for primitive single line statements. 351 // There is no chance to panic and poison the mutex. 352 *self.last_usage.lock().expect("In touch.") = Instant::now(); 353 } 354 355 /// Implementation of `IKeystoreOperation::updateAad`. 356 /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details. update_aad(&self, aad_input: &[u8]) -> Result<()>357 fn update_aad(&self, aad_input: &[u8]) -> Result<()> { 358 let mut outcome = self.check_active().context("In update_aad")?; 359 Self::check_input_length(aad_input).context("In update_aad")?; 360 self.touch(); 361 362 let (hat, tst) = self 363 .auth_info 364 .lock() 365 .unwrap() 366 .before_update() 367 .context(ks_err!("Trying to get auth tokens for uid {}", self.owner))?; 368 369 self.update_outcome(&mut outcome, { 370 let _wp = self.watch("Operation::update_aad: calling IKeyMintOperation::updateAad"); 371 map_km_error(self.km_op.updateAad(aad_input, hat.as_ref(), tst.as_ref())) 372 }) 373 .context(ks_err!("Update failed for uid {}", self.owner))?; 374 375 Ok(()) 376 } 377 378 /// Implementation of `IKeystoreOperation::update`. 379 /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details. update(&self, input: &[u8]) -> Result<Option<Vec<u8>>>380 fn update(&self, input: &[u8]) -> Result<Option<Vec<u8>>> { 381 let mut outcome = self.check_active().context("In update")?; 382 Self::check_input_length(input).context("In update")?; 383 self.touch(); 384 385 let (hat, tst) = self 386 .auth_info 387 .lock() 388 .unwrap() 389 .before_update() 390 .context(ks_err!("Trying to get auth tokens for uid {}", self.owner))?; 391 392 let output = self 393 .update_outcome(&mut outcome, { 394 let _wp = self.watch("Operation::update: calling IKeyMintOperation::update"); 395 map_km_error(self.km_op.update(input, hat.as_ref(), tst.as_ref())) 396 }) 397 .context(ks_err!("Update failed for uid {}", self.owner))?; 398 399 if output.is_empty() { 400 Ok(None) 401 } else { 402 Ok(Some(output)) 403 } 404 } 405 406 /// Implementation of `IKeystoreOperation::finish`. 407 /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details. finish(&self, input: Option<&[u8]>, signature: Option<&[u8]>) -> Result<Option<Vec<u8>>>408 fn finish(&self, input: Option<&[u8]>, signature: Option<&[u8]>) -> Result<Option<Vec<u8>>> { 409 let mut outcome = self.check_active().context("In finish")?; 410 if let Some(input) = input { 411 Self::check_input_length(input).context("In finish")?; 412 } 413 self.touch(); 414 415 let (hat, tst, confirmation_token) = self 416 .auth_info 417 .lock() 418 .unwrap() 419 .before_finish() 420 .context(ks_err!("Trying to get auth tokens for uid {}", self.owner))?; 421 422 let output = self 423 .update_outcome(&mut outcome, { 424 let _wp = self.watch("Operation::finish: calling IKeyMintOperation::finish"); 425 map_km_error(self.km_op.finish( 426 input, 427 signature, 428 hat.as_ref(), 429 tst.as_ref(), 430 confirmation_token.as_deref(), 431 )) 432 }) 433 .context(ks_err!("Finish failed for uid {}", self.owner))?; 434 435 self.auth_info.lock().unwrap().after_finish().context("In finish.")?; 436 437 // At this point the operation concluded successfully. 438 *outcome = Outcome::Success; 439 440 if output.is_empty() { 441 Ok(None) 442 } else { 443 Ok(Some(output)) 444 } 445 } 446 447 /// Aborts the operation if it is active. IFF the operation is aborted the outcome is 448 /// set to `outcome`. `outcome` must reflect the reason for the abort. Since the operation 449 /// gets aborted `outcome` must not be `Operation::Success` or `Operation::Unknown`. abort(&self, outcome: Outcome) -> Result<()>450 fn abort(&self, outcome: Outcome) -> Result<()> { 451 let mut locked_outcome = self.check_active().context("In abort")?; 452 *locked_outcome = outcome; 453 454 { 455 let _wp = self.watch("Operation::abort: calling IKeyMintOperation::abort"); 456 map_km_error(self.km_op.abort()).context(ks_err!("KeyMint::abort failed.")) 457 } 458 } 459 } 460 461 impl Drop for Operation { drop(&mut self)462 fn drop(&mut self) { 463 let guard = self.outcome.lock().expect("In drop."); 464 log_key_operation_event_stats( 465 self.logging_info.sec_level, 466 self.logging_info.purpose, 467 &(self.logging_info.op_params), 468 &guard, 469 self.logging_info.key_upgraded, 470 ); 471 if let Outcome::Unknown = *guard { 472 drop(guard); 473 // If the operation was still active we call abort, setting 474 // the outcome to `Outcome::Dropped` 475 if let Err(e) = self.abort(Outcome::Dropped) { 476 log::error!("While dropping Operation: abort failed:\n {:?}", e); 477 } 478 } 479 } 480 } 481 482 /// The OperationDb holds weak references to all ongoing operations. 483 /// Its main purpose is to facilitate operation pruning. 484 #[derive(Debug, Default)] 485 pub struct OperationDb { 486 // TODO replace Vec with WeakTable when the weak_table crate becomes 487 // available. 488 operations: Mutex<Vec<Weak<Operation>>>, 489 } 490 491 impl OperationDb { 492 /// Creates a new OperationDb. new() -> Self493 pub fn new() -> Self { 494 Self { operations: Mutex::new(Vec::new()) } 495 } 496 497 /// Creates a new operation. 498 /// This function takes a KeyMint operation and an associated 499 /// owner uid and returns a new Operation wrapped in a `std::sync::Arc`. create_operation( &self, km_op: binder::Strong<dyn IKeyMintOperation>, owner: u32, auth_info: AuthInfo, forced: bool, logging_info: LoggingInfo, ) -> Arc<Operation>500 pub fn create_operation( 501 &self, 502 km_op: binder::Strong<dyn IKeyMintOperation>, 503 owner: u32, 504 auth_info: AuthInfo, 505 forced: bool, 506 logging_info: LoggingInfo, 507 ) -> Arc<Operation> { 508 // We use unwrap because we don't allow code that can panic while locked. 509 let mut operations = self.operations.lock().expect("In create_operation."); 510 511 let mut index: usize = 0; 512 // First we iterate through the operation slots to try and find an unused 513 // slot. If we don't find one, we append the new entry instead. 514 match (*operations).iter_mut().find(|s| { 515 index += 1; 516 s.upgrade().is_none() 517 }) { 518 Some(free_slot) => { 519 let new_op = Arc::new(Operation::new( 520 index - 1, 521 km_op, 522 owner, 523 auth_info, 524 forced, 525 logging_info, 526 )); 527 *free_slot = Arc::downgrade(&new_op); 528 new_op 529 } 530 None => { 531 let new_op = Arc::new(Operation::new( 532 operations.len(), 533 km_op, 534 owner, 535 auth_info, 536 forced, 537 logging_info, 538 )); 539 operations.push(Arc::downgrade(&new_op)); 540 new_op 541 } 542 } 543 } 544 get(&self, index: usize) -> Option<Arc<Operation>>545 fn get(&self, index: usize) -> Option<Arc<Operation>> { 546 self.operations.lock().expect("In OperationDb::get.").get(index).and_then(|op| op.upgrade()) 547 } 548 549 /// Attempts to prune an operation. 550 /// 551 /// This function is used during operation creation, i.e., by 552 /// `KeystoreSecurityLevel::create_operation`, to try and free up an operation slot 553 /// if it got `ErrorCode::TOO_MANY_OPERATIONS` from the KeyMint backend. It is not 554 /// guaranteed that an operation slot is available after this call successfully 555 /// returned for various reasons. E.g., another thread may have snatched up the newly 556 /// available slot. Callers may have to call prune multiple times before they get a 557 /// free operation slot. Prune may also return `Err(Error::Rc(ResponseCode::BACKEND_BUSY))` 558 /// which indicates that no prunable operation was found. 559 /// 560 /// To find a suitable candidate we compute the malus for the caller and each existing 561 /// operation. The malus is the inverse of the pruning power (caller) or pruning 562 /// resistance (existing operation). 563 /// 564 /// The malus is based on the number of sibling operations and age. Sibling 565 /// operations are operations that have the same owner (UID). 566 /// 567 /// Every operation, existing or new, starts with a malus of 1. Every sibling 568 /// increases the malus by one. The age is the time since an operation was last touched. 569 /// It increases the malus by log6(<age in seconds> + 1) rounded down to the next 570 /// integer. So the malus increases stepwise after 5s, 35s, 215s, ... 571 /// Of two operations with the same malus the least recently used one is considered 572 /// weaker. 573 /// 574 /// For the caller to be able to prune an operation it must find an operation 575 /// with a malus higher than its own. 576 /// 577 /// The malus can be expressed as 578 /// ``` 579 /// malus = 1 + no_of_siblings + floor(log6(age_in_seconds + 1)) 580 /// ``` 581 /// where the constant `1` accounts for the operation under consideration. 582 /// In reality we compute it as 583 /// ``` 584 /// caller_malus = 1 + running_siblings 585 /// ``` 586 /// because the new operation has no age and is not included in the `running_siblings`, 587 /// and 588 /// ``` 589 /// running_malus = running_siblings + floor(log6(age_in_seconds + 1)) 590 /// ``` 591 /// because a running operation is included in the `running_siblings` and it has 592 /// an age. 593 /// 594 /// ## Example 595 /// A caller with no running operations has a malus of 1. Young (age < 5s) operations 596 /// also with no siblings have a malus of one and cannot be pruned by the caller. 597 /// We have to find an operation that has at least one sibling or is older than 5s. 598 /// 599 /// A caller with one running operation has a malus of 2. Now even young siblings 600 /// or single child aging (5s <= age < 35s) operations are off limit. An aging 601 /// sibling of two, however, would have a malus of 3 and would be fair game. 602 /// 603 /// ## Rationale 604 /// Due to the limitation of KeyMint operation slots, we cannot get around pruning or 605 /// a single app could easily DoS KeyMint. 606 /// Keystore 1.0 used to always prune the least recently used operation. This at least 607 /// guaranteed that new operations can always be started. With the increased usage 608 /// of Keystore we saw increased pruning activity which can lead to a livelock 609 /// situation in the worst case. 610 /// 611 /// With the new pruning strategy we want to provide well behaved clients with 612 /// progress assurances while punishing DoS attempts. As a result of this 613 /// strategy we can be in the situation where no operation can be pruned and the 614 /// creation of a new operation fails. This allows single child operations which 615 /// are frequently updated to complete, thereby breaking up livelock situations 616 /// and facilitating system wide progress. 617 /// 618 /// ## Update 619 /// We also allow callers to cannibalize their own sibling operations if no other 620 /// slot can be found. In this case the least recently used sibling is pruned. prune(&self, caller: u32, forced: bool) -> Result<(), Error>621 pub fn prune(&self, caller: u32, forced: bool) -> Result<(), Error> { 622 loop { 623 // Maps the uid of the owner to the number of operations that owner has 624 // (running_siblings). More operations per owner lowers the pruning 625 // resistance of the operations of that owner. Whereas the number of 626 // ongoing operations of the caller lowers the pruning power of the caller. 627 let mut owners: HashMap<u32, u64> = HashMap::new(); 628 let mut pruning_info: Vec<PruningInfo> = Vec::new(); 629 630 let now = Instant::now(); 631 self.operations 632 .lock() 633 .expect("In OperationDb::prune: Trying to lock self.operations.") 634 .iter() 635 .for_each(|op| { 636 if let Some(op) = op.upgrade() { 637 if let Some(p_info) = op.get_pruning_info() { 638 let owner = p_info.owner; 639 pruning_info.push(p_info); 640 // Count operations per owner. 641 *owners.entry(owner).or_insert(0) += 1; 642 } 643 } 644 }); 645 646 // If the operation is forced, the caller has a malus of 0. 647 let caller_malus = if forced { 0 } else { 1u64 + *owners.entry(caller).or_default() }; 648 649 // We iterate through all operations computing the malus and finding 650 // the candidate with the highest malus which must also be higher 651 // than the caller_malus. 652 struct CandidateInfo { 653 index: usize, 654 malus: u64, 655 last_usage: Instant, 656 age: Duration, 657 } 658 let mut oldest_caller_op: Option<CandidateInfo> = None; 659 let candidate = pruning_info.iter().fold( 660 None, 661 |acc: Option<CandidateInfo>, &PruningInfo { last_usage, owner, index, forced }| { 662 // Compute the age of the current operation. 663 let age = now 664 .checked_duration_since(last_usage) 665 .unwrap_or_else(|| Duration::new(0, 0)); 666 667 // Find the least recently used sibling as an alternative pruning candidate. 668 if owner == caller { 669 if let Some(CandidateInfo { age: a, .. }) = oldest_caller_op { 670 if age > a { 671 oldest_caller_op = 672 Some(CandidateInfo { index, malus: 0, last_usage, age }); 673 } 674 } else { 675 oldest_caller_op = 676 Some(CandidateInfo { index, malus: 0, last_usage, age }); 677 } 678 } 679 680 // Compute the malus of the current operation. 681 let malus = if forced { 682 // Forced operations have a malus of 0. And cannot even be pruned 683 // by other forced operations. 684 0 685 } else { 686 // Expect safety: Every owner in pruning_info was counted in 687 // the owners map. So this unwrap cannot panic. 688 *owners.get(&owner).expect( 689 "This is odd. We should have counted every owner in pruning_info.", 690 ) + ((age.as_secs() + 1) as f64).log(6.0).floor() as u64 691 }; 692 693 // Now check if the current operation is a viable/better candidate 694 // the one currently stored in the accumulator. 695 match acc { 696 // First we have to find any operation that is prunable by the caller. 697 None => { 698 if caller_malus < malus { 699 Some(CandidateInfo { index, malus, last_usage, age }) 700 } else { 701 None 702 } 703 } 704 // If we have found one we look for the operation with the worst score. 705 // If there is a tie, the older operation is considered weaker. 706 Some(CandidateInfo { index: i, malus: m, last_usage: l, age: a }) => { 707 if malus > m || (malus == m && age > a) { 708 Some(CandidateInfo { index, malus, last_usage, age }) 709 } else { 710 Some(CandidateInfo { index: i, malus: m, last_usage: l, age: a }) 711 } 712 } 713 } 714 }, 715 ); 716 717 // If we did not find a suitable candidate we may cannibalize our oldest sibling. 718 let candidate = candidate.or(oldest_caller_op); 719 720 match candidate { 721 Some(CandidateInfo { index, malus: _, last_usage, age: _ }) => { 722 match self.get(index) { 723 Some(op) => { 724 match op.prune(last_usage) { 725 // We successfully freed up a slot. 726 Ok(()) => break Ok(()), 727 // This means the operation we tried to prune was on its way 728 // out. It also means that the slot it had occupied was freed up. 729 Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) => break Ok(()), 730 // This means the operation we tried to prune was currently 731 // servicing a request. There are two options. 732 // * Assume that it was touched, which means that its 733 // pruning resistance increased. In that case we have 734 // to start over and find another candidate. 735 // * Assume that the operation is transitioning to end-of-life. 736 // which means that we got a free slot for free. 737 // If we assume the first but the second is true, we prune 738 // a good operation without need (aggressive approach). 739 // If we assume the second but the first is true, our 740 // caller will attempt to create a new KeyMint operation, 741 // fail with `ErrorCode::TOO_MANY_OPERATIONS`, and call 742 // us again (conservative approach). 743 Err(Error::Rc(ResponseCode::OPERATION_BUSY)) => { 744 // We choose the conservative approach, because 745 // every needlessly pruned operation can impact 746 // the user experience. 747 // To switch to the aggressive approach replace 748 // the following line with `continue`. 749 break Ok(()); 750 } 751 752 // The candidate may have been touched so the score 753 // has changed since our evaluation. 754 _ => continue, 755 } 756 } 757 // This index does not exist any more. The operation 758 // in this slot was dropped. Good news, a slot 759 // has freed up. 760 None => break Ok(()), 761 } 762 } 763 // We did not get a pruning candidate. 764 None => break Err(Error::Rc(ResponseCode::BACKEND_BUSY)), 765 } 766 } 767 } 768 } 769 770 /// Implementation of IKeystoreOperation. 771 pub struct KeystoreOperation { 772 operation: Mutex<Option<Arc<Operation>>>, 773 } 774 775 impl KeystoreOperation { 776 /// Creates a new operation instance wrapped in a 777 /// BnKeystoreOperation proxy object. It also enables 778 /// `BinderFeatures::set_requesting_sid` on the new interface, because 779 /// we need it for checking Keystore permissions. new_native_binder(operation: Arc<Operation>) -> binder::Strong<dyn IKeystoreOperation>780 pub fn new_native_binder(operation: Arc<Operation>) -> binder::Strong<dyn IKeystoreOperation> { 781 BnKeystoreOperation::new_binder( 782 Self { operation: Mutex::new(Some(operation)) }, 783 BinderFeatures { set_requesting_sid: true, ..BinderFeatures::default() }, 784 ) 785 } 786 787 /// Grabs the outer operation mutex and calls `f` on the locked operation. 788 /// The function also deletes the operation if it returns with an error or if 789 /// `delete_op` is true. with_locked_operation<T, F>(&self, f: F, delete_op: bool) -> Result<T> where for<'a> F: FnOnce(&'a Operation) -> Result<T>,790 fn with_locked_operation<T, F>(&self, f: F, delete_op: bool) -> Result<T> 791 where 792 for<'a> F: FnOnce(&'a Operation) -> Result<T>, 793 { 794 let mut delete_op: bool = delete_op; 795 match self.operation.try_lock() { 796 Ok(mut mutex_guard) => { 797 let result = match &*mutex_guard { 798 Some(op) => { 799 let result = f(op); 800 // Any error here means we can discard the operation. 801 if result.is_err() { 802 delete_op = true; 803 } 804 result 805 } 806 None => Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) 807 .context(ks_err!("KeystoreOperation::with_locked_operation")), 808 }; 809 810 if delete_op { 811 // We give up our reference to the Operation, thereby freeing up our 812 // internal resources and ending the wrapped KeyMint operation. 813 // This KeystoreOperation object will still be owned by an SpIBinder 814 // until the client drops its remote reference. 815 *mutex_guard = None; 816 } 817 result 818 } 819 Err(_) => Err(Error::Rc(ResponseCode::OPERATION_BUSY)) 820 .context(ks_err!("KeystoreOperation::with_locked_operation")), 821 } 822 } 823 } 824 825 impl binder::Interface for KeystoreOperation {} 826 827 impl IKeystoreOperation for KeystoreOperation { updateAad(&self, aad_input: &[u8]) -> binder::Result<()>828 fn updateAad(&self, aad_input: &[u8]) -> binder::Result<()> { 829 let _wp = wd::watch("IKeystoreOperation::updateAad"); 830 self.with_locked_operation( 831 |op| op.update_aad(aad_input).context(ks_err!("KeystoreOperation::updateAad")), 832 false, 833 ) 834 .map_err(into_logged_binder) 835 } 836 update(&self, input: &[u8]) -> binder::Result<Option<Vec<u8>>>837 fn update(&self, input: &[u8]) -> binder::Result<Option<Vec<u8>>> { 838 let _wp = wd::watch("IKeystoreOperation::update"); 839 self.with_locked_operation( 840 |op| op.update(input).context(ks_err!("KeystoreOperation::update")), 841 false, 842 ) 843 .map_err(into_logged_binder) 844 } finish( &self, input: Option<&[u8]>, signature: Option<&[u8]>, ) -> binder::Result<Option<Vec<u8>>>845 fn finish( 846 &self, 847 input: Option<&[u8]>, 848 signature: Option<&[u8]>, 849 ) -> binder::Result<Option<Vec<u8>>> { 850 let _wp = wd::watch("IKeystoreOperation::finish"); 851 self.with_locked_operation( 852 |op| op.finish(input, signature).context(ks_err!("KeystoreOperation::finish")), 853 true, 854 ) 855 .map_err(into_logged_binder) 856 } 857 abort(&self) -> binder::Result<()>858 fn abort(&self) -> binder::Result<()> { 859 let _wp = wd::watch("IKeystoreOperation::abort"); 860 let result = self.with_locked_operation( 861 |op| op.abort(Outcome::Abort).context(ks_err!("KeystoreOperation::abort")), 862 true, 863 ); 864 result.map_err(|e| { 865 match e.root_cause().downcast_ref::<Error>() { 866 // Calling abort on expired operations is something very common. 867 // There is no reason to clutter the log with it. It is never the cause 868 // for a true problem. 869 Some(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) => {} 870 _ => log::error!("{:?}", e), 871 }; 872 into_binder(e) 873 }) 874 } 875 } 876