1 // Copyright 2020, The Android Open Source Project 2 // 3 // Licensed under the Apache License, Version 2.0 (the "License"); 4 // you may not use this file except in compliance with the License. 5 // You may obtain a copy of the License at 6 // 7 // http://www.apache.org/licenses/LICENSE-2.0 8 // 9 // Unless required by applicable law or agreed to in writing, software 10 // distributed under the License is distributed on an "AS IS" BASIS, 11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 // See the License for the specific language governing permissions and 13 // limitations under the License. 14 15 //! This crate implements the `IKeystoreOperation` AIDL interface, which represents 16 //! an ongoing key operation, as well as the operation database, which is mainly 17 //! required for tracking operations for the purpose of pruning. 18 //! This crate also implements an operation pruning strategy. 19 //! 20 //! Operations implement the API calls update, finish, and abort. 21 //! Additionally, an operation can be dropped and pruned. The former 22 //! happens if the client deletes a binder to the operation object. 23 //! An existing operation may get pruned when running out of operation 24 //! slots and a new operation takes precedence. 25 //! 26 //! ## Operation Lifecycle 27 //! An operation gets created when the client calls `IKeystoreSecurityLevel::create`. 28 //! It may receive zero or more update request. The lifecycle ends when: 29 //! * `update` yields an error. 30 //! * `finish` is called. 31 //! * `abort` is called. 32 //! * The operation gets dropped. 33 //! * The operation gets pruned. 34 //! `Operation` has an `Outcome` member. While the outcome is `Outcome::Unknown`, 35 //! the operation is active and in a good state. Any of the above conditions may 36 //! change the outcome to one of the defined outcomes Success, Abort, Dropped, 37 //! Pruned, or ErrorCode. The latter is chosen in the case of an unexpected error, during 38 //! `update` or `finish`. `Success` is chosen iff `finish` completes without error. 39 //! Note that all operations get dropped eventually in the sense that they lose 40 //! their last reference and get destroyed. At that point, the fate of the operation 41 //! gets logged. However, an operation will transition to `Outcome::Dropped` iff 42 //! the operation was still active (`Outcome::Unknown`) at that time. 43 //! 44 //! ## Operation Dropping 45 //! To observe the dropping of an operation, we have to make sure that there 46 //! are no strong references to the IBinder representing this operation. 47 //! This would be simple enough if the operation object would need to be accessed 48 //! only by transactions. But to perform pruning, we have to retain a reference to the 49 //! original operation object. 50 //! 51 //! ## Operation Pruning 52 //! Pruning an operation happens during the creation of a new operation. 53 //! We have to iterate through the operation database to find a suitable 54 //! candidate. Then we abort and finalize this operation setting its outcome to 55 //! `Outcome::Pruned`. The corresponding KeyMint operation slot will have been freed 56 //! up at this point, but the `Operation` object lingers. When the client 57 //! attempts to use the operation again they will receive 58 //! ErrorCode::INVALID_OPERATION_HANDLE indicating that the operation no longer 59 //! exits. This should be the cue for the client to destroy its binder. 60 //! At that point the operation gets dropped. 61 //! 62 //! ## Architecture 63 //! The `IKeystoreOperation` trait is implemented by `KeystoreOperation`. 64 //! This acts as a proxy object holding a strong reference to actual operation 65 //! implementation `Operation`. 66 //! 67 //! ``` 68 //! struct KeystoreOperation { 69 //! operation: Mutex<Option<Arc<Operation>>>, 70 //! } 71 //! ``` 72 //! 73 //! The `Mutex` serves two purposes. It provides interior mutability allowing 74 //! us to set the Option to None. We do this when the life cycle ends during 75 //! a call to `update`, `finish`, or `abort`. As a result most of the Operation 76 //! related resources are freed. The `KeystoreOperation` proxy object still 77 //! lingers until dropped by the client. 78 //! The second purpose is to protect operations against concurrent usage. 79 //! Failing to lock this mutex yields `ResponseCode::OPERATION_BUSY` and indicates 80 //! a programming error in the client. 81 //! 82 //! Note that the Mutex only protects the operation against concurrent client calls. 83 //! We still retain weak references to the operation in the operation database: 84 //! 85 //! ``` 86 //! struct OperationDb { 87 //! operations: Mutex<Vec<Weak<Operation>>> 88 //! } 89 //! ``` 90 //! 91 //! This allows us to access the operations for the purpose of pruning. 92 //! We do this in three phases. 93 //! 1. We gather the pruning information. Besides non mutable information, 94 //! we access `last_usage` which is protected by a mutex. 95 //! We only lock this mutex for single statements at a time. During 96 //! this phase we hold the operation db lock. 97 //! 2. We choose a pruning candidate by computing the pruning resistance 98 //! of each operation. We do this entirely with information we now 99 //! have on the stack without holding any locks. 100 //! (See `OperationDb::prune` for more details on the pruning strategy.) 101 //! 3. During pruning we briefly lock the operation database again to get the 102 //! the pruning candidate by index. We then attempt to abort the candidate. 103 //! If the candidate was touched in the meantime or is currently fulfilling 104 //! a request (i.e., the client calls update, finish, or abort), 105 //! we go back to 1 and try again. 106 //! 107 //! So the outer Mutex in `KeystoreOperation::operation` only protects 108 //! operations against concurrent client calls but not against concurrent 109 //! pruning attempts. This is what the `Operation::outcome` mutex is used for. 110 //! 111 //! ``` 112 //! struct Operation { 113 //! ... 114 //! outcome: Mutex<Outcome>, 115 //! ... 116 //! } 117 //! ``` 118 //! 119 //! Any request that can change the outcome, i.e., `update`, `finish`, `abort`, 120 //! `drop`, and `prune` has to take the outcome lock and check if the outcome 121 //! is still `Outcome::Unknown` before entering. `prune` is special in that 122 //! it will `try_lock`, because we don't want to be blocked on a potentially 123 //! long running request at another operation. If it fails to get the lock 124 //! the operation is either being touched, which changes its pruning resistance, 125 //! or it transitions to its end-of-life, which means we may get a free slot. 126 //! Either way, we have to revaluate the pruning scores. 127 128 use crate::enforcements::AuthInfo; 129 use crate::error::{map_err_with, map_km_error, map_or_log_err, Error, ErrorCode, ResponseCode}; 130 use crate::metrics_store::log_key_operation_event_stats; 131 use crate::utils::{watchdog as wd, Asp}; 132 use android_hardware_security_keymint::aidl::android::hardware::security::keymint::{ 133 IKeyMintOperation::IKeyMintOperation, KeyParameter::KeyParameter, KeyPurpose::KeyPurpose, 134 SecurityLevel::SecurityLevel, 135 }; 136 use android_hardware_security_keymint::binder::BinderFeatures; 137 use android_system_keystore2::aidl::android::system::keystore2::{ 138 IKeystoreOperation::BnKeystoreOperation, IKeystoreOperation::IKeystoreOperation, 139 }; 140 use anyhow::{anyhow, Context, Result}; 141 use std::{ 142 collections::HashMap, 143 sync::{Arc, Mutex, MutexGuard, Weak}, 144 time::Duration, 145 time::Instant, 146 }; 147 148 /// Operations have `Outcome::Unknown` as long as they are active. They transition 149 /// to one of the other variants exactly once. The distinction in outcome is mainly 150 /// for the statistic. 151 #[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd)] 152 pub enum Outcome { 153 /// Operations have `Outcome::Unknown` as long as they are active. 154 Unknown, 155 /// Operation is successful. 156 Success, 157 /// Operation is aborted. 158 Abort, 159 /// Operation is dropped. 160 Dropped, 161 /// Operation is pruned. 162 Pruned, 163 /// Operation is failed with the error code. 164 ErrorCode(ErrorCode), 165 } 166 167 /// Operation bundles all of the operation related resources and tracks the operation's 168 /// outcome. 169 #[derive(Debug)] 170 pub struct Operation { 171 // The index of this operation in the OperationDb. 172 index: usize, 173 km_op: Asp, 174 last_usage: Mutex<Instant>, 175 outcome: Mutex<Outcome>, 176 owner: u32, // Uid of the operation's owner. 177 auth_info: Mutex<AuthInfo>, 178 forced: bool, 179 logging_info: LoggingInfo, 180 } 181 182 /// Keeps track of the information required for logging operations. 183 #[derive(Debug)] 184 pub struct LoggingInfo { 185 sec_level: SecurityLevel, 186 purpose: KeyPurpose, 187 op_params: Vec<KeyParameter>, 188 key_upgraded: bool, 189 } 190 191 impl LoggingInfo { 192 /// Constructor new( sec_level: SecurityLevel, purpose: KeyPurpose, op_params: Vec<KeyParameter>, key_upgraded: bool, ) -> LoggingInfo193 pub fn new( 194 sec_level: SecurityLevel, 195 purpose: KeyPurpose, 196 op_params: Vec<KeyParameter>, 197 key_upgraded: bool, 198 ) -> LoggingInfo { 199 Self { sec_level, purpose, op_params, key_upgraded } 200 } 201 } 202 203 struct PruningInfo { 204 last_usage: Instant, 205 owner: u32, 206 index: usize, 207 forced: bool, 208 } 209 210 // We don't except more than 32KiB of data in `update`, `updateAad`, and `finish`. 211 const MAX_RECEIVE_DATA: usize = 0x8000; 212 213 impl Operation { 214 /// Constructor new( index: usize, km_op: binder::Strong<dyn IKeyMintOperation>, owner: u32, auth_info: AuthInfo, forced: bool, logging_info: LoggingInfo, ) -> Self215 pub fn new( 216 index: usize, 217 km_op: binder::Strong<dyn IKeyMintOperation>, 218 owner: u32, 219 auth_info: AuthInfo, 220 forced: bool, 221 logging_info: LoggingInfo, 222 ) -> Self { 223 Self { 224 index, 225 km_op: Asp::new(km_op.as_binder()), 226 last_usage: Mutex::new(Instant::now()), 227 outcome: Mutex::new(Outcome::Unknown), 228 owner, 229 auth_info: Mutex::new(auth_info), 230 forced, 231 logging_info, 232 } 233 } 234 get_pruning_info(&self) -> Option<PruningInfo>235 fn get_pruning_info(&self) -> Option<PruningInfo> { 236 // An operation may be finalized. 237 if let Ok(guard) = self.outcome.try_lock() { 238 match *guard { 239 Outcome::Unknown => {} 240 // If the outcome is any other than unknown, it has been finalized, 241 // and we can no longer consider it for pruning. 242 _ => return None, 243 } 244 } 245 // Else: If we could not grab the lock, this means that the operation is currently 246 // being used and it may be transitioning to finalized or it was simply updated. 247 // In any case it is fair game to consider it for pruning. If the operation 248 // transitioned to a final state, we will notice when we attempt to prune, and 249 // a subsequent attempt to create a new operation will succeed. 250 Some(PruningInfo { 251 // Expect safety: 252 // `last_usage` is locked only for primitive single line statements. 253 // There is no chance to panic and poison the mutex. 254 last_usage: *self.last_usage.lock().expect("In get_pruning_info."), 255 owner: self.owner, 256 index: self.index, 257 forced: self.forced, 258 }) 259 } 260 prune(&self, last_usage: Instant) -> Result<(), Error>261 fn prune(&self, last_usage: Instant) -> Result<(), Error> { 262 let mut locked_outcome = match self.outcome.try_lock() { 263 Ok(guard) => match *guard { 264 Outcome::Unknown => guard, 265 _ => return Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)), 266 }, 267 Err(_) => return Err(Error::Rc(ResponseCode::OPERATION_BUSY)), 268 }; 269 270 // In `OperationDb::prune`, which is our caller, we first gather the pruning 271 // information including the last usage. When we select a candidate 272 // we call `prune` on that candidate passing the last_usage 273 // that we gathered earlier. If the actual last usage 274 // has changed since than, it means the operation was busy in the 275 // meantime, which means that we have to reevaluate the pruning score. 276 // 277 // Expect safety: 278 // `last_usage` is locked only for primitive single line statements. 279 // There is no chance to panic and poison the mutex. 280 if *self.last_usage.lock().expect("In Operation::prune()") != last_usage { 281 return Err(Error::Rc(ResponseCode::OPERATION_BUSY)); 282 } 283 *locked_outcome = Outcome::Pruned; 284 285 let km_op: binder::public_api::Strong<dyn IKeyMintOperation> = 286 match self.km_op.get_interface() { 287 Ok(km_op) => km_op, 288 Err(e) => { 289 log::error!("In prune: Failed to get KeyMintOperation interface.\n {:?}", e); 290 return Err(Error::sys()); 291 } 292 }; 293 294 let _wp = wd::watch_millis("In Operation::prune: calling abort()", 500); 295 296 // We abort the operation. If there was an error we log it but ignore it. 297 if let Err(e) = map_km_error(km_op.abort()) { 298 log::error!("In prune: KeyMint::abort failed with {:?}.", e); 299 } 300 301 Ok(()) 302 } 303 304 // This function takes a Result from a KeyMint call and inspects it for errors. 305 // If an error was found it updates the given `locked_outcome` accordingly. 306 // It forwards the Result unmodified. 307 // The precondition to this call must be *locked_outcome == Outcome::Unknown. 308 // Ideally the `locked_outcome` came from a successful call to `check_active` 309 // see below. update_outcome<T>( &self, locked_outcome: &mut Outcome, err: Result<T, Error>, ) -> Result<T, Error>310 fn update_outcome<T>( 311 &self, 312 locked_outcome: &mut Outcome, 313 err: Result<T, Error>, 314 ) -> Result<T, Error> { 315 match &err { 316 Err(Error::Km(e)) => *locked_outcome = Outcome::ErrorCode(*e), 317 Err(_) => *locked_outcome = Outcome::ErrorCode(ErrorCode::UNKNOWN_ERROR), 318 Ok(_) => (), 319 } 320 err 321 } 322 323 // This function grabs the outcome lock and checks the current outcome state. 324 // If the outcome is still `Outcome::Unknown`, this function returns 325 // the locked outcome for further updates. In any other case it returns 326 // ErrorCode::INVALID_OPERATION_HANDLE indicating that this operation has 327 // been finalized and is no longer active. check_active(&self) -> Result<MutexGuard<Outcome>>328 fn check_active(&self) -> Result<MutexGuard<Outcome>> { 329 let guard = self.outcome.lock().expect("In check_active."); 330 match *guard { 331 Outcome::Unknown => Ok(guard), 332 _ => Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)).context(format!( 333 "In check_active: Call on finalized operation with outcome: {:?}.", 334 *guard 335 )), 336 } 337 } 338 339 // This function checks the amount of input data sent to us. We reject any buffer 340 // exceeding MAX_RECEIVE_DATA bytes as input to `update`, `update_aad`, and `finish` 341 // in order to force clients into using reasonable limits. check_input_length(data: &[u8]) -> Result<()>342 fn check_input_length(data: &[u8]) -> Result<()> { 343 if data.len() > MAX_RECEIVE_DATA { 344 // This error code is unique, no context required here. 345 return Err(anyhow!(Error::Rc(ResponseCode::TOO_MUCH_DATA))); 346 } 347 Ok(()) 348 } 349 350 // Update the last usage to now. touch(&self)351 fn touch(&self) { 352 // Expect safety: 353 // `last_usage` is locked only for primitive single line statements. 354 // There is no chance to panic and poison the mutex. 355 *self.last_usage.lock().expect("In touch.") = Instant::now(); 356 } 357 358 /// Implementation of `IKeystoreOperation::updateAad`. 359 /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details. update_aad(&self, aad_input: &[u8]) -> Result<()>360 fn update_aad(&self, aad_input: &[u8]) -> Result<()> { 361 let mut outcome = self.check_active().context("In update_aad")?; 362 Self::check_input_length(aad_input).context("In update_aad")?; 363 self.touch(); 364 365 let km_op: binder::public_api::Strong<dyn IKeyMintOperation> = 366 self.km_op.get_interface().context("In update: Failed to get KeyMintOperation.")?; 367 368 let (hat, tst) = self 369 .auth_info 370 .lock() 371 .unwrap() 372 .before_update() 373 .context("In update_aad: Trying to get auth tokens.")?; 374 375 self.update_outcome(&mut *outcome, { 376 let _wp = wd::watch_millis("Operation::update_aad: calling updateAad", 500); 377 map_km_error(km_op.updateAad(aad_input, hat.as_ref(), tst.as_ref())) 378 }) 379 .context("In update_aad: KeyMint::update failed.")?; 380 381 Ok(()) 382 } 383 384 /// Implementation of `IKeystoreOperation::update`. 385 /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details. update(&self, input: &[u8]) -> Result<Option<Vec<u8>>>386 fn update(&self, input: &[u8]) -> Result<Option<Vec<u8>>> { 387 let mut outcome = self.check_active().context("In update")?; 388 Self::check_input_length(input).context("In update")?; 389 self.touch(); 390 391 let km_op: binder::public_api::Strong<dyn IKeyMintOperation> = 392 self.km_op.get_interface().context("In update: Failed to get KeyMintOperation.")?; 393 394 let (hat, tst) = self 395 .auth_info 396 .lock() 397 .unwrap() 398 .before_update() 399 .context("In update: Trying to get auth tokens.")?; 400 401 let output = self 402 .update_outcome(&mut *outcome, { 403 let _wp = wd::watch_millis("Operation::update: calling update", 500); 404 map_km_error(km_op.update(input, hat.as_ref(), tst.as_ref())) 405 }) 406 .context("In update: KeyMint::update failed.")?; 407 408 if output.is_empty() { 409 Ok(None) 410 } else { 411 Ok(Some(output)) 412 } 413 } 414 415 /// Implementation of `IKeystoreOperation::finish`. 416 /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details. finish(&self, input: Option<&[u8]>, signature: Option<&[u8]>) -> Result<Option<Vec<u8>>>417 fn finish(&self, input: Option<&[u8]>, signature: Option<&[u8]>) -> Result<Option<Vec<u8>>> { 418 let mut outcome = self.check_active().context("In finish")?; 419 if let Some(input) = input { 420 Self::check_input_length(input).context("In finish")?; 421 } 422 self.touch(); 423 424 let km_op: binder::public_api::Strong<dyn IKeyMintOperation> = 425 self.km_op.get_interface().context("In finish: Failed to get KeyMintOperation.")?; 426 427 let (hat, tst, confirmation_token) = self 428 .auth_info 429 .lock() 430 .unwrap() 431 .before_finish() 432 .context("In finish: Trying to get auth tokens.")?; 433 434 let output = self 435 .update_outcome(&mut *outcome, { 436 let _wp = wd::watch_millis("Operation::finish: calling finish", 500); 437 map_km_error(km_op.finish( 438 input, 439 signature, 440 hat.as_ref(), 441 tst.as_ref(), 442 confirmation_token.as_deref(), 443 )) 444 }) 445 .context("In finish: KeyMint::finish failed.")?; 446 447 self.auth_info.lock().unwrap().after_finish().context("In finish.")?; 448 449 // At this point the operation concluded successfully. 450 *outcome = Outcome::Success; 451 452 if output.is_empty() { 453 Ok(None) 454 } else { 455 Ok(Some(output)) 456 } 457 } 458 459 /// Aborts the operation if it is active. IFF the operation is aborted the outcome is 460 /// set to `outcome`. `outcome` must reflect the reason for the abort. Since the operation 461 /// gets aborted `outcome` must not be `Operation::Success` or `Operation::Unknown`. abort(&self, outcome: Outcome) -> Result<()>462 fn abort(&self, outcome: Outcome) -> Result<()> { 463 let mut locked_outcome = self.check_active().context("In abort")?; 464 *locked_outcome = outcome; 465 let km_op: binder::public_api::Strong<dyn IKeyMintOperation> = 466 self.km_op.get_interface().context("In abort: Failed to get KeyMintOperation.")?; 467 468 { 469 let _wp = wd::watch_millis("Operation::abort: calling abort", 500); 470 map_km_error(km_op.abort()).context("In abort: KeyMint::abort failed.") 471 } 472 } 473 } 474 475 impl Drop for Operation { drop(&mut self)476 fn drop(&mut self) { 477 let guard = self.outcome.lock().expect("In drop."); 478 log_key_operation_event_stats( 479 self.logging_info.sec_level, 480 self.logging_info.purpose, 481 &(self.logging_info.op_params), 482 &guard, 483 self.logging_info.key_upgraded, 484 ); 485 if let Outcome::Unknown = *guard { 486 drop(guard); 487 // If the operation was still active we call abort, setting 488 // the outcome to `Outcome::Dropped` 489 if let Err(e) = self.abort(Outcome::Dropped) { 490 log::error!("While dropping Operation: abort failed:\n {:?}", e); 491 } 492 } 493 } 494 } 495 496 /// The OperationDb holds weak references to all ongoing operations. 497 /// Its main purpose is to facilitate operation pruning. 498 #[derive(Debug, Default)] 499 pub struct OperationDb { 500 // TODO replace Vec with WeakTable when the weak_table crate becomes 501 // available. 502 operations: Mutex<Vec<Weak<Operation>>>, 503 } 504 505 impl OperationDb { 506 /// Creates a new OperationDb. new() -> Self507 pub fn new() -> Self { 508 Self { operations: Mutex::new(Vec::new()) } 509 } 510 511 /// Creates a new operation. 512 /// This function takes a KeyMint operation and an associated 513 /// owner uid and returns a new Operation wrapped in a `std::sync::Arc`. create_operation( &self, km_op: binder::public_api::Strong<dyn IKeyMintOperation>, owner: u32, auth_info: AuthInfo, forced: bool, logging_info: LoggingInfo, ) -> Arc<Operation>514 pub fn create_operation( 515 &self, 516 km_op: binder::public_api::Strong<dyn IKeyMintOperation>, 517 owner: u32, 518 auth_info: AuthInfo, 519 forced: bool, 520 logging_info: LoggingInfo, 521 ) -> Arc<Operation> { 522 // We use unwrap because we don't allow code that can panic while locked. 523 let mut operations = self.operations.lock().expect("In create_operation."); 524 525 let mut index: usize = 0; 526 // First we iterate through the operation slots to try and find an unused 527 // slot. If we don't find one, we append the new entry instead. 528 match (*operations).iter_mut().find(|s| { 529 index += 1; 530 s.upgrade().is_none() 531 }) { 532 Some(free_slot) => { 533 let new_op = Arc::new(Operation::new( 534 index - 1, 535 km_op, 536 owner, 537 auth_info, 538 forced, 539 logging_info, 540 )); 541 *free_slot = Arc::downgrade(&new_op); 542 new_op 543 } 544 None => { 545 let new_op = Arc::new(Operation::new( 546 operations.len(), 547 km_op, 548 owner, 549 auth_info, 550 forced, 551 logging_info, 552 )); 553 operations.push(Arc::downgrade(&new_op)); 554 new_op 555 } 556 } 557 } 558 get(&self, index: usize) -> Option<Arc<Operation>>559 fn get(&self, index: usize) -> Option<Arc<Operation>> { 560 self.operations.lock().expect("In OperationDb::get.").get(index).and_then(|op| op.upgrade()) 561 } 562 563 /// Attempts to prune an operation. 564 /// 565 /// This function is used during operation creation, i.e., by 566 /// `KeystoreSecurityLevel::create_operation`, to try and free up an operation slot 567 /// if it got `ErrorCode::TOO_MANY_OPERATIONS` from the KeyMint backend. It is not 568 /// guaranteed that an operation slot is available after this call successfully 569 /// returned for various reasons. E.g., another thread may have snatched up the newly 570 /// available slot. Callers may have to call prune multiple times before they get a 571 /// free operation slot. Prune may also return `Err(Error::Rc(ResponseCode::BACKEND_BUSY))` 572 /// which indicates that no prunable operation was found. 573 /// 574 /// To find a suitable candidate we compute the malus for the caller and each existing 575 /// operation. The malus is the inverse of the pruning power (caller) or pruning 576 /// resistance (existing operation). 577 /// 578 /// The malus is based on the number of sibling operations and age. Sibling 579 /// operations are operations that have the same owner (UID). 580 /// 581 /// Every operation, existing or new, starts with a malus of 1. Every sibling 582 /// increases the malus by one. The age is the time since an operation was last touched. 583 /// It increases the malus by log6(<age in seconds> + 1) rounded down to the next 584 /// integer. So the malus increases stepwise after 5s, 35s, 215s, ... 585 /// Of two operations with the same malus the least recently used one is considered 586 /// weaker. 587 /// 588 /// For the caller to be able to prune an operation it must find an operation 589 /// with a malus higher than its own. 590 /// 591 /// The malus can be expressed as 592 /// ``` 593 /// malus = 1 + no_of_siblings + floor(log6(age_in_seconds + 1)) 594 /// ``` 595 /// where the constant `1` accounts for the operation under consideration. 596 /// In reality we compute it as 597 /// ``` 598 /// caller_malus = 1 + running_siblings 599 /// ``` 600 /// because the new operation has no age and is not included in the `running_siblings`, 601 /// and 602 /// ``` 603 /// running_malus = running_siblings + floor(log6(age_in_seconds + 1)) 604 /// ``` 605 /// because a running operation is included in the `running_siblings` and it has 606 /// an age. 607 /// 608 /// ## Example 609 /// A caller with no running operations has a malus of 1. Young (age < 5s) operations 610 /// also with no siblings have a malus of one and cannot be pruned by the caller. 611 /// We have to find an operation that has at least one sibling or is older than 5s. 612 /// 613 /// A caller with one running operation has a malus of 2. Now even young siblings 614 /// or single child aging (5s <= age < 35s) operations are off limit. An aging 615 /// sibling of two, however, would have a malus of 3 and would be fair game. 616 /// 617 /// ## Rationale 618 /// Due to the limitation of KeyMint operation slots, we cannot get around pruning or 619 /// a single app could easily DoS KeyMint. 620 /// Keystore 1.0 used to always prune the least recently used operation. This at least 621 /// guaranteed that new operations can always be started. With the increased usage 622 /// of Keystore we saw increased pruning activity which can lead to a livelock 623 /// situation in the worst case. 624 /// 625 /// With the new pruning strategy we want to provide well behaved clients with 626 /// progress assurances while punishing DoS attempts. As a result of this 627 /// strategy we can be in the situation where no operation can be pruned and the 628 /// creation of a new operation fails. This allows single child operations which 629 /// are frequently updated to complete, thereby breaking up livelock situations 630 /// and facilitating system wide progress. 631 /// 632 /// ## Update 633 /// We also allow callers to cannibalize their own sibling operations if no other 634 /// slot can be found. In this case the least recently used sibling is pruned. prune(&self, caller: u32, forced: bool) -> Result<(), Error>635 pub fn prune(&self, caller: u32, forced: bool) -> Result<(), Error> { 636 loop { 637 // Maps the uid of the owner to the number of operations that owner has 638 // (running_siblings). More operations per owner lowers the pruning 639 // resistance of the operations of that owner. Whereas the number of 640 // ongoing operations of the caller lowers the pruning power of the caller. 641 let mut owners: HashMap<u32, u64> = HashMap::new(); 642 let mut pruning_info: Vec<PruningInfo> = Vec::new(); 643 644 let now = Instant::now(); 645 self.operations 646 .lock() 647 .expect("In OperationDb::prune: Trying to lock self.operations.") 648 .iter() 649 .for_each(|op| { 650 if let Some(op) = op.upgrade() { 651 if let Some(p_info) = op.get_pruning_info() { 652 let owner = p_info.owner; 653 pruning_info.push(p_info); 654 // Count operations per owner. 655 *owners.entry(owner).or_insert(0) += 1; 656 } 657 } 658 }); 659 660 // If the operation is forced, the caller has a malus of 0. 661 let caller_malus = if forced { 0 } else { 1u64 + *owners.entry(caller).or_default() }; 662 663 // We iterate through all operations computing the malus and finding 664 // the candidate with the highest malus which must also be higher 665 // than the caller_malus. 666 struct CandidateInfo { 667 index: usize, 668 malus: u64, 669 last_usage: Instant, 670 age: Duration, 671 } 672 let mut oldest_caller_op: Option<CandidateInfo> = None; 673 let candidate = pruning_info.iter().fold( 674 None, 675 |acc: Option<CandidateInfo>, &PruningInfo { last_usage, owner, index, forced }| { 676 // Compute the age of the current operation. 677 let age = now 678 .checked_duration_since(last_usage) 679 .unwrap_or_else(|| Duration::new(0, 0)); 680 681 // Find the least recently used sibling as an alternative pruning candidate. 682 if owner == caller { 683 if let Some(CandidateInfo { age: a, .. }) = oldest_caller_op { 684 if age > a { 685 oldest_caller_op = 686 Some(CandidateInfo { index, malus: 0, last_usage, age }); 687 } 688 } else { 689 oldest_caller_op = 690 Some(CandidateInfo { index, malus: 0, last_usage, age }); 691 } 692 } 693 694 // Compute the malus of the current operation. 695 let malus = if forced { 696 // Forced operations have a malus of 0. And cannot even be pruned 697 // by other forced operations. 698 0 699 } else { 700 // Expect safety: Every owner in pruning_info was counted in 701 // the owners map. So this unwrap cannot panic. 702 *owners.get(&owner).expect( 703 "This is odd. We should have counted every owner in pruning_info.", 704 ) + ((age.as_secs() + 1) as f64).log(6.0).floor() as u64 705 }; 706 707 // Now check if the current operation is a viable/better candidate 708 // the one currently stored in the accumulator. 709 match acc { 710 // First we have to find any operation that is prunable by the caller. 711 None => { 712 if caller_malus < malus { 713 Some(CandidateInfo { index, malus, last_usage, age }) 714 } else { 715 None 716 } 717 } 718 // If we have found one we look for the operation with the worst score. 719 // If there is a tie, the older operation is considered weaker. 720 Some(CandidateInfo { index: i, malus: m, last_usage: l, age: a }) => { 721 if malus > m || (malus == m && age > a) { 722 Some(CandidateInfo { index, malus, last_usage, age }) 723 } else { 724 Some(CandidateInfo { index: i, malus: m, last_usage: l, age: a }) 725 } 726 } 727 } 728 }, 729 ); 730 731 // If we did not find a suitable candidate we may cannibalize our oldest sibling. 732 let candidate = candidate.or(oldest_caller_op); 733 734 match candidate { 735 Some(CandidateInfo { index, malus: _, last_usage, age: _ }) => { 736 match self.get(index) { 737 Some(op) => { 738 match op.prune(last_usage) { 739 // We successfully freed up a slot. 740 Ok(()) => break Ok(()), 741 // This means the operation we tried to prune was on its way 742 // out. It also means that the slot it had occupied was freed up. 743 Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) => break Ok(()), 744 // This means the operation we tried to prune was currently 745 // servicing a request. There are two options. 746 // * Assume that it was touched, which means that its 747 // pruning resistance increased. In that case we have 748 // to start over and find another candidate. 749 // * Assume that the operation is transitioning to end-of-life. 750 // which means that we got a free slot for free. 751 // If we assume the first but the second is true, we prune 752 // a good operation without need (aggressive approach). 753 // If we assume the second but the first is true, our 754 // caller will attempt to create a new KeyMint operation, 755 // fail with `ErrorCode::TOO_MANY_OPERATIONS`, and call 756 // us again (conservative approach). 757 Err(Error::Rc(ResponseCode::OPERATION_BUSY)) => { 758 // We choose the conservative approach, because 759 // every needlessly pruned operation can impact 760 // the user experience. 761 // To switch to the aggressive approach replace 762 // the following line with `continue`. 763 break Ok(()); 764 } 765 766 // The candidate may have been touched so the score 767 // has changed since our evaluation. 768 _ => continue, 769 } 770 } 771 // This index does not exist any more. The operation 772 // in this slot was dropped. Good news, a slot 773 // has freed up. 774 None => break Ok(()), 775 } 776 } 777 // We did not get a pruning candidate. 778 None => break Err(Error::Rc(ResponseCode::BACKEND_BUSY)), 779 } 780 } 781 } 782 } 783 784 /// Implementation of IKeystoreOperation. 785 pub struct KeystoreOperation { 786 operation: Mutex<Option<Arc<Operation>>>, 787 } 788 789 impl KeystoreOperation { 790 /// Creates a new operation instance wrapped in a 791 /// BnKeystoreOperation proxy object. It also enables 792 /// `BinderFeatures::set_requesting_sid` on the new interface, because 793 /// we need it for checking Keystore permissions. new_native_binder( operation: Arc<Operation>, ) -> binder::public_api::Strong<dyn IKeystoreOperation>794 pub fn new_native_binder( 795 operation: Arc<Operation>, 796 ) -> binder::public_api::Strong<dyn IKeystoreOperation> { 797 BnKeystoreOperation::new_binder( 798 Self { operation: Mutex::new(Some(operation)) }, 799 BinderFeatures { set_requesting_sid: true, ..BinderFeatures::default() }, 800 ) 801 } 802 803 /// Grabs the outer operation mutex and calls `f` on the locked operation. 804 /// The function also deletes the operation if it returns with an error or if 805 /// `delete_op` is true. with_locked_operation<T, F>(&self, f: F, delete_op: bool) -> Result<T> where for<'a> F: FnOnce(&'a Operation) -> Result<T>,806 fn with_locked_operation<T, F>(&self, f: F, delete_op: bool) -> Result<T> 807 where 808 for<'a> F: FnOnce(&'a Operation) -> Result<T>, 809 { 810 let mut delete_op: bool = delete_op; 811 match self.operation.try_lock() { 812 Ok(mut mutex_guard) => { 813 let result = match &*mutex_guard { 814 Some(op) => { 815 let result = f(&*op); 816 // Any error here means we can discard the operation. 817 if result.is_err() { 818 delete_op = true; 819 } 820 result 821 } 822 None => Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) 823 .context("In KeystoreOperation::with_locked_operation"), 824 }; 825 826 if delete_op { 827 // We give up our reference to the Operation, thereby freeing up our 828 // internal resources and ending the wrapped KeyMint operation. 829 // This KeystoreOperation object will still be owned by an SpIBinder 830 // until the client drops its remote reference. 831 *mutex_guard = None; 832 } 833 result 834 } 835 Err(_) => Err(Error::Rc(ResponseCode::OPERATION_BUSY)) 836 .context("In KeystoreOperation::with_locked_operation"), 837 } 838 } 839 } 840 841 impl binder::Interface for KeystoreOperation {} 842 843 impl IKeystoreOperation for KeystoreOperation { updateAad(&self, aad_input: &[u8]) -> binder::public_api::Result<()>844 fn updateAad(&self, aad_input: &[u8]) -> binder::public_api::Result<()> { 845 let _wp = wd::watch_millis("IKeystoreOperation::updateAad", 500); 846 map_or_log_err( 847 self.with_locked_operation( 848 |op| op.update_aad(aad_input).context("In KeystoreOperation::updateAad"), 849 false, 850 ), 851 Ok, 852 ) 853 } 854 update(&self, input: &[u8]) -> binder::public_api::Result<Option<Vec<u8>>>855 fn update(&self, input: &[u8]) -> binder::public_api::Result<Option<Vec<u8>>> { 856 let _wp = wd::watch_millis("IKeystoreOperation::update", 500); 857 map_or_log_err( 858 self.with_locked_operation( 859 |op| op.update(input).context("In KeystoreOperation::update"), 860 false, 861 ), 862 Ok, 863 ) 864 } finish( &self, input: Option<&[u8]>, signature: Option<&[u8]>, ) -> binder::public_api::Result<Option<Vec<u8>>>865 fn finish( 866 &self, 867 input: Option<&[u8]>, 868 signature: Option<&[u8]>, 869 ) -> binder::public_api::Result<Option<Vec<u8>>> { 870 let _wp = wd::watch_millis("IKeystoreOperation::finish", 500); 871 map_or_log_err( 872 self.with_locked_operation( 873 |op| op.finish(input, signature).context("In KeystoreOperation::finish"), 874 true, 875 ), 876 Ok, 877 ) 878 } 879 abort(&self) -> binder::public_api::Result<()>880 fn abort(&self) -> binder::public_api::Result<()> { 881 let _wp = wd::watch_millis("IKeystoreOperation::abort", 500); 882 map_err_with( 883 self.with_locked_operation( 884 |op| op.abort(Outcome::Abort).context("In KeystoreOperation::abort"), 885 true, 886 ), 887 |e| { 888 match e.root_cause().downcast_ref::<Error>() { 889 // Calling abort on expired operations is something very common. 890 // There is no reason to clutter the log with it. It is never the cause 891 // for a true problem. 892 Some(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) => {} 893 _ => log::error!("{:?}", e), 894 }; 895 e 896 }, 897 Ok, 898 ) 899 } 900 } 901