• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1 // Copyright 2020, The Android Open Source Project
2 //
3 // Licensed under the Apache License, Version 2.0 (the "License");
4 // you may not use this file except in compliance with the License.
5 // You may obtain a copy of the License at
6 //
7 //     http://www.apache.org/licenses/LICENSE-2.0
8 //
9 // Unless required by applicable law or agreed to in writing, software
10 // distributed under the License is distributed on an "AS IS" BASIS,
11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 // See the License for the specific language governing permissions and
13 // limitations under the License.
14 
15 //! This crate implements the `IKeystoreOperation` AIDL interface, which represents
16 //! an ongoing key operation, as well as the operation database, which is mainly
17 //! required for tracking operations for the purpose of pruning.
18 //! This crate also implements an operation pruning strategy.
19 //!
20 //! Operations implement the API calls update, finish, and abort.
21 //! Additionally, an operation can be dropped and pruned. The former
22 //! happens if the client deletes a binder to the operation object.
23 //! An existing operation may get pruned when running out of operation
24 //! slots and a new operation takes precedence.
25 //!
26 //! ## Operation Lifecycle
27 //! An operation gets created when the client calls `IKeystoreSecurityLevel::create`.
28 //! It may receive zero or more update request. The lifecycle ends when:
29 //!  * `update` yields an error.
30 //!  * `finish` is called.
31 //!  * `abort` is called.
32 //!  * The operation gets dropped.
33 //!  * The operation gets pruned.
34 //!
35 //! `Operation` has an `Outcome` member. While the outcome is `Outcome::Unknown`,
36 //! the operation is active and in a good state. Any of the above conditions may
37 //! change the outcome to one of the defined outcomes Success, Abort, Dropped,
38 //! Pruned, or ErrorCode. The latter is chosen in the case of an unexpected error, during
39 //! `update` or `finish`. `Success` is chosen iff `finish` completes without error.
40 //! Note that all operations get dropped eventually in the sense that they lose
41 //! their last reference and get destroyed. At that point, the fate of the operation
42 //! gets logged. However, an operation will transition to `Outcome::Dropped` iff
43 //! the operation was still active (`Outcome::Unknown`) at that time.
44 //!
45 //! ## Operation Dropping
46 //! To observe the dropping of an operation, we have to make sure that there
47 //! are no strong references to the IBinder representing this operation.
48 //! This would be simple enough if the operation object would need to be accessed
49 //! only by transactions. But to perform pruning, we have to retain a reference to the
50 //! original operation object.
51 //!
52 //! ## Operation Pruning
53 //! Pruning an operation happens during the creation of a new operation.
54 //! We have to iterate through the operation database to find a suitable
55 //! candidate. Then we abort and finalize this operation setting its outcome to
56 //! `Outcome::Pruned`. The corresponding KeyMint operation slot will have been freed
57 //! up at this point, but the `Operation` object lingers. When the client
58 //! attempts to use the operation again they will receive
59 //! ErrorCode::INVALID_OPERATION_HANDLE indicating that the operation no longer
60 //! exits. This should be the cue for the client to destroy its binder.
61 //! At that point the operation gets dropped.
62 //!
63 //! ## Architecture
64 //! The `IKeystoreOperation` trait is implemented by `KeystoreOperation`.
65 //! This acts as a proxy object holding a strong reference to actual operation
66 //! implementation `Operation`.
67 //!
68 //! ```
69 //! struct KeystoreOperation {
70 //!     operation: Mutex<Option<Arc<Operation>>>,
71 //! }
72 //! ```
73 //!
74 //! The `Mutex` serves two purposes. It provides interior mutability allowing
75 //! us to set the Option to None. We do this when the life cycle ends during
76 //! a call to `update`, `finish`, or `abort`. As a result most of the Operation
77 //! related resources are freed. The `KeystoreOperation` proxy object still
78 //! lingers until dropped by the client.
79 //! The second purpose is to protect operations against concurrent usage.
80 //! Failing to lock this mutex yields `ResponseCode::OPERATION_BUSY` and indicates
81 //! a programming error in the client.
82 //!
83 //! Note that the Mutex only protects the operation against concurrent client calls.
84 //! We still retain weak references to the operation in the operation database:
85 //!
86 //! ```
87 //! struct OperationDb {
88 //!     operations: Mutex<Vec<Weak<Operation>>>
89 //! }
90 //! ```
91 //!
92 //! This allows us to access the operations for the purpose of pruning.
93 //! We do this in three phases.
94 //!  1. We gather the pruning information. Besides non mutable information,
95 //!     we access `last_usage` which is protected by a mutex.
96 //!     We only lock this mutex for single statements at a time. During
97 //!     this phase we hold the operation db lock.
98 //!  2. We choose a pruning candidate by computing the pruning resistance
99 //!     of each operation. We do this entirely with information we now
100 //!     have on the stack without holding any locks.
101 //!     (See `OperationDb::prune` for more details on the pruning strategy.)
102 //!  3. During pruning we briefly lock the operation database again to get the
103 //!     the pruning candidate by index. We then attempt to abort the candidate.
104 //!     If the candidate was touched in the meantime or is currently fulfilling
105 //!     a request (i.e., the client calls update, finish, or abort),
106 //!     we go back to 1 and try again.
107 //!
108 //! So the outer Mutex in `KeystoreOperation::operation` only protects
109 //! operations against concurrent client calls but not against concurrent
110 //! pruning attempts. This is what the `Operation::outcome` mutex is used for.
111 //!
112 //! ```
113 //! struct Operation {
114 //!     ...
115 //!     outcome: Mutex<Outcome>,
116 //!     ...
117 //! }
118 //! ```
119 //!
120 //! Any request that can change the outcome, i.e., `update`, `finish`, `abort`,
121 //! `drop`, and `prune` has to take the outcome lock and check if the outcome
122 //! is still `Outcome::Unknown` before entering. `prune` is special in that
123 //! it will `try_lock`, because we don't want to be blocked on a potentially
124 //! long running request at another operation. If it fails to get the lock
125 //! the operation is either being touched, which changes its pruning resistance,
126 //! or it transitions to its end-of-life, which means we may get a free slot.
127 //! Either way, we have to revaluate the pruning scores.
128 
129 use crate::enforcements::AuthInfo;
130 use crate::error::{
131     error_to_serialized_error, into_binder, into_logged_binder, map_km_error, Error, ErrorCode,
132     ResponseCode, SerializedError,
133 };
134 use crate::ks_err;
135 use crate::metrics_store::log_key_operation_event_stats;
136 use crate::utils::watchdog as wd;
137 use android_hardware_security_keymint::aidl::android::hardware::security::keymint::{
138     IKeyMintOperation::IKeyMintOperation, KeyParameter::KeyParameter, KeyPurpose::KeyPurpose,
139     SecurityLevel::SecurityLevel,
140 };
141 use android_hardware_security_keymint::binder::{BinderFeatures, Strong};
142 use android_system_keystore2::aidl::android::system::keystore2::{
143     IKeystoreOperation::BnKeystoreOperation, IKeystoreOperation::IKeystoreOperation,
144 };
145 use anyhow::{anyhow, Context, Result};
146 use std::{
147     collections::HashMap,
148     sync::{Arc, Mutex, MutexGuard, Weak},
149     time::Duration,
150     time::Instant,
151 };
152 
153 /// Operations have `Outcome::Unknown` as long as they are active. They transition
154 /// to one of the other variants exactly once. The distinction in outcome is mainly
155 /// for the statistic.
156 #[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd)]
157 pub enum Outcome {
158     /// Operations have `Outcome::Unknown` as long as they are active.
159     Unknown,
160     /// Operation is successful.
161     Success,
162     /// Operation is aborted.
163     Abort,
164     /// Operation is dropped.
165     Dropped,
166     /// Operation is pruned.
167     Pruned,
168     /// Operation is failed with the error code.
169     ErrorCode(SerializedError),
170 }
171 
172 /// Operation bundles all of the operation related resources and tracks the operation's
173 /// outcome.
174 #[derive(Debug)]
175 pub struct Operation {
176     // The index of this operation in the OperationDb.
177     index: usize,
178     km_op: Strong<dyn IKeyMintOperation>,
179     last_usage: Mutex<Instant>,
180     outcome: Mutex<Outcome>,
181     owner: u32, // Uid of the operation's owner.
182     auth_info: Mutex<AuthInfo>,
183     forced: bool,
184     logging_info: LoggingInfo,
185 }
186 
187 /// Keeps track of the information required for logging operations.
188 #[derive(Debug)]
189 pub struct LoggingInfo {
190     sec_level: SecurityLevel,
191     purpose: KeyPurpose,
192     op_params: Vec<KeyParameter>,
193     key_upgraded: bool,
194 }
195 
196 impl LoggingInfo {
197     /// Constructor
new( sec_level: SecurityLevel, purpose: KeyPurpose, op_params: Vec<KeyParameter>, key_upgraded: bool, ) -> LoggingInfo198     pub fn new(
199         sec_level: SecurityLevel,
200         purpose: KeyPurpose,
201         op_params: Vec<KeyParameter>,
202         key_upgraded: bool,
203     ) -> LoggingInfo {
204         Self { sec_level, purpose, op_params, key_upgraded }
205     }
206 }
207 
208 struct PruningInfo {
209     last_usage: Instant,
210     owner: u32,
211     index: usize,
212     forced: bool,
213 }
214 
215 // We don't except more than 32KiB of data in `update`, `updateAad`, and `finish`.
216 const MAX_RECEIVE_DATA: usize = 0x8000;
217 
218 impl Operation {
219     /// Constructor
new( index: usize, km_op: binder::Strong<dyn IKeyMintOperation>, owner: u32, auth_info: AuthInfo, forced: bool, logging_info: LoggingInfo, ) -> Self220     pub fn new(
221         index: usize,
222         km_op: binder::Strong<dyn IKeyMintOperation>,
223         owner: u32,
224         auth_info: AuthInfo,
225         forced: bool,
226         logging_info: LoggingInfo,
227     ) -> Self {
228         Self {
229             index,
230             km_op,
231             last_usage: Mutex::new(Instant::now()),
232             outcome: Mutex::new(Outcome::Unknown),
233             owner,
234             auth_info: Mutex::new(auth_info),
235             forced,
236             logging_info,
237         }
238     }
239 
watch(&self, id: &'static str) -> Option<wd::WatchPoint>240     fn watch(&self, id: &'static str) -> Option<wd::WatchPoint> {
241         let sec_level = self.logging_info.sec_level;
242         wd::watch_millis_with(id, wd::DEFAULT_TIMEOUT_MS, sec_level)
243     }
244 
get_pruning_info(&self) -> Option<PruningInfo>245     fn get_pruning_info(&self) -> Option<PruningInfo> {
246         // An operation may be finalized.
247         if let Ok(guard) = self.outcome.try_lock() {
248             match *guard {
249                 Outcome::Unknown => {}
250                 // If the outcome is any other than unknown, it has been finalized,
251                 // and we can no longer consider it for pruning.
252                 _ => return None,
253             }
254         }
255         // Else: If we could not grab the lock, this means that the operation is currently
256         //       being used and it may be transitioning to finalized or it was simply updated.
257         //       In any case it is fair game to consider it for pruning. If the operation
258         //       transitioned to a final state, we will notice when we attempt to prune, and
259         //       a subsequent attempt to create a new operation will succeed.
260         Some(PruningInfo {
261             // Expect safety:
262             // `last_usage` is locked only for primitive single line statements.
263             // There is no chance to panic and poison the mutex.
264             last_usage: *self.last_usage.lock().expect("In get_pruning_info."),
265             owner: self.owner,
266             index: self.index,
267             forced: self.forced,
268         })
269     }
270 
prune(&self, last_usage: Instant) -> Result<(), Error>271     fn prune(&self, last_usage: Instant) -> Result<(), Error> {
272         let mut locked_outcome = match self.outcome.try_lock() {
273             Ok(guard) => match *guard {
274                 Outcome::Unknown => guard,
275                 _ => return Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)),
276             },
277             Err(_) => return Err(Error::Rc(ResponseCode::OPERATION_BUSY)),
278         };
279 
280         // In `OperationDb::prune`, which is our caller, we first gather the pruning
281         // information including the last usage. When we select a candidate
282         // we call `prune` on that candidate passing the last_usage
283         // that we gathered earlier. If the actual last usage
284         // has changed since than, it means the operation was busy in the
285         // meantime, which means that we have to reevaluate the pruning score.
286         //
287         // Expect safety:
288         // `last_usage` is locked only for primitive single line statements.
289         // There is no chance to panic and poison the mutex.
290         if *self.last_usage.lock().expect("In Operation::prune()") != last_usage {
291             return Err(Error::Rc(ResponseCode::OPERATION_BUSY));
292         }
293         *locked_outcome = Outcome::Pruned;
294 
295         let _wp = self.watch("Operation::prune: calling IKeyMintOperation::abort()");
296 
297         // We abort the operation. If there was an error we log it but ignore it.
298         if let Err(e) = map_km_error(self.km_op.abort()) {
299             log::warn!("In prune: KeyMint::abort failed with {:?}.", e);
300         }
301 
302         Ok(())
303     }
304 
305     // This function takes a Result from a KeyMint call and inspects it for errors.
306     // If an error was found it updates the given `locked_outcome` accordingly.
307     // It forwards the Result unmodified.
308     // The precondition to this call must be *locked_outcome == Outcome::Unknown.
309     // Ideally the `locked_outcome` came from a successful call to `check_active`
310     // see below.
update_outcome<T>( &self, locked_outcome: &mut Outcome, err: Result<T, Error>, ) -> Result<T, Error>311     fn update_outcome<T>(
312         &self,
313         locked_outcome: &mut Outcome,
314         err: Result<T, Error>,
315     ) -> Result<T, Error> {
316         if let Err(e) = &err {
317             *locked_outcome = Outcome::ErrorCode(error_to_serialized_error(e))
318         }
319         err
320     }
321 
322     // This function grabs the outcome lock and checks the current outcome state.
323     // If the outcome is still `Outcome::Unknown`, this function returns
324     // the locked outcome for further updates. In any other case it returns
325     // ErrorCode::INVALID_OPERATION_HANDLE indicating that this operation has
326     // been finalized and is no longer active.
check_active(&self) -> Result<MutexGuard<Outcome>>327     fn check_active(&self) -> Result<MutexGuard<Outcome>> {
328         let guard = self.outcome.lock().expect("In check_active.");
329         match *guard {
330             Outcome::Unknown => Ok(guard),
331             _ => Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE))
332                 .context(ks_err!("Call on finalized operation with outcome: {:?}.", *guard)),
333         }
334     }
335 
336     // This function checks the amount of input data sent to us. We reject any buffer
337     // exceeding MAX_RECEIVE_DATA bytes as input to `update`, `update_aad`, and `finish`
338     // in order to force clients into using reasonable limits.
check_input_length(data: &[u8]) -> Result<()>339     fn check_input_length(data: &[u8]) -> Result<()> {
340         if data.len() > MAX_RECEIVE_DATA {
341             // This error code is unique, no context required here.
342             return Err(anyhow!(Error::Rc(ResponseCode::TOO_MUCH_DATA)));
343         }
344         Ok(())
345     }
346 
347     // Update the last usage to now.
touch(&self)348     fn touch(&self) {
349         // Expect safety:
350         // `last_usage` is locked only for primitive single line statements.
351         // There is no chance to panic and poison the mutex.
352         *self.last_usage.lock().expect("In touch.") = Instant::now();
353     }
354 
355     /// Implementation of `IKeystoreOperation::updateAad`.
356     /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details.
update_aad(&self, aad_input: &[u8]) -> Result<()>357     fn update_aad(&self, aad_input: &[u8]) -> Result<()> {
358         let mut outcome = self.check_active().context("In update_aad")?;
359         Self::check_input_length(aad_input).context("In update_aad")?;
360         self.touch();
361 
362         let (hat, tst) = self
363             .auth_info
364             .lock()
365             .unwrap()
366             .before_update()
367             .context(ks_err!("Trying to get auth tokens for uid {}", self.owner))?;
368 
369         self.update_outcome(&mut outcome, {
370             let _wp = self.watch("Operation::update_aad: calling IKeyMintOperation::updateAad");
371             map_km_error(self.km_op.updateAad(aad_input, hat.as_ref(), tst.as_ref()))
372         })
373         .context(ks_err!("Update failed for uid {}", self.owner))?;
374 
375         Ok(())
376     }
377 
378     /// Implementation of `IKeystoreOperation::update`.
379     /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details.
update(&self, input: &[u8]) -> Result<Option<Vec<u8>>>380     fn update(&self, input: &[u8]) -> Result<Option<Vec<u8>>> {
381         let mut outcome = self.check_active().context("In update")?;
382         Self::check_input_length(input).context("In update")?;
383         self.touch();
384 
385         let (hat, tst) = self
386             .auth_info
387             .lock()
388             .unwrap()
389             .before_update()
390             .context(ks_err!("Trying to get auth tokens for uid {}", self.owner))?;
391 
392         let output = self
393             .update_outcome(&mut outcome, {
394                 let _wp = self.watch("Operation::update: calling IKeyMintOperation::update");
395                 map_km_error(self.km_op.update(input, hat.as_ref(), tst.as_ref()))
396             })
397             .context(ks_err!("Update failed for uid {}", self.owner))?;
398 
399         if output.is_empty() {
400             Ok(None)
401         } else {
402             Ok(Some(output))
403         }
404     }
405 
406     /// Implementation of `IKeystoreOperation::finish`.
407     /// Refer to the AIDL spec at system/hardware/interfaces/keystore2 for details.
finish(&self, input: Option<&[u8]>, signature: Option<&[u8]>) -> Result<Option<Vec<u8>>>408     fn finish(&self, input: Option<&[u8]>, signature: Option<&[u8]>) -> Result<Option<Vec<u8>>> {
409         let mut outcome = self.check_active().context("In finish")?;
410         if let Some(input) = input {
411             Self::check_input_length(input).context("In finish")?;
412         }
413         self.touch();
414 
415         let (hat, tst, confirmation_token) = self
416             .auth_info
417             .lock()
418             .unwrap()
419             .before_finish()
420             .context(ks_err!("Trying to get auth tokens for uid {}", self.owner))?;
421 
422         let output = self
423             .update_outcome(&mut outcome, {
424                 let _wp = self.watch("Operation::finish: calling IKeyMintOperation::finish");
425                 map_km_error(self.km_op.finish(
426                     input,
427                     signature,
428                     hat.as_ref(),
429                     tst.as_ref(),
430                     confirmation_token.as_deref(),
431                 ))
432             })
433             .context(ks_err!("Finish failed for uid {}", self.owner))?;
434 
435         self.auth_info.lock().unwrap().after_finish().context("In finish.")?;
436 
437         // At this point the operation concluded successfully.
438         *outcome = Outcome::Success;
439 
440         if output.is_empty() {
441             Ok(None)
442         } else {
443             Ok(Some(output))
444         }
445     }
446 
447     /// Aborts the operation if it is active. IFF the operation is aborted the outcome is
448     /// set to `outcome`. `outcome` must reflect the reason for the abort. Since the operation
449     /// gets aborted `outcome` must not be `Operation::Success` or `Operation::Unknown`.
abort(&self, outcome: Outcome) -> Result<()>450     fn abort(&self, outcome: Outcome) -> Result<()> {
451         let mut locked_outcome = self.check_active().context("In abort")?;
452         *locked_outcome = outcome;
453 
454         {
455             let _wp = self.watch("Operation::abort: calling IKeyMintOperation::abort");
456             map_km_error(self.km_op.abort()).context(ks_err!("KeyMint::abort failed."))
457         }
458     }
459 }
460 
461 impl Drop for Operation {
drop(&mut self)462     fn drop(&mut self) {
463         let guard = self.outcome.lock().expect("In drop.");
464         log_key_operation_event_stats(
465             self.logging_info.sec_level,
466             self.logging_info.purpose,
467             &(self.logging_info.op_params),
468             &guard,
469             self.logging_info.key_upgraded,
470         );
471         if let Outcome::Unknown = *guard {
472             drop(guard);
473             // If the operation was still active we call abort, setting
474             // the outcome to `Outcome::Dropped`
475             if let Err(e) = self.abort(Outcome::Dropped) {
476                 log::error!("While dropping Operation: abort failed:\n    {:?}", e);
477             }
478         }
479     }
480 }
481 
482 /// The OperationDb holds weak references to all ongoing operations.
483 /// Its main purpose is to facilitate operation pruning.
484 #[derive(Debug, Default)]
485 pub struct OperationDb {
486     // TODO replace Vec with WeakTable when the weak_table crate becomes
487     // available.
488     operations: Mutex<Vec<Weak<Operation>>>,
489 }
490 
491 impl OperationDb {
492     /// Creates a new OperationDb.
new() -> Self493     pub fn new() -> Self {
494         Self { operations: Mutex::new(Vec::new()) }
495     }
496 
497     /// Creates a new operation.
498     /// This function takes a KeyMint operation and an associated
499     /// owner uid and returns a new Operation wrapped in a `std::sync::Arc`.
create_operation( &self, km_op: binder::Strong<dyn IKeyMintOperation>, owner: u32, auth_info: AuthInfo, forced: bool, logging_info: LoggingInfo, ) -> Arc<Operation>500     pub fn create_operation(
501         &self,
502         km_op: binder::Strong<dyn IKeyMintOperation>,
503         owner: u32,
504         auth_info: AuthInfo,
505         forced: bool,
506         logging_info: LoggingInfo,
507     ) -> Arc<Operation> {
508         // We use unwrap because we don't allow code that can panic while locked.
509         let mut operations = self.operations.lock().expect("In create_operation.");
510 
511         let mut index: usize = 0;
512         // First we iterate through the operation slots to try and find an unused
513         // slot. If we don't find one, we append the new entry instead.
514         match (*operations).iter_mut().find(|s| {
515             index += 1;
516             s.upgrade().is_none()
517         }) {
518             Some(free_slot) => {
519                 let new_op = Arc::new(Operation::new(
520                     index - 1,
521                     km_op,
522                     owner,
523                     auth_info,
524                     forced,
525                     logging_info,
526                 ));
527                 *free_slot = Arc::downgrade(&new_op);
528                 new_op
529             }
530             None => {
531                 let new_op = Arc::new(Operation::new(
532                     operations.len(),
533                     km_op,
534                     owner,
535                     auth_info,
536                     forced,
537                     logging_info,
538                 ));
539                 operations.push(Arc::downgrade(&new_op));
540                 new_op
541             }
542         }
543     }
544 
get(&self, index: usize) -> Option<Arc<Operation>>545     fn get(&self, index: usize) -> Option<Arc<Operation>> {
546         self.operations.lock().expect("In OperationDb::get.").get(index).and_then(|op| op.upgrade())
547     }
548 
549     /// Attempts to prune an operation.
550     ///
551     /// This function is used during operation creation, i.e., by
552     /// `KeystoreSecurityLevel::create_operation`, to try and free up an operation slot
553     /// if it got `ErrorCode::TOO_MANY_OPERATIONS` from the KeyMint backend. It is not
554     /// guaranteed that an operation slot is available after this call successfully
555     /// returned for various reasons. E.g., another thread may have snatched up the newly
556     /// available slot. Callers may have to call prune multiple times before they get a
557     /// free operation slot. Prune may also return `Err(Error::Rc(ResponseCode::BACKEND_BUSY))`
558     /// which indicates that no prunable operation was found.
559     ///
560     /// To find a suitable candidate we compute the malus for the caller and each existing
561     /// operation. The malus is the inverse of the pruning power (caller) or pruning
562     /// resistance (existing operation).
563     ///
564     /// The malus is based on the number of sibling operations and age. Sibling
565     /// operations are operations that have the same owner (UID).
566     ///
567     /// Every operation, existing or new, starts with a malus of 1. Every sibling
568     /// increases the malus by one. The age is the time since an operation was last touched.
569     /// It increases the malus by log6(<age in seconds> + 1) rounded down to the next
570     /// integer. So the malus increases stepwise after 5s, 35s, 215s, ...
571     /// Of two operations with the same malus the least recently used one is considered
572     /// weaker.
573     ///
574     /// For the caller to be able to prune an operation it must find an operation
575     /// with a malus higher than its own.
576     ///
577     /// The malus can be expressed as
578     /// ```
579     /// malus = 1 + no_of_siblings + floor(log6(age_in_seconds + 1))
580     /// ```
581     /// where the constant `1` accounts for the operation under consideration.
582     /// In reality we compute it as
583     /// ```
584     /// caller_malus = 1 + running_siblings
585     /// ```
586     /// because the new operation has no age and is not included in the `running_siblings`,
587     /// and
588     /// ```
589     /// running_malus = running_siblings + floor(log6(age_in_seconds + 1))
590     /// ```
591     /// because a running operation is included in the `running_siblings` and it has
592     /// an age.
593     ///
594     /// ## Example
595     /// A caller with no running operations has a malus of 1. Young (age < 5s) operations
596     /// also with no siblings have a malus of one and cannot be pruned by the caller.
597     /// We have to find an operation that has at least one sibling or is older than 5s.
598     ///
599     /// A caller with one running operation has a malus of 2. Now even young siblings
600     /// or single child aging (5s <= age < 35s) operations are off limit. An aging
601     /// sibling of two, however, would have a malus of 3 and would be fair game.
602     ///
603     /// ## Rationale
604     /// Due to the limitation of KeyMint operation slots, we cannot get around pruning or
605     /// a single app could easily DoS KeyMint.
606     /// Keystore 1.0 used to always prune the least recently used operation. This at least
607     /// guaranteed that new operations can always be started. With the increased usage
608     /// of Keystore we saw increased pruning activity which can lead to a livelock
609     /// situation in the worst case.
610     ///
611     /// With the new pruning strategy we want to provide well behaved clients with
612     /// progress assurances while punishing DoS attempts. As a result of this
613     /// strategy we can be in the situation where no operation can be pruned and the
614     /// creation of a new operation fails. This allows single child operations which
615     /// are frequently updated to complete, thereby breaking up livelock situations
616     /// and facilitating system wide progress.
617     ///
618     /// ## Update
619     /// We also allow callers to cannibalize their own sibling operations if no other
620     /// slot can be found. In this case the least recently used sibling is pruned.
prune(&self, caller: u32, forced: bool) -> Result<(), Error>621     pub fn prune(&self, caller: u32, forced: bool) -> Result<(), Error> {
622         loop {
623             // Maps the uid of the owner to the number of operations that owner has
624             // (running_siblings). More operations per owner lowers the pruning
625             // resistance of the operations of that owner. Whereas the number of
626             // ongoing operations of the caller lowers the pruning power of the caller.
627             let mut owners: HashMap<u32, u64> = HashMap::new();
628             let mut pruning_info: Vec<PruningInfo> = Vec::new();
629 
630             let now = Instant::now();
631             self.operations
632                 .lock()
633                 .expect("In OperationDb::prune: Trying to lock self.operations.")
634                 .iter()
635                 .for_each(|op| {
636                     if let Some(op) = op.upgrade() {
637                         if let Some(p_info) = op.get_pruning_info() {
638                             let owner = p_info.owner;
639                             pruning_info.push(p_info);
640                             // Count operations per owner.
641                             *owners.entry(owner).or_insert(0) += 1;
642                         }
643                     }
644                 });
645 
646             // If the operation is forced, the caller has a malus of 0.
647             let caller_malus = if forced { 0 } else { 1u64 + *owners.entry(caller).or_default() };
648 
649             // We iterate through all operations computing the malus and finding
650             // the candidate with the highest malus which must also be higher
651             // than the caller_malus.
652             struct CandidateInfo {
653                 index: usize,
654                 malus: u64,
655                 last_usage: Instant,
656                 age: Duration,
657             }
658             let mut oldest_caller_op: Option<CandidateInfo> = None;
659             let candidate = pruning_info.iter().fold(
660                 None,
661                 |acc: Option<CandidateInfo>, &PruningInfo { last_usage, owner, index, forced }| {
662                     // Compute the age of the current operation.
663                     let age = now
664                         .checked_duration_since(last_usage)
665                         .unwrap_or_else(|| Duration::new(0, 0));
666 
667                     // Find the least recently used sibling as an alternative pruning candidate.
668                     if owner == caller {
669                         if let Some(CandidateInfo { age: a, .. }) = oldest_caller_op {
670                             if age > a {
671                                 oldest_caller_op =
672                                     Some(CandidateInfo { index, malus: 0, last_usage, age });
673                             }
674                         } else {
675                             oldest_caller_op =
676                                 Some(CandidateInfo { index, malus: 0, last_usage, age });
677                         }
678                     }
679 
680                     // Compute the malus of the current operation.
681                     let malus = if forced {
682                         // Forced operations have a malus of 0. And cannot even be pruned
683                         // by other forced operations.
684                         0
685                     } else {
686                         // Expect safety: Every owner in pruning_info was counted in
687                         // the owners map. So this unwrap cannot panic.
688                         *owners.get(&owner).expect(
689                             "This is odd. We should have counted every owner in pruning_info.",
690                         ) + ((age.as_secs() + 1) as f64).log(6.0).floor() as u64
691                     };
692 
693                     // Now check if the current operation is a viable/better candidate
694                     // the one currently stored in the accumulator.
695                     match acc {
696                         // First we have to find any operation that is prunable by the caller.
697                         None => {
698                             if caller_malus < malus {
699                                 Some(CandidateInfo { index, malus, last_usage, age })
700                             } else {
701                                 None
702                             }
703                         }
704                         // If we have found one we look for the operation with the worst score.
705                         // If there is a tie, the older operation is considered weaker.
706                         Some(CandidateInfo { index: i, malus: m, last_usage: l, age: a }) => {
707                             if malus > m || (malus == m && age > a) {
708                                 Some(CandidateInfo { index, malus, last_usage, age })
709                             } else {
710                                 Some(CandidateInfo { index: i, malus: m, last_usage: l, age: a })
711                             }
712                         }
713                     }
714                 },
715             );
716 
717             // If we did not find a suitable candidate we may cannibalize our oldest sibling.
718             let candidate = candidate.or(oldest_caller_op);
719 
720             match candidate {
721                 Some(CandidateInfo { index, malus: _, last_usage, age: _ }) => {
722                     match self.get(index) {
723                         Some(op) => {
724                             match op.prune(last_usage) {
725                                 // We successfully freed up a slot.
726                                 Ok(()) => break Ok(()),
727                                 // This means the operation we tried to prune was on its way
728                                 // out. It also means that the slot it had occupied was freed up.
729                                 Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) => break Ok(()),
730                                 // This means the operation we tried to prune was currently
731                                 // servicing a request. There are two options.
732                                 // * Assume that it was touched, which means that its
733                                 //   pruning resistance increased. In that case we have
734                                 //   to start over and find another candidate.
735                                 // * Assume that the operation is transitioning to end-of-life.
736                                 //   which means that we got a free slot for free.
737                                 // If we assume the first but the second is true, we prune
738                                 // a good operation without need (aggressive approach).
739                                 // If we assume the second but the first is true, our
740                                 // caller will attempt to create a new KeyMint operation,
741                                 // fail with `ErrorCode::TOO_MANY_OPERATIONS`, and call
742                                 // us again (conservative approach).
743                                 Err(Error::Rc(ResponseCode::OPERATION_BUSY)) => {
744                                     // We choose the conservative approach, because
745                                     // every needlessly pruned operation can impact
746                                     // the user experience.
747                                     // To switch to the aggressive approach replace
748                                     // the following line with `continue`.
749                                     break Ok(());
750                                 }
751 
752                                 // The candidate may have been touched so the score
753                                 // has changed since our evaluation.
754                                 _ => continue,
755                             }
756                         }
757                         // This index does not exist any more. The operation
758                         // in this slot was dropped. Good news, a slot
759                         // has freed up.
760                         None => break Ok(()),
761                     }
762                 }
763                 // We did not get a pruning candidate.
764                 None => break Err(Error::Rc(ResponseCode::BACKEND_BUSY)),
765             }
766         }
767     }
768 }
769 
770 /// Implementation of IKeystoreOperation.
771 pub struct KeystoreOperation {
772     operation: Mutex<Option<Arc<Operation>>>,
773 }
774 
775 impl KeystoreOperation {
776     /// Creates a new operation instance wrapped in a
777     /// BnKeystoreOperation proxy object. It also enables
778     /// `BinderFeatures::set_requesting_sid` on the new interface, because
779     /// we need it for checking Keystore permissions.
new_native_binder(operation: Arc<Operation>) -> binder::Strong<dyn IKeystoreOperation>780     pub fn new_native_binder(operation: Arc<Operation>) -> binder::Strong<dyn IKeystoreOperation> {
781         BnKeystoreOperation::new_binder(
782             Self { operation: Mutex::new(Some(operation)) },
783             BinderFeatures { set_requesting_sid: true, ..BinderFeatures::default() },
784         )
785     }
786 
787     /// Grabs the outer operation mutex and calls `f` on the locked operation.
788     /// The function also deletes the operation if it returns with an error or if
789     /// `delete_op` is true.
with_locked_operation<T, F>(&self, f: F, delete_op: bool) -> Result<T> where for<'a> F: FnOnce(&'a Operation) -> Result<T>,790     fn with_locked_operation<T, F>(&self, f: F, delete_op: bool) -> Result<T>
791     where
792         for<'a> F: FnOnce(&'a Operation) -> Result<T>,
793     {
794         let mut delete_op: bool = delete_op;
795         match self.operation.try_lock() {
796             Ok(mut mutex_guard) => {
797                 let result = match &*mutex_guard {
798                     Some(op) => {
799                         let result = f(op);
800                         // Any error here means we can discard the operation.
801                         if result.is_err() {
802                             delete_op = true;
803                         }
804                         result
805                     }
806                     None => Err(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE))
807                         .context(ks_err!("KeystoreOperation::with_locked_operation")),
808                 };
809 
810                 if delete_op {
811                     // We give up our reference to the Operation, thereby freeing up our
812                     // internal resources and ending the wrapped KeyMint operation.
813                     // This KeystoreOperation object will still be owned by an SpIBinder
814                     // until the client drops its remote reference.
815                     *mutex_guard = None;
816                 }
817                 result
818             }
819             Err(_) => Err(Error::Rc(ResponseCode::OPERATION_BUSY))
820                 .context(ks_err!("KeystoreOperation::with_locked_operation")),
821         }
822     }
823 }
824 
825 impl binder::Interface for KeystoreOperation {}
826 
827 impl IKeystoreOperation for KeystoreOperation {
updateAad(&self, aad_input: &[u8]) -> binder::Result<()>828     fn updateAad(&self, aad_input: &[u8]) -> binder::Result<()> {
829         let _wp = wd::watch("IKeystoreOperation::updateAad");
830         self.with_locked_operation(
831             |op| op.update_aad(aad_input).context(ks_err!("KeystoreOperation::updateAad")),
832             false,
833         )
834         .map_err(into_logged_binder)
835     }
836 
update(&self, input: &[u8]) -> binder::Result<Option<Vec<u8>>>837     fn update(&self, input: &[u8]) -> binder::Result<Option<Vec<u8>>> {
838         let _wp = wd::watch("IKeystoreOperation::update");
839         self.with_locked_operation(
840             |op| op.update(input).context(ks_err!("KeystoreOperation::update")),
841             false,
842         )
843         .map_err(into_logged_binder)
844     }
finish( &self, input: Option<&[u8]>, signature: Option<&[u8]>, ) -> binder::Result<Option<Vec<u8>>>845     fn finish(
846         &self,
847         input: Option<&[u8]>,
848         signature: Option<&[u8]>,
849     ) -> binder::Result<Option<Vec<u8>>> {
850         let _wp = wd::watch("IKeystoreOperation::finish");
851         self.with_locked_operation(
852             |op| op.finish(input, signature).context(ks_err!("KeystoreOperation::finish")),
853             true,
854         )
855         .map_err(into_logged_binder)
856     }
857 
abort(&self) -> binder::Result<()>858     fn abort(&self) -> binder::Result<()> {
859         let _wp = wd::watch("IKeystoreOperation::abort");
860         let result = self.with_locked_operation(
861             |op| op.abort(Outcome::Abort).context(ks_err!("KeystoreOperation::abort")),
862             true,
863         );
864         result.map_err(|e| {
865             match e.root_cause().downcast_ref::<Error>() {
866                 // Calling abort on expired operations is something very common.
867                 // There is no reason to clutter the log with it. It is never the cause
868                 // for a true problem.
869                 Some(Error::Km(ErrorCode::INVALID_OPERATION_HANDLE)) => {}
870                 _ => log::error!("{:?}", e),
871             };
872             into_binder(e)
873         })
874     }
875 }
876