• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1#!/bin/bash
2
3# Force a repair special task for any host that hasn't seen activity in
4# the past day.
5#
6# Various scripts/cron jobs look for DUTs that aren't working.  To be
7# conservative, those scripts assume that a DUT that hasn't run any jobs
8# within a reasonable time interval isn't working, since some of the
9# ways a DUT may be unavailable manifest as inactivity.
10#
11# In some cases, we'd like to be more certain as to a DUT's status.
12# This script goes through the entire AFE hosts table, and identifies
13# unlocked hosts that would otherwise be flagged as "not working due to
14# lack of activity", and forces a repair task.
15#
16# We use a repair task (as opposed to verify) for various reasons:
17#   + If a DUT is working, repair and verify perform the same checks,
18#     and generally run in the same time.
19#   + If a DUT is broken, a verify task will fail and invoke repair,
20#     which will take longer than just repair alone.
21#   + Repair tasks that pass update labels; without this, labels could
22#     become out-of-date simply because a DUT is idle.
23#
24# Locked hosts are skipped because they can't run jobs and because we
25# want them to show up as suspicious anyway.
26
27
28cd $(dirname $0)/..
29
30# Gather all the hosts under supervision of the lab techs.
31# Basically, that's any host in any managed pool.
32
33GET_HOSTS='
34  /pool:(suites|bvt|cq|continuous|cts|arc-presubmit|crosperf|performance)/ {
35    print $1
36  }
37'
38HOSTS=( $(cli/atest host list --unlocked | awk "$GET_HOSTS") )
39
40
41# Go through the gathered hosts, and use dut_status to find the
42# ones with unknown state (anything without a positive "OK" or
43# "NO" diagnosis).
44
45NEED_CHECK='
46  /OK/ || /NO/ { next }
47  /^chromeos/ { print $1 }
48'
49CHECK=( $(site_utils/dut_status.py -d 19 "${HOSTS[@]}" | awk "$NEED_CHECK") )
50
51contrib/repair_hosts "${CHECK[@]}"
52