1#!/bin/bash 2 3# Force a repair special task for any host that hasn't seen activity in 4# the past day. 5# 6# Various scripts/cron jobs look for DUTs that aren't working. To be 7# conservative, those scripts assume that a DUT that hasn't run any jobs 8# within a reasonable time interval isn't working, since some of the 9# ways a DUT may be unavailable manifest as inactivity. 10# 11# In some cases, we'd like to be more certain as to a DUT's status. 12# This script goes through the entire AFE hosts table, and identifies 13# unlocked hosts that would otherwise be flagged as "not working due to 14# lack of activity", and forces a repair task. 15# 16# We use a repair task (as opposed to verify) for various reasons: 17# + If a DUT is working, repair and verify perform the same checks, 18# and generally run in the same time. 19# + If a DUT is broken, a verify task will fail and invoke repair, 20# which will take longer than just repair alone. 21# + Repair tasks that pass update labels; without this, labels could 22# become out-of-date simply because a DUT is idle. 23# 24# Locked hosts are skipped because they can't run jobs and because we 25# want them to show up as suspicious anyway. 26 27 28cd $(dirname $0)/.. 29 30# Gather all the hosts under supervision of the lab techs. 31# Basically, that's any host in any managed pool. 32 33GET_HOSTS=' 34 /pool:(suites|bvt|cq|continuous|cts|arc-presubmit|crosperf|performance)/ { 35 print $1 36 } 37' 38HOSTS=( $(cli/atest host list --unlocked | awk "$GET_HOSTS") ) 39 40 41# Go through the gathered hosts, and use dut_status to find the 42# ones with unknown state (anything without a positive "OK" or 43# "NO" diagnosis). 44 45NEED_CHECK=' 46 /OK/ || /NO/ { next } 47 /^chromeos/ { print $1 } 48' 49CHECK=( $(site_utils/dut_status.py -d 19 "${HOSTS[@]}" | awk "$NEED_CHECK") ) 50 51contrib/repair_hosts "${CHECK[@]}" 52