1SkiaLab 2======= 3 4Overview 5-------- 6 7Skia's buildbots are hosted in three places: 8 9* Google Compute Engine. This is the preferred location for bots which don't 10 need to run on physical hardware, ie. anything that doesn't require a GPU, 11 stable performance numbers, or a specific hardware configuration. Most of our 12 compile bots live here, along with some non-GPU test bots on Linux and 13 Windows. 14* Chrome Golo. This is the preferred location for bots which require specific 15 hardware or OS configurations that are not supported by GCE. We have several 16 Mac, Linux, and Windows bots in the Golo. 17* The local SkiaLab in Chapel Hill. Anything we can't get in GCE or the Golo 18 lives here. This includes newer or uncommon GPUs and all Android, ChromeOS, 19 and iOS devices. 20 21This page covers the local SkiaLab in Chapel Hill. 22 23 24Layout 25------ 26 27The SkiaLab consists of three wireframe racks which hold machines connected to 28two KVM switches. Each KVM switch has a monitor, mouse, and keyboard and is the 29primary mode of access to the lab machines. In general, the machines are on the 30same rack as the KVM switch used to access them. The switch nearest the door 31(labeled "DOOR"), is connected to machines on its own rack as well as a smaller 32rack closer to the door. 33 34Each machine is labeled with its hostname and the number or letter used to 35access it on the KVM switch. Android devices are located on the rack nearest 36the interior of the office (the KVM switch is labeled "OFFICE"). They are 37labeled with their serial number and the name of the buildslave they are 38associated with. Each device connects to a host machine, either directly or 39by way of a powered USB hub. 40 41**Disclaimer: Please ONLY make changes on a lab machine as a last resort, as it 42is disruptive to the running bots and can leave the machines in a dirty state. 43If you must make changes, such as cloning a copy of Skia to run tests and debug 44failures, be sure to clean up after yourself. If a permanent change needs to be 45made on the machine (such as a driver update), please contact an infra team 46member.** 47 48 49Common Tasks 50------------ 51 52### Locating the host machine for a failing bot 53 54Sometimes failures can only be reproduced on a particular hardware 55configuration. In these cases, it is sometimes necessary to log into the host 56machine where a failing bot is running in order to debug the failure. 57 58From the [Status](https://status.skia.org/) page: 59 601. Click on the box associated with a failed build. 612. A popup will appear with some information about the build, including the 62 builder and buildslave. Click the "Lookup" link next to "Host machine". This 63 will bring you to the [SkiaLab Hosts](https://status.skia.org/hosts) page, 64 which contains information about the machines in the lab, pre-filtered to 65 select the machine which runs the buildslave in question. 663. The information box will display the hostname of the machine as well as the 67 KVM switch and number used to access the machine, if the machine is in the 68 SkiaLab. 694. Walk over to the lab. While standing at the KVM switch indicated by the host 70 information page, double tap \<ctrl\> and then press the number or letter from 71 the information page. It may be necessary to move or click the mouse to wake 72 the machine up. 735. Log in to the machine if necessary. The password is stored in 74 [Valentine](https://valentine/) as "Chapel Hill buildbot slave password". 75 76### Rebooting a problematic Android device 77 78Follow the same process as above, with some slight changes: 79 801. On the [Status](https://status.skia.org/) page, click the box for the failed 81 build. 822. Click the "Lookup" link for the host machine. Remember the name of the 83 buildslave which ran the build. 843. The hosts page will display the information used to access the host machine 85 for the device as well as the serial number for the device next to the name 86 of its buildsave. 874. Walk over to the lab and find the Android device with the serial number from 88 the hosts page. Hold the power and volume-up buttons until the device 89 reboots. 905. Access the host machine for the device, per the above instructions. Use the 91 `which_devices.py` script to verify that the device has re-attached. From 92 the home directory: 93 94 $ python buildbot/scripts/which_devices.py 95 96 97Maintenance Tasks 98----------------- 99 100### Bringing up a new buildbot host machine 101 102This assumes that we're just adding a host machine for a new buildbot slave, 103and doesn't cover how to make changes to the buildbot code to change the 104behavior of the builder itself. 105 1061. Obtain the machine itself and place it on the racks in the lab. Connect 107 power, ethernet, and KVM cables. 1082. If we already have a disk image appropriate for this machine, follow the 109 instructions for flashing a disk image to a machine below. Otherwise, follow 110 the instructions for bringing up a new machine from scratch. 1113. Power on the machine. Be sure to kill any buildbot processes that start up, 112 eg. `killall python` on Linux and Mac, and just close any cmd instances which 113 pop up on Windows. 1144. Set the hostname for the machine. 1155. Ensure that the machine is labeled with its hostname and KVM number. 1166. Add the new slave to the slaves.cfg file on the appropriate master, eg. 117 https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.client.skia/slaves.cfg, 118 and upload the change for code review. 1197. Add an entry for the new host machine to the slave_hosts_cfg.py file in the 120 Skia infra repo: https://skia.googlesource.com/buildbot/+/master/site_config/slave_hosts_cfg.py, 121 and upload it for review. 1228. Commit the change to add the slave to the master. Once it lands, commit the 123 slave_hosts_cfg.py change immediately afterward. 1249. Restart the build master. Either ask borenet@ to do this or file a 125 [ticket](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure&labels=Infra-Labs,Restrict-View-Google,Infra-Troopers&summary=Restart%20request%20for%20[%20name%20]&comment=Please%20provide%20the%20reason%20for%20restart.%0A%0ASet%20to%20Pri-0%20if%20immediate%20restarted%20is%20required,%20otherwise%20please%20set%20to%20Pri-1%20and%20the%20restart%20will%20happen%20when%20the%20trooper%20gets%20a%20free%20moment.) for a trooper to do it. 12610. Reboot the machine and monitor the build master to ensure that it connects. 127 This can take some time, since the bot needs to sync Chrome. 128 129 130### Bringing up a new Android bot 131 1321. Locate or add a host machine. We generally want to keep the number of 133 devices attached to each host below 5 or so. If a new host machine is 134 required, follow the above instructions for bringing up a new buildbot 135 host machine, with the exception that the slave corresponds to the Android 136 device, not the host machine itself. 1372. Ensure that the buildslave is not yet running: 138 139 $ killall python 140 1413. Disable MTP and PTP on the device. Some devices require one or the other to 142 be enabled; in that case, select PTP and choose to 'do nothing' when 143 attaching to the host machine. 1444. Connect the device to the host machine, either through a powered USB hub or 145 directly to the machine. 1465. Make sure that the device is in developer mode and that USB debugging is 147 enabled. 1486. Authorize the device for USB debugging on the host machine by checking the 149 "always allow" box on dialog box which appears on the Android device after 150 plugging it into the host. 1517. Ensure that the device appears as "connected" when you run the 152 `which_devices.py` script: 153 154 $ python buildbot/scripts/which_devices.py 155 1568. Reboot the machine to start the buildslave. 157 158 159### Bringing up a new machine from scratch 160 161TODO(borenet): Migrate from Google Docs. 162 163OS-specific instructions are available in a 164[Google Doc](https://docs.google.com/document/d/1X7Hvsj33AlBmj-KEWfFbmdCArUJJAICLkB7ipDcxRV8/edit) 165 166 167### Flashing a disk image to a machine 168 1691. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the 170 machine. 1712. Turn on the machine and load the boot menu. For Shuttle machines, press 172 \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and 173 press the \<option\> key at boot. Boot from the USB key. It's typically UEFI 174 and named something like "FlashBlu" or "Kanguru". 1753. At the Clonezilla menu, choose the "to RAM" option. 1764. Choose your preferred language. 1775. "Don't touch keymap". 1786. "Start Clonezilla". 1797. "device-image". 1808. "local_dev". 1819. Unplug the flash drive and plug in the external hard drive labeled, "Disk 182 images." Wait for the "Attached Enclosure device" message to appear, then 183 hit \<enter\>. 18410. Select the external drive to use for /home/partimag, something like, 185 "1000GB_ntfs_My_Passport". 18611. Select the bot_img directory. 18712. Hit \<enter\> to continue. 18813. "Beginner" 18914. "restoredisk" 19015. Select the image to use. Make sure that it's compatible with this machine. 19116. Choose the hard drive in the machine. It should be the only option. 19217. "y" and "y" 19318. Choose "reboot" after flashing the image to the machine. 19419. Set the hostname of the machine so that it doesn't conflict with any 195 existing machines. 196 197### Capturing a disk image 198 1991. Make sure that the machine is in a clean state: no pre-existing buildslave 200 checkouts, extra software, etc. 2012. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the 202 machine. 2033. Turn on the machine and load the boot menu. For Shuttle machines, press 204 \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and 205 press the \<option\> key at boot. Boot from the USB key. It's typically UEFI 206 and named something like "FlashBlu" or "Kanguru". 2074. At the Clonezilla menu, choose the "to RAM" option. 2085. Choose your preferred language. 2096. "Don't touch keymap". 2107. "Start Clonezilla". 2118. "device-image". 2129. "local_dev" 21310. Unplug the flash drive and plug in the external hard drive labeled, "Disk 214 images." Wait for the "Attached Enclosure device" message to appear, then 215 hit \<enter\>. 21611. Select the external drive to use for /home/partimag, something like, 217 "1000GB_ntfs_My_Passport". 21812. Select the bot_img directory. 21913. "Beginner" 22014. "savedisk" 22115. Choose a name for the disk image. The convention is: 222 `skiabot-<hardware type>-<OS>-<disk image revision #>` 22312. Choose the hard drive in the machine. It should be the only option. 22413. "y" 22514. Choose "reboot" or "shut down" when finished. 226