Target-specific lowering in ICE =============================== This document discusses several issues around generating target-specific ICE instructions from high-level ICE instructions. Meeting register address mode constraints ----------------------------------------- Target-specific instructions often require specific operands to be in physical registers. Sometimes one specific register is required, but usually any register in a particular register class will suffice, and that register class is defined by the instruction/operand type. The challenge is that ``Variable`` represents an operand that is either a stack location in the current frame, or a physical register. Register allocation happens after target-specific lowering, so during lowering we generally don't know whether a ``Variable`` operand will meet a target instruction's physical register requirement. To this end, ICE allows certain directives: * ``Variable::setWeightInfinite()`` forces a ``Variable`` to get some physical register (without specifying which particular one) from a register class. * ``Variable::setRegNum()`` forces a ``Variable`` to be assigned a specific physical register. These directives are described below in more detail. In most cases, though, they don't need to be explicity used, as the routines that create lowered instructions have reasonable defaults and simple options that control these directives. The recommended ICE lowering strategy is to generate extra assignment instructions involving extra ``Variable`` temporaries, using the directives to force suitable register assignments for the temporaries, and then let the register allocator clean things up. Note: There is a spectrum of *implementation complexity* versus *translation speed* versus *code quality*. This recommended strategy picks a point on the spectrum representing very low complexity ("splat-isel"), pretty good code quality in terms of frame size and register shuffling/spilling, but perhaps not the fastest translation speed since extra instructions and operands are created up front and cleaned up at the end. Ensuring a non-specific physical register ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The x86 instruction:: mov dst, src needs at least one of its operands in a physical register (ignoring the case where ``src`` is a constant). This can be done as follows:: mov reg, src mov dst, reg so long as ``reg`` is guaranteed to have a physical register assignment. The low-level lowering code that accomplishes this looks something like:: Variable *Reg; Reg = Func->makeVariable(Dst->getType()); Reg->setWeightInfinite(); NewInst = InstX8632Mov::create(Func, Reg, Src); NewInst = InstX8632Mov::create(Func, Dst, Reg); ``Cfg::makeVariable()`` generates a new temporary, and ``Variable::setWeightInfinite()`` gives it infinite weight for the purpose of register allocation, thus guaranteeing it a physical register (though leaving the particular physical register to be determined by the register allocator). The ``_mov(Dest, Src)`` method in the ``TargetX8632`` class is sufficiently powerful to handle these details in most situations. Its ``Dest`` argument is an in/out parameter. If its input value is ``nullptr``, then a new temporary variable is created, its type is set to the same type as the ``Src`` operand, it is given infinite register weight, and the new ``Variable`` is returned through the in/out parameter. (This is in addition to the new temporary being the dest operand of the ``mov`` instruction.) The simpler version of the above example is:: Variable *Reg = nullptr; _mov(Reg, Src); _mov(Dst, Reg); Preferring another ``Variable``'s physical register ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (An older version of ICE allowed the lowering code to provide a register allocation hint: if a physical register is to be assigned to one ``Variable``, then prefer a particular ``Variable``'s physical register if available. This hint would be used to try to reduce the amount of register shuffling. Currently, the register allocator does this automatically through the ``FindPreference`` logic.) Ensuring a specific physical register ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Some instructions require operands in specific physical registers, or produce results in specific physical registers. For example, the 32-bit ``ret`` instruction needs its operand in ``eax``. This can be done with ``Variable::setRegNum()``:: Variable *Reg; Reg = Func->makeVariable(Src->getType()); Reg->setWeightInfinite(); Reg->setRegNum(Reg_eax); NewInst = InstX8632Mov::create(Func, Reg, Src); NewInst = InstX8632Ret::create(Func, Reg); Precoloring with ``Variable::setRegNum()`` effectively gives it infinite weight for register allocation, so the call to ``Variable::setWeightInfinite()`` is technically unnecessary, but perhaps documents the intention a bit more strongly. The ``_mov(Dest, Src, RegNum)`` method in the ``TargetX8632`` class has an optional ``RegNum`` argument to force a specific register assignment when the input ``Dest`` is ``nullptr``. As described above, passing in ``Dest=nullptr`` causes a new temporary variable to be created with infinite register weight, and in addition the specific register is chosen. The simpler version of the above example is:: Variable *Reg = nullptr; _mov(Reg, Src, Reg_eax); _ret(Reg); Disabling live-range interference ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (An older version of ICE allowed an overly strong preference for another ``Variable``'s physical register even if their live ranges interfered. This was risky, and currently the register allocator derives this automatically through the ``AllowOverlap`` logic.) Call instructions kill scratch registers ---------------------------------------- A ``call`` instruction kills the values in all scratch registers, so it's important that the register allocator doesn't allocate a scratch register to a ``Variable`` whose live range spans the ``call`` instruction. ICE provides the ``InstFakeKill`` pseudo-instruction to compactly mark such register kills. For each scratch register, a fake trivial live range is created that begins and ends in that instruction. The ``InstFakeKill`` instruction is inserted after the ``call`` instruction. For example:: CallInst = InstX8632Call::create(Func, ... ); NewInst = InstFakeKill::create(Func, CallInst); The last argument to the ``InstFakeKill`` constructor links it to the previous call instruction, such that if its linked instruction is dead-code eliminated, the ``InstFakeKill`` instruction is eliminated as well. The linked ``call`` instruction could be to a target known to be free of side effects, and therefore safe to remove if its result is unused. Instructions producing multiple values -------------------------------------- ICE instructions allow at most one destination ``Variable``. Some machine instructions produce more than one usable result. For example, the x86-32 ``call`` ABI returns a 64-bit integer result in the ``edx:eax`` register pair. Also, x86-32 has a version of the ``imul`` instruction that produces a 64-bit result in the ``edx:eax`` register pair. The x86-32 ``idiv`` instruction produces the quotient in ``eax`` and the remainder in ``edx``, though generally only one or the other is needed in the lowering. To support multi-dest instructions, ICE provides the ``InstFakeDef`` pseudo-instruction, whose destination can be precolored to the appropriate physical register. For example, a ``call`` returning a 64-bit result in ``edx:eax``:: CallInst = InstX8632Call::create(Func, RegLow, ... ); NewInst = InstFakeKill::create(Func, CallInst); Variable *RegHigh = Func->makeVariable(IceType_i32); RegHigh->setRegNum(Reg_edx); NewInst = InstFakeDef::create(Func, RegHigh); ``RegHigh`` is then assigned into the desired ``Variable``. If that assignment ends up being dead-code eliminated, the ``InstFakeDef`` instruction may be eliminated as well. Managing dead-code elimination ------------------------------ ICE instructions with a non-nullptr ``Dest`` are subject to dead-code elimination. However, some instructions must not be eliminated in order to preserve side effects. This applies to most function calls, volatile loads, and loads and integer divisions where the underlying language and runtime are relying on hardware exception handling. ICE facilitates this with the ``InstFakeUse`` pseudo-instruction. This forces a use of its source ``Variable`` to keep that variable's definition alive. Since the ``InstFakeUse`` instruction has no ``Dest``, it will not be eliminated. Here is the full example of the x86-32 ``call`` returning a 32-bit integer result:: Variable *Reg = Func->makeVariable(IceType_i32); Reg->setRegNum(Reg_eax); CallInst = InstX8632Call::create(Func, Reg, ... ); NewInst = InstFakeKill::create(Func, CallInst); NewInst = InstFakeUse::create(Func, Reg); NewInst = InstX8632Mov::create(Func, Result, Reg); Without the ``InstFakeUse``, the entire call sequence could be dead-code eliminated if its result were unused. One more note on this topic. These tools can be used to allow a multi-dest instruction to be dead-code eliminated only when none of its results is live. The key is to use the optional source parameter of the ``InstFakeDef`` instruction. Using pseudocode:: t1:eax = call foo(arg1, ...) InstFakeKill // eax, ecx, edx t2:edx = InstFakeDef(t1) v_result_low = t1 v_result_high = t2 If ``v_result_high`` is live but ``v_result_low`` is dead, adding ``t1`` as an argument to ``InstFakeDef`` suffices to keep the ``call`` instruction live. Instructions modifying source operands -------------------------------------- Some native instructions may modify one or more source operands. For example, the x86 ``xadd`` and ``xchg`` instructions modify both source operands. Some analysis needs to identify every place a ``Variable`` is modified, and it uses the presence of a ``Dest`` variable for this analysis. Since ICE instructions have at most one ``Dest``, the ``xadd`` and ``xchg`` instructions need special treatment. A ``Variable`` that is not the ``Dest`` can be marked as modified by adding an ``InstFakeDef``. However, this is not sufficient, as the ``Variable`` may have no more live uses, which could result in the ``InstFakeDef`` being dead-code eliminated. The solution is to add an ``InstFakeUse`` as well. To summarize, for every source ``Variable`` that is not equal to the instruction's ``Dest``, append an ``InstFakeDef`` and ``InstFakeUse`` instruction to provide the necessary analysis information.