# IR Builder ## Overview The IR builder pass constructs the Intermediate Representation (IR) from the Panda bytecode. Panda compiler's IR employs SSA form, which should be constructed before first pass is run. There are various algorithms to construct SSA form, most known is a Cytron et al. It produce most clear and pruned SSA form, without dead phi instructions, but it has various drawbacks, such as significant overhead, since it requires to build dominance tree and dominance frontier. We choose a simple algorithm, that is used in many Virtual Machines. It may produce dead phis, but it is much simplier and faster. Due to specifics of the Panda bytecode, IR builder has responsibility to handle various situation that is not common for the IR. Such as: - some instructions don't specify its type, f.e. `mov` instruction may produce int32 as well as float, final type can be determined only by consumers. - constant hasn't type as well and if one constant is used in integer and float operations, it must be split into two IR ConstInst instructions. - if constant is `0` and it is used in instruction that expects object(f.e. `mov.obj`), we need to create a special constant instruction `NullPtr` to handle this situation. Resolving these things requires addition actions in the builder, that, in turn, can require additional analyses, for example, Dominators Tree Analysis. ## Dependencies * DominatorsTree * LoopAnalysis * RPO ## Algorithm There are three main phases: 1. Building the Control Flow Graph. 2. Building the Data Flow Graph (in SSA form). 3. Fixing the type uncertainties of the instructions. **Building the Control Flow Graph** 1. Iterate over all bytecode instructions and make basic block for all target instructions, i.e. instructions that can be entries to the basic blocks. 2. Create try catch blocks. 3. Connect the basic blocks, hereby making a CFG. **Building the Data Flow Graph** 1. Get next basic block from the graph. 2. If basic block is a loop header, create SafePoint and OsrSaveState instructions. 3. Create phi instructions for the live registers. 4. Get next bytecode instruction from the current basic block. 5. Build the Panda IR instruction from the bytecode instruction: - create auxiliary instructions (SaveState, NullCheck, etc) if needed - set inputs from the virtual register map - if has destination, update virtual register definition in the vreg map 6. If instruction is a terminator, goto 1, else goto 4. `Virtual register map` is a map, where key is virtual register, value is an IR instruction that currently defines value of this register. **Fixing the type uncertainties of the instructions** This final phase contains following parts: 1. Split constants: for all constants that are used in instructions with different types, split constant instruction and set proper types for them. 2. Fix instructions that can't be fully completed in building process: - Remove dead Phi and set types to phi which have no type. Phi may not have type if all it users are pseudo instructions, like SaveState. - Check all instructions that have no type and fix it. Type is got from instructions with known input types. - Resolve dead and inconsistent phi instructions. ## Pseudocode ```python def Run(graph, bytecode): BuildBasicBlocks(graph, bytecode) for bb in graph.basic_blocks_in_rpo(): for bc_inst in bytecode.instructions[bb.first_pc..bb.last_pc]: inst = BuildInstruction(bc_inst) bb.add_instruction(bb, inst) SplitConstants() FixInstructions() def BuildBasicBlocks(graph, bytecode): blocks = [] # each pc holds the reference to the basic block blocks.resize(bytecode.instructions.size()) for inst in bytecode.instructions: if inst.is_branch(): blocks[inst.pc] = graph.new_bb() if inst.is_conditional(): # Next instruction starts new basic block blocks[inst.next.target_pc] = graph.new_bb() # Add control flow edges between created basic blocks ConnectBasicBlocks() graph.remove_unreacheable_blocks() # This is autogenerated function from erb templates def BuildInstruction(bb, bc_inst): # Big switch for each opcode switch bc_inst.opcode: case SOME_OPCODE: bb.create_inst(bc_inst.opcode) bb.set_input(0, self.get_definition(bc_inst.get_vreg(0)) bb.set_input(1, self.get_definition(bc_inst.get_vreg(1)) def SplitConstants(): for const in graph.constants: if not all(x == const.inputs[0] for x in const.inputs): const.split() def FixInstructions(): # Fix instructions types ``` Example of the case for the `ADD` bytecode instruction in autogenerated `BuildInstruction` function. ```c++ . . . case BytecodeInstruction::Opcode::ADDI_IMM8: { // Create IR instruction. auto inst = graph_->CreateInstAdd(DataType::INT32, GetPc(instruction->GetAddress())); // Get current definition of the accumulator and set it as the first input to the instruction/ inst->SetInput(0, GetDefinitionAcc()); // Get immediate from the bytecode and try to find if it is already exists in the IR, if not - create new one. // Then set it as the second input to the instruction. inst->SetInput(1, FindOrCreateConstant(instruction->GetImm())); // ADD bytecode write resutl to the accumulator, so update definition of the accumulator by created instruction. UpdateDefinitionAcc(inst); // Append instruction to the IR AddInstruction(inst); break; . . . ``` ## Links [Panda IR builder source code](../optimizer/ir_builder/) [Static Single Assignment Book](https://pfalcon.github.io/ssabook/latest/book-v1.pdf) [Simple and Efficient Construction of Static Single Assignment Form](https://pp.info.uni-karlsruhe.de/uploads/publikationen/braun13cc.pdf) [Efficiently Computing Static Single Assignment Form and the Control Dependence Graph, Ron Cytron](https://www.cs.utexas.edu/~pingali/CS380C/2010/papers/ssaCytron.pdf)