Lines Matching refs:aco
529 - aco: Implement call scope.
535 - aco: Add support for ray launch size.
973 - aco/optimizer: ensure to not erase high bits when propagating packed constants
974 - aco/ra: don't allocate vector space for MIMG NSA operands
975 - aco: include <cstddef> in aco_util.h
983 - aco/print_ir: fix printing of VOPC_SDWA definitions
984 - aco: use VOPC_SDWA on GFX9+
985 - aco: add instr_is_16bit() helper function
986 - aco/ra: refactor subdword definition info
987 - aco/ra: refactor subdword operand stride
988 - aco/validate: simplify get_subdword_bytes_written()
989 - aco/opcodes: remove definition_size[]
990 - aco: add more validation rules for SDWA operands
994 - aco: remove redundant s_and exec after nir_op_inot
995 - aco: only apply extract if not used more than 4 times
996 - aco: refactor nir_op_imul selection
997 - aco/optimizer: combine v_mul_lo_u16 + v_add_u16 -> v_mad_u16
998 - aco/optimizer: fuse v_mul_f64 + v_add_f64 -> v_fma_f64
999 - aco/optimizer: combine v_pk_mul_u16 + v_pk_add_u16 -> v_pk_mad_u16
1000 - aco: fix init_any_pred_defined() for loop header phis
1001 - aco: refactor lower_phis()
1002 - aco/lower_bool_phis: avoid creating trivial phis
1003 - aco/lower_phis: propagate constants before emitting merge code
1004 - aco/lower_phis: optimize loop exit phis
1005 - aco: fix p_insert lowering with 16bit sources
1006 - aco: rewrite SDWA selector
1007 - aco: remove explicit dst_preserve flag
1008 - aco/print_ir: always print SDWA dst & src selections
1009 - aco: preserve subdword RC when lowering p_insert/p_extract
1010 - aco/ra: Fix potential out-of-bounds array accesses.
1011 - aco/ra: don't copy linear VGPRs within CF in get_reg_create_vector()
1012 - aco: stop scheduling if clause-forming fails
1013 - aco: make clause-forming depend on the number of moved instructions
1014 - aco: try forming clauses even if reg_pressure exceeds
1015 - aco: clang-format
1016 - aco/ra: fix intersects()
1017 - aco/ra: refactor affinities into assignment struct
1018 - aco/ra: remove some redundant code
1019 - aco/ra: split register assignment for phis into separate function
1020 - aco/ra: try more aggressive to assign phi defs the same register
1021 - aco/ra: for phis try to find an operand-matching register earlier
1022 - aco/ra: don't set affinities for ssa-repair phis
1023 - aco/ra: create affinities between nested phis
1024 - aco/ra: create nested affinities for loop header phis
1025 - aco/ra: don't rewrite affinities for phi operands after register assignment
1663 - aco: cleanup assignment of unique_ptrs
1688 - aco: Use cpp_msvc_compat_args.
1903 - aco: Work around MSVC restrict in c99_compat.h
3522 - aco: don't create v_madmk_f32/v_madak_f32 from v_fma_legacy_f16
3525 - aco: use image_dim and image_array intrinsic indices
3526 - aco: calculate correct register demand for branch instructions
3549 - aco: fix validation of DPP v_cndmask_b32/v_addc_co_u32
3550 - aco: add can_use_DPP() and convert_to_DPP()
3551 - aco: move a bunch of helpers into aco_ir.h/aco_ir.cpp
3552 - aco: make optimize_postRA() work across blocks
3553 - aco: handle DPP in the optimizer
3554 - aco: combine DPP into VALU before RA
3555 - aco: combine DPP into VALU after RA
3556 - aco/tests: add tests for pre-RA DPP combining
3557 - aco/tests: add tests for post-RA DPP combining
3558 - aco: fix vectorized 16-bit load_input/load_interpolated_input
3559 - aco: remove label_extract if the extract is used by a non-VALU
3560 - aco/scheduler: allow moving down VMEM stores to below VMEM loads
3565 - aco: include utility in isel
3566 - aco: don't constant propagate to DPP instructions
3567 - aco/tests: test copy propagation with DPP instructions
3568 - aco: remove DPP when applying constants/literals/sgprs
3569 - aco: don't coalesce constant copies into non-power-of-two sizes
3570 - aco/spill: add temporary operands of exec phis to next_use_distances_end
3579 - radv,aco: implement iadd_sat
3580 - aco: implement nir_op_pack_32_4x8
3581 - aco: implement udot_4x8/sdot_4x8/udot_2x16/sdot_2x16 opcodes
3582 - aco/ra: allow v1b operands with 16-bit instructions
3584 - aco/ra: don't use ds_write_b8_d16_hi/ds_write_b16_d16_hi on GFX8
3587 - aco: add RegClass::is_linear_vgpr helper
3588 - aco: add and use RegClass::resize helper
3589 - aco: rewrite print_reg_class()
3590 - aco: find a scratch register for sub-dword copies on GFX7 if scc is empty
3591 - aco: find scratch reg for sub-dword psuedo instructions which read sgprs
3592 - aco/tests: fix finish_ra_test()
3593 - aco/tests: add regalloc.scratch_sgpr.create_vector
3594 - aco: implement linear vgpr copies
3595 - aco: allow live-range splits of linear vgprs in top-level blocks
3596 - aco/nops: use up-to-date mask_size
3597 - aco/nops: create handle_raw_hazard_instr helper
3598 - aco/nops: add State
3599 - aco/nops: fix handle_raw_hazard_internal when visiting the current block
3601 - aco/tests: add idep_amdgfxregs_h
3610 - aco: return 0x76543210 for NULL FMASK fetch
3612 - aco: use correct dim for FMASK fetches
3613 - radv,aco: use lower_to_fragment_fetch
3614 - radv,aco: don't include FMASK in the storage descriptor
3617 - aco: fix vadd32() when b is neither a constant nor temporary
3625 - aco: implement aco_compile_vs_prolog
3626 - aco: implement VS input loads with prologs
3629 - aco: consider pseudo-instructions reading exec in needs_exec_mask()
3778 - aco: implement VK_EXT_shader_atomic_float2
3889 - radv,aco: stop using vs_common_out.export_clip_dists
3941 - aco: fix load_barycentric_at_{offset,sample}
3949 - radv,aco: compute and store the SPI PS input in radv_shader_info
3950 - aco: prevent using undeclared shader arguments for PS
3951 - radv,aco: remap PS inputs when declaring shader arguments
3952 - aco: constify radv_shader_{info,args}
3965 - radv,aco: remove nir_intrinsic_load_layer_id
3969 - aco: cleanup setup_vs_output_info()
3974 - aco: do not return an empty string when disassembly is not supported
3978 - aco: fix invalid IR generated for b2f64 when the dest is a VGPR
3979 - aco: fix emitting stream outputs when the first component isn't zero
3980 - aco: fix loading 64-bit inputs with fragment shaders
3983 - aco: only load streamout buffers if streamout is enabled
4115 - aco: Swap s_and operand order for ballot.
4116 - aco: Allow elect to take advantage of knowing when all lanes are active.
4117 - aco: Remove s_and with exec when all lanes are active.
4119 - aco: Fix how p_elect interacts with optimizations.
4120 - aco, nir, ac: Simplify sequence of getting initial NGG VS edge flags.
4124 - nir, aco: Remove vertex and primitive count overwrite intrinsic.
4126 - aco: Use Navi 10 empty NGG output workaround on NGG culling shaders.
4131 - aco: Fix to_uniform_bool_instr when operands are not suitable.
4132 - radv, ac, aco: Use indices 0-2 of gs_vtx_offset argument array on GFX9+.
4137 - aco: Use workgroup size from input shader info.
4138 - aco: Consider LDS usage by PS inputs in MaxWaves calculation.
4139 - aco: Consider maximum number of workgroups per CU/WGP on Navi.
4140 - aco: Emit zero for the derivatives of uniforms.
4141 - aco: Unset 16 and 24-bit flags from operands in apply_extract.
4145 - aco: Fix invalid usage of std::fill with std::array.
4150 - aco: Use Builder reference in emit_copies_block.
4151 - aco: Skip code paths to emit copies when there are no copies.
4152 - aco/optimize_postRA: Use iterators instead of operator[] of std::array.
4153 - aco: Add some useful info to the README for debugging.
4155 - aco: Add ability to optimize v_lshl + v_sub into v_mad_i32_i24.
4156 - aco/isel: Fix emit_vop2_instruction to apply 16/24-bit flags properly.
4168 - aco: Allow p_extract to have different definition and operand sizes.
4169 - aco: Implement integer conversions using p_extract.
4170 - aco: Omit p_extract after ds_read with matching bit size.
4171 - aco: Don't write m0 register for LDS instructions on GFX9+.
4172 - aco: Fix small primitive precision.
4173 - aco: Fix determining whether any culling is enabled.
4178 - aco/optimizer: Skip SDWA on v_lshlrev when unnecessary in apply_extract.
4182 - aco: Fix how p_is_helper interacts with optimizations.
4230 - aco: Separate LLVM/CLRX asm printers more cleanly
4231 - aco: Extend set of supported GPUs that can be disassembled with CLRX
4234 - aco/tests: Assert that the requested IR is actually provided
4235 - aco/spill: Avoid unneeded copies when iterating over maps
4236 - aco: Use std::vector for the underlying container of std::stack
4237 - aco/spill: Remove unused container
4238 - aco/spill: Replace map[] with map::insert
4239 - aco/spill: Avoid copying next_use maps more often than needed
4240 - aco/spill: Persist memory allocations of local next use maps
4241 - aco/spill: Avoid destroying local next use maps over-eagerly
4242 - aco/spill: Replace vector<map> with vector<vector> for local_next_use
4243 - aco/spill: Prefer unordered_map over map for next use distances
4244 - aco/spill: Avoid copying current_spills when not needed
4245 - aco/spill: Reduce redundant std::map lookups
4246 - aco/spill: Replace an std::map to booleans with std::set
4247 - aco/spill: Store remat list in an std::unordered_map instead of std::map
4248 - aco/spill: Change worklist to a single integer
4249 - aco/spill: Reduce allocations in next_uses_per_block
4250 - aco/spill: Clarify use of long-lived references by adding const
4251 - aco/spill: Use unordered_map for spills_exit
4252 - aco/spill: Use std::unordered_map for spills_entry