Lines Matching refs:aco
1233 - aco: either copy-propagate or inline create_vector operands
1234 - aco: coalesce parallelcopies during register allocation
1240 - aco: fix WQM coalescing
1241 - aco: restrict copying of create_vector operands to GFX9+
1242 - aco: don't move create_vector subdword operands to unsupported register offsets
1243 - aco: fix corner case in register allocation
1244 - aco: don't allow unaligned subdword accesses on GFX6/7
1245 - aco: fix register assignment for p_create_vector on GFX6/7
1246 - aco: simplify statistics collection for copies
1247 - aco: use full-register instructions to implement subdword packing on GFX6/7
1248 - aco: Workarounds subdword lowering on GFX6/7
1249 - aco: adjust GFX6 subdword lowering workarounds for 8bit
1250 - aco: add and use scratch SGPR to lower subdword p_create_vector on GFX6/7
1251 - aco: coalesce copies more aggressively when lowering to hw
1252 - aco: skip partial copies on first iteration when lowering to hw
1253 - aco: optimize packing of 16bit subdword registers on GFX6/7
1254 - aco: remove unnecessary split- and create_vector instructions for subdword loads
1255 - aco: fix shared subdword loads
1256 - aco: reorder calls to aco_validate() and cleanup aco_compile_shader()
1257 - aco: don't allow SGPRs on logical phis
1258 - aco: fix WQM handling in nested loops
1259 - radv/aco: implement logic64 instead of lowering
1260 - aco: align swap operations to 4 bytes on GFX6/7
1261 - aco: don't allow partial copies on GFX6/7
1265 - aco: fix partial copies on GFX6/7
1266 - aco: remove superflous (bool & exec) if the result comes from VOPC
1279 - aco: fix scratch loads which cross element_size boundaries
1280 - aco: ensure to not extract more components than have been fetched
1281 - aco: don't split store data if it was already split into more elements
1282 - aco: prevent infinite recursion in RA for subdword variables
1283 - aco: ensure readfirstlane subdword operands are always dword aligned
1285 - aco: add GFX6/7 subdword lowering tests
1286 - aco: execute branch instructions in WQM if necessary
2647 - aco: Use nir_foreach_variable_with_modes to walk SSBOs
3450 - android: aco: add aco_ir.cpp to Makefile.sources
3661 - aco: Don't declare 'Block' as class, but define as struct.
3662 - aco: Don't std::move temporary object.
3663 - aco: Use correct reference type in for-range-loop.
3791 - aco: remove use of f-strings
3792 - aco: add message to static_assert
3799 - aco: simplify consecutive ordered vmem/lds writes optimization
3800 - aco: fix consecutively written vgprs from vmem instructions
3801 - aco: mark phi definitions as last-seen phi operands
3802 - aco: consider affinities when creating v_mac_f32
3803 - aco: improve phi affinities with p_split_vector
3804 - aco: split operations that use a swap's definition
3805 - aco: fix disassembly with LLVM 11
3808 - aco: fix typo in insert_waitcnt's kill()
3810 - aco: fix interaction with 3f branch workaround and p_constaddr
3811 - aco: consider SDWA during value numbering
3812 - aco: check instruction format before waiting for a previous SMEM store
3813 - aco: preserve more fields when combining additions into SMEM
3814 - aco: don't reorder barriers in the scheduler
3815 - aco: fix 64-bit shared_atomic_exchange
3819 - aco: use v_xor3_b32
3820 - aco: validate instructions reading/writing upper halves/bytes
3821 - aco: p_extract_vector in 64-bit u2f16/i2f16
3822 - aco: allow reading/writing upper halves/bytes when possible
3823 - aco: prefer 4-byte aligned definitions
3824 - aco: add Info::{operand_size,definition_size}
3825 - aco: use Info::definition_size instead of definition's regclass
3826 - aco: fix moving sub-dword values out of a register for a fixed definition
3827 - aco: use num_opcodes instead of last_opcode
3828 - aco: improve code for f2{i,u}{8,16}
3829 - aco: use p_as_uniform in emit_vop1_instruction
3830 - aco: add and set precise flag
3831 - aco: create mads when signed zeros should be preserved
3832 - aco: try to use fma instead of mad when denormals are enabled
3833 - aco: create 16-bit mad/fma
3834 - aco: update comment about preserving fp16/fp64 denormals
3835 - aco: create 16-bit input and output modifiers
3836 - aco: improve sub-dword check for sgpr/constant propagation
3837 - aco: fix half_pi constant for 16-bit fsin/fcos
3838 - aco: use 32-bit inline constants for 16-bit integer instructions
3839 - aco: improve 8/16-bit constants
3840 - aco: copy-propagate constants through p_extract_vector/p_split_vector
3841 - aco: optimize 16-bit and 64-bit float comparisons
3842 - aco: validate sub-dword pseudo instructions
3843 - aco: add more opcodes to can_swap_operands
3844 - aco: allow GFX9 partial writes with instructions which use opsel
3845 - aco: improve check for moving temporaries out of fixed definitions
3846 - aco: fix encoding of certain s_setreg_imm32_b32 instructions
3847 - aco: fix validation error from vgpr spill/restore code
3848 - aco: fix sub-dword opsel/sdwa checks
3849 - aco: fix validation of opsel when set for the definition
3850 - aco: shrink ssa_info
3851 - aco: make ssa_info::label 64-bit
3852 - aco: shrink mad_info
3853 - aco: fix edge check with sub-dword temporaries
3854 - aco: use the same regclass as the definition for undef phi operands
3861 - aco: only use SMEM if we can prove it's safe
3862 - aco: allow SMEM for some sub-dword accesses
3863 - radv/aco,aco: allow SMEM SSBO loads on GFX6/7
3864 - aco: fix copy+paste error in split_buffer_store
3865 - aco: don't store byte-aligned short stores
3866 - aco: add missing bld.scc() in byte_align_scalar()
3867 - aco: don't create byte-aligned short loads
3868 - aco: fix when sub-dword create_vector operand cannot be placed perfectly
3869 - aco: improve vectorization of 8/16-bit loads/stores
3870 - aco: ignore blocked registers when checking edges in get_reg_impl()
3871 - aco: remove outdated assert in handle_operands()
3873 - aco: use VOP2 version of v_mbcnt_hi_u32_b32 on GFX6/7
3874 - aco: rework boolean phi pass
3875 - aco: create better code for boolean phis with constant operands
3876 - aco: optimize boolean phis with uniform selections
3877 - aco: don't create phis with undef operands in the boolean phi pass
3878 - aco: read 0 from inactive lanes when using dpp
3879 - aco: optimize some masked swizzles to DPP
3880 - aco: implement <32-bit masked_swizzle_amd
3884 - aco: add 32-bit integer addition to can_swap_operands
3885 - aco: fix underestimated pressure in spiller when a phi has a killed def
3886 - aco: rewrite graph coloring in spiller
3887 - aco: use unordered_set for spill id interferences
3888 - aco: add add_interference() helper
3889 - aco: use s_round_mode/s_denorm_mode
3890 - aco: flush denormals before fp16 fabs/fneg if needed
3891 - aco: fix nir_op_f2f16_rtne with non-default rounding modes
3892 - aco: set tcs_in_out_eq=false if float controls of VS and TCS stages differ
3894 - aco: properly recognize that s_waitcnt mitigates VMEMtoScalarWriteHazard
3895 - aco: use s_waitcnt_depctr to mitigate VMEMtoScalarWriteHazard
3898 - aco: always set FI on GFX10
3900 - aco: implement b2i8/b2i16
3901 - aco: be more careful combining additions that could wrap into loads/stores
3902 - aco: allow overflow for some SMEM instructions
3903 - aco: add NUW flag
3905 - aco: use nir_addition_might_overflow to combine additions into SMEM
3906 - aco: move some setup code into helpers
3907 - aco: make validate() usable in tests
3908 - aco: print ACO IR before scheduling instead of after
3910 - aco: fix copy of uninitialized boolean
3911 - aco: fix includes in aco_ir.cpp
3912 - aco: add missing add_to_hazard_query
3913 - aco: rework barriers and replace can_reorder
3914 - radv/aco,aco: use scoped barriers
3915 - aco: consider intrinsic access in visit_{load,store}_image
3916 - nir,radv/aco: add and use pass to lower make available/visible barriers
3917 - aco: enable value numbering of s_buffer_load_*
3918 - aco: use storage_scratch
3919 - aco: improve sync_info for TCS output stores
3920 - aco: improve workgroup-scope and lower vmem/smem barriers
3921 - aco: create acq+rel barriers instead of acq/rel
3925 - aco: remove isel for GLSL-style barriers
3926 - aco: add framework for unit testing
3927 - aco: add a few tests for the assembler and optimizer
3928 - aco: add framework for testing isel and integration tests
3930 - aco/tests: add tests for sub-dword swaps
3931 - aco: optimize swizzled SALU 8/16-bit conversions
3932 - aco: fix waitcnt insertion on GFX10.3
3933 - aco: don't create v_mad_f32 on GFX10.3
3934 - aco: update bug workarounds for GFX10_3
3935 - aco: fix max_waves_per_simd on Polaris, VegaM and GFX10.3
3936 - aco: update vgpr_alloc_granule for GFX10.3
3937 - aco: implement subgroup shader_clock on GFX10.3
3938 - aco: update aco_opcodes.py for GFX10.3
3939 - aco: disable SMEM stores on GFX10.3
3940 - aco: replace MADs in isel with FMA on GFX10.3
3942 - radv/aco: enable VK_KHR_memory_model
3946 - aco: fix C++11/C++14 compilation
3947 - aco: set constant_data_offset correctly in the case of merged shaders
3948 - aco: don't move memory accesses to before control barriers
3949 - aco: fix non-rtz pack_half_2x16
3950 - aco: consider branch definitions in spiller
3951 - aco: don't consider the first partial spill if it's the wrong type
3952 - aco: don't fix break condition for break+discard to exec
3953 - aco: fix regclass checks when fixing to vcc/exec with Builder
3954 - aco: fix spills_entry heuristic for branch blocks in init_live_in_vars()
3955 - aco: keep loop live-through variables spilled
3956 - aco: reserve 2 sgprs for each branch
3957 - aco: create long jumps
3958 - aco: fix byte_align_scalar for 3 dword vectors
3959 - aco: fix one-off error in Operand(uint16_t)
3961 - aco: fix v_writelane_b32 with two sgprs
3962 - aco: don't apply constant to SDWA on GFX8
3964 - radv,aco: fix reading primitive ID in FS after TES
4267 - aco: remove unecessary p_split_vector with v2b reg class
4283 - aco: fix 64-bit trunc with negative exponents on GFX6
4285 - aco: prevent invalid loads/stores vectorization if robustness is enabled
4289 - aco: optimize add/sub(a, cndmask(b, 0, 1, cond)) -> addc/subbrev_co(0, a, b)
4298 - aco: remove useless check for nir_tex_src_bias
4299 - aco: add support for texturing with clamped LOD
4304 - aco: store 16-bit temporary outputs as v2b
4305 - aco: convert 16-bit values before exporting MRTs
4306 - aco: allow to load/store 16-bit values in VMEM for tess and geom
4307 - aco: implement 8-bit/16-bit mov's with p_create_vector
4308 - aco: implement 16-bit vertex fetches with tbuffer_load_format_d16_*
4309 - aco: validate v_interp_*_f16 as VOP3 instructions instead of VINTRP
4310 - aco: emit v_interp_*_f16 instructions as VOP3 instead of VINTRP
4311 - aco: implement 16-bit interp
4312 - aco: fix off-by-one error with 16-bit MTBUF opcodes on GFX10
4313 - radv/aco: enable storageInputOutput16 on GFX9+
4314 - aco: fix missing break in label_instruction()
4318 - aco: declare 8-bit/16-bit reduce operations
4319 - aco: implement 8-bit/16-bit reductions
4320 - aco: validate 8-bit/16-bit VGPR operands for readfirstlane/readlane/writelane
4321 - aco: implement 8-bit/16-bit nir_intrinsic_read_first_invocation
4322 - aco: implement 8-bit/16-bit nir_intrinsic_{shuffle,_read_invocation}
4323 - aco: implement 8-bit/16-bit nir_intrinsic_quad_*
4324 - aco: use a temporary SGPR for 8-bit/16-bit literal reduction identities
4325 - aco: sign-extend the input and identity for 8-bit subgroup operations
4330 - aco: implement nir_intrinsic_shader_clock with device scope
4337 - aco: add support for bias/lod with texture gather
4341 - radv/aco: enable VK_EXT_subgroup_size_control
4342 - aco: fix register allocation for subdword instructions on GFX10
4343 - aco: implement 8-bit/16-bit reductions on GFX10
4344 - aco: allocate a temp VGPR for some 8-bit/16-bit reduction ops on GFX10
4345 - aco: allow gfx10_wave64_bpermute with 8-bit/16-bit input
4346 - aco: sign-extend input/indentity for 32-bit reduce ops on GFX10
4347 - radv/aco: enable VK_KHR_subgroup_extended_types on GFX8+
4350 - aco: implement 16-bit reduce operations on GFX6-GFX7
4351 - aco: implement 16-bit nir_intrinsic_quad_* on GFX6-GFX7
4352 - aco: fix subdword copies on GFX6-GFX7
4353 - aco: sign-extend input/identity for 16-bit subgroup ops on GFX6-GFX7
4354 - radv/aco: enable 64-bit atomic features if RADV is linked with LLVM 8
4355 - aco: use v_bfe_u32 for unsigned reductions sign-extension on GFX6-GFX7
4356 - aco: fix sign-extend 8-bit subgroup operations on GFX6-GFX7
4357 - aco: fix nir_intrinsic_quad_* with 8-bit in GFX6-GFX7
4358 - radv/aco: enable VK_KHR_shader_subgroup_extended_types on GFX6-GFX7
4362 - aco: implement 8-bit/16-bit conversions on GFX6-GFX7
4363 - aco: fix alignment of vectors with 4 elements
4364 - radv/aco: enable 8-bit/16-bit storage on GFX6-GFX7
4365 - radv/aco: enable shaderInt16 on GFX6-GFX7
4366 - radv/aco: enable shaderInt8 and VK_KHR_shader_float16_int8 on GFX6-GFX7
4370 - aco: implement radv_enable_mrt_output_nan_fixup workaround
4375 - aco: allow to swap operands for some 16-bit float instructions
4377 - radv/aco: enable FP16 features/extensions on GFX9+
4383 - aco: replace == GFX10 with >= GFX10 where it's needed
4386 - aco: fix printing ASM on GFX6-7 if clrxdisasm is not found
4387 - aco: improve validation checks for readlane/writelane
4388 - aco: fix printing ASM on GFX6-7 again
4418 - aco: fix more validation errors from vgpr spill/restore code
4455 - aco: add support for nir_intrinsic_shared_atomic_fadd
4497 - aco: handle unaligned loads on GFX10.3
4631 - aco/gfx10: Refactor of GFX10 wave64 bpermute.
4632 - aco: Implement subgroup shuffle on GFX6-7.
4633 - radv/aco: Always enable subgroup shuffle.
4634 - aco: Fix emit_boolean_exclusive_scan in wave32 mode.
4698 - aco: Fix integer overflows when emitting parallel copies during RA