Lines Matching refs:aco
230 - aco: sun flickering with Assassins Creeds Origins
232 - aco: wrong geometry with Assassins Creed Origins on GFX6
265 - aco: Minor optimization in spill_ctx constructor
266 - aco: pass vars by const &
1185 - aco: fix image_atomic_cmp_swap
1195 - aco: add comparison operators for PhysReg
1196 - aco: add sub-dword regclasses
1197 - aco: refactor regClass setup for subdword VGPRs
1198 - aco: validate p_create_vector with subdword elements properly
1199 - aco: validate register alignment of subdword operands and definitions
1200 - aco: validate uninitialized operands
1201 - aco: validate RA of subdword assignments
1202 - aco: print subdword registers
1203 - aco: fix Temp and assignment of renamed operands during RA
1204 - aco: remove unnecessary reg_file.fill() operation in
1206 - aco: add notion of subdword registers to register allocator
1207 - aco: create helper function to collect variables from register area
1208 - aco: adapt register allocation for subdword registers
1209 - aco: align subdword registers during RA when necessary
1210 - aco: small refactoring of shuffle code lowering
1211 - aco: add builder function for subdword copy()
1212 - aco: lower subdword shuffles correctly.
1213 - aco: don't propagate SGPRs into subdword PSEUDO instructions
1214 - aco: don't assume split_vector(create_vector) has the same number of
1216 - aco: don't vectorize 8/16bit load/store_ssbo
1217 - aco: add missing conversion operations for small bitsizes
1218 - aco: add byte_align_scalar() & trim_subdword_vector() helper
1220 - aco: prepare helper functions for subdword handling
1221 - aco: implement vec2/3/4 with subdword operands
1222 - aco: implement storagePushConstant8 & storagePushConstant16
1223 - aco: implement 8bit/16bit load_buffer
1224 - aco: implement 8bit/16bit store_ssbo
1225 - aco: use MUBUF to load subdword SSBO
1226 - aco: guarantee that Temp fits in 4 bytes
1227 - aco: add explicit padding for all Instruction sub-structs
1228 - aco: improve hashing for value numbering
1229 - aco: improve register assignment when live-range splits are necessary
1230 - aco: replace assignment hashmap by std::vector in register allocation
1231 - aco: during RA only insert into renames table if a variable got
1233 - aco: improve speed of live_var_analysis
1234 - aco: refactor try_remove_trivial_phi() in RA
1235 - aco: change some std::map to std::unordered_map in
1237 - aco: change live_out variables to std::unordered_set
1238 - aco: move all needed helper containers to ra_ctx
1239 - aco: RA - move all std::function objects into proper functions
1240 - aco: setup subdword regclasses for ssa_undef & load_const
1241 - aco: ensure correct bit representation of subdword constants
1242 - aco: don't constant-propagate into subdword PSEUDO instructions
1243 - aco: lower subdword phis with SGPR operands
1244 - aco: rename aco_lower_bool_phis() -> aco_lower_phis()
1245 - aco: make some reg_file helpers private and fix their uses
1246 - aco: fix p_extract_vector optimization in presence of unequally sized
1248 - aco: use v_subrev_f32 for fsub with an sgpr operand in src1
1249 - aco: fix 64bit fsub
1250 - aco: move src1 to vgpr instead of using VOP3 for VOP2 instructions
1252 - aco: simplify operand handling in RA
1253 - aco: refactor get_reg() to take Temp instead of RegClass
1254 - aco: refactor get_reg() to also handle affinities
1255 - aco: create pseudo dummy instruction in RA to be used for live-range
1257 - aco: create and use DefInfo struct in RA
1258 - aco: use DefInfo in more places to simplify RA
1259 - aco: move attempt to find strided register into get_reg_simple()
1260 - aco: allocate full register for subdword definitions if HW doesn't
1262 - aco: don't create vector affinities for operands which are not killed
1264 - aco: refactor get_reg_simple() to return early on exact matches
1265 - aco: stop get_reg_simple after reaching max_used_gpr
1266 - aco: try to always find a register with stride for even sizes
1267 - aco: use upper part of gap in register file if it is beneficial for
1269 - aco: coalesce v_mad's accumulator with definition's affinities
1270 - aco: either copy-propagate or inline create_vector operands
1609 - aco: Fix signed-vs-unsigned warning.
2407 - aco: Implement b2b32 and b2b1
3256 - android: aco: fix PIPE_FORMAT related building errors
3259 - android: aco: add various compiler statistics
3492 - aco: fix gfx10_wave64_bpermute
3493 - aco: gfx10_wave64_bpermute reduce op to print_ir
3494 - aco: disable some instruction combining if it could change an exec
3496 - aco: improve SCC handling in some SALU combines
3501 - aco: add RegisterFile
3502 - aco: add some helpers for filling/testing register ranges
3503 - aco: improve GFX9 1D ddx/ddy assertion
3506 - aco: keep track of which events are used in a barrier
3507 - aco: fix carry-out size for wave32 v_add_co_u32_e64
3508 - aco: handle v_add_co_u32_e64 in parse_base_offset()
3509 - aco: add new NOP insertion pass for GFX6-9
3510 - aco: improve get_wait_states()
3511 - aco: consider non-hazard writes in handle_raw_hazard_internal
3512 - aco: improve control flow handling in GFX6-9 NOP pass
3513 - aco: only reserve sgprs for vcc if it's used
3514 - aco: fix uninitialized data error in waitcnt pass
3516 - aco: add helpers for moving instructions for scheduling
3517 - aco: add helpers for ensuring correct ordering while scheduling
3518 - aco: allow barriers to be skipped during scheduling
3519 - aco: don't stop scheduling at exports
3520 - aco: move some register demand helpers into aco_live_var_analysis.cpp
3521 - aco: add a late kill flag
3522 - aco: set late kill for v_interp_p1_f32 for some APUs
3523 - aco: fix instruction encoding for LS VGPR init bug workaround
3524 - aco: fix operand order for LS VGPR init bug workaround
3528 - aco: set has_divergent_branch for discards in loops
3529 - aco: handle missing second predecessors at merge block phis
3530 - aco: handle when ACO adds new continue edges
3531 - aco: skip NIR in unreachable merge blocks
3532 - aco: improve check for unreachable loop continue blocks
3533 - aco: emit IR in IF's merge block instead if the other side ends in a
3535 - aco: fix boolean undef regclass
3537 - aco: remove dead code in handle_operands()
3538 - aco: implement 64-bit VGPR constant copies in handle_operands()
3539 - aco: look at p_{extract,split}_vector's definitions in
3544 - aco: add various compiler statistics
3545 - aco: add vmem/smem score statistic
3546 - radv, aco: collect statistics if requested but executables are not
3548 - aco: make PhysReg in units of bytes
3549 - aco: add SDWA_instruction
3550 - aco: print and validate opsel
3551 - aco: add emission support for register-allocated sdwa sels
3552 - aco: remove divergence check in sanitize_if()
3553 - aco: zero-initialize Temp
3554 - aco: improve vector optimization with sub-dword vectors
3555 - aco: fix p_extract_vector validation
3556 - aco: improve p_create_vector RA for sub-dword operands
3557 - aco: clear moved operands in get_reg_create_vector()
3558 - aco: fix 1D textureGrad() on GFX9
3559 - aco: implement various 8/16-bit conversions
3560 - aco: add missing scc clobber to nir_op_unpack_32_2x16_split_y
3561 - aco: fix copy statistic for 64-bit vgpr constant copy
3562 - aco: add VOP3P_instruction
3563 - aco: implement sub-dword swaps
3564 - aco: implement 64-bit sgpr swaps
3568 - aco: decrease the uses of other copy operations after
3570 - aco: copy-propagate p_create_vector copies of vectors
3571 - aco: remove copy in load_input_from_temps()
3572 - aco: move call to store_output_to_temps in store_ls_or_es_output
3574 - aco: combine VALU and SALU into various VOP3 instructions
3575 - aco: improve code for 32-bit isign
3576 - aco: fix v_or(s_lshl) and v_add(s_lshl) optimizations
3577 - aco: fix outdated label_vec from p_create_vector labelling
3580 - aco: be more careful about using SMEM for load_global
3581 - aco: add and use RegClass::get() helper
3582 - aco: add emit_load helper
3583 - aco: refactor load_lds to use new helpers
3584 - aco: use emit_load helper for VMEM/SMEM loads
3585 - aco: add helpers for splitting stores
3586 - aco: refactor store_lds() to use new helpers
3587 - aco: refactor store_vmem_mubuf() to use new helpers
3588 - aco: refactor visit_store_ssbo() to use new helpers
3589 - aco: refactor visit_store_global() to use new helpers
3590 - aco: refactor visit_store_scratch() to use new helpers
3591 - aco: add and use get_buffer_store_op() helper
3592 - aco: allow 8/16-bit shared loads
3593 - aco: vectorize global loads/stores
3594 - aco: handle undef p_create_vector operands in the optimizer
3595 - aco: clobber scc in s_bfe_u32 in get_alu_src()
3596 - aco: improve sub-dword emit_split_vector() with sgprs
3597 - aco: lower 8/16-bit integer arithmetic
3598 - radv/aco: enable 8/16-bit storage and int8/int16 on GFX8+
3599 - aco: make RegisterFile::block() take a regclass
3600 - aco: check alignment of non-subdword registers in get_reg_specified()
3601 - aco: fix neighboring register check in get_reg_simple()
3602 - aco: split self-intersecting copies instead of swapping
3603 - aco: don't recurse in sub-dword get_reg_simple()
3604 - aco: improve RA for uneven p_split_vector
3605 - aco: add missing adjust_max_used_regs()
3606 - aco: fix sub-dword out-of-bounds check in RA validator
3607 - aco: fix sub-dword overwrite check in RA validator
3608 - aco: add various GFX10 int16 opcodes
3609 - aco: improve clamped integer addition disassembly workaround
3610 - aco: fix vgpr nir_op_vecn with sgpr operands
3611 - aco: consider blocks unreachable if they are in the logical cfg
3612 - aco: remove use of f-strings
3613 - aco: add message to static_assert
3816 - aco: fix MUBUF VS input loads when expanding vec3 to vec4 on GFX6
3817 - aco: do not use ds_{read,write}2 on GFX6
3819 - aco: fix waiting for scalar stores before "writing back" data on
3823 - aco: fix creating v_madak if v_mad_f32 has two sgpr literals
3872 - aco: fix image load/store with lod and 1D images
3943 - aco: only break SMEM clauses if XNACK is enabled (mostly APUs)
3944 - aco: always optimize v_mad to v_madak in presence of literals
3964 - aco: implement 16-bit nir_op_frexp_sig/nir_op_frexp_exp
3965 - aco: implement 16-bit nir_op_ffract
3966 - aco: implement 16-bit nir_op_fexp2/nir_op_flog2
3967 - aco: implement 16-bit nir_op_ftrunc/nir_op_fround_even
3968 - aco: implement 16-bit nir_op_fsqrt/nir_op_frcp/nir_op_frsq
3969 - aco: implement 16-bit nir_op_ffloor/nir_op_fceil
3970 - aco: implement 16-bit nir_op_fmax/nir_op_fmin
3971 - aco: implement 16-bit nir_op_fabs/nir_op_fneg
3972 - aco: implement 16-bit nir_op_fsub/nir_op_fadd
3973 - aco: implement 16-bit nir_op_fcos/nir_op_fsin
3974 - aco: implement 16-bit nir_op_fmul
3975 - aco: implement 16-bit nir_op_fsat
3976 - aco: implement 16-bit nir_op_fsign
3977 - aco: implement 16-bit nir_op_bcsel
3978 - aco: implement 16-bit nir_op_f2i32/nir_op_f2u32
3979 - aco: implement 16-bit nir_op_ldexp
3980 - aco: implement 16-bit nir_op_fmax3/nir_op_fmin3/nir_op_fmed3
3981 - aco: implement 16-bit comparisons
3982 - aco: implement nir_op_b2f16/nir_op_i2f16/nir_op_u2f16
3983 - aco: fix f2i64/f2u64 with sgprs if the exponent computation overflow
3984 - aco: implement 16-bit nir_op_f2i64/nir_op_f2u64
3985 - aco: fix nir_op_pack_32_2x16_split if one operand is a constant
3988 - aco: fix nir_op_frexp_exp with 16-bit floats and negative exponents
3989 - radv/aco: do not advertise VK_KHR_shader_subgroup_extended_types
3990 - aco: implement nir_op_f2i8/nir_op_f2u8
3991 - aco: fix emitting stream output with tess eval shaders
3997 - aco: fix exporting the viewport index if the fragment shader needs it
4015 - aco: fix nir_texop_texture_samples with NULL descriptors
4016 - aco: fix adjusting the sample index with FMASK if value is negative
4024 - aco: fix 64-bit trunc with negative exponents on GFX6
4154 - aco/optimizer: Don't combine uniform bool s_and to s_andn2.
4156 - aco: Extract setup_gs_variables into a separate function.
4157 - aco: Setup tessellation control shader variables.
4158 - aco: Implement load_tess_coord.
4159 - aco: Implement load_primitive_id for tessellation shaders.
4160 - aco: Implement load_patch_vertices_in.
4161 - aco: Implement load_invocation_id for tessellation control shaders.
4162 - aco: Implement control_barrier for tessellation control shaders.
4163 - aco: Implement memory_barrier_tcs_patch.
4164 - aco: Implement load_view_index for TCS and TES.
4165 - aco: Setup correct HW stages when tessellation is used.
4166 - aco: Use mesa shader stage when loading inputs.
4167 - aco: Remove vertex_geometry_gs assertion from merged shaders.
4168 - aco: Extract LDS alignment calculation to a separate function.
4169 - aco: Remove esgs_itemsize from LDS alignment calculation.
4170 - aco: Introduce new VMEM load/store helpers.
4171 - aco: Introduce new helpers for calculating address offsets.
4172 - aco: Refactor load_per_vertex_input in preparation for tessellation.
4173 - aco: Refactor VS output stores in preparation for tessellation.
4174 - aco: Slight fix to lds_store and lds_load.
4175 - aco: Fix combining DS additions in the optimizer.
4176 - aco: Implement tessellation control shader input/output.
4177 - aco: Store VS outputs correctly when tessellation is used.
4178 - aco: Fix LS VGPR init bug on affected hardware.
4180 - aco: Setup tessellation evaluation shader variables.
4181 - aco: Use TES output info when TES runs on the VS stage.
4182 - aco: Store TES outputs when TES runs on the HW VS stage.
4183 - aco: Enable streamout when TES runs on the HW VS stage.
4184 - aco: Implement loading TES inputs.
4186 - aco: Enable running TES as ES, including merged TES+GS.
4188 - aco: Don't generate an if when the first part of a merged HS or GS is
4190 - aco: Store tess factors in VMEM only at the end of the shader.
4191 - aco: Only write TCS outputs to LDS when they are read by the TCS.
4192 - aco: Don't store TCS outputs to LDS when we're sure that none are
4197 - aco: Create null exports in instruction selection instead of
4199 - aco: Extract tcs_driver_location_matches_api_mask to separate
4201 - aco: Fix handling of tess factors.
4202 - aco: Allow combining TCS output VMEM stores.
4203 - aco: Allow combining LDS loads when loading tess factors.
4204 - aco: Skip 2nd read of merged wave info when TCS in/out vertices are
4206 - aco: Use more optimal sequence at the beginning of merged shaders.
4208 - aco: Treat outputs of the previous stage as inputs of the next stage.
4209 - aco: Change isel inputs/outputs to a flat array.
4210 - aco: Zero-fill undefined elements in create_vec_from_array.
4211 - aco: Extract setup_tcs_info to a separate function.
4212 - aco: Fix workgroup size calculation.
4213 - aco: Extract store_output_to_temps into a separate function.
4214 - aco: When LS and HS invocations are the same, pass LS outputs in
4216 - aco: Don't store LS VS outputs to LDS when TCS doesn't need them.
4217 - aco: Fix crash in insert_wait_states.
4218 - aco: Extract uniform if handling to separate functions.
4219 - aco: Print block_kind_export_end.
4220 - aco: Extract merged_wave_info_to_mask to its own function.
4221 - aco: Treat s_setprio as a scheduling barrier.
4222 - aco/ngg: Add new stage for hw_ngg_gs.
4223 - aco/ngg: Initialize exec mask for NGG VS and TES.
4224 - aco/ngg: Fix exports for NGG VS and TES.
4225 - aco/ngg: Setup NGG VS and TES stages.
4226 - aco/ngg: Implement NGG VS and TES.
4227 - aco/ngg: Schedule position exports of NGG VS/TES.
4228 - aco/ngg: Run GS_ALLOC_REQ on priority 3 for NGG VS and TES.
4230 - aco: Print shader stage in aco_print_program.
4233 - aco: Only store TCS outputs to VMEM when they are read by TES.
4234 - aco: Increase barrier_count to 7 to include barrier_barrier.
4235 - aco: Abort when RA can't find a register.
4236 - aco: Const correctness for get_barrier_interaction.
4237 - aco: Const correctness for aco_print_ir.
4238 - aco: Use 24-bit multiplication in TCS I/O
4239 - aco: Use 24-bit multiplication for NGG wave id and thread id.
4240 - aco: Move s_setprio to correct place after the gs_alloc_req.
4242 - aco: Use context variables instead of calculating TCS inputs/outputs.
4243 - aco: Remember VS/TCS output driver locations.
4244 - aco: Calculate workgroup size of legacy GS.
4245 - aco: Set config->lds_size when TES or VS is running on HW ESGS.
4248 - aco: Use new default driver locations.