• Home
  • Raw
  • Download

Lines Matching refs:aco

62 -  aco: sun flickering with Assassins Creeds Origins
64 - aco: wrong geometry with Assassins Creed Origins on GFX6
78 - aco: implement GFX6 support
85 - aco: Dead Rising 4 crashes in lower_to_hw_instr() on GFX6-GFX7
92 - [Navi/aco] Guild Wars 2 - ring gfx timeout with commit 3bca0af2
93 - [radv/aco] Regression is causing a soft crash in The Witcher 3
158 - radv/aco Jedi Fallen Order hair rendering buggy
693 - aco: Constify radv_nir_compiler_options in isel
694 - aco: Use radv_shader_args in aco_compile_shader()
695 - aco: Split vector arguments at the beginning
696 - aco: Make num_workgroups and local_invocation_ids one argument each
698 - aco: Use common argument handling
699 - aco: Make unused workgroup id's 0
715 - aco: fix immediate offset for spills if scratch is used
716 - aco: only use single-dword loads/stores for spilling
717 - aco: fix accidential reordering of instructions when scheduling
718 - aco: workaround Tonga/Iceland hardware bug
719 - aco: fix invalid access on Pseudo_instructions
720 - aco: preserve kill flag on moved operands during RA
721 - aco: rematerialize s_movk instructions
722 - aco: check if SALU instructions are predeceeded by exec when
724 - aco: value number instructions using the execution mask
725 - aco: use s_and_b64 exec to reduce uniform booleans to one bit
728 - aco: don't value-number instructions from within a loop with ones
730 - aco: don't split live-ranges of linear VGPRs
731 - aco: fix a couple of value numbering issues
732 - aco: refactor visit_store_fs_output() to use the Builder
733 - aco: Initial GFX7 Support
734 - aco: SI/CI - fix sampler aniso
735 - aco: fix SMEM offsets for SI/CI
736 - aco: implement nir_op_fquantize2f16 for SI/CI
737 - aco: only use scalar loads for readonly buffers on SI/CI
738 - aco: implement nir_op_isign on SI/CI
739 - aco: move buffer_store data to VGPR if needed
740 - aco: implement quad swizzles for SI/CI
741 - aco: recognize SI/CI SMRD hazards
742 - aco: fix disassembly of writelane instructions.
743 - aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
744 - aco: implement 64bit VGPR shifts for SI/CI
745 - aco: make 1/2*PI a literal constant on SI/CI
746 - aco: implement 64bit i2b for SI /CI
747 - aco: implement 64bit ine/ieq for SI/CI
748 - aco: disable disassembly for SI/CI due to lack of support by LLVM
750 - aco: flush denorms after fmin/fmax on pre-GFX9
751 - aco: don't use a scalar temporary for reductions on GFX10
752 - aco: implement (clustered) reductions for SI/CI
753 - aco: implement inclusive_scan for SI/CI
754 - aco: implement exclusive scan for SI/CI
756 - aco: return to loop_active mask at continue_or_break blocks
758 - aco: use soffset for MUBUF instructions on SI/CI
759 - aco: improve readfirstlane after uniform ssbo loads on GFX7
760 - aco: propagate temporaries into expanded vectors
762 - aco: compact various Instruction classes
763 - aco: compact aco::span<T> to use uint16_t offset and size instead of
765 - aco: fix unconditional demote_to_helper
766 - aco: rework lower_to_cssa()
767 - aco: handle phi affinities transitively through parallelcopies
768 - aco: ignore parallelcopies to the same register on jump threading
769 - aco: fix combine_salu_not_bitwise() when SCC is used
770 - aco: reorder VMEM operands in ACO IR
771 - aco: fix register allocation with multiple live-range splits
772 - aco: simplify adjust_sample_index_using_fmask() & get_image_coords()
773 - aco: simplify gathering of MIMG address components
775 - aco: fix image_atomic_cmp_swap
831 - aco: handle gfx7 int8/10 clamping on exports
1922 - aco: use NIR_MAX_VEC_COMPONENTS instead of 4
2524 - android: aco: fix Lower to CSSA
2760 - aco: add Instruction::usesModifiers() and add more checks in the
2763 - aco: use DPP instead of exec modification when lowering GFX10
2765 - aco: fix shuffle with uniform operands
2767 - aco: fix read_invocation with VGPR lane index
2768 - aco: don't propagate vgprs into v_readlane/v_writelane
2769 - aco: combine read_invocation and shuffle implementations
2771 - aco: don't combine literals into v_cndmask_b32/v_subb/v_addc
2772 - aco: fix 64-bit fsign with 0
2773 - aco: implement VK_KHR_shader_float_controls
2774 - aco: refactor reduction lowering helpers
2775 - aco: implement 64-bit integer reductions
2776 - radv/aco: enable VK_KHR_shader_subgroup_extended_types
2781 - aco: improve waitcnt insertion around loops
2782 - aco: fix copy+paste error
2783 - aco: fix waitcnts for barriers at block ends
2788 - aco: enable load/store vectorizer
2789 - aco: allow constant offsets for global/scratch instructions on GFX10
2790 - aco: set dlc/glc correctly for image loads
2791 - aco: propagate p_wqm on an image_sample's coordinate p_create_vector
2792 - aco: fix i2i64
2793 - aco: fix incorrect cast in parse_wait_instr()
2794 - aco: add v_nop inbetween exec write and VMEM/DS/FLAT
2795 - aco: improve WAR hazard workaround with >64bit stores
2796 - aco: fix GFX10 opcodes for some global/flat atomics
2797 - aco: fix assembly of FLAT/GLOBAL atomics
2798 - aco: fix SADDR with FLAT on GFX10
2799 - aco: don't enable store_global for helper invocations
2800 - aco: improve FLAT/GLOBAL scheduling
2801 - aco: implement global atomics
2805 - aco: validate the CFG
2806 - aco: handle loop exit and IF merge phis with break/discard
2807 - aco: fix block_kind_discard s_andn2 definition to exec
2811 - aco/wave32: fix comparison optimizations
2812 - aco: improve jump threading with wave32
2813 - aco: fix vgpr alloc granule with wave32
2814 - aco: limit register usage for large work groups
2815 - aco: set vm for pos0 exports on GFX10
2816 - aco: fix imageSize()/textureSize() with large buffers on GFX8
2817 - aco: fix uninitialized data in the binary
2818 - aco: handle VOP3 modifiers when combining a constant comparison's NaN
2820 - aco: handle omod successors with the constant in the first operand
2821 - aco: check usesModifiers() when identifying a neg/abs
2822 - aco: better handle neg/abs of sgprs
2823 - aco: set exec_potentially_empty for demotes
2824 - aco: don't DCE atomics with return values
2825 - aco: disable add combining for ds_swizzle_b32
2826 - aco: check if multiplication/clamp is live when applying output
2830 - aco: update IR validator
2831 - aco: apply literals to split mads
2832 - aco: combine two sgprs into a VALU if they're the same
2833 - aco: improve can_use_VOP3()
2834 - aco: rewrite literal combining
2835 - aco: rewrite apply_sgprs()
2836 - aco: add check_vop3_operands()
2837 - aco: be more careful with literals in combine_salu_{n2,lshl_add}
2838 - aco: follow through temporary when merging tests into constant
2840 - aco: allow applying two sgprs to an instruction
2841 - aco: allow an extra SGPR with multiple uses to be applied to VOP3
2842 - aco: take advantage of GFX10's constant bus limit and VOP3 literals
2843 - aco: improve creation of v_madmk_f32/v_madak_f32
2844 - aco: fix clamp optimization
2845 - aco: improve clamp optimization
2846 - aco: add min(-max(), ) and max(-min(), ) optimization
2847 - aco: don't move literal to reg when making an instruction VOP3 on
2849 - aco: allow input modifiers on v_cndmask_b32
2850 - aco: replace extract_vector with copies
2851 - aco: improve readfirstlane after uniform LDS loads
2852 - aco: add integer min/max to can_swap_operands
2856 - aco: fix stack buffer overflow in apply_sgprs()
2857 - aco: fix fall-through test in try_remove_simple_block() with
2859 - aco: fix operand kill flags when a temporary is used more than once
2860 - aco: fix off-by-one error when initializing sgpr_live_in
2862 - aco: improve support for s_sendmsg
2863 - radv/aco,aco: implement GS on GFX9+
2864 - aco: implement GS on GFX7-8
2865 - radv/aco: allow ACO for GS
2866 - aco: explicitly mark end blocks for exports
2867 - aco: remove needs_instance_id
2868 - aco: implement GS copy shaders
2869 - radv/aco: use ACO for GS copy shaders
2870 - aco: use nir_move_copies
2871 - aco: fix WaR check for >64-bit FLAT/GLOBAL instructions
2872 - aco: fix operand to scc when selecting SGPR ufind_msb/ifind_msb
2873 - aco: always add sgprs to sgpr_ids when choosing literals
2874 - aco: fix literal application with v_cndmask_b32/v_addc_co_u32/etc
2876 - aco: rework vertex fetching a bit
2877 - aco: skip unused channels at the start when fetching vertices
2878 - aco: handle unaligned vertex fetch on GFX10
2879 - aco: value-number MUBUF instructions
2880 - aco: use MUBUF in some situations instead of splitting vertex fetches
2881 - aco: fix rebase error from GS copy shader support
2882 - aco: ensure predecessors' p_logical_end is in WQM when a p_phi is in
2884 - aco: run p_wqm instructions in WQM
2887 - aco: fix target calculation when vgpr spilling introduces sgpr
2889 - aco: don't consider loop header blocks branch blocks in
2891 - aco: don't update demand in add_coupling_code() for loop headers
2892 - aco: only create parallelcopy to restore exec at loop exit if needed
2893 - aco: don't always add logical edges from continue_break blocks to
2895 - aco: error when block has no logical preds but VGPRs are live at the
2897 - aco: set exec_potentially_empty after continues/breaks in nested IFs
2898 - aco: improve assertion at the end of spiller
2899 - aco: fill reg_demand with sensible information in add_coupling_code()
2900 - aco: parallelcopy exec mask before s_wqm
2901 - aco: fix exec mask consistency issues
2902 - aco: fix gfx10_wave64_bpermute
3098 - aco: drop useless lowering of deref operations for shared memory
3144 - aco: handle nir_intrinsic_image_deref_{load,store} with lod
3177 - aco: fix emitting SMEM instructions with no operands on GFX6-GFX7
3178 - aco: do not select 96-bit/128-bit variants for ds_read/ds_write on
3180 - aco: do not combine additions of DS instructions on GFX6
3181 - aco: implement stream output with vec3 on GFX6
3182 - aco: fix emitting slc for MUBUF instructions on GFX6-GFX7
3183 - aco: print assembly with CLRXdisasm for GFX6-GFX7 if found on the
3185 - aco: fix constant folding of SMRD instructions on GFX6
3186 - aco: do not use the vec3 variant for stores on GFX6
3187 - aco: do not use the vec3 variant for loads on GFX6
3188 - aco: add new addr64 bit to MUBUF instructions on GFX6-GFX7
3189 - aco: implement nir_intrinsic_load_barycentric_at_sample on GFX6
3199 - aco: add support for nir_texop_fragment_{mask}_fetch
3201 - aco: fix printing assembly with CLRXdisasm on GFX6
3202 - aco: fix wrong IR in nir_intrinsic_load_barycentric_at_sample
3203 - aco: implement nir_intrinsic_store_global on GFX6
3204 - aco: implement nir_intrinsic_load_global on GFX6
3205 - aco: implement nir_intrinsic_global_atomic\_\* on GFX6
3206 - aco: implement 64-bit nir_op_ftrunc on GFX6
3207 - aco: implement 64-bit nir_op_fceil on GFX6
3208 - aco: implement 64-bit nir_op_fround_even on GFX6
3209 - aco: implement 64-bit nir_op_ffloor on GFX6
3210 - aco: implement nir_op_f2i64/nir_op_f2u64 on GFX6
3212 - aco: combine MRTZ (depth, stencil, sample mask) exports
3213 - aco: fix a hardware bug for MRTZ exports on GFX6
3214 - aco: fix a hazard with v_interp\_\* and v_{read,readfirst}lane\_\* on
3216 - aco: copy the literal offset of SMEM instructions to a temporary
3232 - aco: implement VK_AMD_shader_explicit_vertex_parameter
3238 - aco: fix VS input loads with MUBUF on GFX6
3243 - aco: fix MUBUF VS input loads when expanding vec3 to vec4 on GFX6
3244 - aco: do not use ds_{read,write}2 on GFX6
3245 - aco: fix waiting for scalar stores before "writing back" data on
3247 - aco: fix creating v_madak if v_mad_f32 has two sgpr literals
3399 - aco: Make sure not to mistakenly propagate 64-bit constants.
3400 - aco: Treat all booleans as per-lane.
3401 - aco: Optimize out trivial code from uniform bools.
3402 - aco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.
3403 - aco: Remove superfluous argument from emit_boolean_logic.
3404 - aco: Remove lower_linear_bool_phi, it is not needed anymore.
3405 - aco: Optimize load_subgroup_id to one bit field extract instruction.
3406 - aco/wave32: Change uniform bool optimization to work with wave32.
3407 - aco/wave32: Replace hardcoded numbers in spiller with wave size.
3408 - aco/wave32: Introduce emit_mbcnt which takes wave size into account.
3409 - aco/wave32: Add wave size specific opcodes to aco_builder.
3410 - aco/wave32: Use lane mask regclass for exec/vcc.
3411 - aco/wave32: Fix load_local_invocation_index to support wave32.
3412 - aco/wave32: Use wave_size for barrier intrinsic.
3413 - aco/wave32: Allow setting the subgroup ballot size to 64-bit.
3414 - aco/wave32: Fix reductions.
3415 - aco: Fix uniform i2i64.
3417 - aco/wave32: Set the definitions of v_cmp instructions to the lane
3419 - aco: Implement 64-bit constant propagation.
3420 - aco: Allow optimizing vote_all and nir_op_iand.
3421 - aco: Don't skip combine_instruction when definitions[1] is used.
3422 - aco: Optimize out s_and with exec, when used on uniform bitwise
3424 - aco: Flip s_cbranch / s_cselect to optimize out an s_not if possible.
3431 - aco: Fix -Wstringop-overflow warnings in aco_span.
3432 - aco: Fix maybe-uninitialized warnings.
3433 - aco: Fix signedness compare warning.
3434 - aco: Make a better guess at which instructions need the VCC hint.
3435 - aco: Transform uniform bitwise instructions to 32-bit if possible.
3436 - aco/gfx10: Fix VcmpxExecWARHazard mitigation.
3437 - aco: Fix the meaning of is_atomic.
3438 - aco/optimizer: Don't combine uniform bool s_and to s_andn2.