20.0.0.rst - OpenGrok cross reference for /external/mesa3d/docs/relnotes/20.0.0.rst

Lines Matching refs:aco
62 -  aco: sun flickering with Assassins Creeds Origins
64 -  aco: wrong geometry with Assassins Creed Origins on GFX6
78 -  aco: implement GFX6 support
85 -  aco: Dead Rising 4 crashes in lower_to_hw_instr() on GFX6-GFX7
92 -  [Navi/aco] Guild Wars 2 - ring gfx timeout with commit 3bca0af2
93 -  [radv/aco] Regression is causing a soft crash in The Witcher 3
158 -  radv/aco Jedi Fallen Order hair rendering buggy
693 -  aco: Constify radv_nir_compiler_options in isel
694 -  aco: Use radv_shader_args in aco_compile_shader()
695 -  aco: Split vector arguments at the beginning
696 -  aco: Make num_workgroups and local_invocation_ids one argument each
698 -  aco: Use common argument handling
699 -  aco: Make unused workgroup id's 0
715 -  aco: fix immediate offset for spills if scratch is used
716 -  aco: only use single-dword loads/stores for spilling
717 -  aco: fix accidential reordering of instructions when scheduling
718 -  aco: workaround Tonga/Iceland hardware bug
719 -  aco: fix invalid access on Pseudo_instructions
720 -  aco: preserve kill flag on moved operands during RA
721 -  aco: rematerialize s_movk instructions
722 -  aco: check if SALU instructions are predeceeded by exec when
724 -  aco: value number instructions using the execution mask
725 -  aco: use s_and_b64 exec to reduce uniform booleans to one bit
728 -  aco: don't value-number instructions from within a loop with ones
730 -  aco: don't split live-ranges of linear VGPRs
731 -  aco: fix a couple of value numbering issues
732 -  aco: refactor visit_store_fs_output() to use the Builder
733 -  aco: Initial GFX7 Support
734 -  aco: SI/CI - fix sampler aniso
735 -  aco: fix SMEM offsets for SI/CI
736 -  aco: implement nir_op_fquantize2f16 for SI/CI
737 -  aco: only use scalar loads for readonly buffers on SI/CI
738 -  aco: implement nir_op_isign on SI/CI
739 -  aco: move buffer_store data to VGPR if needed
740 -  aco: implement quad swizzles for SI/CI
741 -  aco: recognize SI/CI SMRD hazards
742 -  aco: fix disassembly of writelane instructions.
743 -  aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
744 -  aco: implement 64bit VGPR shifts for SI/CI
745 -  aco: make 1/2*PI a literal constant on SI/CI
746 -  aco: implement 64bit i2b for SI /CI
747 -  aco: implement 64bit ine/ieq for SI/CI
748 -  aco: disable disassembly for SI/CI due to lack of support by LLVM
750 -  aco: flush denorms after fmin/fmax on pre-GFX9
751 -  aco: don't use a scalar temporary for reductions on GFX10
752 -  aco: implement (clustered) reductions for SI/CI
753 -  aco: implement inclusive_scan for SI/CI
754 -  aco: implement exclusive scan for SI/CI
756 -  aco: return to loop_active mask at continue_or_break blocks
758 -  aco: use soffset for MUBUF instructions on SI/CI
759 -  aco: improve readfirstlane after uniform ssbo loads on GFX7
760 -  aco: propagate temporaries into expanded vectors
762 -  aco: compact various Instruction classes
763 -  aco: compact aco::span<T> to use uint16_t offset and size instead of
765 -  aco: fix unconditional demote_to_helper
766 -  aco: rework lower_to_cssa()
767 -  aco: handle phi affinities transitively through parallelcopies
768 -  aco: ignore parallelcopies to the same register on jump threading
769 -  aco: fix combine_salu_not_bitwise() when SCC is used
770 -  aco: reorder VMEM operands in ACO IR
771 -  aco: fix register allocation with multiple live-range splits
772 -  aco: simplify adjust_sample_index_using_fmask() & get_image_coords()
773 -  aco: simplify gathering of MIMG address components
775 -  aco: fix image_atomic_cmp_swap
831 -  aco: handle gfx7 int8/10 clamping on exports
1922 -  aco: use NIR_MAX_VEC_COMPONENTS instead of 4
2524 -  android: aco: fix Lower to CSSA
2760 -  aco: add Instruction::usesModifiers() and add more checks in the
2763 -  aco: use DPP instead of exec modification when lowering GFX10
2765 -  aco: fix shuffle with uniform operands
2767 -  aco: fix read_invocation with VGPR lane index
2768 -  aco: don't propagate vgprs into v_readlane/v_writelane
2769 -  aco: combine read_invocation and shuffle implementations
2771 -  aco: don't combine literals into v_cndmask_b32/v_subb/v_addc
2772 -  aco: fix 64-bit fsign with 0
2773 -  aco: implement VK_KHR_shader_float_controls
2774 -  aco: refactor reduction lowering helpers
2775 -  aco: implement 64-bit integer reductions
2776 -  radv/aco: enable VK_KHR_shader_subgroup_extended_types
2781 -  aco: improve waitcnt insertion around loops
2782 -  aco: fix copy+paste error
2783 -  aco: fix waitcnts for barriers at block ends
2788 -  aco: enable load/store vectorizer
2789 -  aco: allow constant offsets for global/scratch instructions on GFX10
2790 -  aco: set dlc/glc correctly for image loads
2791 -  aco: propagate p_wqm on an image_sample's coordinate p_create_vector
2792 -  aco: fix i2i64
2793 -  aco: fix incorrect cast in parse_wait_instr()
2794 -  aco: add v_nop inbetween exec write and VMEM/DS/FLAT
2795 -  aco: improve WAR hazard workaround with >64bit stores
2796 -  aco: fix GFX10 opcodes for some global/flat atomics
2797 -  aco: fix assembly of FLAT/GLOBAL atomics
2798 -  aco: fix SADDR with FLAT on GFX10
2799 -  aco: don't enable store_global for helper invocations
2800 -  aco: improve FLAT/GLOBAL scheduling
2801 -  aco: implement global atomics
2805 -  aco: validate the CFG
2806 -  aco: handle loop exit and IF merge phis with break/discard
2807 -  aco: fix block_kind_discard s_andn2 definition to exec
2811 -  aco/wave32: fix comparison optimizations
2812 -  aco: improve jump threading with wave32
2813 -  aco: fix vgpr alloc granule with wave32
2814 -  aco: limit register usage for large work groups
2815 -  aco: set vm for pos0 exports on GFX10
2816 -  aco: fix imageSize()/textureSize() with large buffers on GFX8
2817 -  aco: fix uninitialized data in the binary
2818 -  aco: handle VOP3 modifiers when combining a constant comparison's NaN
2820 -  aco: handle omod successors with the constant in the first operand
2821 -  aco: check usesModifiers() when identifying a neg/abs
2822 -  aco: better handle neg/abs of sgprs
2823 -  aco: set exec_potentially_empty for demotes
2824 -  aco: don't DCE atomics with return values
2825 -  aco: disable add combining for ds_swizzle_b32
2826 -  aco: check if multiplication/clamp is live when applying output
2830 -  aco: update IR validator
2831 -  aco: apply literals to split mads
2832 -  aco: combine two sgprs into a VALU if they're the same
2833 -  aco: improve can_use_VOP3()
2834 -  aco: rewrite literal combining
2835 -  aco: rewrite apply_sgprs()
2836 -  aco: add check_vop3_operands()
2837 -  aco: be more careful with literals in combine_salu_{n2,lshl_add}
2838 -  aco: follow through temporary when merging tests into constant
2840 -  aco: allow applying two sgprs to an instruction
2841 -  aco: allow an extra SGPR with multiple uses to be applied to VOP3
2842 -  aco: take advantage of GFX10's constant bus limit and VOP3 literals
2843 -  aco: improve creation of v_madmk_f32/v_madak_f32
2844 -  aco: fix clamp optimization
2845 -  aco: improve clamp optimization
2846 -  aco: add min(-max(), ) and max(-min(), ) optimization
2847 -  aco: don't move literal to reg when making an instruction VOP3 on
2849 -  aco: allow input modifiers on v_cndmask_b32
2850 -  aco: replace extract_vector with copies
2851 -  aco: improve readfirstlane after uniform LDS loads
2852 -  aco: add integer min/max to can_swap_operands
2856 -  aco: fix stack buffer overflow in apply_sgprs()
2857 -  aco: fix fall-through test in try_remove_simple_block() with
2859 -  aco: fix operand kill flags when a temporary is used more than once
2860 -  aco: fix off-by-one error when initializing sgpr_live_in
2862 -  aco: improve support for s_sendmsg
2863 -  radv/aco,aco: implement GS on GFX9+
2864 -  aco: implement GS on GFX7-8
2865 -  radv/aco: allow ACO for GS
2866 -  aco: explicitly mark end blocks for exports
2867 -  aco: remove needs_instance_id
2868 -  aco: implement GS copy shaders
2869 -  radv/aco: use ACO for GS copy shaders
2870 -  aco: use nir_move_copies
2871 -  aco: fix WaR check for >64-bit FLAT/GLOBAL instructions
2872 -  aco: fix operand to scc when selecting SGPR ufind_msb/ifind_msb
2873 -  aco: always add sgprs to sgpr_ids when choosing literals
2874 -  aco: fix literal application with v_cndmask_b32/v_addc_co_u32/etc
2876 -  aco: rework vertex fetching a bit
2877 -  aco: skip unused channels at the start when fetching vertices
2878 -  aco: handle unaligned vertex fetch on GFX10
2879 -  aco: value-number MUBUF instructions
2880 -  aco: use MUBUF in some situations instead of splitting vertex fetches
2881 -  aco: fix rebase error from GS copy shader support
2882 -  aco: ensure predecessors' p_logical_end is in WQM when a p_phi is in
2884 -  aco: run p_wqm instructions in WQM
2887 -  aco: fix target calculation when vgpr spilling introduces sgpr
2889 -  aco: don't consider loop header blocks branch blocks in
2891 -  aco: don't update demand in add_coupling_code() for loop headers
2892 -  aco: only create parallelcopy to restore exec at loop exit if needed
2893 -  aco: don't always add logical edges from continue_break blocks to
2895 -  aco: error when block has no logical preds but VGPRs are live at the
2897 -  aco: set exec_potentially_empty after continues/breaks in nested IFs
2898 -  aco: improve assertion at the end of spiller
2899 -  aco: fill reg_demand with sensible information in add_coupling_code()
2900 -  aco: parallelcopy exec mask before s_wqm
2901 -  aco: fix exec mask consistency issues
2902 -  aco: fix gfx10_wave64_bpermute
3098 -  aco: drop useless lowering of deref operations for shared memory
3144 -  aco: handle nir_intrinsic_image_deref_{load,store} with lod
3177 -  aco: fix emitting SMEM instructions with no operands on GFX6-GFX7
3178 -  aco: do not select 96-bit/128-bit variants for ds_read/ds_write on
3180 -  aco: do not combine additions of DS instructions on GFX6
3181 -  aco: implement stream output with vec3 on GFX6
3182 -  aco: fix emitting slc for MUBUF instructions on GFX6-GFX7
3183 -  aco: print assembly with CLRXdisasm for GFX6-GFX7 if found on the
3185 -  aco: fix constant folding of SMRD instructions on GFX6
3186 -  aco: do not use the vec3 variant for stores on GFX6
3187 -  aco: do not use the vec3 variant for loads on GFX6
3188 -  aco: add new addr64 bit to MUBUF instructions on GFX6-GFX7
3189 -  aco: implement nir_intrinsic_load_barycentric_at_sample on GFX6
3199 -  aco: add support for nir_texop_fragment_{mask}_fetch
3201 -  aco: fix printing assembly with CLRXdisasm on GFX6
3202 -  aco: fix wrong IR in nir_intrinsic_load_barycentric_at_sample
3203 -  aco: implement nir_intrinsic_store_global on GFX6
3204 -  aco: implement nir_intrinsic_load_global on GFX6
3205 -  aco: implement nir_intrinsic_global_atomic\_\* on GFX6
3206 -  aco: implement 64-bit nir_op_ftrunc on GFX6
3207 -  aco: implement 64-bit nir_op_fceil on GFX6
3208 -  aco: implement 64-bit nir_op_fround_even on GFX6
3209 -  aco: implement 64-bit nir_op_ffloor on GFX6
3210 -  aco: implement nir_op_f2i64/nir_op_f2u64 on GFX6
3212 -  aco: combine MRTZ (depth, stencil, sample mask) exports
3213 -  aco: fix a hardware bug for MRTZ exports on GFX6
3214 -  aco: fix a hazard with v_interp\_\* and v_{read,readfirst}lane\_\* on
3216 -  aco: copy the literal offset of SMEM instructions to a temporary
3232 -  aco: implement VK_AMD_shader_explicit_vertex_parameter
3238 -  aco: fix VS input loads with MUBUF on GFX6
3243 -  aco: fix MUBUF VS input loads when expanding vec3 to vec4 on GFX6
3244 -  aco: do not use ds_{read,write}2 on GFX6
3245 -  aco: fix waiting for scalar stores before "writing back" data on
3247 -  aco: fix creating v_madak if v_mad_f32 has two sgpr literals
3399 -  aco: Make sure not to mistakenly propagate 64-bit constants.
3400 -  aco: Treat all booleans as per-lane.
3401 -  aco: Optimize out trivial code from uniform bools.
3402 -  aco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.
3403 -  aco: Remove superfluous argument from emit_boolean_logic.
3404 -  aco: Remove lower_linear_bool_phi, it is not needed anymore.
3405 -  aco: Optimize load_subgroup_id to one bit field extract instruction.
3406 -  aco/wave32: Change uniform bool optimization to work with wave32.
3407 -  aco/wave32: Replace hardcoded numbers in spiller with wave size.
3408 -  aco/wave32: Introduce emit_mbcnt which takes wave size into account.
3409 -  aco/wave32: Add wave size specific opcodes to aco_builder.
3410 -  aco/wave32: Use lane mask regclass for exec/vcc.
3411 -  aco/wave32: Fix load_local_invocation_index to support wave32.
3412 -  aco/wave32: Use wave_size for barrier intrinsic.
3413 -  aco/wave32: Allow setting the subgroup ballot size to 64-bit.
3414 -  aco/wave32: Fix reductions.
3415 -  aco: Fix uniform i2i64.
3417 -  aco/wave32: Set the definitions of v_cmp instructions to the lane
3419 -  aco: Implement 64-bit constant propagation.
3420 -  aco: Allow optimizing vote_all and nir_op_iand.
3421 -  aco: Don't skip combine_instruction when definitions[1] is used.
3422 -  aco: Optimize out s_and with exec, when used on uniform bitwise
3424 -  aco: Flip s_cbranch / s_cselect to optimize out an s_not if possible.
3431 -  aco: Fix -Wstringop-overflow warnings in aco_span.
3432 -  aco: Fix maybe-uninitialized warnings.
3433 -  aco: Fix signedness compare warning.
3434 -  aco: Make a better guess at which instructions need the VCC hint.
3435 -  aco: Transform uniform bitwise instructions to 32-bit if possible.
3436 -  aco/gfx10: Fix VcmpxExecWARHazard mitigation.
3437 -  aco: Fix the meaning of is_atomic.
3438 -  aco/optimizer: Don't combine uniform bool s_and to s_andn2.