Skip to content

Trivial build fix for SIMD_HAS_SUBSCRIPT_GATHER#6

Closed
prlw1 wants to merge 315 commits intoGSI-HPC:mainfrom
prlw1:patch-1
Closed

Trivial build fix for SIMD_HAS_SUBSCRIPT_GATHER#6
prlw1 wants to merge 315 commits intoGSI-HPC:mainfrom
prlw1:patch-1

Conversation

@prlw1
Copy link
Copy Markdown

@prlw1 prlw1 commented Jul 17, 2025

No description provided.

... to work without conversion to private base

ChangeLog:

	* simd.h (basic_simd::basic_simd): Use _M_as_base() to convert
	to base class.
	(basic_simd::_M_as_base): New.
	* simd_mask.h (basic_simd_mask::basic_simd_mask): Call
	_Impl::_S_convert directly.
ChangeLog:

	* simd.h: Disable construction from a contiguous range.
ChangeLog:

	* simd.h: Use to_address(it) instead of addressof(*it)
ChangeLog:

	* simd.h (simd::operator<<): New.
	(simd::operator>>): New.
ChangeLog:

	* simd_reductions.h (reduce): New masked overloads.
	Constrain _BinaryOperation to avoid ambiguity with masked
	overload.
ChangeLog:

	* simd.h (simd::copy_to): Implement masked store without use of
	the std::experimental base types.
ChangeLog:

	* detail.h (__detail::__arithmetic): Moved to simd_abi.h.
	(__detail::__vectorizable): Likewise.
	* simd.h (simd::operator[]): Implement directly (without calling
	into _Base). Add special case for _AbiArray.
	* simd_abi.h (__detail::_AbiArray): New.
	(__detail::_SimdImplArray): New.
	(__detail::_MaskImplArray): New (stub).
	(__detail::_DeduceAbi): Prefer _AbiArray over _AbiCombine.
ChangeLog:

	* permute.h (simd_permute): Mark helper lambda as always_inline.
ChangeLog:

	* constexpr_tests.c++: Test __static_range_size.
	* detail.h (__detail::__static_range_size): Remove constraint,
	fix SFINAE, and handle C-arrays.
New shorthand __pv2 for std::experimental::parallelism_v2 and
std::experimental::parallelism_v2::__proposed.

ChangeLog:

	* Makefile: Fix check-skylake-avx512 target name.
	* constexpr_tests.c++: Add new tests. Replace __detail with
	__pv2 scope.
	* constexpr_wrapper.h: New file. Copied from vir-simd. Add
	literals.
	* detail.h: Move __arithmetic and __vectorizable from
	simd_abi.h. Add __pv2 qualification.
	* fwddecl.h: Define __pv2 namespace. Declare basic_simd and
	basic_simd_mask. Declare mask and simd reductions, simd_split,
	and simd_cat.
	* mask_reductions.h: Use __pv2 instead of __detail.
	* simd_split.h: Likewise.
	* simd_reductions.h: Likewise. Remove defaults that are now in
	fwddecl.h.
	* simd.h: Use __pv2 instead of __detail. Don't inherit
	stdx::simd anymore.
	(operator[]): Complete range check/assume. Add new case for
	array _M_data.
	(_M_is_constprop): Add case for _Impl::_S_is_constprop.
	* simd_abi.h: Use __pv2 instead of __detail. More complete
	_AbiArray and implementation. Constrain vectorizable template
	parameters. Pass arrays by const-ref.
	(_SimdImplArray::_S_masked_assign): New.
	(_SimdImplArray::_S_is_constprop): New.
	(_MaskImplArray): New.
	(_SimdTupleMeta): New.
	(_SimdTupleData): New.
	(_SimdTuple): New.
	(_SimdImplAbiCombine): New.
	(_MaskImplAbiCombine::_S_generator): New.
	* simd_mask.h: Use __pv2 instead of __detail.
	(operator[]): Complete range check/assume. Add new case for
	array _M_data.
ChangeLog:

	* Makefile: Add -fconcepts-diagnostics-depth=3.
	* constexpr_tests.c++: Test for random_access_range and not
	output_range.
	* simd.h (simd::begin, simd::end): New.
	* simd_iterator.h: New file.
	* simd_mask.h (simd_mask::begin, simd_mask::end): New.
ChangeLog:

	* constexpr_wrapper.h:
ChangeLog:

	* loads_and_stores/ce.cpp:
Returns an unspecified value if none_of(mask) is true.

ChangeLog:

	* detail.h:
	* mask_reductions.h:
	* mask_reductions/ce.cpp: New file.
ChangeLog:

	* fwddecl.h:
	* mask_reductions.h:
	* mask_reductions/ce.cpp:
ChangeLog:

	* mask_reductions.h:
	* simd_abi.h:
ChangeLog:

	* simd.h:
Update constexpr_wrapper from vir-simd repo.

ChangeLog:

	* constexpr_wrapper.h:
ChangeLog:

	* fwddecl.h (simd_alignment, simd_alignment_v): New.
	* simd.h (simd_alignment): Partial specializations for
	basic_simd and basic_simd_mask.
Copied and modified large parts of the vec-builtin and x86
implementation in <experimental/simd>.

Branching on CPU features now uses conditions derived from template
parameters, not globals. This should make it easier to adopt multi-arch
/ multi-veclen compilation at some point (once the compiler supports
it).

Reduced the number of template and lambda instantiations, to reduce
compile time and space requirements. Notably, iterations on index spaces
now use a single lambda instead of a function template invoking one
lambda specialization per index. Also removing the use of std::invoke
has a *huge* impact.

ChangeLog:

	* Makefile: Add more tests and compile them in many different
	configurations.
	* arm_detail.h: New file.
	* constexpr_tests.c++: Add another test for size 7.
	* constexpr_wrapper.h: Update copyright.
	* detail.h: Copy and adjust several functions from
	experimental/simd.
	Add _FloatingPointFlags, _MachineFlags, and _BuildFlags /
	__build_flags.
	Explicitly list the vectorizable types. Preliminary support for
	std::float(16|32|64)_t.
	* detail_bitmask.h: New file.
	* flags.h: Add TS-like load/store flags.
	* fwddecl.h: Remove TS usage. Add _VecAbi, _Avx512Abi, and
	_ScalarAbi.
	* interleave.h: Update copyright.
	* iota.h: Update copyright. Reduce lambda instantiations.
	* mask_reductions.h: Move x86 specific code into simd_x86.h.
	Call into Abi implementation types when available.
	* permute.h: Update copyright. Replace always_inline macro.
	* power_detail.h: New file.
	* simd.h: Add mossing unary operators +, -, ~, ++, --.
	Constrain all operators on using the same operator on the
	value-type.
	* simd_abi.h: Implement ABI tag deduction.
	* simd_builtin.h: New file.
	* simd_config.h: New file.
	* simd_converter.h: New file.
	* simd_iterator.h: Update copyright.
	* simd_mask.h: Remove base class and implement ABI-specific /
	implementation-defined conversions.
	* simd_reductions.h: Move x86 optimization into simd_x86.h.
	* simd_scalar.h: New file.
	* simd_split.h: Remove <experimental/simd> dependency.
	* simd_x86.h: New file.
	* tests/misc.cpp: New file.
	* tests/shift_left.cpp: New file.
	* tests/shift_right.cpp: New file.
	* unittest.h: New file.
	* x86_detail.h: Add _MachineFlags, __x86_builtin_int,
	__to_x86_intrin, and __movmsk.
ChangeLog:

	* Makefile: Generate all check targets and help targets without
	using a shell using the foreach and eval make functions. Call a
	submake in the check recipes with a randomized set of concrete
	checks to run.
Change _VecAbi and _Avx512Abi to use number of elements as template
parameter instead of number of bytes. This simplifies _S_size,
_S_full_size, and _S_is_partial, which don't need to be templates
anymore. More importantly, it removes the need for passing the
value-type as a template parameter to some of the Impl functions.

Remove unused _SimdBase and _MaskBase.

Have _SimdTraits depend on build flags, adding a new _SimdMaskTraits to
work out the right ABI for AVX w/o AVX2.

ChangeLog:

	* arm_detail.h: Include detail.h.
	* constexpr_tests.c++: Add sanity checks relating to AVX w/o
	AVX2.
	* detail.h (__make_dependent): New.
	(_SimdTraits): Add __build_flags template argument.
	(_SimdMaskTraits): New.
	* fwddecl.h (_VecAbi, _Avx512Abi): Change template parameter
	name.
	(__native_abi_impl_recursive): Adjust for change from bytes to
	width.
	(_DeduceAbi): Move default definition to simd_abi.h.
	* power_detail.h: Include detail.h.
	* simd.h (simd::mask_type): Test for void not vectorizable to
	document intent.
	(basic_simd(basic_simd_mask) deduction guide): Defer ABI tag
	deduction to __simd_abi_for_mask trait.
	* simd_abi.h: Adjust for ABI tag template parameter change from
	bytes to width.
	(_AbiCombine): Move from std::__detail to std namespace.
	(_SimdImplArray::_S_masked_assign): Handle some AVX w/o AVX2
	cases.
	* simd_builtin.h: Adjust for ABI tag template parameter change
	from bytes to width.
	* simd_mask.h (__simd_abi_for_mask): New.
	* simd_scalar.h: Remove template heads from _S_size,
	_S_full_size, and _S_is_partial. Remove _SimdBase, _MaskBase.
	* simd_x86.h: Adjust for ABI tag template parameter change.
	(_SimdMaskTraits): Specialize for the AVX w/o AVX2 case.
	(_ImplBuiltin::_S_load): Overload for AVX512 bitmasks.
	(_ImplBuiltin::_S_select_bitmask): Swap argument names. Enable
	constexpr eval.
	(_ImplBuiltin::_S_select): Enable constexpr eval. Disambiguate
	overloads when _MaskMember<_TV> is a bitmask type.
	(_ImplBuiltin::_S_masked_assign): Broadcast scalar argument to a
	vector when calling _S_select_bitmask.
	(_ImplBuiltin::_S_bit_and, _S_bit_or, _S_bit_xor, _S_to_bits):
	Overload for bitmasks.
	(_ImplBuiltin::_S_bit_shift_right, _S_bit_shift_left): Replace
	several reinterpret_cast with __vec_bitcast_trunc on return.
	Adjust ABI/Impl type needed after template parameter change.
	Fix conditions for sizeof<16 inputs.
	(_ImplBuiltin::_S_ldexp): Use __make_dependent to instantiate
	_Rebind only on use of _S_ldexp. Start of supporting sizeof<16
	inputs.
	* unittest.h (instantiate_tests_for_value_type): Sanity check
	that if simd<T, N> is usable, then the corresponding mask is
	also usable.
	* x86_detail.h (__x86_builtin_fp): New.
	(__to_x86_intrin): Normalize floating-point types using
	__x86_builtin_fp.
60k check targets:
- 0.6s for 'make help', listing all 60k targets
- 0.5s for 'make debug', parsing the whole Makefile and a bit of output

All check targets are shuffled differently on every make invocation
without significant overhead. The 'check' target works without sub-make,
whereas all the other check-% targets recurs once (which might become a
problem with too long command lines).

Per -march, a header is generated, a PCH is built from it, and the
header is automatically included into the builds.

ChangeLog:

	* Makefile: Rewrite.
ChangeLog:

	* Makefile: Remove stale help/% target. Accommodate new
	unittest.h location. Remove -I. flag. Fix required check target.
	* tests/misc.cpp: Adjust unittest.h include.
	* tests/shift_left.cpp: Likewise.
	* tests/shift_right.cpp: Likewise.
	* unittest.h: Renamed to unittest.h.
ChangeLog:

	* Makefile: Require 0 failed tests or fail with non-zero exit
	status.
Use icerun for running tests to enable -j<large number>.

ChangeLog:

	* Makefile: Run tests in icerun. Compile and link in one step if
	DIRECT is non-empty. Document DIRECT in help target.
	Unconditionally set DIRECT=1 without icecream. Build without
	icecream wrapper but with icerun when DIRECT is non-empty.
mattkretz and others added 27 commits March 8, 2025 22:32
ChangeLog:

	* Makefile.common:
ChangeLog:

	* .github/workflows/CI.yml: Renamed to Clang.yml.
	* .github/workflows/GCC.yml: New file.
	* README.md:
ChangeLog:

	* Makefile.common: Remove std::byte from testtypes.
	* README.md: Document SIMD_STD_BYTE.
	* bits/fwddecl.h (__is_vectorizable): Enable specialization for
	std::byte.
	* bits/simd.h (basic_simd): The conversion constructor needs to
	check whether function-style cast rather can constructible_from
	for enums.
	(operator[]): Use static_cast rather than relying on implicit
	conversion (needed for std::byte).
	* bits/simd_config.h: Add SIMD_STD_BYTE.
	* bits/simd_meta.h (__loadstore_convertible_to): For std::byte
	this needs to check for explicit rather than implicit
	conversion.
	* bits/vec_detail.h (__canonical_vec_type): Add partial
	specialization for all enums, mapping to their underlying_type.
	* tests/arithmetic.cpp: Constrain tests.
	* tests/misc.cpp: Likewise.
	* tests/simd_alg.cpp: These tests need to work for std::byte.
	But to setup the test values, no arithmetic operators can be
	used anymore.
	* tests/unittest.h: Add std::byte.
	* tests/unittest_pch.h (test_iota_max): Reduce
	numeric_limits::max if Max NTTP is negative. Add partial
	specialization for enums.
	(test_iota): Pass negative Max NTTP on to test_iota_max.
ChangeLog:

	* bits/simd_x86.h (_SimdConverter): constrain to different _From
	and _To types.
... which was used for debugging the test.

ChangeLog:

	* tests/shift_right.cpp (constant_p): Remove.
ChangeLog:

	* README.md:
ChangeLog:

	* README.md: Document iota, permute, and simd_mask <->
	bitset/int paper implementation status.
	* bits/fwddecl.h (zero_element, uninit_element): New.
	* bits/permute.h (permute_zero): Remove.
	(permute): Refactor to call _S_permute.
	* bits/simd_builtin.h (_S_permute): New.
	* bits/simd_meta.h (__index_permutation_function_size): Change
	size argument type to int.
	* bits/simd_scalar.h (_S_permute): New.
ChangeLog:

	* bits/fwddecl.h: Move math exposition-only concepts and traits
	here from simd_math.h for declarations of math functions.
	(isfinite, isunordered): Declare.
	* bits/simd_math.h (isfinite, isunordered): New.
	* bits/mask_reductions.h (all_of, any_of, none_of): Call ABI
	implementation functions with _M_data member instead of
	basic_simd_mask object.
	* bits/simd_abi.h (_S_any_of, _S_all_of, _S_none_of): Change
	interface from basic_simd_mask to _MaskMember.
	* bits/simd_builtin.h (_S_any_of, _S_all_of, _S_none_of):
	Likewise.
	* bits/simd_scalar.h (_S_any_of, _S_all_of, _S_none_of):
	Likewise.
	* bits/simd_x86.h (_S_any_of, _S_all_of, _S_none_of): Likewise.
	(_S_divides): Use masked divp[hsd] with AVX10/AVX512 on partial
	vectors.
	(_S_isnan): Implement consteval/constprop variant for
	_Avx512Abi.
ChangeLog:

	* bits/fwddecl.h (__deduce_t): Use __canonical_vec_type to
	reduce the number of possible _DeduceAbi specializations. Add
	_PrefAbi argument.
	* bits/vec_detail.h (__canonical_vec_type): Moved to fwddecl.h.
	* bits/simd.h (rebind, resize): Determine new ABI tag using
	_Rebind on ingoing ABI tag.
	* bits/simd_abi.h (_AbiList::_S_A0_is_valid): Take
	_S_defer_to_scalar_abi into consideration.
	(_Rebind): New.
	(_AbiMaxSize): New.
	(__is_valid_preferred_abi): New.
	(_DeduceAbi): Add _PrefAbi argument. Change existing partial
	specializations to trigger on _NoAbiPreference. Add partial
	specialization for given _PrefAbi.
	* bits/simd_builtin.h (_S_defer_to_scalar_abi): New.
	(_Rebind): Simplify to __deduce_t with _VecAbi.
	* bits/simd_scalar.h (_S_defer_to_scalar_abi, _Rebind): New.
	* bits/simd_x86.h (_S_defer_to_scalar_abi): New.
	(_Rebind): Simplify to __deduce_t with _Avx512Abi.
	* constexpr_tests.c++: Add _AbiMaxSize and ABI deduction tests.
ChangeLog:

	* bits/simd.h: Add a reason to deleted functions.
	* bits/simd_mask.h: Likewise.
ChangeLog:

	* bits/simd_abi.h (_SimdTuple): Change integral_constants into
	plain constexpr ints.
Recently __has_single_bit, __bit_ceil, __bit_floor, etc. have started to
require unsigned integers, rejecting int. Therefore, cast to unsigned
where it was called with a signed int.

Fixes GSI-HPCgh-1

ChangeLog:

	* bits/detail.h:
	* bits/mask_reductions.h:
	* bits/simd_abi.h:
	* bits/simd_builtin.h:
	* bits/simd_converter.h:
	* bits/simd_reductions.h:
	* bits/simd_x86.h:
	* bits/vec_detail.h (__glibcxx_simd_erroneous_unless): New.
	(__is_power2_minus_1): Moved from detail.h and rewritten to not
	use bit functions.
	(__signed_has_single_bit, __signed_bit_ceil): New.
	(__signed_bit_floor): New.
	* constexpr_tests.c++:
ChangeLog:

	* Makefile.common: Pass -p to mkdir.
ChangeLog:

	* tests/unittest_pch.h: Avoid wrap-around to negative from
	numeric_limits<long long>::max().
ChangeLog:

	* bits/simd.h: Use _GLIBCXX_DELETE_MSG instead of delete.
	* bits/simd_mask.h: Likewise.
This should make the warning go away for users that have warnings in
system headers enabled.

ChangeLog:

	* Makefile.common: Remove -Wno-psabi from CXXFLAGS.
	* bits/simd_x86.h: Ignore -Wpsabi diagnostic.
	* bits/vec_detail.h: Likewise.
ChangeLog:

	* .github/workflows/Clang.yml: Run on cplusplus-ci:latest.
	* .github/workflows/GCC.yml: Likewise.
	* .github/workflows/build-ci-docker.yml: New file.
	* Dockerfile: New file.
ChangeLog:

	* .github/workflows/Clang.yml: Adjust container image URL.
	* .github/workflows/GCC.yml: Likewise.
	* .github/workflows/build-ci-docker.yml: Removed.
	* Dockerfile: Removed.
ChangeLog:

	* .github/workflows/GCC.yml:
ChangeLog:

	* Makefile:
	* README.md:
ChangeLog:

	* Makefile.common: Determine obj directory from SIMD_OBJ_SUBST
	if defined.
ChangeLog:

	* bits/simd_x86.h (_S_divides): Add initial branch for consteval
	and constprop => never call __builtin_ia32_... in those cases.
ChangeLog:

	* bits/simd_builtin.h (_S_permute): Rewrite using
	integral_constants instead of plain int.
ChangeLog:

	* Makefile.common: Replace obj with $(objdir)
	* Makefile.more: Ditto.
	* Makefile: Ditto.
ChangeLog:

	* Makefile.common:
	* Makefile:
@mattkretz
Copy link
Copy Markdown
Collaborator

Thank you for that fix. At this point the main branch (and its derived branches) are unlikely to survive. I'm focusing on a different architecture (see 'rewrite' branch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants