Math: FFT: Optimizations with Xtensa HiFi code by singalsu · Pull Request #10637 · thesofproject/sof

singalsu · 2026-03-20T15:42:53Z

No description provided.

Copilot

Pull request overview

This PR refactors the multi-radix FFT implementation by moving the generic multi-FFT logic out of fft_multi.c and introducing a HiFi3-optimized implementation using Xtensa intrinsics, plus a small HiFi3 FFT kernel optimization.

Changes:

Split dft3_32() and fft_multi_execute_32() out of fft_multi.c into new generic and HiFi3-specific source files.
Add a new HiFi3-optimized multi-FFT implementation (fft_multi_hifi3.c) using packed complex arithmetic and fused MAC operations.
Optimize the HiFi3 32-bit FFT kernel by special-casing the first stage and skipping the twiddle multiply for j=0.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
test/cmocka/src/math/fft/CMakeLists.txt	Updates unit test build inputs to compile the new split multi-FFT sources.
src/math/fft/fft_multi_hifi3.c	Adds HiFi3-optimized `dft3_32()` and `fft_multi_execute_32()` implementations.
src/math/fft/fft_multi_generic.c	Adds the generic `dft3_32()` and `fft_multi_execute_32()` implementations previously in `fft_multi.c`.
src/math/fft/fft_multi.c	Removes `dft3_32()`/`fft_multi_execute_32()` from this file, leaving plan allocation/free + twiddle table inclusion.
src/math/fft/fft_32_hifi3.c	Optimizes HiFi3 FFT stage execution (skip twiddle multiply in first stage and for `j=0`).
src/math/fft/CMakeLists.txt	Adds the new multi-FFT source files to the build when `CONFIG_MATH_FFT_MULTI` is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/math/fft/fft_multi_generic.c

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/math/fft/fft_multi_generic.c

src/math/fft/fft_multi_hifi3.c

lyakh · 2026-03-23T09:14:26Z

src/math/fft/fft_32_hifi3.c

 			}
+			/* advance pointers past current group's bottom half */
+			top_ptr += n;
+			bot_ptr += n;


just to check - is this really correct? top_ptr and bot_ptr are incremented n times inside the loop, and now n is added to them again?

Yes it is correct, for the loop of k, the top is incremented by m. In the pointer arithmetic version the loop increments them one by one to n. But since n is m/2, need to increment other n.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/math/fft/fft_multi_generic.c

src/math/fft/fft_multi_hifi3.c

Copilot · 2026-03-26T17:55:43Z

src/math/fft/fft_32_hifi3.c

+				sample1 = tw_r[0];
+				sample2 = tw_i[0];


Twiddle loading here assigns int32_t values directly into ae_int32x2 temporaries before packing with AE_SEL32_LH(). In other HiFi3 code paths (e.g. fft_multi_hifi3.c) scalar-to-vector uses AE_MOVDA32() to avoid relying on implicit conversions / lane contents. Please convert tw_r[0]/tw_i[0] using AE_MOVDA32() (or an equivalent intrinsic) before packing, so the real/imag lanes are well-defined and the complex multiply uses the intended coefficients.

Suggested change

sample1 = tw_r[0];

sample2 = tw_i[0];

sample1 = AE_MOVDA32(tw_r[0]);

sample2 = AE_MOVDA32(tw_i[0]);

I like the cleaner look of trivial operations like "=", adding those AE_MOVDA32() is same MCPS and probably same code, but true that it's defined in the manual to know what exactly happens without guessing.

Using now AE_MOVDA32X2().

lgirdwood

Nice MCPS saving, good for me, but pls respond to copilot.

src/math/fft/fft_32_hifi3.c

This patch optimizes the cycle count of the radix-2 Cooley-Tukey implementation with three changes: - Dedicated depth-1 stage: all N/2 butterflies use a real twiddle factor W^0 = 1+0j, so the complex multiply is replaced by plain add or subtract. - Skip multiply for j=0 in stages >= 2: The first butterfly in every group also uses W^0, saving an additional ~N/2 complex multiplications across all remaining stages. - Pointer arithmetic: replace per-butterfly index arithmetic (outx[k+j], outx[k+j+n], twiddle[i*j]) with auto-incrementing pointers and strided twiddle access (tw_r += stride), eliminating integer multiplies for address computation. This change saves 11 MCPS (from 74 MCPS to 63 MCPS) in STFT Process module in MTL platform with 1024/256 size/hop FFT processing. It was tested with scripts: scripts/rebuild-testbench.sh -p mtl scripts/sof-testbench-helper.sh -x -m stft_process_1024_256_ \ -p profile-stft_process.txt Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

This patch adds HiFi3 versions for functions dft3_32() and fft_multi_execute_32(). The functions are implemented to fft_multi_hifi3.c and the generic versions are moved to fft_multi_generic.c. in MTL platform the optimization saves 119 MCPS, from 237 MCPS to 118 MCPS. The test was done with script run: scripts/rebuild-testbench.sh -p mtl scripts/sof-testbench-helper.sh -x -m stft_process_1536_240_ \ -p profile-stft_process.txt The above STFT used FFT length of 1536 with hop 240. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T17:55:40Z

src/math/fft/fft_multi_generic.c

+/* Twiddle factor tables defined in fft_multi.c via twiddle_3072_32.h */
+#define FFT_MULTI_TWIDDLE_SIZE 2048
+extern const int32_t multi_twiddle_real_32[];
+extern const int32_t multi_twiddle_imag_32[];
+
+/* Constants for size 3 DFT */
+#define DFT3_COEFR -1073741824 /* int32(-0.5 * 2^31) */
+#define DFT3_COEFI 1859775393  /* int32(sqrt(3) / 2 * 2^31) */
+#define DFT3_SCALE 715827883   /* int32(1/3*2^31) */


FFT_MULTI_TWIDDLE_SIZE is hard-coded here (2048) even though the authoritative value is already defined in twiddle_3072_32.h. Duplicating this constant in multiple translation units risks the size drifting out of sync with the actual twiddle tables, which would break indexing in the twiddle-multiply loop. Consider introducing a small shared header for the multi-twiddle tables (size macro + extern declarations) and include that instead of repeating the literal size in each implementation.

Suggested change

/* Twiddle factor tables defined in fft_multi.c via twiddle_3072_32.h */

#define FFT_MULTI_TWIDDLE_SIZE 2048

extern const int32_t multi_twiddle_real_32[];

extern const int32_t multi_twiddle_imag_32[];

/* Constants for size 3 DFT */

#define DFT3_COEFR -1073741824 /* int32(-0.5 * 2^31) */

#define DFT3_COEFI 1859775393 /* int32(sqrt(3) / 2 * 2^31) */

#define DFT3_SCALE 715827883 /* int32(1/3*2^31) */

#include <sof/math/twiddle_3072_32.h>

/* Constants for size 3 DFT */

#define DFT3_COEFR -1073741824 /* int32(-0.5 * 2^31) */

#define DFT3_COEFI 1859775393 /* int32(sqrt(3) / 2 * 2^31) */

#define DFT3_SCALE 715827883 /* int32(1/3*2^31) */

#define DFT3_COEFR -1073741824 /* int32(-0.5 * 2^31) */

#define DFT3_COEFI 1859775393 /* int32(sqrt(3) / 2 * 2^31) */

#define DFT3_SCALE 715827883 /* int32(1/3*2^31) */

I have another PR under work to make twiddle factors cold data in #10457. It will address this. But as a larger change I don't want to do it now.

Copilot · 2026-03-27T17:55:40Z

src/math/fft/fft_multi_hifi3.c

+#include <string.h>
+
+/* Twiddle factor tables defined in fft_multi.c via twiddle_3072_32.h */
+#define FFT_MULTI_TWIDDLE_SIZE 2048


FFT_MULTI_TWIDDLE_SIZE is duplicated here as a literal (2048) even though twiddle_3072_32.h defines the table size. Having multiple independent definitions increases the chance of size/table mismatches if the twiddle header is regenerated or changed. Consider moving the size macro + extern declarations into a shared header used by both generic and HiFi3 implementations, so only one place defines the twiddle size contract.

Suggested change

#define FFT_MULTI_TWIDDLE_SIZE 2048

Copilot · 2026-03-27T17:55:40Z

src/math/fft/fft_multi_hifi3.c

+void fft_multi_execute_32(struct fft_multi_plan *plan, bool ifft)
+{
+	struct icomplex32 x[FFT_MULTI_COUNT_MAX];
+	struct icomplex32 y[FFT_MULTI_COUNT_MAX];


The new HiFi3-optimized dft3_32() / fft_multi_execute_32() implementations are compiled only when FFT_HIFI3 is defined, but the cmocka FFT tests run in a generic host build where this code is compiled out. This means the new optimized path likely isn't exercised in CI, increasing regression risk (especially around fixed-point scaling/saturation). Consider adding a CI job that builds/runs these tests for an Xtensa/HiFi3 target (or an equivalent emulator/simulator setup) so both generic and HiFi3 paths are validated against the reference vectors.

Suggested change

struct icomplex32 y[FFT_MULTI_COUNT_MAX];

We need to get it done with ztest. Cmocka isn't performing xt-clang builds and xt-run runs. It's not an issue of this PR but a general gap.

singalsu force-pushed the math_fft_xtensa_hifi_optimizations branch from 6acb7ad to 0fef7d3 Compare March 20, 2026 16:00

singalsu marked this pull request as ready for review March 20, 2026 16:29

singalsu requested review from dbaluta, lbetlej, lgirdwood, mmaka1 and plbossart as code owners March 20, 2026 16:29

Copilot AI review requested due to automatic review settings March 20, 2026 16:29

singalsu requested a review from kv2019i as a code owner March 20, 2026 16:29

Copilot started reviewing on behalf of singalsu March 20, 2026 16:30 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

src/math/fft/fft_multi_generic.c Outdated Show resolved Hide resolved

singalsu force-pushed the math_fft_xtensa_hifi_optimizations branch from 0fef7d3 to 9e70d76 Compare March 20, 2026 16:51

singalsu requested a review from Copilot March 20, 2026 17:11

Copilot started reviewing on behalf of singalsu March 20, 2026 17:12 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

src/math/fft/fft_multi_generic.c Show resolved Hide resolved

src/math/fft/fft_multi_hifi3.c Show resolved Hide resolved

lyakh reviewed Mar 23, 2026

View reviewed changes

singalsu force-pushed the math_fft_xtensa_hifi_optimizations branch from 9e70d76 to 21d771f Compare March 24, 2026 15:22

singalsu requested review from Copilot and lyakh March 26, 2026 17:48

Copilot started reviewing on behalf of singalsu March 26, 2026 17:50 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

lgirdwood approved these changes Mar 27, 2026

View reviewed changes

singalsu force-pushed the math_fft_xtensa_hifi_optimizations branch from 21d771f to 608b774 Compare March 27, 2026 17:14

singalsu commented Mar 27, 2026

View reviewed changes

src/math/fft/fft_32_hifi3.c Outdated Show resolved Hide resolved

singalsu added 2 commits March 27, 2026 19:44

singalsu force-pushed the math_fft_xtensa_hifi_optimizations branch from 608b774 to 2c43e7c Compare March 27, 2026 17:45

singalsu requested a review from Copilot March 27, 2026 17:47

Copilot started reviewing on behalf of singalsu March 27, 2026 17:49 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Conversation

singalsu commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

lyakh Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

singalsu Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

singalsu Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

singalsu Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

lgirdwood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

singalsu Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

singalsu Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

singalsu Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants