As a hobby I'm putting together an integer Fast Fourier Transform,
something that can be used to multiply huge numbers together. Progress
is slow and painful, but it's actually kind of fun optimizing more
and more.
How fast can it get? Well, anybody who says that code they've written is
the fastest in history usually gets embarassed about ten minutes later
when somebody one-ups them. Not much has been written yet on my part,
mostly just some Pentium-optimized assembly and a simple driver routine.
However, it seems that the forward radix-2 transform absolutely flies
right now!
Here's the latest version. I'm a big fan of gcc, and
regrettably that means the assembly language is in AT&T syntax. Go ahead
and steal it, I don't care.
You also get, at no extra charge, floating point code that can perform
modular multiplication four times faster than using integer instructions,
and vectorized for use with the FFT.
UPDATES:
4/14/98:
12/14/98:
PS: Look in the pi page I have for links to real experts on
high-speed FFTs, Joerg Arndt and Mikko Tommila. Finally, these people have gone
hopelessly insane writing high-performance FFTs, and they're the best at
it.
Added decimation-in-time FFT. Cleaned up assembly routines,
got rid of useless ones, renamed the rest. Also added
high-performance transpose routines (I'm quite
proud of them, they seem to be extremely fast). Together with the FFTs
this is almost enough to make a fast integer multiply.
The old stuff has led to production code! Carey Bloodworth's monster
pi calculator needed a faster NTT for use with the Pentium, so
I wrote one. This one includes normalization code that the previous
ones did not, and hence is good for integers all the way up to 31 bits
in size. Unfortunately, this slows things down by a third; however,
as I say rather often, if the answer doesn't have to be right, arbitrary
speed is possible. The older NTTs could only be used for primes 26 bits
or less in size, and even then were limited to run lengths of 1k or so.
This code has no such restriction.