Title: SIMD Library Documentation Author: Gregory W Heckler Data: April 24, 2006 Shoutout: ----------------------------------------------------------------------- The following is a detailed explanation of the exact behavior of all the functions provided with the library. Assume this to be freeware, modify as your wish, but please include this .txt file in the source code you might distribute that includes this library, or its source code. Good luck. ----------------------------------------------------------------------- Files in Library: ----------------------------------------------------------------------- MMX.cpp //contains functions that use MMX registers SSE.cpp //contains functions that use SSE registers SIMD.h //function prototypes FFT.cpp //Radix 2 FFT with MMX butterflies FFT.h //Radix 2 FFT class declaration ----------------------------------------------------------------------- Data Formats: ----------------------------------------------------------------------- Complex Interleaved Format: Bits 0-15 16-31 Data Real Imaginary Extended Complex Interleaved Format: Bits 0-15 16-31 32-47 48-63 Data Real Imaginary -Imaginary Real ----------------------------------------------------------------------- Notation: ----------------------------------------------------------------------- A[k]: kth element of vector A A(0:15): bits 0 through 15 of value A A[k](0:15): bits 0 through 15 of kth element of vector A ----------------------------------------------------------------------- int mmx_dot(void *A, void *B, int cnt) ----------------------------------------------------------------------- Arguments: void *A: pointer to 16 bit signed integer vector void *B: pointer to 16 bit signed integer vector int cnt: # of 16 bit integers in vectors A & B to process Operation: 1) Multiply A[k] x B[k], result is a 32 bit signed integer C 2) Add C into accumulator 3) After processing cnt elements, return accumulation Return: int accum: 32 bit signed integer accumulation value ----------------------------------------------------------------------- void mmx_mul(void *A, void *B, int cnt) ----------------------------------------------------------------------- Arguments: void *A: pointer to 16 bit signed integer vector void *B: pointer to 16 bit signed integer vector int cnt: # of 16 bit integers in vectors A & B to process Operation: 1) Multiply A[k] x B[k], result is a 32 bit signed integer C 2) Truncate C to a 16 bit signed integer, preserving sign bit C(0:14) = C(0:14), C(15) = C(31) 3) A[k] = C ----------------------------------------------------------------------- void mmx_add(void *A, void *B, int cnt) ----------------------------------------------------------------------- Arguments: void *A: pointer to 16 bit signed integer vector void *B: pointer to 16 bit signed integer vector int cnt: # of 16 bit integers in vectors A & B to process Operation: 1) A[k] = A[k] + B[k] ----------------------------------------------------------------------- void mmx_sub(void *A, void *B, int cnt) ----------------------------------------------------------------------- Arguments: void *A: pointer to 16 bit signed integer vector void *B: pointer to 16 bit signed integer vector int cnt: # of 16 bit integers in vectors A & B to process Operation: 1) A[k] = A[k] - B[k] ----------------------------------------------------------------------- void mmx_qnt(void *A, int cnt) ----------------------------------------------------------------------- Arguments: void *A: pointer to 16 bit signed integer vector int cnt: # of 16 bit integers in vector A Operation: 1) if A[k] < 0 A[k] = -1; else A[k] = 1; ----------------------------------------------------------------------- __int64 mmx_cacc(void *A, void *B, int cnt); ----------------------------------------------------------------------- Arguments: void *A: pointer to complex interleaved vector void *B: pointer to extended complex interleaved vector int cnt: # of samples to process Operation: 1) Complex multiply A[k] x B[k], result is a 64 bit signed integer C A[k].r = A[k](0:15), A[k].i = A[k](16:31) B[k].r = B[k](0:15), B[k].i = B[k](16:31) C.r = C(0:31) = A[k].r * B[k].r – A[k].i * B[k].i C.i = C(32:63) = A[k].r * B[k].i + A[k].i * B[k].r 2) C.r = C(0:31) -> 32 bit signed integer real value C.i = C(32:63) -> 32 bit signed integer complex value 2) Add C into accumulator 3) After processing cnt elements, return accumulation Return: __in64: 64 bit accumulation value. Real accumulation value in bits (0:31), imaginary accumulation value in bits (32:63) ----------------------------------------------------------------------- void mmx_cmul(void *A, void *B, int cnt, int shift) ----------------------------------------------------------------------- Arguments: void *A: pointer to complex interleaved vector void *B: pointer to complex interleaved vector int cnt: # of 32 bit complex interleaved samples to process int shift: intermediate right shift value Operation: 1) Complex multiply A[k] x B[k], result is a 64 bit signed integer C A[k].r = A[k](0:15), A[k].i = A[k](16:31) B[k].r = B[k](0:15), B[k].i = B[k](16:31) C.r = C(0:31) = A[k].r * B[k].r – A[k].i * B[k].i C.i = C(32:63) = A[k].r * B[k].i + A[k].i * B[k].r 2) Arithmetic right shift C(0:31) by shift, preserve sign bit Arithmetic right shift C(32:63) by shift, preserve sign bit C(0:31) = C(0:31) >> shift; C(31:63) = C(31:63) >> shift; 3) Truncate C(0:31) to a 16 bit signed integer, preserving sign bit Truncate C(32:63) to a 16 bit signed integer, preserving sign bit C(0:14) = C(0:14), C(15) = C(31) C(16:30) = C(32:46), C(31) = C(63) 4) A[k](0:31) = C(0:31) Notes: This function performs a complex multiplication. The shift value is required to prevent overflow. If A & B were both known to have 10 bits of precision, A*B = C would have 20 bits of precision, overflowing the 15 bits of precision available in a 16 bit signed integer, thus to keep precision "shift" should be set to 5 to preserve numerical accuracy. ----------------------------------------------------------------------- void mmx_crot(void *A, void *B, int cnt, int shift) ----------------------------------------------------------------------- Arguments: void *A: pointer to complex interleaved vector void *B: pointer to complex interleaved value (NOT A VECTOR!!!) int cnt: # of 32 bit complex samples to process int shift: intermediate right shift value Operation: 1) Complex multiply A[k] x B, result is a 64 bit signed integer C A[k].r = A[k](0:15), A[k].i = A[k](16:31) B.r = B(0:15), B.i = B(16:31) C.r = C(0:31) = A[k].r * B.r – A[k].i * B.i C.i = C(32:63) = A[k].r * B.i + A[k].i * B.r 2) Right shift C(0:31) by shift, preserve sign bit Right shift C(32:63) by shift, preserve sign bit C(0:31) = C(0:31) >> shift; C(31:63) = C(31:63) >> shift; 3) Truncate C(0:31) to a 16 bit signed integer, preserving sign bit Truncate C(32:63) to a 16 bit signed integer, preserving sign bit C(0:14) = C(0:14), C(15) = C(31) C(16:30) = C(32:46), C(31) = C(63) 4) A[k](0:31) = C(0:31) Notes: This function is similar to mmx_cmul. The only difference is that B is a constant. This function effectively rotates the values found in vector A by the angle represented by the complex value B. ----------------------------------------------------------------------- void mmx_conj(void *A, int cnt) ----------------------------------------------------------------------- Arguments: void *A: pointer to complex interleaved vector int cnt: # of 32 bit complex samples to process Operation: 1) A[k](16:31) = -A[k](16:31) ----------------------------------------------------------------------- MMX Enabled FFT ----------------------------------------------------------------------- The FFT is implemented as a C++ object. The public methods, available to the user are as follows: FFT(int _N); //Initialize FFT for 2^N ~FFT(); //Destructor void doFFT(CPX *_x, bool _shuf); //Forward FFT, decimate in time void doiFFT(CPX *_x, bool _shuf); //Inverse FFT, decimate in time void doFFTdf(CPX *_x, bool _shuf); //Forward FFT, decimate in frequency void doiFFTdf(CPX *_x, bool _shuf); //Inverse FFT, decimate in frequency Example Code: /*-------------------------------------------------------------------------------------------------------------*/ FFT *pFFT; //Pointer to FFT pFFT = new FFT(1024); //Allocate FFT object for a 1024 pt FFT CPX *x = new CPX[1024]; //allocate array for(int lcv = 0; lcv < 1024; lcv++) //initialize array x[lcv] = rand(); pFFT->doFFT(x, true); //do the FFT, bit shuffle to reorder output to natural order pFFT->doFFT(x, false); //do the FFT, leaving output in bit reverse order delete pFFT; delete [] x; /*-------------------------------------------------------------------------------------------------------------*/ -----------------------------------------------------------------------