Pre zaujímavosť som skúsil Visual Studio 2022 na Windows. Nejaké sse/avx (void*) som musel fixnúť na korektné casty ako napr. _mm_stream_load_si128((const __m128i*)(inbuf + i)).
Gcc tam nemal warningy?
Vysledky na mobilnom AMD Ryzen 7 PRO 2700U 2200Mhz:
naive:188
unroll:102
lut:65
unroll2: 55
sse:42
avx:37
Unroll2 je moje zjednodušenie unroll ():
int oi = 0;
for (int i = 0; i < 4 * SAMPLES; i += 4) {
int16_t raw0 = inbuf[i++];
raw0 = (raw0 & 0xEFFF) | ((raw0 & 0xE000) >> 1);
int16_t raw1 = inbuf[i++];
raw1 = (raw1 & 0xEFFF) | ((raw1 & 0xE000) >> 1);
int16_t raw2 = inbuf[i++];
raw2 = (raw2 & 0xEFFF) | ((raw2 & 0xE000) >> 1);
int16_t raw3 = inbuf[i++];
raw3 = (raw3 & 0xEFFF) | ((raw3 & 0xE000) >> 1);
outbuf1[oi] = (float)raw0;
outbuf2[oi] = (float)raw2;
outbuf1[++oi] = (float)raw1;
outbuf2[oi] = (float)raw3;
}
Kompiler tam pouzil nejake sse, je tam 4x instrukcia cvtdq2ps xmm0,xmm0