This also adds some extra casts to shut up compiler warnings
reported on MSVC 6 where there is implicit truncation for the
arguments of bitexact_cos().
Lacking access to CLZ/BSR will make the code a fair bit slower but
that is better than failing to compile.