/*
* This a hybrid algorithm for bit counting between parallel counting and
* using multiplication. The idea is to sum up the bits in each Byte, so
* that the final accumulation can be done with a single multiplication.
* If the platform has a slow multiplication instruction, it can be replaced
* by the commented out version below.
*/