Binary Multiplication Tricks

Written 2017-10-15

Tags:Multiplication Optimization Math Logic 

This is a rambly math post. Sean Anderson's Bit Twiddling Hacks is full of interesting uses of multiplication to manipulate bits, but some of the examples use multiplication that isn't immediately clear. At some point in my software development career, there was a moment where I understood how multiplication worked, and the day before then I didn't. This post is an attempt to explain how I use multiplication for bit and byte twiddling, as opposed to scalar arithmetic multiplication. In particular, I'm going to focus on multiplication with one constant, and tricks for determining the right constant to use.

Multiplication Algorithms

There are a number of different multiplation algorithms, each more or less useful in different circumstances. When learning multiplication as a child, I remember the Grid Method, Long Multiplication, and Peasant Method. These all work great, but a derivative of Peasant Method, Shift and Add is a good model for digital multiplication.

Shift-and-add multiplication model

Shift-and-add treats multiplication as a loop over all the bits in one multiplicand and multiplies one bit at a time by the multiplier. Results are accumulated at each step. Digitally, a 1xN-bit multiplier can be implemented with an N-bit adder and a mask. A*B is equivalent to If(A) This function implements an 8x8=>16 bit multiplier using a 1xN bit multiplier and 16-bit accumulator.


uint16_t mult8( uint8_t a, uint8_t b )
{
uint16_t accum = 0;
for( unsigned i = 0; i < 8; ++i ) //For every bit in B
    if( b & ( 1 << i ) ) //If corresponding bit
        accum += ( a << i ); //shift-accumulate to multiply
return accum;
}

While not particularly well suited to hardware-implementation(long, wide adders are complicated), when unrolled, N-bit multiplication can be implemented with N shifters(holding 'A'), a set of N masks, each combining a bit of 'B' with the corresponding shifter output, and the output of each mask fed into an N-input adder. The concept scales to arbitrary length integers with more shifts and a wider adder. In short, we can treat multiplication as a bit-vectorized shift & add accumulator.

By the Grace of Boole, Let us Compute

Multiplying by 16

To multiply by a power-of-two, we need only set the corresponding bit in the multiplier. So for B=16, shifter number 4 needs activated, and no others.

        00000000 | B & ( 0 << 0 )
       000000000 | B & ( 0 << 1 )
      0000000000 | B & ( 0 << 2 )
     00000000000 | B & ( 0 << 3 )
    AAAAAAAA0000 | B & ( 1 << 4 )
   0000000000000 | B & ( 0 << 5 )
  00000000000000 | B & ( 0 << 6 )
 000000000000000 | B & ( 0 << 7 )
               +
---------------------------------
     AAAAAAAA*10

Multiplying by 10

To multiply by B=10, we represent 10 as 2+8, and correspondingly activate B shifters 1 and 3. This is equivalent to adding 2*B + 8*B:

        00000000 | B & ( 0 << 0 )
       AAAAAAAA0 | B & ( 1 << 1 )
      0000000000 | B & ( 0 << 2 )
     AAAAAAAA000 | B & ( 1 << 3 )
    000000000000 | B & ( 0 << 4 )
   0000000000000 | B & ( 0 << 5 )
  00000000000000 | B & ( 0 << 6 )
 000000000000000 | B & ( 0 << 7 )
               +
---------------------------------
     AAAAAAAA*10

Multiplying by 255

To multiply by B=255, every bit is set in B, so we just add all 8 inputs:

        AAAAAAAA | B & ( 1 << 0 )
       AAAAAAAA0 | B & ( 1 << 1 )
      AAAAAAAA00 | B & ( 1 << 2 )
     AAAAAAAA000 | B & ( 1 << 3 )
    AAAAAAAA0000 | B & ( 1 << 4 )
   AAAAAAAA00000 | B & ( 1 << 5 )
  AAAAAAAA000000 | B & ( 1 << 6 )
 AAAAAAAA0000000 | B & ( 1 << 7 )
               +
---------------------------------
    AAAAAAAA*255

Some simple bit twiddling

Now that we understand our swiss-army chainsaw of shift-accumulation:

Repeat a 2-bit pattern 4-times

Bit-repeating a pattern is as simple as multipling by 85(0b01010101):
        000000Aa | B & ( 1 << 0 )
       000000000 | B & ( 0 << 1 )
      000000Aa00 | B & ( 1 << 2 )
     00000000000 | B & ( 0 << 3 )
    000000Aa0000 | B & ( 1 << 4 )
   0000000000000 | B & ( 0 << 5 )
  000000Aa000000 | B & ( 1 << 6 )
 000000000000000 | B & ( 0 << 7 )
               +
---------------------------------
        AaAaAaAa

Repeat an n-bit pattern m(power-of-two)-times

As seen above, repetition of a pattern by a power-of-two can be done by scaling with a bit-pattern of (n-1) zeros followed by 1 one. We can use ( 255 )/( 2^n - 1 ) to synthesize it . In C we might use the following to generate a C-integer-sized repetition pattern:
#define REPEATER(n) ( ~0 / ( ( 1 << n ) - 1 ) )
You'll actually see the equivalent of REPEATER(8), ~0UL/255, on 5 of Sean Anderson's collection of bit-twiddling hacks.

Compiler optimizations for constant multipliers

On many architectures, a shift and an add is faster or encodes smaller than a multiply. With this knowledge, compilers often replace constant-multiplies with shifts and adds when that input can be represented with just a few bits set, or just a few bits cleared.

Multiplying by three:

To multiply by three, we can simply add A to A<<1.

        000000AA | B & ( 1 << 0 )
       000000AA0 | B & ( 1 << 1 )
      0000000000 | B & ( 0 << 2 )
     00000000000 | B & ( 0 << 3 )
    000000000000 | B & ( 0 << 4 )
   0000000000000 | B & ( 0 << 5 )
  00000000000000 | B & ( 0 << 6 )
 000000000000000 | B & ( 0 << 7 )
               +
---------------------------------
             A*3 = A*2+A

It is trivial to extend this to fit multiplication by any number of set bits

Multiplying by seven:

Instead of multiplying by seven, we could add A + A<<1 + A<<2. Or...we could also take A<<3 - A. Both are equivalent, but the second is usually complies shorter:

        000000AA | B & ( 1 << 0 )
       000000AA0 | B & ( 1 << 1 )
      000000AA00 | B & ( 1 << 2 )
     00000000000 | B & ( 0 << 3 )
    000000000000 | B & ( 0 << 4 )
   0000000000000 | B & ( 0 << 5 )
  00000000000000 | B & ( 0 << 6 )
 000000000000000 | B & ( 0 << 7 )
               +
---------------------------------
             A*7 = A*8-A

     000000AA000 | B & ( 1 << 3 )
        000000AA | B & ( 1 << 0 )
               -
---------------------------------
           A*8-A