Navigate to the
$LAB_WORK_DIR/cpu_src
directory for the original source code.With a file editor, open the
MurmurHash2.c
file.Review the following
MurmurHash2
hash function code.unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed ) { const unsigned int m = 0x5bd1e995; // Initialize the hash to a 'random' value unsigned int h = seed ^ len; // Mix 4 bytes at a time into the hash const unsigned char * data = (const unsigned char *)key; switch(len) { case 3: h ^= data[2] << 16; case 2: h ^= data[1] << 8; case 1: h ^= data[0]; h *= m; }; // Do a few final mixes of the hash to ensure the last few // bytes are well-incorporated. h ^= h >> 13; h *= m; h ^= h >> 15; return h; }
The computational complexity is the number of basic computing operations required to execute the function.
The compute of the hash for a single word ID consists of four XORs, three arithmetic shifts, and two multiplication operations.
A shift of 1-bit in an arithmetic shift operation takes one clock cycle on the CPU.
The three arithmetic operations shift a total of 44-bits (when
len=3
in the above code) to compute the hash that requires 44 clock cycles just to shift the bits on the host CPU.On the FPGA, you can create custom architectures, and therefore, create an accelerator that will shift the data by an arbitrary number of bits in a single clock cycle.
The FPGA also has dedicated digital signal processing (DSP) units that perform multiplications faster than the CPU. Even though the CPU runs at a frequency eight times higher than the FPGA, the arithmetic shift and multiplication operations can perform faster on the FPGA because of the customizable hardware architecture.
Therefore, this function is a good candidate for FPGA acceleration.