For config Bloom4x, you read 512-bit input values from the DDR and computed 4 words in parallel which uses only 128-bit input values. This steps enables you to run 8 words in parallel.
You can achieve this by using PF=8
on the command line. Use the following steps to compute 8 words in parallel.