There are now four arrays for the distances:
uint16_t distances_0[N][N];
uint16_t distances_1[N][N];
uint16_t distances_2[N][N];
uint16_t distances_3[N][N];
The incoming distance data points are still read one at a time, but they are copied into all four memories:
loop_distances: for (int i = 0; i < N*N; ++i)
{
uint16_t val;
streamDistances >> val;
distances_0[i/N][i%N] = val;
distances_1[i/N][i%N] = val;
distances_2[i/N][i%N] = val;
distances_3[i/N][i%N] = val;
}
The loop_compute
main loop continuously increments by four and distributes the four values to copies of the compute
function. Each evaluates a route:
loop_compute: for( unsigned long int i_ = 0; i_ < factorialN; i_ += 4 )
{
#pragma HLS pipeline II=1
candidate0 = std::min(candidate0, compute(i_+0, distances_0));
candidate1 = std::min(candidate1, compute(i_+1, distances_1));
candidate2 = std::min(candidate2, compute(i_+2, distances_2));
candidate3 = std::min(candidate3, compute(i_+3, distances_3));
}
Final determination of the shortest distance:
// Determine shortest between the four candidates
shortestDistance = std::min({ candidate0, candidate1,
candidate2, candidate3 });