You can use the following approach to tradeoff throughput for storage:
Apply
single_buffer
constraint on the input. For more information, refer to AI Engine Kernel and Graph Programming Guide UG1076.Add placement constraints to store each tile’s storage requirements locally.
Code snippet below taken from <path-to-design>/aie/tdm_fir/firbank_app.cpp
shows an example of how this can be done.
single_buffer(dut.tdmfir.m_firKernels[ii+0].in[0]);
std::string file_i0 = "data/filterbank_i_" + std::to_string(ii) + ".txt";
std::string file_o0 = "data/filterbank_o_" + std::to_string(ii) + ".txt";
sig_i[ii] = input_plio::create("PLIO_i_"+std::to_string(ii), plio_64_bits, file_i0 );
sig_o[ii] = output_plio::create("PLIO_o_"+std::to_string(ii), plio_64_bits, file_o0 );
connect<>( sig_i[ii].out[0], dut.sig_i[ii] );
connect<>( dut.sig_o[ii], sig_o[ii].in[0] );
location<kernel> (dut.tdmfir.m_firKernels[ii]) = tile(start_index+xoff,0);
location<stack> (dut.tdmfir.m_firKernels[ii]) = bank(start_index+xoff,0,3);
location<parameter>(dut.tdmfir.m_firKernels[ii].param[0]) = bank(start_index+xoff,0,3);
location<parameter>(dut.tdmfir.m_firKernels[ii].param[1]) = address(start_index+xoff,0,0x4C00);
location<buffer> (dut.tdmfir.m_firKernels[ii].in[0]) = bank(start_index+xoff,0,0);
location<buffer> (dut.tdmfir.m_firKernels[ii].out[0]) = { bank(start_index+xoff,0,1), bank(start_index+xoff,0,3) };
Compile and simulate the design to confirm it works as expected.
[shell]% cd <path-to-design>/aie/tdm_fir
[shell]% make clean all
[shell]% vitis_analyzer aiesimulator_output/default.aierun_summary
Inspecting vitis_analyzer, we observe that our resource count dropped to 32 tiles with a throughput = 4096/1.837us = 2230 MSPS.