The XRNN compiler generates two different XMODELS for them. Runners created with batch-3 XMODEL are assigned to batch-3 CU only, and runners created with batch-4 XMODEL are assigned to batch-4 CU only. So, to utilize both the CUs, you need to create a runner with each XMODEL.
While passing the
input, the batch size should match the batch size supported by the corresponding
runner. The batch size of a runner can be accessed from the shape of the tensor
returned by runner->get_input_tensors()
.
This section describes the important parts of the customer satisfaction application in Python running on DPURAHR16L. The complete code can be accessed from Vitis-AI/demo/ rnn_u25_u50lv/apps/customer_satisfaction/run_dpu_e2e.py.
- Import required modules using the following
command:
import vart import xir
- Load the model on CUs.
There are two available CUs. The first CU processes batch-3 input while the second CU processes batch-4 input. To utilize both CUs, create two runners, each one with a corresponding XMODEL.
runners = [] models = ["compiled_batch_3.xmodel", "compiled_batch_4.xmodel"] for i in range(len(models)): graph = xir.Graph.deserialize(models[i]) runners.append(vart.Runner.create_runner( graph.get_root_subgraph(), "run"))
- Quantize the input data using the following
command:
in_pos = graph.get_root_subgraph().get_attr('input_float2fix') quantized_lstm_input = quanti_convert_float_to_int16( lstm_input.reshape(num_records * 25*32), in_pos) .reshape((num_records, 25*32))
- Start the execution. The input data is fed into two runners in an
alternating manner. The dimensions for input and output can be accessed from a
runner, like batch size and aligned dimensions for input or output. Allocate the
output array for
execute_async()
beforehand.lstm_output = np.zeros((num_records, 25*100), dtype=np.int16) i = 0 num_cores = 2 while count < len(quantized_input): inputTensors = runners[i].get_input_tensors() outputTensors = runners[i].get_output_tensors() batch_size, num_frames, runner_in_seq_len = tuple(inputTensors[0].dims) _, _, runner_out_seq_len = tuple(outputTensors[0].dims) input_data = quantized_input[count:count+batch_size] batch_size = input_data.shape[0] input_data = input_data.reshape(batch_size, num_sequences, runner_in_seq_len) output_data = np.empty((batch_size, num_sequences, runner_out_seq_len), dtype=np.int16) job_id = runners[i].execute_async([input_data], [output_data], True) runners[i].wait(job_id) out_np[count:count+batch_size, ...] = output_data[..., :output_seq_dim] .reshape(batch_size, num_sequences*output_seq_dim) count += batch_size i = (i + 1) % num_cores
To run both the CUs in parallel, invoke the
execute_async()
call in two different threads. Refer Vitis-AI/demo/ rnn_u25_u50lv/apps/customer_satisfaction/run_dpu_e2e_mt.py for example. - Dequantize the output using the following
command:
out_pos = graph.get_root_subgraph().get_attr('output_fix2float') lstm_output = quanti_convert_int16_to_float(lstm_output, out_pos)