IRQ Balancing - 2.0 English - PG414

H.264/H.265 Video Decode Unit Solutions LogiCORE IP Product Guide (PG414)

Document ID
PG414
Release Date
2024-10-25
Version
2.0 English

Various multimedia use-cases involving video codecs such as audio/video conferencing, video- on-demand, playback, and record use-cases also involve multiple other peripherals such as ethernet, video capture pipeline related IPs including image sensor and image signal processing engines, DMA engines, and display pipeline related IP like video mixers and HDMI transmitters, which in turn use unique interrupt lines for communicating with the CPU.

In these scenarios, it becomes important to distribute the interrupt processing load across multiple CPU cores instead of using the same core for all the peripherals/IP. Distributing the IRQ across CPU cores optimizes the latency and performance of the running use-case as the IRQ context switching and ISR handling load gets distributed across multiple CPU cores.

Each peripheral/IP is assigned a unique interrupt number by the Linux kernel. Whenever a peripheralor IP needs to signal something to the CPU (like it has completed a task or detected something), it sends a hardware signal to the CPU and the kernel retrieves the associated IRQ number and then calls the associated interrupt service routine. The IRQ numbers can be retrieved using the following command. This command also lists the number of interrupts processed by each core, the interrupt type, and comma-delimited list of drivers registered to receive that interrupt.
$cat /proc/interrupts

The Versal has 2 CPU cores available. If running a plain PetaLinux image withoutany irqbalance daemon, then by default all IRQ requests are processed by CPU 0 by the Linux scheduler. To assign a different CPU core to process a particular IRQ number, the IRQ affinity for that particular interrupt needs to be changed. The IRQ affinity value defines which CPU cores are allowed to process that particular IRQ. For more information, see https://www.kernel.org/doc/Documentation/IRQ-affinity.txt.

By default, the IRQ affinity value for each peripheral is set to0xf, which means that all four CPU cores are allowed to process interrupt as shown in following example using the IRQ number 42.
$cat /proc/irq/42/smp_affinity
output: f
To restrict this IRQ to a CPU core n, you have to set a mask for only the nth bit. For example, if you want to route to only CPU core 1, then set the mask for the second bit using the value 0x2.
echo 2 > /proc/irq/42/smp_affinity

The following section shows how IRQ balancing can be performed before running a multistream video conferencing use-case that involves multiple peripherals and video IP.

Consider you have various DMA channels to capture different video streams, which in turn also use different interrupt lines as depicted by the versal-dma blocks in the following figure.

Figure 1. Default IRQ Assignment to CPU Core 0

As seen in the previous figure, all interrupt requests from different peripherals goes to CPU 0 by default.

To distribute the interrupt requests across different CPU cores as show in the following figure, follow these steps:

Figure 2. Distributed Interrupt Layout
  1. Find the IRQ numbers for each of the above peripherals.
    
    root@vek280:~/ # cat /proc/interrupts | grep al5
    49:         1250127         47679                  GICv2 127 Level         a0120000.al5d
    root@vek280:~/# cat /proc/interrupts | grep xilinx_frame
    
    52:         18662               0                  GICv2 122 Level       xilinx_framebuffer
    53:         19170               0              interrupt-controller@a0055000       3 Level -level        xilinx_framebuffer 
    54:         18825               0              interrupt-controller@a0055000       0 Level -level        xilinx_framebuffer 
    55:         18463               0              interrupt-controller@a0055000       1 Level -level        xilinx_framebuffer
    57:           0                 0              GICv2 121 Level           xilinx_framebuffer
    
    root@vek280:~/ # cat /proc/interrupts | grep xilinx-hdmi
    56:         544834              0              GICv2 123 Level           xilinx-hdmi-rx
    58:         86730               0              GICv2 125 Level           xilinx-hdmitxss
    
    root@vek280:~/ # cat /proc/interrupts | grep mixer
    59:         86752               0              GICv2 128 Level           xlnx-mixer
    
    root@ vek280:~/ # cat /proct/interrupts | grep versal-dma
    
    12:      42151036               0              GICv2 156 Level           versal-dma
    13:      31494805           10644207           GICv2 157 Level           versal-dma
    14:      31483922               0              GICv2 158 Level           versal-dma
    15:      31518024               0              GICv2 159 Level           versal-dma
    NOTE: Here there are multiple versal -dma interrupt lines so to check which ones are getting, you first need to run the usecase and then check which interrupt lines are getting utilized.

    The numbers on the left are the IRQ numbers for the respective peripherals.

  2. Assign CPU 0 to VDU IRQ with number 49.
    echo 1 > /proc/irq/49/smp_affinity #VDU 
  3. Assign CPU 0 to HDMI RX and the framebuffer write IP
    echo 1 > /proc/irq/52/smp_affinity #Frame buffer
    echo 1 > /proc/irq/56/smp_affinity #Primary HDMI Rx
    
  4. Assign CPU 1 to HDMI TX and Video mixer IP
    echo 2 > /proc/irq/58/smp_affinity #Tx
    echo 2 > /proc/irq/59/smp_affinity #Mixer
    
    By default, the interrupts for video1 xilinx_framebuffer DMA engine and various other peripherals are already being processed by CPU 0 so there is no need to modify the smp_affinity for the same. Using the previous commands, the IRQ is distributed as per the scheme mentioned in the previous figure, which can also be seen by running the following command when the use-case is running and observing whether interrupts for the peripherals are going to respective CPU cores as intended or not. Likewise, similar scheme of distributing interrupts can be followed for other use-cases too depending upon the peripherals being used, system load, and intended performance.
    $ cat /proc/interrupts
  5. Assign a unique CPU to each versal-dma channel if possible.
    echo 1 > /proc/irq/12/smp_affinity #versal-dma1 
    echo 1 > /proc/irq/13/smp_affinity # versal-dma2 
    echo 2 > /proc/irq/14/smp_affinity # versal-dma3 
    echo 2 > /proc/irq/15/smp_affinity # versal-dma4
    
    By default the interrupts for other peripherals is processed by cpu 0 so there is no need to modify the smp_affinity for the same. Using the preceding commands, the IRQ gets distributed as per the scheme mentioned in which can also be seen by running the following command when the use-case is running:
    
    cat /proc/interrupts 
    12:      42151036          0            0           0         GICv2 156 Level zynqmp-dma
    13:      31494805      10644207         0           0         GICv2 157 Level zynqmp-dma
    14:      31483922          0         10643127       0         GICv2 158 Level zynqmp-dma
    15:      31518024          0            0        10595920     GICv2 159 Level versal-dma
    49:      1250127         47679          0           0         GICv2 127 Level a0120000.al5d, a0100000.al5e
    52:      18662             0           822          0         GICv2 122 Level xilinx_framebuffer