Video Pipeline
The blender takes two native video stream inputs and outputs blended pixels. The blending operation will be done by converting two input streams into RGB and the final result will be converted to proper format as per user programming. The final output video is also forwarded to PL layer. Refer to the pipeline block diagram below.
Color formats are processed as per user programming. The input color space converter will convert both input streams to RGB for blending operation. The blended output goes to the cursor stage for cursor blending. After that, the video stream is then converted to an appropriate format as per user selection and sent out.
The input color space converter provides selection for different input formats such as YCbCr 444 or 422 and converts to RGB.
The output color space convertor provides selection for different output formats such as YCbCr 444 or 422 and xvYCC.
Alpha Blend
Alpha blending is done per color in pixel as per the following equation. Pixels are 12 bpc.
Output_Pixel = (Pixel_stream2 * Alpha /255) + (Pixel_stream1 * (255-Alpha) /255);
If global alpha is used, then the equation will be as follows:
Output_Pixel = (Pixel_stream2 * Global_Alpha /255) + (Pixel_stream1 * (255 - Global_Alpha /255);
Note: for ease of implementation of division operation, the equations are modified as below:
If (alpha==255) Output_Pixel = Pixel_stream2;
Else Output_Pixel = (Pixel_stream2 * Alpha /256) + (Pixel_stream1 * (256-Alpha) /256)
The arithmetic is done on 12-bit samples of stream1 & stream2. The output from buffer manager will be 12-bits.
Chroma Keying
It is a technique by which a particular color in one video stream can be replaced by another color from the second stream. The programmable options supported include:
- A programmable color as key with a range min to max
- A select to enable chroma keying
- Master stream select
It is assumed that both the streams are of same resolution.
Chroma keying logic assumes two input streams – called master and second stream. These inputs to chroma key block are from the AV Buffer Manager logic.
Chroma key logic checks the color range of each pixel. If the R, G, B value of each pixel
- matches the key (which is SW configured), then output = second stream
- else, output = master stream
The chroma key is always in R, G, B format. So, the logic always uses input color space conversion. And the output can be in any of the formats supported by blender.
Cursor
The Display Controller provides dedicated resources to support a hardware cursor function. The cursor is stored in a 128x128 pixel memory, pixel format used is RGBA4444, that can be updated through the DMA channel for metadata. The cursor takes up 32k bytes of RAM. The system writes the X and Y coordinate, in unit of pixels, with respect to the start of the active video, of the cursor into a register in TxDC. The coordinate can be updated up to once every frame.The size of the cursor is fixed at 128-by-128 pixels. Smaller cursor size can be achieved by setting the alpha values of the pixels on the sides to 0s.
The location (x, y), with respect to the start of the active video, is defined in control registers.
- Any pixels of an image that are outside of the cursor area pass through the cursor blending block without changes.
- Image pixels that overlap with the cursor go through alpha blending with the cursor’s alpha value.
- Portions of the cursor that fall outside the image are cropped.
Partial Blend
Partial plane blending refers to the concept of blending a full-size source plane with a partial plane from a blend plane. Both the source plane and the blend plane need to be of the same resolution.
The source plane can come either from live or stream 1 of the non-live video. The blend plane can only come from stream 2 of the non-live video streams.
There are three sets of parameters located in APB registers:
- The x-y co-ordinate, (xp, yp), of the partial plane with respect to the origin (start) of the blend plane.
- The size, (xp_size, yp_size), of the partial plane.
- The x-y co-ordinate, (xs, ys), of the source plane where the origin of the blend plane will be aligned to. The position of the partial plane within the source plane will be, (xs+xp, ys+yp).
All the coordinates and size parameters are specified in APB registers and are all in unit of pixels. Blend region outside of the source plane would be cropped for the different blending scenarios.
Up Sampling
YCbCr 4:2:0 to YCbCr 4:4:4, YCbCr 4:2:0 to YCbCr 4:2:2, and YCbCr 4:2:2 to YCbCr 4:4:4 up-sampling are supported. For up-sampling, a horizontal/vertical linear interpolation technique shall be used. When up sampling from 420, vertical interpolation will be required.
Down Sampling
Chroma sub-sampling operations from YCbCr4:4:4 to YCbCr4:2:2, from YCbCr4:2:2 to YCbCr4:2:0, and from YCbCr4:4:4 to YCbCr4:2:0 are supported. Chroma subsampling is established by sampling a subset of the chroma components of the pixels as defined by the format.
Color Space Conversion
Standard color space conversion from YCbCr 4:4:4 to RGB shall be implemented. The alpha channel shall be carried forward to the alpha blending block. Lower resolution formats (YCbCr 4:2:2, YCbCr 4:2:0) shall be up sampled before converting to RGB. Color space conversion shall be done by reading the coefficients from programmable registers.
Color space conversion is supported at both input and output side.
Audio
The Display controller supports a non-live audio channel from memory and a live audio channel from the PL.
Audio Input Stage
The audio can be sourced from memory or from the PL using a dedicated audio channel to the Display controller. The following sections describe each of the interfaces for sourcing audio.
Live Audio
Live audio can be sourced from PL.
Non-Live Audio
Non-live audio input can be sourced from memory using the DPDMA. This interface supports one non-live audio channels capable of fetching audio samples from memory. Refer to the next section "Display Controller DMA" for further details.
Audio Interface
I2S (Inter-IC Sound) is a serial bus interface standard. The bus separates clock and serial data signals. The interface consists of 3 or more wires:
- Bit clock, i2sclk, at 512 times the sampling clock frequency
- Word clock, i2slrclk, at the sampling clock frequency
- One or more multiplexed data lines, i2sdata, up to 2 audio channels per data line. Four data lines are used to carry 8 channels.
The data line contains serial data arranged as shown in below figure. The sample data size is 16 to 24 bits. The total data width including the metadata (preamble and control bits) is 22 to 30 bits. Refer to the following figure for the I2S data format
Where
- PR – Start of block Preamble
- 00: Subframe 1 and start of an audio block – 192 consecutive frames made up of two subframes
- 01: Subframe 1
- 10: Subframe 2
- 11: reserved
- P – Parity bit, even parity over all fields, except PR
- C – Channel status, per frame
- U – User data, per subframe (channel)
- V – Validity, this bit is 0 if the information in the data field is reliable, 1 if it is not
I2S Interface timing is shown below
DMA Controller
DC has an 8-channel DMA controller to read two streams (3 planes per stream) of non-live video/graphics data, audio data, and SDP and cursor data in the system memory. The DMA operations are specified by descriptors located in memory. DMA transfer is carried out through a 128-bit AXI interface.
Video data in a frame buffer must be in raster format.
The DC operates live and non-live video/graphics data. For live traffic, the PL is responsible for providing a continuous flow of data. For non-live traffic, data flows from frame buffers in memory. The DMA controller is responsible for fetching frame buffer data in memory and sending that to the channel buffers in the audio/video buffer manager. The DMA controller together with the buffer manager ensures a continuous flow of non-live traffic.
The system memory (DDR) latency is critical in the video/display data processing, which may cause an interruption in video data if not handled adequately. The DMA design is expected to solve this by issuing multiple regular requests into memory based on channel buffer requests. DMA sends back the received data from DDR into channel buffers through the AXI streaming (AXIS) interface.
Refer to the following figure for the DMA controller block diagram. The DMA controller acts as an AXI 128-bit master for R/W from/to system memory. It has a 32-bit APB slave register interface for configuration. The data/control transfer between DMA and the A/V buffer manager is through a 128-bit AXI streaming interface, where based on a request, data is written into a buffer by the DMA controller.
The streaming interface has a tag. The valid tags are 0, 1, 2, 3, 4, 5, 6 and 7. The DMA will send AXI requests to the memory controller based on these channel tags. On receiving the data, it will route them to the particular channel buffer in the A/V buffer manager. The DMA controller does not have any data storage buffer.
- AXIS interface – this interface will handshake with the A/V buffer manager
- DMA core – this handles all types of control function in DMA (can be further divided into sub-blocks)
- 128-bit AXI master interface – the interface contains arbitration logic and it connects to the top switch
- Interrupt – this generates interrupts for various events
- APB register block – this holds the configurable registers of the DMA
Outstanding AXI requests are held in the FIFOs at the AXI master interface. The DMA spends most of the time handling read requests and retrieving read data from memory. The AXI read address FIFO needs to have sufficient depth to deal with up to two video line response latency. For 4k@p60, the size of two lines of data is about 4k x 2 x 8-byte = 64 kbytes. With burst length of 16 and 128-bit data width, the maximum number of AXI read commands is 256 and a read address FIFO depth of 256 is required. Other AXI channels require 8-deep FIFOs for typical pipeline delays.
DMA Descriptor Management
The DMA 8 channel contains AXI command generation logic and DMA descriptor management. Descriptor and data are always 256-byte aligned, and AXI burst length is 16 or less. A planar video frame buffer requires 3 descriptors, a semi-planar frame buffer requires 2 descriptors, and a graphics frame buffer requires 1 descriptor. The location of the first descriptor is defined in the configuration register by SW. The location of the next descriptor is specified in the current descriptor. The table below summarizes the descriptor fields
| Descriptor Field | Width (bits) | Description |
|---|---|---|
| Identifier | 16 | A constant value identifying the block as a valid TxDC descriptor, and its version. |
| Control Attributes | 32 | Preample[7:0], update_en[8], ignore_done[9], last_descriptor[10], last_descriptor_frame[11], crc_en[12], axi_burst[13], axi_cache[17:14], axi_prot[19:18], axi_awcache[23:20], axi_awqos[27:24], RESERVED[31:28] |
| Data Size | 32 | Number of bytes of data to be fetched. |
| Data Start Address | 48 | Start address of the data. Each descriptor shall contain the pointer to where the data needs to be fetched from. |
| Next Descriptor | 48 | Next Descriptor Address. |
| TLB Prefetch En | 1 | 0 = Disabled, 1 = Enabled. Prefetch use type is configured via apb register: 0 (default) = cache stashing, 1 = read prefetch. |
| TLB Prefetch BLK Size | 14 | TLB prefetch address block size in 16-Byte resolution. |
| TLB Prefetch BLK Offset | 14 | TLB prefetch address block offset in 16-Byte resolution. The value should be less than TLB prefetch blk size. Determines prefetch launch time when data fetch address distance to next blk address boundary equals blk size minus blk offset. |
| Line | 1 | 0 for Line. |
| Line Size | 18 | Number of bytes per line. |
| Line Stride | 14 | Stride size in 16-byte resolution , greater than or equal to the Line Size, has to be integer multiple of 256-byte for AXI burst length of 16, 128-byte for burst length of 8, 64-byte for burst length of 4, and so on. |
| Target Address | 1 | 0 for DP (SDP), 1 for Cursor RAM when channel ID = 7. |
| Enable Interrupt | 1 | Enable interrupt upon completion. When not enabled, SW shall poll the “Done” flag. |
| Reserved | 1 | Aligned to 16byte boundary. |
| Presentation Time | 64 |
Presentation timestamp (from timing source). The DC shall update the presentation time from STC and write the descriptor back into the memory. The STC counter will wrap around when it reaches (1 << 42) -1. [41:0] for STC [62:42] reserved [63] status/done |
| Reserved | 32 | Reserved for future use. |
| Checksum | 32 | 32-bit additions over the descriptor, ignore carry. |
| TOTAL | 384 | 3 x 128-bit. |
Video/Graphics and Audio/Video Buffer Manager
Video/Graphics Input Stage
TX DC can operate at: 1) live video planes from PL, 2) both non-live video planes from memory, and 3) in mixed mode where there is one live plane and one non-live plane. Based on the stream selection by software, two streams can be selected for video blending operation. The data from memory is converted to frame buffer formats programmed by user. Standard compliance patterns are also supported for video and audio.
Video/Graphics Output Stage
The output video/graphic stage contains output color space conversion, Chroma subsampling, pixel size reduction. The video out data is forwarded to DP over a Type-C connector.
Audio/Video Buffer Manager
Audio/Video buffer manager module manages audio/video data from memory and from PL. Data from memory, obtained through DMA, is considered as non-live and data from PL is considered as live. Data from memory is written into channel buffers using 128-bit AXIS interface. The channel buffer ID is passed into Streaming TID port. Valid IDs are 0,1,2,3,4,5,6,7.
Channel buffers 0, 1, 2 and 3, 4, 5 correspond to Y, Cr, Cb for planar video and channel buffers 0 and 3 are also used for graphics. Buffers 6 and 7 are for audio and cursor/SDP, respectively.