The Histogram of Oriented Gradients (HOG) is a feature descriptor used in computer vision for the purpose of object detection. The feature descriptors produced from this approach is widely used in the pedestrian detection.
The technique counts the occurrences of gradient orientation in localized portions of an image. HOG is computed over a dense grid of uniformly spaced cells and normalized over overlapping blocks, for improved accuracy. The concept behind HOG is that the object appearance and shape within an image can be described by the distribution of intensity gradients or edge direction.
Both RGB and gray inputs are accepted to the function. In the RGB mode, gradients are computed for each plane separately, but the one with the higher magnitude is selected. With the configurations provided, the window dimensions are 64x128, block dimensions are 16x16.
API Syntax
template<int WIN_HEIGHT, int WIN_WIDTH, int WIN_STRIDE, int BLOCK_HEIGHT, int BLOCK_WIDTH, int CELL_HEIGHT, int CELL_WIDTH, int NOB, int DESC_SIZE, int IMG_COLOR, int OUTPUT_VARIANT, int SRC_T, int DST_T, int ROWS, int COLS, int NPC = XF_NPPC1, bool USE_URAM=false, int XFCVDEPTH_IN = _XFCVDEPTH_DEFAULT, int XFCVDEPTH_DESC = _XFCVDEPTH_DEFAULT>
void HOGDescriptor(xf::cv::Mat<SRC_T, ROWS, COLS, NPC, XFCVDEPTH_IN> &_in_mat, xf::cv::Mat<DST_T, 1, DESC_SIZE, NPC, XFCVDEPTH_DESC> &_desc_mat);
Parameter Descriptions
The following table describes the template parameters.
Parameters | Description |
---|---|
WIN_HEIGHT | The number of pixel rows in the window. This must be a multiple of 8 and should not exceed the number of image rows. |
WIN_WIDTH | The number of pixel cols in the window. This must be a multiple of 8 and should not exceed the number of image columns. |
WIN_STRIDE | The pixel stride between two adjacent windows. It is fixed at 8. |
BLOCK_HEIGHT | Height of the block. It is fixed at 16. |
BLOCK_WIDTH | Width of the block. It is fixed at 16. |
CELL_HEIGHT | Number of rows in a cell. It is fixed at 8. |
CELL_WIDTH | Number of cols in a cell. It is fixed at 8. |
NOB | Number of histogram bins for a cell. It is fixed at 9 |
DESC_SIZE | The size of the output descriptor. |
IMG_COLOR | The type of the image, set as either XF_GRAY or XF_RGB |
OUTPUT_VARIE NT | Must be either XF_HOG_RB or XF_HOG_NRB |
SRC_T | Input pixel type. Must be either XF_8UC1 or XF_8UC4, for gray and color respectively. |
DST_T | Output descriptor type. Must be XF_32UC1. |
ROWS | Number of rows in the image being processed. |
COLS | Number of columns in the image being processed. |
NPC | Number of pixels to be processed per cycle; this function supports only XF_NPPC1 or 1 pixel per cycle operations. |
USE_URAM | Enable to map UltraRAM instead of BRAM for some storage structures. |
XFCVDEPTH_IN | Depth of the input image. |
XFCVDEPTH_DESC | Depth of the output image. |
The following table describes the function parameters.
Parameters | Description |
---|---|
_in_mat | Input image, of xf::cv::Mat type |
_desc_mat | Output descriptors, of xf::cv::Mat type |
Where,
- RB is repetitive blocks (descriptor data are written window wise)
- NRB is non-repetitive blocks (descriptor data are written block wise, in order to reduce the number of writes).
Note: In the RB mode, the block data is written to the memory taking the overlap windows into consideration. In the NRB mode, the block data is written directly to the output stream without consideration of the window overlap. In the host side, the overlap must be taken care.
Resource Utilization
The following table shows the resource utilization of HOGDescriptor
function for normal operation (1 pixel) mode as generated in Vivado HLS
2019.1 version tool for the part Xczu9eg-ffvb1156-1-i-es1 at 300 MHz to
process an image of 1920x1080 resolution.
Resource | Utilization (at 300 MHz) of 1 pixel operation | |||
---|---|---|---|---|
NRB | RB | |||
Gray | RGB | Gray | RGB | |
BRAM_18K | 43 | 49 | 171 | 177 |
DSP48E | 34 | 46 | 36 | 48 |
FF | 15365 | 15823 | 15205 | 15663 |
LUT | 12868 | 13267 | 13443 | 13848 |
The following table shows the resource utilization of HOGDescriptor
function for normal operation (1 pixel) mode as generated in Vivado HLS 2019.1
version tool for the part xczu7ev-ffvc1156-2-e at 300 MHz to process an
image of 1920x1080 resolution with UltraRAM enabled.
Resource | Utilization (at 300 MHz) of 1 pixel operation | |||
---|---|---|---|---|
NRB | RB | |||
Gray | RGB | Gray | RGB | |
BRAM_18K | 10 | 12 | 18 | 20 |
URAM | 15 | 15 | 15 | 17 |
DSP48E | 34 | 46 | 36 | 48 |
FF | 17285 | 17917 | 18270 | 18871 |
LUT | 12409 | 12861 | 12793 | 13961 |
Performance Estimate
The following table shows the performance estimates of HOGDescriptor() function for different configurations as generated in Vivado HLS 2019.1 version tool for the part Xczu9eg-ffvb1156-1-i-es1 to process an image of 1920x1080p resolution.
Operating Mode | Operating Frequency (MHz) | Latency Estimate | |
---|---|---|---|
Min (ms) | Max (ms) | ||
NRB-Gray | 300 | 6.98 | 8.83 |
NRB-RGBA | 300 | 6.98 | 8.83 |
RB-Gray | 300 | 176.81 | 177 |
RB-RGBA | 300 | 176.81 | 177 |
Deviations from OpenCV
Listed below are the deviations from the OpenCV:
Border care
The border care that OpenCV has taken in the gradient computation is BORDER_REFLECT_101, in which the border padding will be the neighboring pixels’ reflection. Whereas, in the Xilinx implementation, BORDER_CONSTANT (zero padding) was used for the border care.
Gaussian weighing
The Gaussian weights are multiplied on the pixels over the block, that is a block has 256 pixels, and each position of the block are multiplied with its corresponding Gaussian weights. Whereas, in the HLS implementation, gaussian weighing was not performed.
Cell-wise interpolation The magnitude values of the pixels are distributed across different cells in the blocks but on the corresponding bins. Pixels in the region 1 belong only to its corresponding cells, but the pixels in region 2 and 3 are interpolated to the adjacent 2 cells and 4 cells respectively. This operation was not performed in the HLS implementation.
Output handling
The output of the OpenCV will be in the column major form. In the HLS implementation, output will be in the row major form. Also, the feature vector will be in the fixed point type Q0.16 in the HLS implementation, while in the OpenCV it will be in floating point.
Limitations
- The configurations are limited to Dalal’s implementation
- Image height and image width must be a multiple of cell height and cell width respectively.