Stochastic Gradient Descent Framework - 2024.2 English - XD160

Vitis Libraries

Document ID
XD160
Release Date
2024-11-29
Version
2024.2 English

Stochasitc gradient descent is a method to optimize an objective function that has certain properties. It is similar to gradient descent but different in data loaded. Gradient descent uses a whole set of data, while SGD randomly choose a fraction of data from the whole set.

Because random access data in DDR will have poor efficiency on DDR bandwidth, a “Drop or Jump” table sampler is implemented. If “drop” is chosen, the sampler will continuously read data from the table, and drop a part of them. This will lead to continuously burst read of data. This is better when the fraction is not too small. If “jump” is chosen, the sampler will read a continuous bucket data, jump a few buckets, and read the next bucket. This will lead to burst read of data of a certain length and interrupted by jump. This is better when the fraction is relatively small. In such way, you could have better DDR access efficiency.

Each iteration, the SGD framework will compute the gradient of the current weight (and intercept if needed). Then SGD will update the weight according to the gradient. Linear Least Sqaure Regression, LASSO Regression, and Ridge Regression training share the same gradient calculation process. There are three different ways to update: Simple update, L1 update, and L2 update. They will have different training results and various desired characteristics.