A model is defined inside a handle, in which all the components of the model are configured. In particular, any model is defined via the residual function, \(r(x) = \theta(t, x) - y\), where the pair \((t, y)\) are the data points used to evaluate the model’s residual vector.
Residual functions
To train the model, the optimizer needs to make calls to the residual function which is
defined using da_nlls_define_residuals.
Some solvers require further information such as the
first order derivatives (residual Jacobian matrix) or even second order ones.
These are also defined with this function.
Refer to nonlinear least-squares callbacks for further details on the
residual function signatures.
Derivatives
A key requirement of this iterative optimizer is to have access to first order derivatives (residual Jacobian matrix) in order to calculate an improved solution. There is a strong relationship between the quality of the derivatives and the performance of the solver. If the user does not provide a call-back derivative function, either because it is not available or by choice, then the solver will approximate the derivatives matrix using the single-sided finite-differences method.
Finite-differences is a well established and numerically effective method to estimate missing derivatives. The method is expensive, requiring a number of residual function calls proportional to the number of variables (coefficients) in the model.
The implementation provides a single optional parameter ('finite differences step') that defines the perturbation step used
to estimate a derivative. The value of this step plays a crucial role in the quality of the approximation. The default
is a judicious value that works for most applications.
It is strongly recommended to relax the convergence tolerances (see options) when approximating derivatives. If it is observed that the solver “stagnates” or fails during the optimization process, tweaking the step value is encouraged.
Verifying derivatives
One of the most common problems while trying to train a model is having incorrect derivatives.
Writing the derivative call-back function is error-prone and to address this, a derivative
checker can be activated (set option 'Check derivatives' to 'yes') for checking the
derivatives provided by the call-back. The checker produces a table similar to
Begin Derivative Checker
Jacobian storage scheme (Fortran_Jacobian) = C (row-major)
Jac[ 0, 0] = -1.681939915944E-01 ~ -1.681939717973E-01 [ 1.177E-07], ( 0.165E-06)
Jac[ 2, 5] = 1.000000000000E+01 ~ -1.318047255339E+02 [ 1.076E+00], ( 0.100E-06) XT
...
Jac[ 3, 40] = -1.528131154992E+02 ~ -1.597095836470E+02 [ 4.318E-02], ( 0.100E-06) XT Skip
Derivative checker detected 106 likely error(s)
Note: derivative checker detected that 66 entries may correspond to the transpose.
Verify the Jacobian storage ordering is correct.
End Derivative Checker
The reported table has a few sections. The first column after the equal sign (=), is the derivative
returned by the user-supplied call-back. The column after the ~ sign is the approximated finite-difference
derivative. The value inside the brackets is the relative threshold
\(\frac{|\mathrm{approx} - \mathrm{exact}|}{\max(|\mathrm{approx}|,\; \mathrm{fd_ttol})}\),
(fd_ttol is defined by the option Derivative test tol). The value inside the parenthesis is the relative tolerance
to compare the relative threshold against.
The last column provides some flags: X to indicate that the threshold is larger than the tolerance and is deemed likely
to be wrong. T indicates that the value stored in \(J(i,j)\) corresponds the to the value belonging to the transposed Jacobian matrix,
providing a hint that possibly the storage sequence is incorrect. This implies that you should check in case the matrix is being stored in row-major format and that
the solver option 'Storage scheme' is set to column-major or vice-versa. Finally, Skip indicates that either the
associated variable is fixed (constrained to a fixed value) or the bounds on it are too tight to perform a finite-difference
approximation and thus the check for this entry cannot be performed and is skipped.
The derivative checker uses finite-differences to compare with the user-provided derivatives and as such the
quality of the approximation depends on the finite-difference step used (see option 'Finite difference step').
The option 'Derivative test tol' is involved in defining the relative tolerance to decide if the user-supplied
derivative is correct. A smaller value implies a more stringent test.
Under certain circumstances the checker may signal false-positives. Tweaking the options 'Finite difference step'
and 'Derivative test tol' can help prevent this.
It is highly recommended that during the writing or development of the derivative call-back, you set the option
'Check derivatives' to 'yes'.
After validating the residual Jacobian matrix, and to avoid performance impact, the option can then be reset to 'no'.
Residual weights
Under certain circumstances it is known that some residuals are more reliable than others. In such cases it is
desirable to give more importance to these. This is done by defining the weighting matrix, \(W\), using
da_nlls_define_weights. Note that \(W\) is a diagonal matrix with
positive elements. These elements
should correspond to the inverse of the variance of each residual.
Constraining the model
Some models aim to explain real-life phenomena where some coefficients may not make physical sense if
they take certain invalid
values, e.g. coefficient \(x_j\) representing a distance may not take negative values. For these cases, parameter
optimization needs to be constrained to valid values. In the previous distance example, the coefficient would be
bound constrained to the non-negative real half-space: \(0 \le x_j\).
These constraints are added to the model using da_nlls_define_bounds.
Adding regularization
Nonlinear models can have multiple local-minima that are undesirable, provide a biased solution or
even show signs of overfitting.
A practical way to tackle these scenarios is to introduce regularization.
Typically quadratic or cubic regularization (i.e., \(p=2, 3\)) yield best results. Note that \(\sigma\) and
\(p\) are hyperparameters and are not optimized by this model, so they have to be provided by the caller.
\(\sigma\) provides a transition between an unregularized local solution (\(\sigma=0\)) and the
zero-coefficient vector (\(\sigma \gg 0\)). Striking the correct balance may require trial and error
or a good understanding of the underlying model. Regularization is added by using the
optional parameters Regularization term (\(\sigma\)) and Regularization power (\(p\)),
see Nonlinear least-squares options.
Training the model
Once the model has been set up, the iterative training process is performed by calling the optimizer da_nlls_fit.