Defining a nonlinear model - 5.2 English - 68552

AOCL API Guide (68552)

Document ID
68552
Release Date
2025-12-29
Version
5.2 English

A model is defined inside a handle, in which all the components of the model are configured. In particular, any model is defined via the residual function, \(r(x) = \theta(t, x) - y\), where the pair \((t, y)\) are the data points used to evaluate the model’s residual vector.

Residual functions

To train the model, the optimizer needs to make calls to the residual function which is defined using da_nlls_define_residuals. Some solvers require further information such as the first order derivatives (residual Jacobian matrix) or even second order ones. These are also defined with this function. Refer to nonlinear least-squares callbacks for further details on the residual function signatures.

Derivatives

A key requirement of this iterative optimizer is to have access to first order derivatives (residual Jacobian matrix) in order to calculate an improved solution. There is a strong relationship between the quality of the derivatives and the performance of the solver. If the user does not provide a call-back derivative function, either because it is not available or by choice, then the solver will approximate the derivatives matrix using the single-sided finite-differences method.

Finite-differences is a well established and numerically effective method to estimate missing derivatives. The method is expensive, requiring a number of residual function calls proportional to the number of variables (coefficients) in the model.

The implementation provides a single optional parameter ('finite differences step') that defines the perturbation step used to estimate a derivative. The value of this step plays a crucial role in the quality of the approximation. The default is a judicious value that works for most applications.

It is strongly recommended to relax the convergence tolerances (see options) when approximating derivatives. If it is observed that the solver “stagnates” or fails during the optimization process, tweaking the step value is encouraged.

Verifying derivatives

One of the most common problems while trying to train a model is having incorrect derivatives. Writing the derivative call-back function is error-prone and to address this, a derivative checker can be activated (set option 'Check derivatives' to 'yes') for checking the derivatives provided by the call-back. The checker produces a table similar to

Begin Derivative Checker

   Jacobian storage scheme (Fortran_Jacobian) = C (row-major)

   Jac[     0,     0] =  -1.681939915944E-01 ~  -1.681939717973E-01  [ 1.177E-07], ( 0.165E-06)
   Jac[     2,     5] =   1.000000000000E+01 ~  -1.318047255339E+02  [ 1.076E+00], ( 0.100E-06)  XT
   ...
   Jac[     3,    40] =  -1.528131154992E+02 ~  -1.597095836470E+02  [ 4.318E-02], ( 0.100E-06)  XT   Skip

   Derivative checker detected    106 likely error(s)

   Note: derivative checker detected that     66 entries may correspond to the transpose.
   Verify the Jacobian storage ordering is correct.

End Derivative Checker

The reported table has a few sections. The first column after the equal sign (=), is the derivative returned by the user-supplied call-back. The column after the ~ sign is the approximated finite-difference derivative. The value inside the brackets is the relative threshold \(\frac{|\mathrm{approx} - \mathrm{exact}|}{\max(|\mathrm{approx}|,\; \mathrm{fd_ttol})}\), (fd_ttol is defined by the option Derivative test tol). The value inside the parenthesis is the relative tolerance to compare the relative threshold against. The last column provides some flags: X to indicate that the threshold is larger than the tolerance and is deemed likely to be wrong. T indicates that the value stored in \(J(i,j)\) corresponds the to the value belonging to the transposed Jacobian matrix, providing a hint that possibly the storage sequence is incorrect. This implies that you should check in case the matrix is being stored in row-major format and that the solver option 'Storage scheme' is set to column-major or vice-versa. Finally, Skip indicates that either the associated variable is fixed (constrained to a fixed value) or the bounds on it are too tight to perform a finite-difference approximation and thus the check for this entry cannot be performed and is skipped.

The derivative checker uses finite-differences to compare with the user-provided derivatives and as such the quality of the approximation depends on the finite-difference step used (see option 'Finite difference step').

The option 'Derivative test tol' is involved in defining the relative tolerance to decide if the user-supplied derivative is correct. A smaller value implies a more stringent test.

Under certain circumstances the checker may signal false-positives. Tweaking the options 'Finite difference step' and 'Derivative test tol' can help prevent this.

It is highly recommended that during the writing or development of the derivative call-back, you set the option 'Check derivatives' to 'yes'. After validating the residual Jacobian matrix, and to avoid performance impact, the option can then be reset to 'no'.

Residual weights

Under certain circumstances it is known that some residuals are more reliable than others. In such cases it is desirable to give more importance to these. This is done by defining the weighting matrix, \(W\), using da_nlls_define_weights. Note that \(W\) is a diagonal matrix with positive elements. These elements should correspond to the inverse of the variance of each residual.

Constraining the model

Some models aim to explain real-life phenomena where some coefficients may not make physical sense if they take certain invalid values, e.g. coefficient \(x_j\) representing a distance may not take negative values. For these cases, parameter optimization needs to be constrained to valid values. In the previous distance example, the coefficient would be bound constrained to the non-negative real half-space: \(0 \le x_j\). These constraints are added to the model using da_nlls_define_bounds.

Adding regularization

Nonlinear models can have multiple local-minima that are undesirable, provide a biased solution or even show signs of overfitting. A practical way to tackle these scenarios is to introduce regularization. Typically quadratic or cubic regularization (i.e., \(p=2, 3\)) yield best results. Note that \(\sigma\) and \(p\) are hyperparameters and are not optimized by this model, so they have to be provided by the caller. \(\sigma\) provides a transition between an unregularized local solution (\(\sigma=0\)) and the zero-coefficient vector (\(\sigma \gg 0\)). Striking the correct balance may require trial and error or a good understanding of the underlying model. Regularization is added by using the optional parameters Regularization term (\(\sigma\)) and Regularization power (\(p\)), see Nonlinear least-squares options.

Training the model

Once the model has been set up, the iterative training process is performed by calling the optimizer da_nlls_fit.