Alligator - Modeling Options
Most options are identical for the different calibration methods. These are explained here.
Sample sets
Each calibration model is based on a calibration sample set. Additional data sets may be used
- for test set validation
- as a repeatability sample set (Westerhaus 1991)
For the repeatabiltiy sample set a factor may be added to enhance or reduce the influence of the repeatability sample set. A factor of 1.0 gives identical results to Westerhaus (1991).
In MLR calibration development with manually added terms the calibrations sample set is not used to derive any information, therefore the calibration sample set is used for test set validation.
For LOCAL calibration development typically two sampels set are used:
- database to select the calibration samples from
- test set with "unknown" samples to be predicted
If the test set is NOT selected the user might run the database in "self-prediction", every ith sample is taken as "unknown" and calibration samples are taken from the same database set. The "unknown" sample is removed from the selected calibration samples prior to modeling.
LOCAL models might be calculated based on PCA or PLS compressed scores, the selection is "neighborhood H" instead of "correlation" in these cases. The advantage is that the huge speed improvement compared to LOCAL based on native spectra.
MLR, PLS, LOCAL, SVM, ANN, PCA options
In this part of the options they differ between the different calibration methods.
Methods / algorithms:
If sub-versions exist, the user may select them here.
The maximal number of terms (MLR) or factors (PLS, PCA, LOCAL) is set here. The final number of terms or factors is determined either by F value (MLR), cross validation (PCA, PLS) or grid search (LOCAL).
If cross validation is available the settings are configured here. Cross validation is mainly used for two purposes:
- determine the number of factors in a PLS or PCA model
- to determine an estimate of the precision of the model (SECV) for future application (for PLS, SVM, ANN)
The main influence of cross validation is the selection of the groups during modeling, if the sequence of the samples in the sample set has a structure (sorted by any means). In this case blockwise group selection is strongly adviced. There is no scientific reason to choose the number of factors by cross validation.
The number of CV groups may be set by the user. Two special values are available:
- -1: leave one out cross validation (LOO)
- 0: the program selects an appropriate number of cross validation groups: 2 groups for calibration sample sets with 1000 samples or more, 4 for calibration sample sets with 200-500 samples, 10 for sample sets with 50-100 samples, LOO for less than 25 samples
Spectral options
Most NIRS calibration developemnt work is done on mathematically pre-treated spectra. The main reason is to enhance the property of interest (POI) and reduce noise as well as interfering unwanted information in the spectra.
The main tools used are
- scatter correction algorithms
- derivatives
Scatter correction may be applied in the form of
- SNV and detrend
- SNV
- detrend
- MSC
- EMSC
- normalize spectra
- auto-scale spectra
Derivatives can be calculated following
- ISI notation
- Norris derivatives
- Savitzki-Golay
Signal processing may be applied after scatter correction and derivatives. It may be either EPO (external parameter orthogenalization) or NAS (net analyte signal).
The pre-treatment is applied in the sequence shown on screen. To be able to use a scatter correction after derivative calculation for MLR and PLS a second pre-treatment can be selected.
It is possible to transform the absorption values. The power transformation can adjust a non-linear relationship between absorption and constituent values.
For MLR and PLS calibration development a search across several scatter correction algorithms and derivatives may be started.
Wavelengh options
The wavelength range used in modeling can be selected and two segments can be excluded. For application of the final model the spectra has to match the range of the calibration sample set. The spectral pre-treatment - see abpve - is done on the whole wavelegnth range before trimming it by these wavelength options.
A look at the preprocessed spectra prior to modeling is strongly adviced.
Sample options
In this section mainly outlier tests, their limits and the number of outlier passes can be configured. The defaults are set under the project settings.
Constituent options
Here the constituents to be modeled can be un/selected. This selection is saved for all modeling tasks, i.e. across the MLR, PLS, SVM, LOCAL and ANN options.
The constituent values might be log transformed. This transformation is mainly used to either adjust non-normally distributed reference values or to adjust the linearity between spectra absorption and constituent values.