Fitting & Outlier Screening
Fitting is achieved by calling .fit()
on an instantiated monitoring class.
In general the default arguments for each monitoring class correspond to the fitting which was used in the corresponding publication. However since the classes are not bound to the fit that was used in the publication it is entirely possible to use any combination of fitting arguments with any monitoring class.
Fitting works by passing an xarray.DataArray
and specifying a fitting method.
Optionally a method to screen outliers in the time-series can also be passed
to the fit call.
Screen outliers
Outlier screening happens before the fitting and is designed to remove unwanted outliers in the time-series. When using optical satellite data those outliers are mostly unwanted clouds, cloud shadows and snow.
Shewhart
nrt_class.fit(dataarray, screen_outliers='Shewhart', L=5)
This outlier screening is using Shewhart control charts to remove outliers.
The optional parameter L
defines how sensitive the outlier screening is.
With this method, first an OLS fit is carried out using the arguments passed during instantiation. Then the standard deviation \(\sigma\) of residuals is computed and all observations with residuals larger than \(L\cdot\sigma\) are screened out.
CCDC-RIRLS
While Shewhart outlier screening could work for any type of time series, the default outlier screening used by CCDC is tailored for optical satellite time series to mask out clouds and cloud shadows.
nrt_class.fit(dataarray, screen_outliers='CCDC-RIRLS',
green=xr_green, swir=xr_swir, scaling_factor=10000)
This screening uses hard-coded thresholds of the short-wave infrared (SWIR) and green bands
to detect clouds and cloud shadows. For this, reflectance values of the green and
SWIR bands need to be passed as xarray.DataArrays
. Originally the bands 2 (0.52-0.60 µm) and 5 (1.55-1.75 µm)
of the Landsat 5 Thematic Mapper were used.
If other sensors like Sentinel 2 are used, which supply data with a scaling factor, the optional parameter
scaling_factor
needs to be set appropriately to bring the values to a 0-1 range.
To screen out clouds, CCDC-RIRLS uses a Robust Iteratively Reweighted Least Squares fit to reduce the influence of outliers on the fit. See the chapter about RIRLS for more details.
Do note that the RIRLS fit is quite computationally intensive.
Fitting
In general when trying to fit a temporal signature it is advisable to fit it on a stable part of the time-series which doesn’t include structural changes. For this there are two fitting methods (ROC and CCDC-stable) available that aim to achieve a fit on a stable part of the time-series. The other two fitting methods (OLS, RIRLS) always fit a model on the entire history period, so if a lot of disturbances happened during the history period, the fitting results with these two methods might deliver worse results. Especially OLS however is much less computationally expensive than ROC and CCDC-stable.
OLS
nrt_class.fit(dataarray, method='OLS')
This carries out an ordinary least squares fit. All other available fitting methods in this package are at some point based on this fit.
RIRLS
nrt_class.fit(dataarray, method='RIRLS', maxiter=50)
The Robust Iteratively Reweighted Least Squares fit isn’t the default for any nrt monitoring class, it’s main purpose is in the outlier screening method CCDC-RIRLS.
By iteratively reweighting each observation in the time-series, a fit is reached which is less influenced by outliers in the time-series.
This process can take a lot of iterations and thus can become very computationally expensive. The maximum number
of iterations can be controlled by setting maxiter
. There are also many more possible parameters to modify.
For a complete list see the api documentation for RIRLS
.
ROC
nrt_class.fit(dataarray, method='ROC', alpha=0.05)
Reverse Ordered Cumulative Sums (ROC) works by applying the same type of monitoring logic as in CuSum to the fitting.
In particular this means, that the fitting period is gradually increased backwards in time starting from the
end of the entire history period (so in reverse order). The period is increased as long as the
cumulative sum of residuals is within a certain threshold which depends on alpha
.
As soon as the threshold is crossed, it is likely that there was a structural break in the history period and thus the rest of the time series before the threshold was crossed will not be used for fitting the model.
alpha
is the significance of the detected structural break. So the lower alpha
the lower the sensitivity
for breaks in the time-series.
CCDC-stable
nrt_class.fit(dataarray, method='CCDC-stable', threshold=3)
With CCDC-stable, models are first fit using an OLS regression. Those models are then checked for stability.
Stability is given if:
slope / RMSE < threshold
first observation / RMSE < threshold
last observation / RMSE < threshold
Since the slope of the model is one of the test conditions, it is required for trend
to be True
during instantiation of the monitoring class.
If a model is not stable, the two oldest acquisitions are removed, a model is fit using this shorter time-series and again checked for stability. This process continues until the model is stable or until not enough observations are left, at which point the time-series will get marked as unstable and not be fit.
Note
This process is slightly different to the one described in Zhu & Woodcock 2014, since with the nrt package no new observations can be added during fitting.