nrt package

Subpackages

Submodules

nrt.fit_methods module

Model fitting

Functions defined in this module always use a 2D array containing the dependant variables (y) and return both coefficient (beta) and residuals matrices. These functions are meant to be called in nrt.BaseNrt._fit().

The RIRLS fit is derived from Chris Holden’s yatsm package. See the copyright statement below.

nrt.fit_methods.ols(X, y)

Fit simple OLS model

Parameters:
  • X ((M, N) np.ndarray) – Matrix of independant variables

  • y ({(M,), (M, K)} np.ndarray) – Matrix of dependant variables

Returns:

The array of regression estimators residuals (numpy.ndarray): The array of residuals

Return type:

beta (numpy.ndarray)

nrt.fit_methods.weighted_ols(X, y, w)

Apply a weighted OLS fit to 1D data

Parameters:
  • X (np.ndarray) – independent variables

  • y (np.ndarray) – dependent variable

  • w (np.ndarray) – observation weights

Returns:

coefficients and residual vector

Return type:

tuple

nrt.log module

nrt.outliers module

Removing outliers

Functions defined in this module always use a 2D array containing the dependant variables (y) and return y with outliers set to np.nan. These functions are meant to be called in nrt.BaseNrt._fit()

Citations:

  • Brooks, E.B., Wynne, R.H., Thomas, V.A., Blinn, C.E. and Coulston, J.W., 2013. On-the-fly massively multitemporal change detection using statistical quality control charts and Landsat data. IEEE Transactions on Geoscience and Remote Sensing, 52(6), pp.3316-3332.

  • Zhu, Zhe, and Curtis E. Woodcock. 2014. “Continuous Change Detection and Classification of Land Cover Using All Available Landsat Data.” Remote Sensing of Environment 144 (March): 152–71. https://doi.org/10.1016/j.rse.2014.01.011.

nrt.outliers.ccdc_rirls(X, y, green, swir, scaling_factor=1, **kwargs)

Screen for missed clouds and other outliers using green and SWIR band

Parameters:
  • X ((M, N) np.ndarray) – Matrix of independant variables

  • y ((M, K) np.ndarray) – Matrix of dependant variables

  • green (np.ndarray) – 2D array containing spectral values

  • swir (np.ndarray) – 2D array containing spectral values (~1.55-1.75um)

  • scaling_factor (int) – Scaling factor to bring green and swir values to reflectance values between 0 and 1

Returns:

y with outliers set to np.nan

Return type:

np.ndarray

nrt.outliers.shewhart(X, y, L=5, **kwargs)

Remove outliers using a Shewhart control chart

As described in Brooks et al. 2014, following an initial OLS fit, outliers are identified using a shewhart control chart and removed.

Parameters:
  • X ((M, N) np.ndarray) – Matrix of independant variables

  • y ({(M,), (M, K)} np.ndarray) – Matrix of dependant variables

  • L (float) – control limit used for outlier filtering. Must be a positive float. Lower values indicate stricter filtering. Residuals larger than L*sigma will get screened out

  • **kwargs – not used

Returns:

Dependant variables with outliers set to np.nan

Return type:

y(np.ndarray)

nrt.stats module

nrt.stats.bisquare(resid, c=4.685)

Weight residuals using bisquare weight function

Parameters:
  • resid (np.ndarray) – residuals to be weighted

  • c (float) – tuning constant for Tukey’s Biweight (default: 4.685)

Returns:

weights for residuals

Return type:

weight (ndarray)

Reference:

http://statsmodels.sourceforge.net/stable/generated/statsmodels.robust.norms.TukeyBiweight.html

nrt.stats.erfcc(x)

Complementary error function.

nrt.stats.mad(resid, c=0.6745)

Returns Median-Absolute-Deviation (MAD) for residuals

Parameters:
  • resid (np.ndarray) – residuals

  • c (float) – scale factor to get to ~standard normal (default: 0.6745) (i.e. 1 / 0.75iCDF ~= 1.4826 = 1 / 0.6745)

Returns:

MAD ‘robust’ variance estimate

Return type:

float

Reference:

http://en.wikipedia.org/wiki/Median_absolute_deviation

nrt.stats.nan_percentile_axis0(arr, percentiles)

Faster implementation of np.nanpercentile

This implementation always takes the percentile along axis 0. Uses numba to speed up the calculation by more than 7x.

Function is equivalent to np.nanpercentile(arr, <percentiles>, axis=0)

Parameters:
  • arr (np.ndarray) – 2D array to calculate percentiles for

  • percentiles (np.ndarray) – 1D array of percentiles to calculate

Returns:

Array with first dimension corresponding to values passed in percentiles

Return type:

np.ndarray

nrt.stats.nanlstsq(X, y)

Return the least-squares solution to a linear matrix equation

Analog to numpy.linalg.lstsq for dependant variable containing Nan

Note

For best performances of the multithreaded implementation, it is recommended to limit the number of threads used by MKL or OpenBLAS to 1. This avoids over-subscription, and improves performances. By default the function will use all cores available; the number of cores used can be controled using the numba.set_num_threads function or by modifying the NUMBA_NUM_THREADS environment variable

Parameters:
  • X ((M, N) np.ndarray) – Matrix of independant variables

  • y ({(M,), (M, K)} np.ndarray) – Matrix of dependant variables

Examples

>>> import os
>>> # Adjust linear algebra configuration (only one should be required
>>> # depending on how numpy was installed/compiled)
>>> os.environ['OPENBLAS_NUM_THREADS'] = '1'
>>> os.environ['MKL_NUM_THREADS'] = '1'
>>> import numpy as np
>>> from sklearn.datasets import make_regression
>>> from nrt.stats import nanlstsq
>>> # Generate random data
>>> n_targets = 1000
>>> n_features = 2
>>> X, y = make_regression(n_samples=200, n_features=n_features,
...                        n_targets=n_targets)
>>> # Add random nan to y array
>>> y.ravel()[np.random.choice(y.size, 5*n_targets, replace=False)] = np.nan
>>> # Run the regression
>>> beta = nanlstsq(X, y)
>>> assert beta.shape == (n_features, n_targets)
Returns:

Least-squares solution, ignoring Nan

Return type:

np.ndarray

nrt.stats.ncdf(x)

Normal cumulative distribution function Source: Stackoverflow Unknown, https://stackoverflow.com/a/809402/12819237

nrt.utils module

nrt.utils.build_regressors(dates, trend=True, harmonic_order=3)

Build the design matrix (X) from a list or an array of datetimes

Trend assumes temporal resolution no finer than daily Harmonics assume annual cycles

Parameters:
  • dates (pandas.DatetimeIndex) – The dates to use for building regressors

  • trend (bool) – Whether to add a trend component

  • harmonic_order (int) – The order of the harmonic component

Returns:

A design matrix

Return type:

numpy.ndarray

nrt.utils.datetimeIndex_to_decimal_dates(dates)

Convert a pandas datetime index to decimal dates

nrt.utils.dt_to_decimal(dt)

Helper to build a decimal date from a datetime object

nrt.utils.numba_kwargs(func)

Decorator which enables passing of kwargs to jitted functions by selecting only those kwargs that are available in the decorated functions signature

nrt.utils_efp module

CUSUM utility functions

Functions defined in this module implement functionality necessary for CUSUM and MOSUM monitoring as implemented in the R packages strucchange and bFast.

Portions of this module are derived from Chris Holden’s pybreakpoints package. See the copyright statement below.

nrt.utils_efp.history_roc(X, y, alpha=0.05, crit=0.9478982340418134)

Reverse Ordered Rec-CUSUM check for stable periods

Checks for stable periods by calculating recursive OLS-Residuals (see _recresid()) on the reversed X and y matrices. If the cumulative sum of the residuals crosses a boundary, the index of y where this structural change occured is returned.

Parameters:
  • X ((M, ) np.ndarray) – Matrix of independant variables

  • y ((M, K) np.ndarray) – Matrix of dependant variables

  • alpha (float) – Significance level for the boundary (probability of type I error)

  • crit (float) – Critical value corresponding to the chosen alpha. Can be calculated with _cusum_rec_test_crit. Default is the value for alpha=0.05

Returns:

(int) Index of structural change in y.

0: y completely stable

>0: y stable after this index

Module contents