.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_simulation_data.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_simulation_data.py: Synthetic disturbance data ========================== This example illustrates the simulation of a near real time monitoring scenario on synthetic data. The EWMA approach instantiated from ``nrt.monitor.ewma import EWMA`` is used for monitoring and detection of the artificially generated breakpoints and the experiment is concluded by a simple accuracy assessment. .. GENERATED FROM PYTHON SOURCE LINES 10-18 Synthetic data generation ------------------------- The simulate module of the nrt-data package contains functionalities to create synthetic data with controlled parameters such as position of structural change, phenological amplitude, noise level, etc One such example can be visualized using the ``make_ts`` function, which creates a single time-series. .. GENERATED FROM PYTHON SOURCE LINES 18-47 .. code-block:: Python import random import numpy as np from nrt.data import simulate import matplotlib.pyplot as plt import matplotlib.dates as mdates dates = np.arange('2018-01-01', '2022-06-15', dtype='datetime64[W]') fig, axes = plt.subplots(3,3, constrained_layout=True) for row, amplitude in zip(axes, [0.1, 0.2, 0.3]): for ax, noise in zip(row, [0.02, 0.05, 0.1]): break_idx = random.randint(30,100) ts = simulate.make_ts(dates=dates, break_idx=break_idx, sigma_noise=noise, amplitude=amplitude) ax.plot(dates, ts) ax.axvline(x=dates[break_idx], color='magenta') ax.set_ylim(-0.1,1.1) ax.set_title('Amplitude: %.1f\nsigma noise: %.2f' % (amplitude, noise), fontsize=11) ax.xaxis.set_major_locator(mdates.YearLocator()) ax.xaxis.set_major_formatter(mdates.DateFormatter("\n%Y")) ax.tick_params( axis='x', which='both', bottom=False, top=False, labelbottom=False) fig.supxlabel('Time') fig.supylabel('NDVI') plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_simulation_data_001.png :alt: Amplitude: 0.1 sigma noise: 0.02, Amplitude: 0.1 sigma noise: 0.05, Amplitude: 0.1 sigma noise: 0.10, Amplitude: 0.2 sigma noise: 0.02, Amplitude: 0.2 sigma noise: 0.05, Amplitude: 0.2 sigma noise: 0.10, Amplitude: 0.3 sigma noise: 0.02, Amplitude: 0.3 sigma noise: 0.05, Amplitude: 0.3 sigma noise: 0.10 :srcset: /auto_examples/images/sphx_glr_plot_simulation_data_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 48-55 The spatial counterpart of ``make_ts`` is ``make_cube`` and its main argument is an ``xarray.Dataset`` of simulation parameters that can be generated with the ``make_cube_parameters`` function. The data cube generated is a standard univariate ``xarray.DataArray`` with ``x``, ``y`` and ``time`` dimensions. Each pixel in the spatial dimensions contains a time-series of simulated values with varying levels of noise, seasonality, outliers and in some cases a structural break point .. GENERATED FROM PYTHON SOURCE LINES 55-65 .. code-block:: Python params_ds = simulate.make_cube_parameters(shape=(50,50), n_outliers_interval=(0,5), n_nan_interval=(0,7), break_idx_interval=(105,dates.size - 20)) # Convert break_idx to dates print('Early breakpoint: %s' % dates[105]) print('Late breakpoint: %s' % dates[dates.size - 20]) cube = simulate.make_cube(dates=dates, params_ds=params_ds) .. rst-class:: sphx-glr-script-out .. code-block:: none Early breakpoint: 2020-01-02 Late breakpoint: 2022-01-20 .. GENERATED FROM PYTHON SOURCE LINES 66-70 In the ndvi datacube created, 50 percents of the pixels contain a break point occuring between 2020-01-02 and 2022-01-20. The ``break_idx`` variable of the ``params_ds`` ``Dataset`` informs on the presence or absence of a break point, and its position. .. GENERATED FROM PYTHON SOURCE LINES 72-91 Simulation of an NRT monitoring scenario -------------------------------------------------- For the simulating a near real time monitoring scenario, we consider all the pixels of the datacube (no mask) and define the 2018-01-01 to 2019-12-31 period as the stable history period and all subsequent dates as monitoring. We know from the time-series simulation parameters that the stable history period is indeed free of breakpoints. In a real life near real time monitoring use case, fitting and monitoring are occuring separately; we therefore need to split the datacube created in two. After that instantiation of the ``EWMA`` class and stable history takes place. The harmonic fit parameters for each pixel is stored in the instance of the ``EWMA`` class Note that in a real life scenario, several days may pass between fitting and the next observation, or between consecutive observations. The fit parameters or ongoing monitoring variables are then usually stored to disk in a NetCDF file. See the ``to_netcdf()`` method for more details. During monitoring each new observation needs to be passed to the monitor method as a numpy array. Since we currently have these observations in an xarray DataArray structure, we need to unpack each temporal slice as an (array, date) tuple .. GENERATED FROM PYTHON SOURCE LINES 91-110 .. code-block:: Python import datetime from nrt.monitor.ewma import EWMA cube_history = cube.sel(time=slice('2018-01-01','2019-12-31')) cube_monitor = cube.sel(time=slice('2020-01-01', '2022-12-31')) # Monitoring class instantiation and fitting monitor = EWMA(trend=False, harmonic_order=1, lambda_=0.3, sensitivity=4, threshold_outlier=10) monitor.fit(dataarray=cube_history) # Monitor every date of the ``cube_monitor`` DataArray for array, date in zip(cube_monitor.values, cube_monitor.time.values.astype('M8[s]').astype(datetime.datetime)): monitor.monitor(array=array, date=date) .. GENERATED FROM PYTHON SOURCE LINES 111-128 Monitoring performances evaluation ---------------------------------- Assessing the performance of a time-series monitoring algorithm can be a complex task that depends on the specific use case and what the user wants to emphasize. A user valuing rapid detection will chose an assessment approach that puts extra weight on the temporal aspect or penalize late detections, while if timeliness is not a requirement, accuracy assessment will resemble standard spatial validation. In the present example we work with a temporal threshold for which 6 months is the default value. This approach to accuracy assessment implies that any breakpoint occuring outside of the 6 months periods after the simulated breakpoint (considered ground thruth) is considered comission error. Absence of detection during that same period would then be an omission, detections during the period are true positives, and absence of detection on stable time-series are true negatives. Note that alternative accuracy assessment approaches exist; see for instance [1]_ who chose to use ``PixelYears`` as their sampling units, or [2]_ who introduced the concept of a time weighted F1 score, hence considerating simultaneously detection speed and spatial accuracy in a single index. .. GENERATED FROM PYTHON SOURCE LINES 128-157 .. code-block:: Python def accuracy(nrtInstance, params_ds, dates, delta=180): """Compute accuracy metrics (precision, recall) of a nrt simulation on synthetic data Args: nrtInstance: Instance of a nrt monitoring class used for monitoring params_ds: Time-series generation paramaters dates: Array of numpy.datetime64 dates used for synthetic time-series generation delta (int): Time delta in day after a reference break for a detected break to be considered True Positive. """ detection_date = nrtInstance._report(layers=['detection_date'], dtype=np.uint16) dates_true = np.where(params_ds.break_idx != -1, dates[params_ds.break_idx.values], np.datetime64('NaT')) dates_true_bound = dates_true + np.timedelta64(delta) dates_pred = np.datetime64('1970-01-01') + np.timedelta64(1) * detection_date dates_pred[dates_pred == np.datetime64('1970-01-01')] = np.datetime64('NaT') # Computes arrays of TP, FP, FN (they should be mutually exclusive) TP = np.where(np.logical_and(dates_pred >= dates_true, dates_pred <= dates_true_bound), 1, 0) FP = np.where(np.logical_and(TP == 0, ~np.isnat(dates_pred)), 1, 0) FN = np.where(np.logical_and(np.isnat(dates_pred), ~np.isnat(dates_true)), 1, 0) precision = TP.sum() / (TP.sum() + FP.sum()) recall = TP.sum() / (TP.sum() + FN.sum()) return precision, recall print(accuracy(monitor, params_ds, dates)) .. rst-class:: sphx-glr-script-out .. code-block:: none (np.float64(0.8966542750929368), np.float64(0.9934102141680395)) .. GENERATED FROM PYTHON SOURCE LINES 158-170 White noise sensitivity analysis -------------------------------- To go one step further we can assess and visualize how these accuracy measures vary with the amount of noise present in the synthetic data. For that we define a new function encompassing all the steps of data generation, instantiation, fitting and monitoring The increase in recall at low noise levels is probably due to the extreme outliers filtering feature of the EWMA monitoring process, OUtliers that exceed ``threshold_outlier`` times the standard deviation of the fit residuals are considered extreme outliers (often clouds or artifacts) in real images, and do not contribute to the monitoring process. With such low noise levels, that threshold is easily reached and breaks missed. .. GENERATED FROM PYTHON SOURCE LINES 170-204 .. code-block:: Python def make_cube_fit_and_monitor(dates, noise_level): params_ds = simulate.make_cube_parameters(shape=(20,20), n_outliers_interval=(4,5), n_nan_interval=(3,4), sigma_noise_interval=(noise_level, noise_level), break_idx_interval=(105,dates.size - 20)) cube = simulate.make_cube(dates=dates, params_ds=params_ds) cube_history = cube.sel(time=slice('2018-01-01','2019-12-31')) cube_monitor = cube.sel(time=slice('2020-01-01', '2022-12-31')) # Monitoring class instantiation and fitting monitor = EWMA(trend=False, harmonic_order=1, lambda_=0.3, sensitivity=4, threshold_outlier=10) monitor.fit(dataarray=cube_history) # Monitor every date of the ``cube_monitor`` DataArray for array, date in zip(cube_monitor.values, cube_monitor.time.values.astype('M8[s]').astype(datetime.datetime)): monitor.monitor(array=array, date=date) return params_ds, monitor noises = [0.02, 0.03, 0.05, 0.07, 0.09, 0.12, 0.15, 0.2] prs = [] for noise in noises: params_ds, monitor = make_cube_fit_and_monitor(dates, noise) prs.append(accuracy(monitor, params_ds, dates)) precisions, recalls = zip(*prs) plt.plot(noises, precisions, label='Precision') plt.plot(noises, recalls, label='Recall') plt.xlabel('Noise level') plt.legend(loc='lower right') plt.grid(True) plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_simulation_data_002.png :alt: plot simulation data :srcset: /auto_examples/images/sphx_glr_plot_simulation_data_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 205-215 References ---------- .. [1] Bullock, E.L., Woodcock, C.E. and Holden, C.E., 2020. Improved change monitoring using an ensemble of time series algorithms. Remote Sensing of Environment, 238, p.111165. .. [2] Viehweger, J., 2021. Comparative Assessment of Near Real-Time Forest Disturbance Detection Algorithms. Master thesis, Philipps Universitat Marburg .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 3.194 seconds) .. _sphx_glr_download_auto_examples_plot_simulation_data.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_simulation_data.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_simulation_data.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: plot_simulation_data.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_