.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/plot_parallel_computing.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_plot_parallel_computing.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_plot_parallel_computing.py:


Parallel model fitting
======================
The most computationally expensive part of a typical nrt workflow is the fitting
of a harmonic model over the stable history period. Starting with version ``0.2.0``,
``nrt`` uses multithreading to further speed-up the already fast model fitting.
This example illustrates how multithreading can be enabled and adjusted to your use case.

.. GENERATED FROM PYTHON SOURCE LINES 11-30

Confirgure multithreading options of linear algebra library
-----------------------------------------------------------

Most of the low level computation/numerical optimization occuring during model
fitting with nrt relies on a linear algebra library. These libraries often implement
low level methods with built-in multi-threading. ``nrt`` implements multi-threading
thanks to ``numba`` on a different, higher level.
To prevent nested parallelism that would result in over-subscription and potentially
reduce performances, it is recommanded to disable the built in multi-threading
of the linear algebra library being used.
Depending on how ``numpy`` was installed, it will rely on one of the three linear
algebra libraries which are OpenBLAS, MKL or BLIS. At the time of writing this
tutorial, pipy wheels (obtain when installing ``numpy`` using pip) are shipped
with OpenBLAS, while a conda installation from the default channel will come with
MKL. All three libraries use an environmental variable to control threading 
(``MKL_NUM_THREADS``, ``OPENBLAS_NUM_THREADS`` and ``BLIS_NUM_THREADS``); in the
present example, we set them all to ``'1'`` directly from within python.
Although knowing which library is used on your system would allow you to remove
the unnecessary configuration lines, it is not entirely necessary.

.. GENERATED FROM PYTHON SOURCE LINES 30-36

.. code-block:: Python

    import os
    # Note that 1 is a string, not an integer
    os.environ['MKL_NUM_THREADS'] = '1'
    os.environ['OPENBLAS_NUM_THREADS'] = '1'
    os.environ['BLIS_NUM_THREADS'] = '1'


.. GENERATED FROM PYTHON SOURCE LINES 37-45

Create benchmark data
---------------------

Using the synthetic data generation functionalities of the package, we can create
an xarray DataArray for benchmark. Note that in order to keep the compilation time
of this tutorial manageable we limit the size of that object to 200 by 200 pixels.
While this is significantly smaller than e.g. a Sentinel2 MGRS tile, it is sufficient
to illustrate differences in fitting time among various fitting strategies

.. GENERATED FROM PYTHON SOURCE LINES 45-58

.. code-block:: Python

    import xarray as xr
    import numpy as np
    from nrt import data

    # Create synthetic ndvi data cube
    dates = np.arange('2018-01-01', '2020-12-31', dtype='datetime64[W]')
    params_ds = data.make_cube_parameters(shape=(200,200), unstable_proportion=0)
    cube = data.make_cube(dates=dates, params_ds=params_ds)
    # We also create a very small cube for running each fitting method once before
    # the benchmark, ensuring compilation of the jitted functions and fair comparison
    cube_sub = cube.isel(indexers={'x': slice(1,5), 'y': slice(1,5)})


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /home/docs/checkouts/readthedocs.org/user_builds/nrt/checkouts/latest/docs/gallery/plot_parallel_computing.py:51: DeprecationWarning: The function 'make_cube_parameters' has been moved to 'nrt.data.simulate'. Please update your imports to 'from nrt.data.simulate import make_cube_parameters'. This import path will be deprecated in future versions.
      params_ds = data.make_cube_parameters(shape=(200,200), unstable_proportion=0)
    /home/docs/checkouts/readthedocs.org/user_builds/nrt/checkouts/latest/docs/gallery/plot_parallel_computing.py:52: DeprecationWarning: The function 'make_cube' has been moved to 'nrt.data.simulate'. Please update your imports to 'from nrt.data.simulate import make_cube'. This import path will be deprecated in future versions.
      cube = data.make_cube(dates=dates, params_ds=params_ds)


.. GENERATED FROM PYTHON SOURCE LINES 59-67

Benchmark fitting time of all methods 
-------------------------------------

Note that we are only interested in fitting time and therefore use a single
class instance for the benchmark. The time required for any subsequent .monitor()
call is usually negligible and as a consequence not included in this benchmark.
We use here ``CuSum`` but any of the monitoring classes could be used and
would produce the same results.

.. GENERATED FROM PYTHON SOURCE LINES 67-104

.. code-block:: Python

    import time
    import itertools
    from collections import defaultdict
    from nrt.monitor.cusum import CuSum
    import matplotlib.pyplot as plt

    # Benchmark parameters
    benchmark_dict = defaultdict(dict)
    monitor = CuSum()
    methods = ['OLS', 'RIRLS', 'CCDC-stable', 'ROC']
    threads = range(1,3)

    # Make sure all numba jitted function are compiled
    monitor_ = CuSum()
    [monitor_.fit(cube_sub, method=method) for method in methods]

    # Benchmark loop
    for method, n_threads in itertools.product(methods, threads):
        t0 = time.time()
        monitor.fit(cube, n_threads=n_threads, method=method)
        t1 = time.time()
        benchmark_dict[method][n_threads] = t1 - t0

    # Visualize the results
    index = np.arange(len(methods))
    for idx, n in enumerate(threads):
        values = [benchmark_dict[method][n] for method in methods]
        plt.bar(index + idx * 0.2, values, 0.2, label='%d thread(s)' % n)

    plt.xlabel('Fitting method')
    plt.ylabel('Time (seconds)')
    plt.title('Fitting time')
    plt.xticks(index + 0.2, methods)
    plt.legend()
    plt.tight_layout()
    plt.show()


.. image-sg:: /auto_examples/images/sphx_glr_plot_parallel_computing_001.png
   :alt: Fitting time
   :srcset: /auto_examples/images/sphx_glr_plot_parallel_computing_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 105-114

From the results above we notice large differences in fitting time among fitting
methods. Unsurprisingly, OLS is the fastest, which is expected given that all
other methods use OLS complemented with some additional, sometimes iterative
refitting, etc... All methods but ``ROC`` for which parallel fitting hasn't been
implemented, benefit from using multiple threads.
Note that a multithreading benefit can only be observed as long as the number
threads is lower than the computing resources available. The machine used for
compiling this tutorial is not meant for heavy computation and obviously has limited 
resources as shown by the cpu_count below

.. GENERATED FROM PYTHON SOURCE LINES 114-118

.. code-block:: Python

    import multiprocessing
    print(multiprocessing.cpu_count())


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    2


.. GENERATED FROM PYTHON SOURCE LINES 119-130

Further considerations
----------------------

A deployment at scale may involve several levels of parallelization. The multi-threaded
example illustrated above is made possible thanks to the numba parallel accelerator.
However, it is also very common to handle the earlier steps of data loading and
data pre-processing with ``dask.distributed``, which facilitates lazy and distributed
computation. There is no direct integration between the two parallelism mechanisms
and while calling ``.fit()`` on a lazy distributed dask array is possible, the lazy
evaluation cannot be preserved and all the input data need to be evaluated and
loaded in memory

.. GENERATED FROM PYTHON SOURCE LINES 130-139

.. code-block:: Python

    from nrt import data

    # Lazy load test data using dask
    cube = data.romania_10m(chunks={'x': 20, 'y': 20})
    vi_cube = (cube.B08 - cube.B04) / (cube.B08 + cube.B04)
    print(vi_cube)
    monitor = CuSum()
    monitor.fit(vi_cube, method='OLS', n_threads=2)
    print(type(monitor.beta))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Downloading file 'sentinel2_cube_subset_romania_10m.nc' from 'https://jeodpp.jrc.ec.europa.eu/ftp/jrc-opendata/FOREST/NRT/NRT-DATA/VER1-0/sentinel2_cube_subset_romania_10m.nc' to '/home/docs/.cache/nrt-data'.
    /home/docs/checkouts/readthedocs.org/user_builds/nrt/envs/latest/lib/python3.11/site-packages/nrt/data/__init__.py:66: UserWarning: The specified chunks separate the stored chunks along dimension "y" starting at index 20. This could degrade performance. Instead, consider rechunking after loading.
      return xr.open_dataset(file_path, **kwargs)
    /home/docs/checkouts/readthedocs.org/user_builds/nrt/envs/latest/lib/python3.11/site-packages/nrt/data/__init__.py:66: UserWarning: The specified chunks separate the stored chunks along dimension "x" starting at index 20. This could degrade performance. Instead, consider rechunking after loading.
      return xr.open_dataset(file_path, **kwargs)
    <xarray.DataArray (time: 131, y: 100, x: 100)> Size: 10MB
    dask.array<truediv, shape=(131, 100, 100), dtype=float64, chunksize=(131, 20, 20), chunktype=numpy.ndarray>
    Coordinates:
      * y            (y) float64 800B 2.533e+06 2.533e+06 ... 2.532e+06 2.532e+06
      * x            (x) float64 800B 5.272e+06 5.272e+06 ... 5.273e+06 5.273e+06
        spatial_ref  int32 4B 0
      * time         (time) datetime64[ns] 1kB 2015-08-01T09:30:06.027000 ... 202...
    <class 'numpy.ndarray'>


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (1 minutes 25.097 seconds)


.. _sphx_glr_download_auto_examples_plot_parallel_computing.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_parallel_computing.ipynb <plot_parallel_computing.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_parallel_computing.py <plot_parallel_computing.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_parallel_computing.zip <plot_parallel_computing.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_