Tutorial
========

This section introduces the code in the package and how it might be used. It is best to
read the preceding sections of the documentation first to get a more detailed
introduction to the tool components. See the Jupyter notebooks in the examples folder
of the repository for further code examples.

Imports
-------

The first step in using SACO is importing it as a package into a script/notebook. The
most important functionality becomes available with::

    from saco import Dataset, Calculator, Optimiser

From this we can see that the three most important components that a user might
interact with are:

    - ``Dataset``: provides a "container" to group and manipulate multiple data tables.
    - ``Calculator``: takes a ``Dataset`` as input, calculates "scenario flows" and
      assesses their compliance against environmental flow targets.
    - ``Optimiser``: takes a ``Dataset`` as input then formulates and solves an
      optimisation problem to find abstraction impact reductions needed to meet flow
      targets.

We provide an initial guide to these objects/components below. See the "Reference"
sections of the documentation for further details, including information about additional
helper functions.

.. note::

    The backgrounds to the Calculator and Optimiser components are explained in more
    detail in the :doc:`calculator` and :doc:`optimiser` sections of the documentation.

Dataset
-------

A ``Dataset`` is primarily used to store and group together the relevant data tables
(i.e. primarily WRGIS tables). One way to get going with a ``Dataset`` (and the tool
in general) is to load a folder of data table files::

    ds = Dataset(data_folder='/path/to/my/data-folder')
    ds.load_data()

Here we have created a ``Dataset`` object (as ``ds``) and loaded data into memory. As a
``Dataset`` object is the main input to the Calculator and Optimiser components of the
tool, we could actually now go ahead and run those components. But first let us look a
bit more into what a ``Dataset`` consists of.

Tables
~~~~~~

A Dataset object has individual data tables as its most important attributes. For example,
``ds.swabs`` provides access to the SWABS_NBB table of surface water abstractions from
WRGIS (with "swabs" being the "short name" for SWABS_NBB). Similar attributes exist for
the other key WRGIS tables listed below:

    - ``asbs``: AbsSensBands_NBB (waterbody abstraction sensitivity bands)
    - ``asb_percs``: ASBPercentages (fractional deviations defining the reference flows
      - typically environmental flow indicator, EFI)
    - ``dis``: Discharges_NBB (point surface water discharges)
    - ``gwabs``: GWABs_NBB (point groundwater abstractions)
    - ``qnat``: QNaturalFlows_NBB (waterbody natural flows)
    - ``refs``: REFS_NBB (reference flows - typically EFI)
    - ``sup``: SupResGW_NBB (point "complex impacts")
    - ``swabs``: SWABS_NBB (point surface water abstractions)
    - ``wbs``: IntegratedWBs_NBB (waterbody metadata)

An additional table that is not in WRGIS is derived and included in a ``Dataset``:

    - ``mt``: Master (waterbody summary table - water balance terms, compliance, etc)

The Master table is intended to be the key waterbody-level table that brings together
the water balance components with information on surplus/deficit and compliance
classifications.

The tables have various properties, but most importantly each table class possesses a
``data`` attribute, which is a pandas.DataFrame. Therefore, to access the dataframe of
surface water abstractions, we can use ``ds.swabs.data``. We can then query or
manipulate the data table as we would any pandas.DataFrame. A description of the column
indexes and fields is given in :doc:`fields`.

Changing Numbers
~~~~~~~~~~~~~~~~

A user may wish to changes numbers in a ``Dataset`` to improve the data or to test the
implications of known, planned, hypothetical or other types of prescribed changes. Two
ways to change the numbers in a ``Dataset`` are:

    - Modify the numbers in data table files on disk and then load the ``Dataset``
      using syntax like the ``load_data`` example above.
    - Modify a dataframe directly using a table's ``data`` attribute, as described in
      the preceding paragraph.

An example of the latter approach to change a surface water abstraction impact would be::

    ds.swabs.data.loc[ds.swabs.data.index == 'swab-unique-id', 'SWQ95FLWR'] = 1.23

This would change the abstraction impact under the fully licensed (FL) artificial
influences scenario at the 95th (natural) flow percentile to 1.23 (Ml/d). But the
dataframe could be queried or manipulated in different ways, including through pandas
merge/join operations etc.

.. note::

    The Master (``mt``) table and its data should not be set or edited directly by a
    typical user (in general). See below about (1) using the Calculator to obtain an
    updated Master table and (2) using specific methods to ensure that a Dataset and
    its Master table are ready to go into the Optimiser.

.. note::

    If a user changes a surface or groundwater abstraction impact in a ``Dataset``
    under a given artificial influences scenario and at a given (natural) flow
    percentile, any long-term average abstraction columns in the relevant table are
    *not* automatically updated (currently). See :doc:`reference-dataset` for
    explanation of the available options to make this calculation if needed.

Other Functionality
~~~~~~~~~~~~~~~~~~~

The ``Dataset`` possesses additional functionality to help set table values, write
tables to output files, work with the "network" of waterbodies, and prepare for input
to the Optimiser component. This functionality (the "methods" of ``Dataset``) are
described in :doc:`reference-dataset`.

Calculator
----------

Once a ``Dataset`` has been loaded or constructed (potentially with modifications
relative to the "base" WRGIS), it can be supplied as input to the ``Calculator``. As
demonstrated below, the ``run`` method of the ``Calculator`` can then be executed to
calculate scenario flows, surpluses/deficits and compliance bands based on the input
``Dataset``::

    calculator = Calculator(ds)
    output_dataset = calculator.run()

In the example above we rely on default arguments, which will run the ``Calculator``
for all scenario/percentile combinations and for the whole domain in the input
``Dataset`` (i.e. all waterbodies present). It will also run with some default
methodological choices (see below for more on this).

By default, the ``run`` method returns a complete ``Dataset`` with an updated Master
table (i.e. one that is consistent with all the other tables in the ``Dataset``). If we
want to save this ``Dataset`` (i.e. all of its component tables) at this point, we could
do so as follows (see :doc:`reference-dataset` for guidance on output options)::

    output_dataset.write_tables(output_folder='/my/output/folder')

However, if we want to customise the execution of the ``Calculator``, we can provide
optional arguments, as described in :doc:`reference-calculate`. One such argument
defined on initialisation of the ``Calculator`` is named ``capping_method``. This
argument controls the approach to "unfeasible" impacts - prescribed abstraction impacts
that cannot be satisfied. See :doc:`calculator` for a more precise explanation of this
point. By default, the Calculator takes a WRGIS-like approach to this issue. Using the
:doc:`reference-calculate` documentation to understand the available options, we could
override the default and take WBAT-like approach as follows::

    calculator = Calculator(ds, capping_method='simple')

We might then for example ask the ``Calculator.run`` method to return only an updated
Master table as a pandas.DataFrame::

    updated_master_table = calculator.run(master_only=True)

These are just a couple of examples of customisation via optional arguments - see
:doc:`reference-calculate` for more options and details.

.. note::

    If capping has been applied to avoid propagation of unfeasible impacts, scenario
    flows output from the Calculator may be larger than initially expected from
    performing a simple "ups" water balance calculation for some waterbodies. This is
    because capping reduced net impacts upstream to retain physical plausibility. The
    Master table gives the "target" artificial influences components, which are not
    capped individually.

.. note::

    Any long-term average fields in a ``Dataset`` passed to the ``Calculator`` are *not*
    changed by the execution of the ``Calculator.run`` method currently (i.e. they are
    unchanged in the ``Dataset`` or Master table returned).

Optimiser
---------

The role of the Optimiser is to suggest how impacts could best be adjusted to meet flow
targets, given some objective(s) and constraints. The solution to this problem is
obtained via mixed integer (binary) linear programming.

Dataset Preparation
~~~~~~~~~~~~~~~~~~~

The starting point for the ``Optimiser`` is again a ``Dataset``. However, in this case
we need to ensure that certain columns are present in some tables - columns that are not
necessarily relevant to the ``Calculator``. The relevant tables and columns are
(currently):

    - Master table: requires a flow target column(s) (optional for the Calculator).
    - GWABs_NBB table: requires a flag column to indicate whether a given impact (row)
      should be available for change in the optimisation (1) or not (0).
    - SWABS_NBB table: as per GWABs_NBB table.
    - SupResGW_NBB table: requires a flag column - we return to this below.

See :doc:`fields` for a guide to the naming conventions for these columns.

The relevant columns can be added or set using the methods in the example snippet below
(assuming still that we have a ``Dataset`` instance as ``ds``)::

    ds.set_flow_targets()
    ds.set_optimise_flag()

Called in this way, both of these methods will use their default settings, which are
described in :doc:`reference-dataset`. Both methods have optional arguments that can be
used to customise flow targets and flag which abstraction impacts will be
included/excluded in optimisation.

.. note::

    If any further manipulation of the inclusion/exclusion flag is needed it could be
    achieved by working with the relevant dataframes (i.e. ``ds.swabs.data``,
    ``ds.gwabs.data`` and ``ds.sup.data``).

.. note::

    For users interested in Environmental Destination (ED) modelling, it should be noted
    that the default behaviour of the ``Dataset.set_optimise_flag`` method does not
    mimic exactly the configuration used in ED modelling for the second National
    Framework for Water Resources. It is important for a user to check that they are
    happy with the flag setup.

.. note::

    Waterbody targets can be (optionally) specified via a separate *Fix_Flags* table
    that sits within a ``Dataset`` (accessible via the short name ``wbfx``). If present
    in a ``Dataset``, the flags in this table will be used to customise flow targets
    (i.e. permitting further relaxation beyond the flow in REFS_NBB). See :doc:`fields`
    for more details.

Optional Arguments
~~~~~~~~~~~~~~~~~~

Once we are happy that a ``Dataset`` is ready for the ``Optimiser``, we could invoke the
run method of the ``Optimiser`` as below::

    optimiser = Optimiser(ds)
    output_dataset = optimiser.run()
    output_dataset.write_tables(output_folder='/my/output/folder')

However, the :doc:`reference-optimise` section provides information on options that we
may want to customise when setting up the ``Optimiser`` (i.e. before execution). One
such option concerns the geographical domain considered. By default, the code above
will run the ``Optimiser`` for the whole domain contained in the input ``Dataset``. The
following lines provide an example of how to run for only part of the domain (referring
to the most downstream waterbody of interest as an "outlet")::

    outlet_waterbody = 'outlet-waterbody-id'  # could be a list of outlet waterbodies

    selected_waterbodies = ds.identify_upstream_waterbodies(outlet_waterbody)

    optimiser = Optimiser(ds, domain=selected_waterbodies)
    output_dataset = optimiser.run()

Other options can be specified too, such as which artificial influences scenario(s) and
flow percentile(s) should be considered. Options also exist concerning the objectives
of the optimisation and whether any "relaxation" should be applied when attempting to
solve for a secondary objective - see :doc:`optimiser`.

Outputs
~~~~~~~

The contents of the output from ``Optimiser.run`` are similar to a normal ``Dataset``,
apart from (keeping complex impacts to one side for the moment):

    - The SWABS_NBB and GWABs_NBB tables now contain abstraction impacts as they are
      the optimisation has been completed (i.e. the impacts that remain after the
      “fix”).
    - Similarly, the Master table summarises the water balance and compliance etc for
      the solution formulated by the ``Optimiser``.
    - Additional tables are present: SWABS_Changes and GWABS_Changes (accessible via
      the output ``Dataset``'s attributes ``swabs_chg`` and ``gwab_chg``, respectively.
      These tables contain the impact reductions (Ml/d) required relative to a
      “reference” ``Dataset`` - see :doc:`reference-optimise`.

.. note::

    Long-term average abstraction is recalculated after optimisation under the
    assumption that the relative impact profile across the FDC remains constant.
    However, SWABS with hands-off flow (HOF) conditions are omitted from the
    recalculation at present to avoid introducing a conservative bias into the
    estimates. See :doc:`reference-dataset` and :doc:`reference-optimise` for more
    details.

Complex Impacts
~~~~~~~~~~~~~~~

Exploratory and optional functionality has been included to allow specific types of
complex impacts to be included in optimisation. As noted above, the SupResGW_NBB table
thus requires a flag column to indicate whether a given complex impact (row) should be:

    - 0: Excluded from optimisation
    - 1: Included as a reservoir compensation flow increase
    - 2: Included as a complex abstraction

The default is for all complex impacts to be excluded from optimisation. Thus, the
``Dataset.set_optimise_flag`` method will insert a flag column of zeros, if a column
has not already been added to the table.

If we wish to include specific complex impacts in optimisation, we need to manually
adjust the flag column for the appropriate rows. This can be achieved in memory using
dataframe operations, for example::

    ds.sup.data.loc[ds.sup.data.index == 'complex-impact-id', 'Optimise_Flag'] = 1

If we specify that a reservoir compensation flow increase is allowed in optimisation,
we also need to indicate that maximum increase permitted. This should be done for the
relevant scenarios and percentiles under consideration. For example, to permit a maximum
reservoir compensation flow increase of 5.0 Ml/d, we could do the following::

    ds.sup.data.loc[ds.sup.data.index == 'complex-impact-id', 'SUPFLQ95_MAX_INCREASE'] = 5.0

Once the flag field and any maximum increase fields have been added to the relevant
``Dataset`` table, the ``Optimiser`` can be run in the way described above. Outputs are
as detailed above, with the addition of a SupResGW_Changes table that gives any changes
to complex impacts. Note that positive numbers in this table indicate increases to
compensation flows or *reductions in complex abstraction impacts*. The latter sign
convention is opposite to that for SWABS_Changes and GWABS_Changes. It arises because
complex impacts can be positive or negative.

Some important points to note about the behaviour of complex impact functionality in the
``Optimiser`` are:

    - Use of this functionality will typically require local knowledge of specific
      impacts *and* how they have been represented in CAMS ledgers and WRGIS.
    - In its current implementation, the Optimiser seeks to minimise increases to
      reservoir compensation flows in its solution for several reasons. It will only
      increase a compensation flow if it helps to meet flow targets that would have
      otherwise been missed. Other approaches may be explored in future.
    - Complex abstractions (flag = 2) are treated in the same way as normal surface and
      groundwater abstractions. The Optimiser identifies required impact reductions for
      the scenario/percentile combination in question, rather than changes to complex
      licence conditions.
    - A user has the option to explore the implications of different prescribed changes
      to complex impacts by editing the relevant rows in the data table and excluding
      them from optimisation.