1/1

# Rainfall quality control and gap-filling procedure using Generalised Additive Models for Location, Scale and Shape (GAMLSS)

software

posted on 2022-01-26, 11:47 authored by Stefaan ConradieStefaan Conradie, Piotr Wolski, Bruce HewitsonThis item includes a set of scripts containing functions to perform quality control (QC) on daily rainfall data, meaningfully disaggregate cumulative totals over arbitrary periods and subsequently do infilling of missing values. The procedure is focussed on known shortcomings of South African rainfall data.

The novel component of the procedure is its use of Generalised Additive Models for Location, Scale and Shape (GAMLSS) both for infilling of daily values and for assessing the probability of observing values as extreme as suspect recorded values or sequences of values. Models are fit for the probability of rainfall occurrence and the mean and variance of daily totals, as parameters of a Zero-Inflated Gamma distribution. Other GAMLSS-compatible distributions could relatively easily be used instead. The infilling is quasi-stochastic: rainfall occurrence is modelled as a binomial random variable, mean estimates of rainfall amount are used directly if a day is selected as a rain day.

A forthcoming paper will describe the procedure in full detail and demonstrate its effectiveness on a test dataset. The procedure is briefly described in a paper that has been accepted for publication and will be available shortly.

The freely available components of the WRZ2019 dataset (https://doi.org/10.25375/uct.16453452.v1) can be used to test (parts of) the procedure. A script `run_DWS2019.R` is included to demonstrate how this may be done.

To run this test, the following directory structure is required:

|-- data

| |-- DWS2019.csv

| |-- mm2019.csv

| `--DWSmon.csv

|-- gamlss_cleaner

| |-- gamlss_fit.R

The novel component of the procedure is its use of Generalised Additive Models for Location, Scale and Shape (GAMLSS) both for infilling of daily values and for assessing the probability of observing values as extreme as suspect recorded values or sequences of values. Models are fit for the probability of rainfall occurrence and the mean and variance of daily totals, as parameters of a Zero-Inflated Gamma distribution. Other GAMLSS-compatible distributions could relatively easily be used instead. The infilling is quasi-stochastic: rainfall occurrence is modelled as a binomial random variable, mean estimates of rainfall amount are used directly if a day is selected as a rain day.

Details of the functions used and the inputs of each major

component of the procedure are set out in the files provided.

Comments preceded by 3 hashes (`###`) at the top of a function contains such descriptions.

A forthcoming paper will describe the procedure in full detail and demonstrate its effectiveness on a test dataset. The procedure is briefly described in a paper that has been accepted for publication and will be available shortly.

The freely available components of the WRZ2019 dataset (https://doi.org/10.25375/uct.16453452.v1) can be used to test (parts of) the procedure. A script `run_DWS2019.R` is included to demonstrate how this may be done.

To run this test, the following directory structure is required:

|-- data

| |-- DWS2019.csv

| |-- mm2019.csv

| `--DWSmon.csv

|-- gamlss_cleaner

| |-- cleaners.R

| |-- cleanfill_fn.R

| |-- estimators.R| |-- gamlss_fit.R

| |-- helpers.R

| |`-- time_shifters.R

| -- run_DWS2019.R