Author(s): Lupicinio Garcia; Miguel Angel Diaz; Francisco Nunez; Damian Sanchez; Manuel Argamasilla; Ana Genaro; Juan Antonaya; Eduardo Bustos
Linked Author(s):
Keywords: Machine Learning; Water-stressed areas; Reservoir volume; Water table prediction; Snowmelt flow
Abstract: Water management is a complex issue, especially considering climate change forecasts. Semi-arid regions are characterized by long periods of low rainfall, temperatures are rising, water resources are becoming less available and on the other hand, water demand is increasing. An efficient management requires a proper understanding of water availability and it is a key for drought mitigation. Frequently, Physics-based models are used for exploitable water simulations and predictions, but these methods have an important limitation because they cannot be applied in cases of lack of all necessary parameters data. However, using Machine Learning techniques have the advantage that allows getting a good approach with fewer variables than the Physics-based models. To tackle these issues, we developed a data science pipeline that quantifies with high accuracy the availability of different water resources with diverse hydrodynamic behaviors such as groundwater (piezometric levels) or freshwater (reservoir volume and snowmelt flow rate) at several different locations allowing us to validate the scalability of the solution. We have applied our pipeline in the Maipo river basin in Chile (snowmelt), in Marbella-Estepona, Vega de Granada and Campo de Dalías - Sierra de Gádor Groundwater Bodies in Spain (water table levels), and for reservoir volume in Concepción, Canales, Quéntar and Beninar dams in Málaga, Granada and Almería respectively. The developed pipeline addresses from the beginning the typical case of missing values, so frequent in this type of resources. Once the series are completed, we have proceeded to build parallel series, both targets to predict and series that allow to explain the trends (precipitation, temperature, pumping, etc.). A set of predictions (month by month) using these series have been made with a cross validation of the data, to choose the algorithm that best behaves automatically. The final part of the pipeline applies the winning algorithm to three-, six-, nine- and 12-month predictions, with three precipitation options: dry, normal and wet. The results are highly satisfactory, this predictive power is reflected in the low mean SMAPE (Symmetric Mean Absolute Percentage Error) values easy to interpret (Kreinovich et al., 2014), which in our k-fold performance pipeline yielded mean scores below 10% in all folds and in the worst performing pilots did not exceed means of 15% for periods less than six months. References: Kreinovich, V., Nguyen, H. T., & Ouncharoen, R. (2014). How to estimate forecasting quality: A system-motivated derivation of symmetric mean absolute percentage error (SMAPE) and other similar characteristics.
DOI: https://doi.org/10.3850/IAHR-39WC2521711920221784
Year: 2022