Author(s): Durga Lal Shrestha; Dimitri Solomatine
Linked Author(s): Dimitri Solomatine
Keywords: Model uncertainty; Prediction intervals; Fuzzy clustering; Instance based learning
Abstract: This paper presents a methodology for assessing total model uncertainty using machine learning techniques. Historical model errors are assumed to be indicator of total model uncertainty. The model uncertainty is measured in the form of the model errors quantiles or prediction intervals (PIs) and such expression of uncertainty comprises all sources of uncertainty (e. g. model structure, model parameters, input data and output data etc. ) without attempting to separate the contribution given by the individual sources of uncertainties. The method consists of partition of the model input data into different clusters. The data belonging to the same cluster have similar values of model errors (at least mean and variance). This is done by building a data matrix by combining (some of the) historical model inputs and corresponding model errors; partitioning this calibration data using clustering techniques such as crisp cluster or fuzzy clustering. PIs are constructed for each cluster by constructing empirical distribution of the model errors. The estimation of PIs for unseen test (or validation) data can be done by i) “eager” supervised classification, ii) instance-based (prototype) learning, and iii) supervised regression method. In classification method classifiers are built from the cluster labels and input data matrix and this classifier classifies the unseen input data. Estimation of PIs for the given input data consists of query of lookup table between cluster labels and PIs. In instance-based learning instead of building classifier, distance function is used to identify the cluster for the given validation input data, and represent it by its prototype (typically, its center). In regression method, PIs to each input in calibration data set are computed. Two regression models that estimate upper and lower PIs independently are trained from the input data matrix. The trained regression models are applied to estimate PIs in the unseen validation data set. The third approach was applied by Shrestha and Solomatine (2006) to estimate uncertainty of river flows. This paper presents the instance based approach to estimate the total model uncertainty of simulated river flows by HBV model of the case study of Brue catchment in United Kingdom.
Year: 2007