Machine Learning–Based 100-Year Flood Flow Prediction Model Using Basin Characteristics and Meteorological Data in the Northeast United States

Abstract

Estimating the value of extreme hydrological events, such as the 100-yr flood at ungauged locations, remains a significant hydrological challenge. Conventional flood flow estimation techniques have significant shortcomings, including random and systematic errors that limit their effectiveness in supporting decision-making in water resource planning. This paper presents the use of random forests (RFs), a machine learning (ML) technique that integrates multiple dynamic and static datasets to estimate the 100-yr flood flow and quantify error in the model predictions. Inputs for the proposed model include precipitation, temperature, slope, watershed area, land cover, and elevation datasets. The 98 gauge locations over the Northeast United States, all with a minimum of 40 years of historic streamflow, are selected to evaluate the ML-based approach that calculates the 100-yr peak flow estimates and compares them to estimates made by the U.S. Geological Survey’s streamflow statistics (StreamStats) program. A k-fold cross-validation technique is used to test the flood flow prediction model. The ML technique provides improvements in 100-yr flood estimates, demonstrating similar mean absolute relative error (MARE) to StreamStats but significantly improved normalized centered root-mean-square error (NCRMSE) and Kling–Gupta efficiency (KGE). This estimation approach can support the accurate characterization of error in predicting 100-yr flood, which is essential in the development of flood flow prediction algorithms.

Publication
Artificial Intelligence for the Earth Systems
Kostas Andreadis
Kostas Andreadis
Associate Professor