Explainable machine learning improves interpretability in the predictive modeling of biological stream conditions in the Chesapeake Bay Watershed, USA
Anthropogenic alterations have resulted in widespread degradation of stream conditions. To aid in stream restoration and management, baseline estimates of conditions and improved explanation of factors driving their degradation are needed. We used random forests to model biological conditions using a benthic macroinvertebrate index of biotic integrity for small, non-tidal streams (upstream area ≤200 km2) in the Chesapeake Bay watershed (CBW) of the mid-Atlantic coast of North America. We utilized several global and local model interpretation tools to improve average and site-specific model inferences, respectively. The model was used to predict condition for 95,867 individual catchments for eight periods (2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019). Predicted conditions were classified as Poor, FairGood, or Uncertain to align with management needs and individual reach lengths and catchment areas were summed by condition class for the CBW for each period. Global permutation and local Shapley importance values indicated percent of forest, development, and agriculture in upstream catchments had strong impacts on predictions. Development and agriculture negatively influenced stream condition for model average (partial dependence [PD] and accumulated local effect [ALE] plots) and local (individual condition expectation and Shapley value plots) levels. Friedman’s H-statistic indicated large overall interactions for these three land covers, and bivariate global plots (PD and ALE) supported interactions among agriculture and development. Total stream length and catchment area predicted in FairGood conditions decreased then increased over the 19-years (length/area: 66.6/65.4% in 2001, 66.3/65.2% in 2011, and 66.6/65.4% in 2019). Examination of individual catchment predictions between 2001 and 2019 showed those predicted to have the largest decreases in condition had large increases in development; whereas catchments predicted to exhibit the largest increases in condition showed moderate increases in forest cover. Use of global and local interpretative methods together with watershed-wide and individual catchment predictions support conservation practitioners that need to identify widespread and localized patterns, especially acknowledging that management actions typically take place at individual-reach scales.
Find more information on the ScienceDirect page.