Observations related to water and thermal budgets in the Delaware River Basin. Data from reservoirs in the basin include reservoir characteristics (e.g., bathymetry), daily water levels, daily depth-resolved water temperature observations, and daily inflows, diversions, and releases. Data from streams in the basin include daily flow and temperature observations. Data were compiled from a variety of sources to cover the modeling period (1980-2021), including the National Water Inventory System, Water Quality Portal, EcoSHEDS stream water temperature database, ReaLSAT, and the New York Department of Environmental Conservation. The data are formatted as a single csv (comma separated values) or zipped csv.
For modeling purposes, we sought to create a test set of flow and temperature observations that were representative of dynamics throughout the Delaware River basin from water year 1980-present. Test holdouts are documented in the flow and temperature files. To minimize the possibility of the correlation between sites and temporal autocorrelation at single sites causing artificially high test performance, we created temporal holdouts (time periods where data from all sites were reserved for testing), and spatial holdouts (sites where all data were reserved for testing). In all, this resulted in a train/test split of 66.2%/33.8% for observed temperature reach days, and 71.4%/28.6% for observed flow reach days.
For temporal holdouts: All data in the water years 1980-84, 2011-15, and 2021 were reserved for the test set. These windows were chosen to attempt to balance the ability to test on the most recent data (critical to assess performance in an operational setting) and historical periods, while still training on a sufficient amount of modern continuous data. For spatial holdouts: We chose eight reaches of the PRMS network to reserve all data for testing, based on representing key parts of the Delaware basin (mainstem, headwaters, reservoir-adjacent reaches), representing the distribution catchment attributes (e.g. fraction of impervious surfaces) and minimizing the number of observations within a 20 km distance along the network ('fish radius').