Data Management Plan for Beyond temperature-only coldwater climate refugia: integration of process-guided deep learning models for flow and temperature into assessments for coldwater streams
National and Regional Climate Adaptation Science Centers
Northeast CASC
Provenance
[:]-[:]
Additional Information
Data Management Plan Extension
customSoftware
description
A web-based data and model platform that will include 1) a database for storing and collecting (via user uploads) raw images and environmental data (including stream temperature), 2) an application programming interface (API) for servicing the images and environmental data from the database, 3) a user-friendly web application, and 4) a model environment for extracting data from the database to use for model training and execution. The web applications will include a raw data viewer and uploader, a tool for enabling human flow ranking of image pairs for data training (funded by other sources), and a series of interactive data visualizations for explaining how the model works and for exploring the model predictions.
source
Not anticipating any restrictions at this point. Will be publicly available at https://www.usgs.gov/apps/ecosheds
webToolMaintenanceAndSupport
The web tool will be integrated into the USGS EcoSHEDS project (https://usgs.gov/apps/ecosheds), which has been actively maintained since its inception in 2014. The Eastern Ecological Science Center has committed to providing ongoing and indefinite funding to support hosting and maintenance of all EcoSHEDS projects, which will include the platform developed for this project. Like most EcoSHEDS projects, the web tool will be developed using a serverless, cloud-based architecture that will run in Amazon Web Services through the USGS Cloud Hosting Solutions. Use of this architecture will minimize hosting costs and maintenance needs by eliminating the cost and effort associated with maintaining a physical, on-premises server.
The database for this project will be hosted on AWS Relational Database Service (RDS) through USGS Cloud Hosting Solutions. RDS will be configured to perform automated daily backups of the database. Any files generated for the project (e.g., model artifacts) will be stored on AWS S3, which is designed to provide 99.999999999% durability and 99.99% availability (https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html).
languages
The integrated data and model platform will run primarily on JavaScript using Node.js (https://nodejs.org). The website will also be built using JavaScript and the vue.js application framework (https://vuejs.org). Code for training and running the models will be developed using Python (https://www.python.org). Additional analyses may also be performed using R (https://cran.r-project.org).
restrictions
Not yet available, to be included as part of https://www.usgs.gov/apps/ecosheds
name
Integrated Data and Model Platform
existingInput
fees
None.
description
Streamflow, water temperature and image timeseries data from USGS streamgages nested in the West Brook, Neversink River, and a 3rd catchment TBD via collaboration with Tribal partners.
source
USGS National Water Information System NWIS https://waterdata.usgs.gov/nwis and USGS Flow Photo Explorer FPE https://www.usgs.gov/apps/ecosheds/fpe/. Data collection will be ongoing during the proposed project, but not funded by CASC.
qualityChecks
The streamflow and temperature data accessed via NWIS is already subject to all USGS data review requirements, see QA/QC documentation: https://waterdata.usgs.gov/nwis/qwdata?help. The images available on FPE are screened at user upload for any Personal Identifying Information PII for people or vehicles. See addition information on PII processing under “What’s New” on FPE landing page (https://www.usgs.gov/apps/ecosheds/fpe/) and at the end of the Uploading Photos section of the User Guide (https://www.usgs.gov/apps/ecosheds/fpe/#/user-guide). Image database is also labeled with “provisional Database” disclaimer also found on the FPE landing pages linked above.
citation
No citation available; data made publicly available on NWIS and FPE websites linked above.
format
USGS National Water Information System NWIS standard formatting for streamflow and temperature data. Images are made available for viewing via the FPE application webpage cited above and can be exported as links to the FPE database in a *.csv file.
restrictions
No restriction. Publicly available data.
backupAndStorage
The FPE image database runs in the Amazon Web Services (AWS) Relational Database Service (RDS) through the USGS Cloud Hosting Solutions (CHS). The database is backed up daily and follows a 7-day rotation cycle using the RDS backup functionality. The image files are stored on AWS Simple Storage Service (S3), which is designed to provide 99.999999999% durability and 99.99% availability, and thus has a very low chance that image files will be lost (https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html).
volumeEstimate
Approx. 10 GB including all image files, temperature and streamflow data files.
dataProcessing
No plans to manipulate streamflow or image data.
name
Streamflow, water temperature and image data from streamgages
fees
None
description
Water temperature timeseries data.
source
SHEDS Northeast Stream Temperature Database: http://db.ecosheds.org/
qualityChecks
A series of existing quality control checks are already implemented for this database to support our previous temperature modeling efforts. These checks include standard QAQC tests for continuous temperature data such as spike detection, range limits (i.e., 0 to 35 degC), temporal autocorrelation, air temperature correlation, and spatial autocorrelation.
citation
All relevant information available at http://db.ecosheds.org/
format
Database downloadable as *.csv
restrictions
No restriction. Publicly available data.
backupAndStorage
The SHEDS stream temperature database is stored on an on-premises server at UMass Amherst. The database is backed up daily using a 7-day rotation to two other servers, one of which is also located at UMass Amherst, and the other located off-site.
volumeEstimate
Approx. 1 GB of continuous stream temperature data files.
dataProcessing
No plans to manipulate stream temperature data.
name
Water temperature data
history
2023-08-29 11:28:43 MDT: phase Approved DMP
model
modelVersion
This proposed model will be developed for this project. Versioning not yet available
description
A process guided deep learning (PGDL) model that predicts streamflow and water temperature from camera image and environmental data.
source
Image database and planned location of model: https://www.usgs.gov/apps/ecosheds
modelInputs
Streamflow and water temperature data from nearby gages as available, environmental data like precipitation, air temperature and solar radiation, camera image data, and images ranked by relative flow using a novel manual image ranking technique (see above).
calibrationDetails
Calibration and validation methodology still under development as this model will be developed as part of this ongoing project.
modelOutputs
Streamflow and water temperature data
name
Process-Guided Deep Learning model that predicts streamflow and water temperature
modelVersion
We are constructing the model as a project funded by other sources. No versioning yet available.
description
A process guided deep learning (PGDL) model that predicts relative streamflow from camera image and environmental data.
source
Image database and planned location of predicted relative flow data created with machine learning model: USGS Flow Photo Explorer: https://www.usgs.gov/apps/ecosheds/fpe
modelInputs
Streamflow data at the target location, streamflow data from nearby “nested” gages, environmental data like precipitation, air temperature and solar radiation, camera image data, and images ranked by relative flow using a novel manual image ranking technique.
calibrationDetails
The model will be calibrated and validated using standard machine learning methodologies including cross-validation based on independent training, testing, and validation splits of the dataset. Hyperparameters of the model will be tuned using standard methods to prevent over-fitting during the training process.
modelOutputs
Volumetric streamflow at sub-daily time increment.
name
Deep Learning Computer Vision model that predicts streamflow from camera image data