Skip to main content

Data Management Plan for Beyond temperature-only coldwater climate refugia: integration of process-guided deep learning models for flow and temperature into assessments for coldwater streams

Attached Files

Click on title to download individual files attached to this item.

dmp.pdf 9.36 KB application/pdf
dmp-fair-beyond_temperature-only_coldwater_climat-approved_dmp-20230829-1128.pdf 10.08 KB application/pdf

Communities

  • National and Regional Climate Adaptation Science Centers
  • Northeast CASC

Provenance

[:]-[:]

Additional Information

Data Management Plan Extension

customSoftware
descriptionA web-based data and model platform that will include 1) a database for storing and collecting (via user uploads) raw images and environmental data (including stream temperature), 2) an application programming interface (API) for servicing the images and environmental data from the database, 3) a user-friendly web application, and 4) a model environment for extracting data from the database to use for model training and execution. The web applications will include a raw data viewer and uploader, a tool for enabling human flow ranking of image pairs for data training (funded by other sources), and a series of interactive data visualizations for explaining how the model works and for exploring the model predictions.
sourceNot anticipating any restrictions at this point. Will be publicly available at https://www.usgs.gov/apps/ecosheds
webToolMaintenanceAndSupportThe web tool will be integrated into the USGS EcoSHEDS project (https://usgs.gov/apps/ecosheds), which has been actively maintained since its inception in 2014. The Eastern Ecological Science Center has committed to providing ongoing and indefinite funding to support hosting and maintenance of all EcoSHEDS projects, which will include the platform developed for this project. Like most EcoSHEDS projects, the web tool will be developed using a serverless, cloud-based architecture that will run in Amazon Web Services through the USGS Cloud Hosting Solutions. Use of this architecture will minimize hosting costs and maintenance needs by eliminating the cost and effort associated with maintaining a physical, on-premises server. The database for this project will be hosted on AWS Relational Database Service (RDS) through USGS Cloud Hosting Solutions. RDS will be configured to perform automated daily backups of the database. Any files generated for the project (e.g., model artifacts) will be stored on AWS S3, which is designed to provide 99.999999999% durability and 99.99% availability (https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html).
languagesThe integrated data and model platform will run primarily on JavaScript using Node.js (https://nodejs.org). The website will also be built using JavaScript and the vue.js application framework (https://vuejs.org). Code for training and running the models will be developed using Python (https://www.python.org). Additional analyses may also be performed using R (https://cran.r-project.org).
restrictionsNot yet available, to be included as part of https://www.usgs.gov/apps/ecosheds
nameIntegrated Data and Model Platform
existingInput
feesNone.
descriptionStreamflow, water temperature and image timeseries data from USGS streamgages nested in the West Brook, Neversink River, and a 3rd catchment TBD via collaboration with Tribal partners.
sourceUSGS National Water Information System NWIS https://waterdata.usgs.gov/nwis and USGS Flow Photo Explorer FPE https://www.usgs.gov/apps/ecosheds/fpe/. Data collection will be ongoing during the proposed project, but not funded by CASC.
qualityChecksThe streamflow and temperature data accessed via NWIS is already subject to all USGS data review requirements, see QA/QC documentation: https://waterdata.usgs.gov/nwis/qwdata?help. The images available on FPE are screened at user upload for any Personal Identifying Information PII for people or vehicles. See addition information on PII processing under “What’s New” on FPE landing page (https://www.usgs.gov/apps/ecosheds/fpe/) and at the end of the Uploading Photos section of the User Guide (https://www.usgs.gov/apps/ecosheds/fpe/#/user-guide). Image database is also labeled with “provisional Database” disclaimer also found on the FPE landing pages linked above.
citationNo citation available; data made publicly available on NWIS and FPE websites linked above.
formatUSGS National Water Information System NWIS standard formatting for streamflow and temperature data. Images are made available for viewing via the FPE application webpage cited above and can be exported as links to the FPE database in a *.csv file.
restrictionsNo restriction. Publicly available data.
backupAndStorageThe FPE image database runs in the Amazon Web Services (AWS) Relational Database Service (RDS) through the USGS Cloud Hosting Solutions (CHS). The database is backed up daily and follows a 7-day rotation cycle using the RDS backup functionality. The image files are stored on AWS Simple Storage Service (S3), which is designed to provide 99.999999999% durability and 99.99% availability, and thus has a very low chance that image files will be lost (https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html).
volumeEstimateApprox. 10 GB including all image files, temperature and streamflow data files.
dataProcessingNo plans to manipulate streamflow or image data.
nameStreamflow, water temperature and image data from streamgages
feesNone
descriptionWater temperature timeseries data.
sourceSHEDS Northeast Stream Temperature Database: http://db.ecosheds.org/
qualityChecksA series of existing quality control checks are already implemented for this database to support our previous temperature modeling efforts. These checks include standard QAQC tests for continuous temperature data such as spike detection, range limits (i.e., 0 to 35 degC), temporal autocorrelation, air temperature correlation, and spatial autocorrelation.
citationAll relevant information available at http://db.ecosheds.org/
formatDatabase downloadable as *.csv
restrictionsNo restriction. Publicly available data.
backupAndStorageThe SHEDS stream temperature database is stored on an on-premises server at UMass Amherst. The database is backed up daily using a 7-day rotation to two other servers, one of which is also located at UMass Amherst, and the other located off-site.
volumeEstimateApprox. 1 GB of continuous stream temperature data files.
dataProcessingNo plans to manipulate stream temperature data.
nameWater temperature data
history2023-08-29 11:28:43 MDT: phase Approved DMP
model
modelVersionThis proposed model will be developed for this project. Versioning not yet available
descriptionA process guided deep learning (PGDL) model that predicts streamflow and water temperature from camera image and environmental data.
sourceImage database and planned location of model: https://www.usgs.gov/apps/ecosheds
modelInputsStreamflow and water temperature data from nearby gages as available, environmental data like precipitation, air temperature and solar radiation, camera image data, and images ranked by relative flow using a novel manual image ranking technique (see above).
calibrationDetailsCalibration and validation methodology still under development as this model will be developed as part of this ongoing project.
modelOutputsStreamflow and water temperature data
nameProcess-Guided Deep Learning model that predicts streamflow and water temperature
modelVersionWe are constructing the model as a project funded by other sources. No versioning yet available.
descriptionA process guided deep learning (PGDL) model that predicts relative streamflow from camera image and environmental data.
sourceImage database and planned location of predicted relative flow data created with machine learning model: USGS Flow Photo Explorer: https://www.usgs.gov/apps/ecosheds/fpe
modelInputsStreamflow data at the target location, streamflow data from nearby “nested” gages, environmental data like precipitation, air temperature and solar radiation, camera image data, and images ranked by relative flow using a novel manual image ranking technique.
calibrationDetailsThe model will be calibrated and validated using standard machine learning methodologies including cross-validation based on independent training, testing, and validation splits of the dataset. Hyperparameters of the model will be tuned using standard methods to prevent over-fitting during the training process.
modelOutputsVolumetric streamflow at sub-daily time increment.
nameDeep Learning Computer Vision model that predicts streamflow from camera image data
phaseApproved DMP
templateNameCASC DMP v4

Item Actions

View Item as ...

Save Item as ...

View Item...