U.S. Geological Survey (USGS) scientists are at the forefront of research that is critical for decision-making, particularly through the development of models (Bayesian networks, or BNs) that forecast coastal change. The utility of these tools outside the scientific community has been limited because they rely on expensive, technical software and a moderate understanding of statistical analyses. We proposed to convert one of our models from proprietary to freely available open-source software, resulting in a portable interactive web-interface. The resulting product will serve as a prototype to demonstrate how interdisciplinary USGS science and models can be transformed into an approachable format for decision-makers. As the project has been significantly delayed in FY18, we present lessons learned to date through the ongoing project planning and process of filling staffing requirements.. We will apply these lessons in the remainder of FY18 and FY19. This report summarizes our lessons learned and next steps in three categories: 1) processing, storage, and hosting; 2) staffing requirements and needs; and 3) the development process.
Principal Investigator : Erika Lentz
Co-Investigator : Ben Gutierrez, Michelle D Staudinger, Nathaniel G Plant, Sara L Zeigler, Emily J Sturdivant
Cooperator/Partner : Emily Himmelstoss, Michael N Fienen
1) Processing, storage, and hosting requirements
In summer of 2017, we learned our Science Center could no longer support a public-facing web server. This presented challenges both in terms of providing access to a development site where users could test the product as well as where the product would ultimately be hosted over the long term. Because our processing and storage needs were minimal due to precompiled GIS layers we had already prepared for this project (anticipated use of up to 25 simultaneous users and 50GB of storage), we were able to find a potential temporary development solution to this issue through another Center. It is worth noting that because this was not a “tiered data center”, uptime limitations included no backup generators for the server or dual high-bandwidth failover connections. In addition, to initially streamline the process we agreed to access via an internal-only site initially, requiring us to bring user groups physically on site to test the prototype, though we hoped to expand access via a non-indexed site to expand user testing in the future.
Ultimately, after a fair amount of research into longer-term hosting solutions such as NatWeb vs. cloud services, we decided should we elect to support long-term hosting needs with this product, we would pursue high performance development via Amazon Cloud hosting services. This would support APIs and server-side processing, and would therefore be more likely to minimize functionality issues down the line.
2) Staffing requirements and needs
This project has benefitted from a diverse team that brings a variety of skills, knowledge, and expertise. These skills and abilities include: multidisciplinary research backgrounds, including hydrology, oceanography, geology, and ecology; expertise in the development, testing, and design of Bayesian networks using proprietary software; GIS expertise; background in Python, R, and other open source software; facilitation experience with agile development (see Section 3); and direct access to an end-user group for testing and iterative feedback during the development cycle.
Our most significant delays in this project come from the identified need for an in-house full stack developer who is familiar with both the front end (client side; the user interface) and back end (server side; the application and database that support user interaction) development, that we are currently in the process of hiring. The back end is particularly critical because it is dynamic and will need to be updated consistently for currency. We require an approach that meets our changing data needs but also changing security standards and that also stays current with database and programming language versions. Hiring a contractor or vendor would not necessarily provide the project with someone familiar with the laws, policies and regulations for USGS digital products and services (accessibility/section 508, mandatory content, mobile/device-agnostic, technology standards, etc.) or hosting issues (e.g. internal sites for development and Amazon Cloud) required to ensure a seamless transition from an initial launch through ongoing maintenance phases. In particular, the WAF (Web Application Firewall) has been the leading cause of functionality loss both during and after development in an existing web-based portal at our Center and was a concern for any prototype emerging from this project.
3) Development process
In this project we seek to develop software that will allow a sufficiently generic web platform to migrate other BNs to similar web interfaces in the future. In addition, the BN prototype stored in proprietary software will be used to identify potential options by which it could be converted to an open-source environment to replace the need for precompiled scenarios for use in the web application. In our project planning to date, we have identified three phases necessary to the development process. These include:
a) Agile development framework: This framework is intended to be iterative and incremental, requiring a level of engagement from the team throughout the process. As the developer progresses with work, project team members will remain apprised of developments. Another benefit to this method is that the developer will have direct access to multiple people to ask questions and get prompt answers. This keeps the development process flexible, meaning as issues arise the entire team is engaged and able to collectively decide how to proceed as opposed to other development methods that may only identify an issue after many weeks of independent developer effort. This tighter level of engagement provides the developer with a platform for rapid feedback. The first step in the process is to capture all of the details that would traditionally fill a requirement document. We will do this through a ‘user story’ approach, where stakeholders are identified and provide sufficient information for the developer to estimate implementation times. We plan to write user stories together as a group in a project planning workshop once the developer has been hired. The intent is to have a lot of verbal communication between team members and the developer and to emerge with a project plan that includes a solid vision of clearly-defined items that will make up the final product, thereby helping to prioritize tasks for the developer.
b) Coding language, network packages, and software package decisions: Here, the developer will evaluate the capabilities of an array of open-source graphical, mapping, and Bayesian network packages and applications (e.g. bnlearn in R vs. pgmpy in Python) that would support back-end conversion of models created with proprietary software (Netica) and support front-end interface and display options (e.g. GeoServer, R Shiny, ArcOnline), informed by outcomes from the agile project planning workshop.
c) Beta version of the web interface and software: The beta version of the interface will allow for testing by an assembled user-group that will evaluate how real users interact with the interface. With this beta release, the developer will provide a comprehensive migration plan from proprietary software.
Figure caption for attached image:
Bayesian network to predict coastal change as currently configured in proprietary software. Ongoing work in FY18/19 will develop an interface and back end code wherein a user can query different ranges in parameters and explore how coastal response scenarios change. Modified from Lentz et al. (2015; 2016)