For disparate watersheds of the Pacific Northwest, I have been working to address issues around the implementation of water markets. In doing so, I seek answers to the following questions:

With changes in flow from water transactions, what co-benefits exist for indicators of water quality?
What characteristics of watersheds explain baseline conditions for flow, temperature, nutrients, and sediment?
How well can we predict water quantity and quality in unmonitored streams?
Considering future water resource management efforts, what changes in water quantity and quality might we expect to achieve?

To help answer these questions, I have been on a data treasure hunt. I find treasures in all sorts of places, shapes, and sizes. Some are polished gems that can go seamlessly into my treasure trove. Others are rough rocks that need splitting, chipping, smelting, and recasting to add value. Different data purveyors of have varying capacity and resources to make this kind of job easier or harder for people like me. For any data scientist, this is a well-known problem. Each of us has tools and techniques to do data magic (i.e. processing).

For a pilot project I have been working on in the state of Washington, I recently found one valuable source with a bevy of rough data. The Washington Department of Ecology’s Hydrology Branch has decades of daily records (from 2003 to present) for flow, stream temperature, air temperature, precipitation, and stage data at 385 monitoring sites across the state.

385 water quality gauges of the Washington Department of Ecology. Source: https://fortress.wa.gov/ecy/eap/flows/regions/

To make some data magic, I wrote some R code to download, integrate, format, and compile data. Using the test case of the in the Wenatchee watershed, I chose six gages of 385 gages. Each of the six gages had discrete text files for each water year (over 400 text files) and with unique characteristics (e.g. different file names, added variables in certain years but not others, lack of geolocation, inconsistent naming conventions). To address these idiosyncrasies, my code exploded to over 1,000 lines. It could be shorter. I know I missed opportunities to shorten it additional loops and if/then statements. Less code is typically better. I will improve this tool as I use it further in other projects. However, as a data scientist it was satisfying to execute the code, crawl over the internet, and shape the idiosyncratic rocks of data into more than 140,000 gems of useable records. Now, I can now look further at what happening with the water of the Wenatchee, why it is doing what it is, and how it might change in the future. More than simply gems of data, these data will help to answer my questions, which will be the real treasure!