flowchart LR A(Environmental Issue) --> B(Specific Question) B --> C(Data Analysis Workflow) B1[Environmental Data] --> C C --> D(Product)
flowchart LR A(Environmental Issue) --> B(Specific Question) B --> C(Data Analysis Workflow) B1[Environmental Data] --> C C --> D(Product)
Here is a more detailed breakdown of the data analysis workflow:
<We have not yet really talked about this yet>?It is important to understand what these missing steps are, to avoid being in the same situation as the underpants gnomes in South Park (Figure 2).
Whenever we use data to understand a scientific process or environmental issue, we are essentially building a model. A model is a simplified representation of reality that helps us represent, understand, and predict phenomena.
Models can be conceptual, where we use diagrams and verbal descriptions to represent the relationships between different components of a system. For example, we might have a conceptual model of the carbon cycle that shows how carbon moves between the atmosphere, oceans, and land.
Models can also vary in complexity from simple to very complex.
Both models rely on data to be built and validated.
For now we will focus on statistical models and will come back to modeling later in the course.
We will explore one specific statistical model in more detail: flood probability modeling.
Understanding flood return risk is very important for a variety of reasons, including urban planning, infrastructure design, and pricing insurance (Figure 5).
For example, we might want to know the probability of a 100-year flood event occurring in a specific location. This would be a flood event that has a 1% chance of occurring in any given year. To estimate this probability, we can use historical flood data to fit a statistical model that describes the distribution of flood events.
Infrastructure projects are often designed to withstand a certain return period flood event, such as a 100-year flood.
Similarly, insurance companies use flood return periods to set insurance premiums accordingly. Statistically, a house in a 100-year floodplain has more than 1 in 4 chance of being flooded during a 30-year mortgage.
Forecasting extreme events like floods is very difficult because of limited data. We can use statistical models to extrapolate beyond the range of observed data, but this comes with a lot of uncertainty.
You will see this in the following case study, using data from the USGS stream gauge at the South River, which has recorded streamflow data since 1952 (i.e. less than 100-years of data).