Lesson Plan for Week 10
Objectives
We continue to work with Xarray and will be introducing some more data processing concepts. We have already encountered aggregation-methods like groupby or rolling averages in pandas.
When working with global and gridded environmental data these operations are also very useful.
For example, since environmental data has annual cycles, it often makes sense to calculate anomalies, which is something where groupby can help.
This is one step to work more in-depth with this kind of data including then fitting models or calculating correlations between variables.
Specific learning goals
Technical
- Selecting and subsetting variables from an xarray- dataset.
- Plotting xarray data on maps.
- Performing aggregation options like
.rolling()or.groupby()to process gridded - Use
poochto access NOAA Climate Data Products on Amazon Web Services.
Weather and Climate System
- Comprehend the fundamentals of climatologies.
- Calculate an anomaly to a climatology.
- Calculate the rolling mean of the anomaly data to smooth the time series and extract long-term signals/patterns
Class Preparation
Readings and Materials
Background
- Abernathy: Maps in Scientific Python
- We won’t discuss this explicitly, but this is a bit of background on working with maps in python. This provides a bit of background on why this matters and how this is practically done in python.
- Climate Match: We will be working through two tutorials from Climate Match, an open source bootcamp for using python for climate science.
Specifically, we will be using materials from Tutorial 4: Understanding Climatology Through Precipitation Data and Tutorial 5: Calculating Anomalies Using Precipitation Data
Please watch the two introductory videos:
Data:
This week also makes use of some climate data observed by satellite data that is published by NOAA and freely accessible on Amazon AWS.
Climate Match has a good overview on datasets from different providers that provide climate and environmental data.
Planned Agenda
Monday:
- More xarray: Grouping, Anomaly Calculation, Correlation, …
Wednesday:
- Check-In: Python Skills and Resources
- Skills Check
- Check-In: Learning Reflection
- Semester Project Check-In.
Activity:
Motivation
Thinking about the upcoming learning reflection, it is a good idea to think about all the python skills that we have been covering so far.
This follows up from a similar exercise about the tools in Week 9.
Task
With a partner/ or your team:
- Go through the list and discuss where in the course you have encountered this before in ISAT 420.
- Where in the course was this covered?
- Think about whether this is something that applies to your semester project?
- Do you know how you would apply this in an exercise?
| Python Area | I can do this | Skill | Where in the course? | Additional Resource |
|---|---|---|---|---|
| General | Import a package | |||
| Specify the path of a file | ||||
Select multiple files for reading using glob |
||||
| Pandas | Select items from a pandas series or dataframe object |
AP | ||
Calculate basic statistics on a dataframe or series using .min(), .max(), .mean(), ... |
AP | |||
Selecting data on index using .loc() |
AP | |||
| Select data in a dataframe on a condition | AP | |||
| Merge data from two different dataframes into a new dataframe | AP, PDA 5.2 | |||
Use df.describe() and df.info() to understand data |
AP, PDA 5.3 | |||
Read tabular data (e.g. csv ) into a dataframe using pd.read_csv() |
AP, PDA 6.1 | |||
Use the df.plot() functionality to make simple plots such as scatter plots, histograms, or barplots |
AP, PDA 8, PDA 9.2 | |||
Plotting a column in a dataframe using .plot(y=...,) |
||||
| Parsing time-series data and using the date as index | AP | |||
Selecting data by columns using a list of columnsdf[[‘col1’, ‘col2’]] |
PDA 5.1AP, E1 | |||
| Identify and fill missing values in a dataframe | AP | |||
Using df.groupby() as an aggregation function |
AP | |||
Using pd.read_csv() to read more complex tabular data (i.e. tab-delimited, skipping rows, naming columns) |
AP, PDA 6.1 | |||
Temporally resampling and aggregating data using df.resample().mean() |
AP, E1 | |||
| Performing calculations and assigning results to a new column | AP | |||
| Adding titles, labels, text, and other features to plots | ||||
| Xarray | Reading a gridded netCDF dataset using .open_dataset() and .open_mfdataset() |
AX, CM4 | ||
| Exploring dimensions, coordinates, data variables, and attributes of dataset | AX, CM4 | |||
Selecting a variable from dataset ds.<var_name> |
AX, CM4 | |||
Selecting data using .sel() |
AX, CM4 | |||
| Creating a mapped plot and adding map features | AX, CM4 | |||
Calculating statistics like means across named dimensions .mean(dim=…) |
AX | |||
Selecting a slice from a dataset using slice(<start>, <end>) to only plot a region |
AX | |||
Access remote data using s3fs and pooch |
CM4 | |||
Use .groupby() to find the typical behavior (climatology) of environmental data |
AX, CM4 | |||
Use .groupby() and climatologies to find deviations from typical conditions (e.g. anomalies) |
AX, CM4 | |||
Use .rolling() to remove high frequency variations in environmental data. |
AX, CM5 | |||
Use .weighted to calculated area averages on a sphere |
AX, CM5 |
Resources
- [PDA] McKinney W., Python for Data Analysis - Open Edition, 3e, O’Reilly, 2022
- Abernathy, R: Earth and Environmental Data Science, 2021
- [AP]: Chapter: Pandas
- [AX]: Chapter: Xarray Fundamentals
- Earth Lab, Intermediate Earth Data Science Textbook, University of Colorado Boulder, Earth Lab, Updated: 2022, Citation DOI: https://doi.org/10.5281/zenodo.4683910
Semester Project
- Where are you in the project
- What data do you have?
- What data do you need?
- How can you apply the concepts of ISAT 420 to your data and your questions?
- How should you structure your repository?
Keep notes