Topic Block 1: Open Science and Reproducible Workflows
Overview:
This block introduces concepts of Open Science and Reproducible Workflows, which are really important when doing environmental data analysis.
- Monday: Why Open Science?: We will do a discussion based introduction to some important concepts
- Wednesday: Tools & Workflows: We will do activities designed to introduce the concept of reproducible workflows, and also start introducing some tools. Please bring laptops!
Learning Objectives
Why Open Science?
- Identify issues with closed science.
- Understand the values behind open science and open data.
- Be able to define key terms including:
- Open Science
- Open Data
- Findable
- Accessible
- Interoperable
- Reusable
- Understand the benefit of these concepts for addressing environmental issues?
Data & Workflows
- Understand the necessity for reproducible workflows.
- Experience working with data.
- Recognize that tools are needed.
- Realize that all of this is a process and that getting started is better than doing nothing.
Tools
- Recognize the importance of using tools to achieve reproducible workflows and open science.
- Become familiar with two tools supporting reproducible workflows and sharing of code namely:
- Jupyter Notebooks
- Git & GitHub using GitHub Desktop
- Jupyter Notebooks
- Experience basic workflows for
- version control.
- data analysis
Materials
Readings
Week 2: Monday
- Replication Crisis in Science
Science has problems including the fact that many scientific results cannot be reproduced/ replicated for a variety of reasons.
Open Science
Open Science is a current movement on making science more transparent, accountable to the public, and more efficient.
FAIR Data Principles
One of the main questions we face with environmental data is how to make data useful and usable. What good is data, if it sits on someone’s hard drive (not-findable), or you have no idea on how to read the data (in-accessible). The FAIR movement is trying to do that:
- FAIR Principles
- Optional Reading: The FAIR Guiding Principles for scientific data management and stewardship
This is the scientific article that is referenced in the FAIR Principles reading.
Optional: CARE
Indigenous groups have proposed a complementary set of principles to FAIR, which emphasize the ethics and collective benefits of indigenous data governance.
Week 2: Wednesday
- Bowers & Voors, 2016: How to improve your relationship with your future self
- A paper that provides a set of clear recommendations on how to do data analysis
- Wilson et al., 2017: Good enough practices in scientific computing
- A paper that provides a set of clear recommendations on how to do use computers for data analysis
Week 2: Friday
- Background Readings for Class Activities. We will do a bit of live coding. What we do is described in more detail in the chapters below.
- Earth Lab, Introduction to Earth Data Science Textbook, Chapter 3: Jupyter for Python
- Earth Lab, Introduction to Earth Data Science Textbook, Chapter 7: GitHub for Version Control
Installing the Software
We will get started using a few tools that will help us create open and reproducible workflows. In this course we will use:
- Git a tool for version control using
repositories. - GitHub an online repository to manage and share Git
repositories - Python (version 3) with Jupyter Notebooks to conduct and describe our analysis.
- Google Colab to run and edit Jupyter Notebooks in the Cloud
If you are familiar with all of these and have working installations for Git and Python 3.x with Jupyter there is no need for you to install any of the below software.
However, it is your responsibility to then manage your own computing environment.
- Git with GitHub: We will be using GitHub Desktop, which is freely available for Windows, Mac, & Linux.
- Download it and follow the installation instructions.
- You will be prompted to connect your GitHub Account. If you do not have one, you can create one in the process.
- Test your installation:
- Follow the Create a Tutorial Repository Instructions

- Follow the Create a Tutorial Repository Instructions
- Python with Jupyter: The easiest way to install install all the software needed is through a distribution like Anaconda, that will do the heavy lifting for you.
- However, Anaconda will use about 1 GB of disk space. If that is a problem, there are alternatives (Please let me know!).
- Download and install Anaconda on your machine.
- Download Anaconda
- Windows Instructions
- MacOS Instructions
- Test your installation:
- After the installation is complete open the Anaconda Navigator and select
Jupyter Labfrom the provided options. This should open a window in your web browser. - Open the Python 3 Console (not a Notebook ).
- Type
1+1into the box and hit<Shift>+<Enter>. - If the Python Installation Worked, you should see the results
2printed on the screen.
- After the installation is complete open the Anaconda Navigator and select
Acknowledgement:
The portion on reproducible workflows and associated readings was inspired from Alexander: Telling Stories with data (2024)