Topic Block 1: Open Science and Reproducible Workflows

Overview:

This block introduces concepts of Open Science and Reproducible Workflows, which are really important when doing environmental data analysis.

  • Monday: Why Open Science?: We will do a discussion based introduction to some important concepts
  • Wednesday: Tools & Workflows: We will do activities designed to introduce the concept of reproducible workflows, and also start introducing some tools. Please bring laptops!

Learning Objectives

Why Open Science?

  1. Identify issues with closed science.
  2. Understand the values behind open science and open data.
  3. Be able to define key terms including:
    1. Open Science
    2. Open Data
    3. Findable
    4. Accessible
    5. Interoperable
    6. Reusable
  4. Understand the benefit of these concepts for addressing environmental issues?

Data & Workflows

  1. Understand the necessity for reproducible workflows.
  2. Experience working with data.
  3. Recognize that tools are needed.
  4. Realize that all of this is a process and that getting started is better than doing nothing.

Tools

  1. Recognize the importance of using tools to achieve reproducible workflows and open science.
  2. Become familiar with two tools supporting reproducible workflows and sharing of code namely:
    • Jupyter Notebooks
    • Git & GitHub using GitHub Desktop
  3. Experience basic workflows for
    • version control.
    • data analysis

Materials

Readings

Week 2: Monday

  1. Replication Crisis in Science

Science has problems including the fact that many scientific results cannot be reproduced/ replicated for a variety of reasons.

  1. Open Science

    Open Science is a current movement on making science more transparent, accountable to the public, and more efficient.

  2. FAIR Data Principles

    One of the main questions we face with environmental data is how to make data useful and usable. What good is data, if it sits on someone’s hard drive (not-findable), or you have no idea on how to read the data (in-accessible). The FAIR movement is trying to do that:

  3. Optional: CARE

    Indigenous groups have proposed a complementary set of principles to FAIR, which emphasize the ethics and collective benefits of indigenous data governance.

Week 2: Wednesday

Week 2: Friday

Installing the Software

We will get started using a few tools that will help us create open and reproducible workflows. In this course we will use:

  1. Git a tool for version control using repositories.
  2. GitHub an online repository to manage and share Git repositories
  3. Python (version 3) with Jupyter Notebooks to conduct and describe our analysis.
  4. Google Colab to run and edit Jupyter Notebooks in the Cloud
Note

If you are familiar with all of these and have working installations for Git and Python 3.x with Jupyter there is no need for you to install any of the below software.

However, it is your responsibility to then manage your own computing environment.

  • Git with GitHub: We will be using GitHub Desktop, which is freely available for Windows, Mac, & Linux.
    • Download it and follow the installation instructions.
    • You will be prompted to connect your GitHub Account. If you do not have one, you can create one in the process.
    • Test your installation:
      • Follow the Create a Tutorial Repository Instructions GitHub Desktop Tutorial Repository
  • Python with Jupyter: The easiest way to install install all the software needed is through a distribution like Anaconda, that will do the heavy lifting for you.
    • However, Anaconda will use about 1 GB of disk space. If that is a problem, there are alternatives (Please let me know!).
    • Download and install Anaconda on your machine.
    • Download Anaconda
    • Windows Instructions
    • MacOS Instructions
    • Test your installation:
      • After the installation is complete open the Anaconda Navigator and select Jupyter Lab from the provided options. This should open a window in your web browser.
      • Open the Python 3 Console (not a Notebook ).
      • Type 1+1 into the box and hit <Shift>+<Enter>.
      • If the Python Installation Worked, you should see the results 2 printed on the screen.

Acknowledgement:

The portion on reproducible workflows and associated readings was inspired from Alexander: Telling Stories with data (2024)