2.6 Python Packages and Conda Environments

Learning Goals

After completing this lesson you will be able to

  • explain what a Python package is.
  • import a package into Python.
  • understand how dependency management can play a large role in Python programming.
  • explain how to use conda environments to manage your third-party libraries.
  • create a conda environment.
  • install a Python package in the terminal using conda.

Background

You have probably noticed that most of our notebooks start with importing Python packages like pandas, which we then use for our data analysis (see the code below).

import glob
import pandas as pd
import matplotlib.pyplot as plt  

You may ask yourself, exactly what a package is and why we should care about this?

Also in the coming weeks, we will be using additional more specialized packages to work with environmental data. These packages do not come from a single source, they are third party libraries

Third party libraries are critical to making Python the great tool it is. Developers and scientists all over the world are constantly improving and adding to the functionality Python provides by writing new packages. When you require one of these third party libraries in your workflow, they are called dependencies because your workflow depends on them to function.
(CU Boulder, 2020)

Dependency conflicts can cause significant issues when working with Python. Conda allows you to install multiple environments on your computer and to address dependency issues. Image from XKCD

This means we need to find ways to manage these dependencies. The answer to this is using computation environments which are documented and reproducible, like all of our workflows

Packages

What are packages

In Python, a package is a bundle of pre-built functionality that adds to the functionality available in base Python. Base Python can do many things such as perform math and other operations. However, Python packages can significantly extend this functionality.

You can think of a Python package as a toolbox filled with tools. The tools in the toolbox can be used to do things that you would have to otherwise hand code in base Python. These tasks are things that many people might want to do in Python, thus warranting the creation of a package. After all, it doesn’t make sense for everyone to hand-code everything!

For example, the matplotlib package allows you to create plots of data. Since most of us create plots routinely, having a Python package to create plots makes programming more efficient for everyone who needs to create plots.
(CU Boulder, 2020)

Python’s scientific ecosystem

Fabien Maussion provides a great description of the most important Python packages for scientific research.

Overview of python scientfic packages (source)via Fabien Maussion

We have already used some of them, like Jupyter, matplotlib, and pandas. Others like xarray will be introduced soon.

Working with packages

You have to explicitly load (i.e. import) all packages that you want to use in your code.

This is done using the import command (see below).

import glob
import pandas as pd
import matplotlib.pyplot as plt  

Python packages can have modules. For example, the matplotlib library has a module called pyplot, which makes it easier to set up plots.

We can import specific modules from a package by first calling the package name followed by the module name (see above).

We can also import the module using an alias or short name, such as plt for matplotlib.pyplot.

import matplotlib.pyplot as plt  

Using an alias helps us avoid typing long package names, whenever we use functionality.

For example, you could read a .csv file with pandas like this:

pandas.read_csv('filename')

or like shorter like this:

pd.read_csv('filename')

Python Environments

A Python environment is a dedicated directory where specific dependencies can be stored and maintained. Environments have unique names and can be activated when you need them, allowing you to have ultimate control over the libraries that are installed at any given time.

You can create as many environments as you want. Because each one is independent, they will not interact or “mess up” the other. Thus, it is common for programmers to create new environments for each project that they work on.
(CU Boulder, 2020)

We are using Anaconda as our Python distribution, which is built around the conda package manager. Several features make conda a good choice for letting it manage your Python installation and package management:

  • Conda is cross-platform and available on Linux, Mac, and Windows
  • When installing new packages, conda will perform a dependency check and will try to find a combination of packages that play nice with each other.
  • It has the built-in functionality for managing different Python environments.

Managing Python Environments

Because of dependency issues, it is a good idea to create a dedicated environment for each project that you undertake (e.g. your semester project).

This means you need to be able to:

  • create a new environment
  • add packages to the environment
  • activate the environment for use

Using the Anaconda Navigator

You can use the Anaconda Navigator to do this. The online documentation walks you through the steps on how to do this.

In this course, you will be provided with a configuration file, that describes the environment. It can be used to re-create (i.e. import) the environment for you like this:

To use the environment, you select if from the list like in the image below.

Using the command line

I personally avoid using the Anaconda Navigator, because it is sooo slooow!

So Slow

If you open the Anaconda Prompt (or Anaconda Powershell Prompt) in Windows or the Mac Terminal you can do all of this with a few lines of code.

  • Create an environment:

    $ conda create -n <environment name>
  • Import an environment using yml-file:

    $ conda env create -f environment.yml
  • Use an environment:

    • you can list all available environments like this:

      $ conda env list
    • you can then select an environment from the list like this:

      $ conda activate <environment name>

      Once you have activated a conda environment, all installations that you run will be installed specifically to this environment. This allow you to have ultimate control when installing and managing dependencies for each project.
      (CU Boulder, 2020)

  • Update an environment with a new yml-file:

    • Once you have created an environment you can always update it with a yml-file. For example, the below code will update th ISAT420 environment with the packages found in the environment.yml configuration file.

      $ conda activate ISAT420
      $ conda env update -f environment.yml
TipHomework:

Install the ISAT 420 Environment from the environment.yml file found on our course GitHub

The steps are:

  • Download the file to your hard-drive
  • Follow the guide above to clone the environment
    • give it the name ISAT420
  • Activate the environment
  • Open a Jupyter notebook
  • Test whether it works by importing the xarray package import xarray as xr

Acknowledgements

This lecture is partially based on: