Working with Gridded netCDF data and xarray

This lesson is based on the Lesson: working with netCDF data in Fabien Maussion’s Physics of the Climate System Course.

These lecture notes and exercises are licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

You already learned how to use the basic features of the python language with the numpy and matplotlib libraries. The purpose of this lesson is to introduce you to the main tool that you will use for working with gridded data: xarray.

This is a dense lesson. Please do it entirely and try to remember its structure and content. This code will provide a template for your own code, and you can always come back to these examples when you’ll need them. I don’t expect you to understand all details, but I hope that you are going to get acquainted with the “xarray way” of manipulating multi-dimensional data. You will have to copy and adapt parts of the code below to complete the exercises.

Learning Objectives

  • Describe netCDF files as self-describing data
  • Understand how netCDF can be applied to big data
  • Load netCDF datasets
  • Select data by time and coordinates
  • Perform aggregation operations
  • Plot variables

NetCDF Files

In order to open and plot NetCDF files, you’ll need to install the xarray, cartopy, and netcdf4 packages: if you haven’t done so already, follow the installation instructions for our ISAT420 python environment that contains these packages.

As a quick fix, you can also install them directly using the code below (this will take some time).

#To install these packages remove the hash (#) characters in the lines below and run the cell. The ! tells jupyter to run a system command. 
#! conda install xarray
#! conda install netcdf4
#! conda install cartopy 

Imports and options

First, let’s import the tools we need. Remember why we need to import our tools? If not, ask Dr. Gerken

# Import the tools we are going to need today:
import matplotlib.pyplot as plt  # plotting library
import numpy as np  # numerical library
import xarray as xr  # netCDF library
import cartopy  # Map projections libary
import cartopy.crs as ccrs  # Projections list
from glob import glob
# Some defaults:
plt.rcParams['figure.figsize'] = (12, 5)  # Default plot size

The Data

We will also be using an example of ERA5 Reanalysis data.

ERA5 (or European ReAnalysis v5) provides global, hourly estimates of atmospheric, ocean wave, and land-surface variables at a horizontal resolution of 31,km. Data is available from 1940 onwards both hourly and averaged to monthly.

Reanalysis in general are the fusion of observations with a global weather model to derive a homogenous, regular-best estimate output on a grid from station based observations.

ERA5 is produced by the European Center for Medium Range Weather Forecasting (ECMWF) and can be downloaded freely (account registration required).

I have placed the data files into the W8_Xarray_Gridded/Data directory.

Read the data

Most of today’s meteorological data is stored in the NetCDF format (*.nc). NetCDF files are binary files, which means that you can’t just open them in a text editor. You need a special reader for it. Nearly all the programming languages offer an interface to NetCDF. For this course we are going to use the xarray library to read the data:

Xarray commands are similar to pandas but not quite the same. To open a dataset ds we can use the .open_dataset() method.

Let’s start with having a look at the ERA5 file, I am providing.

# Here I downloaded the file in the "Data" folder which I placed in a folder close to this notebook
# The variable name "ds" stands for "dataset"
ds = xr.open_dataset(r'../data/reanalysis-era5-single-level-monthly-means_2000_T_Td_u_v_SST.nc', engine='netcdf4')
# Lets see what we have:
ds
<xarray.Dataset> Size: 249MB
Dimensions:     (valid_time: 12, latitude: 721, longitude: 1440)
Coordinates:
  * valid_time  (valid_time) datetime64[ns] 96B 2000-01-01 ... 2000-12-01
    expver      (valid_time) <U4 192B ...
  * latitude    (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude   (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    number      int64 8B ...
Data variables:
    u10         (valid_time, latitude, longitude) float32 50MB ...
    v10         (valid_time, latitude, longitude) float32 50MB ...
    d2m         (valid_time, latitude, longitude) float32 50MB ...
    t2m         (valid_time, latitude, longitude) float32 50MB ...
    sst         (valid_time, latitude, longitude) float32 50MB ...
Attributes:
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2026-03-11T00:20 GRIB to CDM+CF via cfgrib-0.9.1...

Each netcdf file has a data model, that is represented by xarray:

xarray data model

The NetCDF dataset is made up of various elements: Dimensions, Coordinates, Variables, Attributes:

  • the dimensions specify the number of elements of each data coordinate, their names should be understandable and specific
  • the attributes provide some information about the file (metadata)
  • the variables contain the actual data. In our file there are five variables. All have the dimensions [time, latitude, longitude], so we can expect an array of size [12, 721, 1440]
  • the coordinates locate the data in space and time

Working with big data

The entire ERA5 dataset is larger than 5 Petabytes. This is 5,000,000 GB. Your laptop has 8-16 GB of working memory (RAM) and even supercomputers cannot access more than a few TB of RAM.

Lazy execution

When loading a dataset in Pandas, you are always reading the entire dataset into memory. Xarray, in contrast uses lazy indexing by design. This means, when opening a dataset, the dataset is not actually read into memory, but Xarray learns its internal structure.

When we look into a variable, we can see the size, and the dtype of the underlying array, but not the actual values. This is because the values have not yet been loaded.

Xarray only loads data, when it is asked to produce an output such as printing a value to the screen or making a plot.

Loading multiple files

Since big data is distributed across many files, Xarray can also treat data that is spread across multiple files as a single dataset. This is done by passing a list of files to the .open_mfdataset method.

files = glob(r'../Data/*.nc')
files
['../Data\\reanalysis-era5-single-level-monthly-means_2000_T_Td_u_v_SST.nc',
 '../Data\\reanalysis-era5-single-level-monthly-means_2001_T_Td_u_v_SST.nc']
ds = xr.open_mfdataset(files, engine='netcdf4')
ds
  • latitude
    (latitude)
    float64
    90.0 89.75 89.5 ... -89.75 -90.0
    units :
    degrees_north
    standard_name :
    latitude
    long_name :
    latitude
    stored_direction :
    decreasing
    array([ 90.  ,  89.75,  89.5 , ..., -89.5 , -89.75, -90.  ], shape=(721,))
  • longitude
    (longitude)
    float64
    0.0 0.25 0.5 ... 359.2 359.5 359.8
    units :
    degrees_east
    standard_name :
    longitude
    long_name :
    longitude
    array([0.0000e+00, 2.5000e-01, 5.0000e-01, ..., 3.5925e+02, 3.5950e+02,
           3.5975e+02], shape=(1440,))
  • number
    ()
    int64
    0
    long_name :
    ensemble member numerical id
    units :
    1
    standard_name :
    realization
    array(0)
    • u10
      (valid_time, latitude, longitude)
      float32
      dask.array<chunksize=(6, 361, 720), meta=np.ndarray>
      GRIB_paramId :
      165
      GRIB_dataType :
      an
      GRIB_numberOfPoints :
      1038240
      GRIB_typeOfLevel :
      surface
      GRIB_stepUnits :
      1
      GRIB_stepType :
      avgua
      GRIB_gridType :
      regular_ll
      GRIB_uvRelativeToGrid :
      0
      GRIB_NV :
      0
      GRIB_Nx :
      1440
      GRIB_Ny :
      721
      GRIB_cfName :
      unknown
      GRIB_cfVarName :
      u10
      GRIB_gridDefinitionDescription :
      Latitude/Longitude Grid
      GRIB_iDirectionIncrementInDegrees :
      0.25
      GRIB_iScansNegatively :
      0
      GRIB_jDirectionIncrementInDegrees :
      0.25
      GRIB_jPointsAreConsecutive :
      0
      GRIB_jScansPositively :
      0
      GRIB_latitudeOfFirstGridPointInDegrees :
      90.0
      GRIB_latitudeOfLastGridPointInDegrees :
      -90.0
      GRIB_longitudeOfFirstGridPointInDegrees :
      0.0
      GRIB_longitudeOfLastGridPointInDegrees :
      359.75
      GRIB_missingValue :
      3.4028234663852886e+38
      GRIB_name :
      10 metre U wind component
      GRIB_shortName :
      10u
      GRIB_totalNumber :
      0
      GRIB_units :
      m s**-1
      long_name :
      10 metre U wind component
      units :
      m s**-1
      standard_name :
      unknown
      GRIB_surface :
      0.0
      Array Chunk
      Bytes 95.05 MiB 5.95 MiB
      Shape (24, 721, 1440) (6, 361, 720)
      Dask graph 16 chunks in 5 graph layers
      Data type float32 numpy.ndarray
      1440 721 24
    • v10
      (valid_time, latitude, longitude)
      float32
      dask.array<chunksize=(6, 361, 720), meta=np.ndarray>
      GRIB_paramId :
      166
      GRIB_dataType :
      an
      GRIB_numberOfPoints :
      1038240
      GRIB_typeOfLevel :
      surface
      GRIB_stepUnits :
      1
      GRIB_stepType :
      avgua
      GRIB_gridType :
      regular_ll
      GRIB_uvRelativeToGrid :
      0
      GRIB_NV :
      0
      GRIB_Nx :
      1440
      GRIB_Ny :
      721
      GRIB_cfName :
      unknown
      GRIB_cfVarName :
      v10
      GRIB_gridDefinitionDescription :
      Latitude/Longitude Grid
      GRIB_iDirectionIncrementInDegrees :
      0.25
      GRIB_iScansNegatively :
      0
      GRIB_jDirectionIncrementInDegrees :
      0.25
      GRIB_jPointsAreConsecutive :
      0
      GRIB_jScansPositively :
      0
      GRIB_latitudeOfFirstGridPointInDegrees :
      90.0
      GRIB_latitudeOfLastGridPointInDegrees :
      -90.0
      GRIB_longitudeOfFirstGridPointInDegrees :
      0.0
      GRIB_longitudeOfLastGridPointInDegrees :
      359.75
      GRIB_missingValue :
      3.4028234663852886e+38
      GRIB_name :
      10 metre V wind component
      GRIB_shortName :
      10v
      GRIB_totalNumber :
      0
      GRIB_units :
      m s**-1
      long_name :
      10 metre V wind component
      units :
      m s**-1
      standard_name :
      unknown
      GRIB_surface :
      0.0
      Array Chunk
      Bytes 95.05 MiB 5.95 MiB
      Shape (24, 721, 1440) (6, 361, 720)
      Dask graph 16 chunks in 5 graph layers
      Data type float32 numpy.ndarray
      1440 721 24
    • d2m
      (valid_time, latitude, longitude)
      float32
      dask.array<chunksize=(6, 361, 720), meta=np.ndarray>
      GRIB_paramId :
      168
      GRIB_dataType :
      an
      GRIB_numberOfPoints :
      1038240
      GRIB_typeOfLevel :
      surface
      GRIB_stepUnits :
      1
      GRIB_stepType :
      avgua
      GRIB_gridType :
      regular_ll
      GRIB_uvRelativeToGrid :
      0
      GRIB_NV :
      0
      GRIB_Nx :
      1440
      GRIB_Ny :
      721
      GRIB_cfName :
      unknown
      GRIB_cfVarName :
      d2m
      GRIB_gridDefinitionDescription :
      Latitude/Longitude Grid
      GRIB_iDirectionIncrementInDegrees :
      0.25
      GRIB_iScansNegatively :
      0
      GRIB_jDirectionIncrementInDegrees :
      0.25
      GRIB_jPointsAreConsecutive :
      0
      GRIB_jScansPositively :
      0
      GRIB_latitudeOfFirstGridPointInDegrees :
      90.0
      GRIB_latitudeOfLastGridPointInDegrees :
      -90.0
      GRIB_longitudeOfFirstGridPointInDegrees :
      0.0
      GRIB_longitudeOfLastGridPointInDegrees :
      359.75
      GRIB_missingValue :
      3.4028234663852886e+38
      GRIB_name :
      2 metre dewpoint temperature
      GRIB_shortName :
      2d
      GRIB_totalNumber :
      0
      GRIB_units :
      K
      long_name :
      2 metre dewpoint temperature
      units :
      K
      standard_name :
      unknown
      GRIB_surface :
      0.0
      Array Chunk
      Bytes 95.05 MiB 5.95 MiB
      Shape (24, 721, 1440) (6, 361, 720)
      Dask graph 16 chunks in 5 graph layers
      Data type float32 numpy.ndarray
      1440 721 24
    • t2m
      (valid_time, latitude, longitude)
      float32
      dask.array<chunksize=(6, 361, 720), meta=np.ndarray>
      GRIB_paramId :
      167
      GRIB_dataType :
      an
      GRIB_numberOfPoints :
      1038240
      GRIB_typeOfLevel :
      surface
      GRIB_stepUnits :
      1
      GRIB_stepType :
      avgua
      GRIB_gridType :
      regular_ll
      GRIB_uvRelativeToGrid :
      0
      GRIB_NV :
      0
      GRIB_Nx :
      1440
      GRIB_Ny :
      721
      GRIB_cfName :
      unknown
      GRIB_cfVarName :
      t2m
      GRIB_gridDefinitionDescription :
      Latitude/Longitude Grid
      GRIB_iDirectionIncrementInDegrees :
      0.25
      GRIB_iScansNegatively :
      0
      GRIB_jDirectionIncrementInDegrees :
      0.25
      GRIB_jPointsAreConsecutive :
      0
      GRIB_jScansPositively :
      0
      GRIB_latitudeOfFirstGridPointInDegrees :
      90.0
      GRIB_latitudeOfLastGridPointInDegrees :
      -90.0
      GRIB_longitudeOfFirstGridPointInDegrees :
      0.0
      GRIB_longitudeOfLastGridPointInDegrees :
      359.75
      GRIB_missingValue :
      3.4028234663852886e+38
      GRIB_name :
      2 metre temperature
      GRIB_shortName :
      2t
      GRIB_totalNumber :
      0
      GRIB_units :
      K
      long_name :
      2 metre temperature
      units :
      K
      standard_name :
      unknown
      GRIB_surface :
      0.0
      Array Chunk
      Bytes 95.05 MiB 5.95 MiB
      Shape (24, 721, 1440) (6, 361, 720)
      Dask graph 16 chunks in 5 graph layers
      Data type float32 numpy.ndarray
      1440 721 24
    • sst
      (valid_time, latitude, longitude)
      float32
      dask.array<chunksize=(6, 361, 720), meta=np.ndarray>
      GRIB_paramId :
      34
      GRIB_dataType :
      an
      GRIB_numberOfPoints :
      1038240
      GRIB_typeOfLevel :
      surface
      GRIB_stepUnits :
      1
      GRIB_stepType :
      avgua
      GRIB_gridType :
      regular_ll
      GRIB_uvRelativeToGrid :
      0
      GRIB_NV :
      0
      GRIB_Nx :
      1440
      GRIB_Ny :
      721
      GRIB_cfName :
      unknown
      GRIB_cfVarName :
      sst
      GRIB_gridDefinitionDescription :
      Latitude/Longitude Grid
      GRIB_iDirectionIncrementInDegrees :
      0.25
      GRIB_iScansNegatively :
      0
      GRIB_jDirectionIncrementInDegrees :
      0.25
      GRIB_jPointsAreConsecutive :
      0
      GRIB_jScansPositively :
      0
      GRIB_latitudeOfFirstGridPointInDegrees :
      90.0
      GRIB_latitudeOfLastGridPointInDegrees :
      -90.0
      GRIB_longitudeOfFirstGridPointInDegrees :
      0.0
      GRIB_longitudeOfLastGridPointInDegrees :
      359.75
      GRIB_missingValue :
      3.4028234663852886e+38
      GRIB_name :
      Sea surface temperature
      GRIB_shortName :
      sst
      GRIB_units :
      K
      long_name :
      Sea surface temperature
      units :
      K
      standard_name :
      unknown
      GRIB_surface :
      0.0
      Array Chunk
      Bytes 95.05 MiB 5.95 MiB
      Shape (24, 721, 1440) (6, 361, 720)
      Dask graph 16 chunks in 5 graph layers
      Data type float32 numpy.ndarray
      1440 721 24
  • GRIB_centre :
    ecmf
    GRIB_centreDescription :
    European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre :
    0
    Conventions :
    CF-1.7
    institution :
    European Centre for Medium-Range Weather Forecasts
    history :
    2026-03-11T00:20 GRIB to CDM+CF via cfgrib-0.9.15.1/ecCodes-2.42.0 with {"source": "tmp4g3rqpp6/data.grib", "filter_by_keys": {"stream": ["mnth"], "stepType": ["avgua"]}, "encode_cf": ["parameter", "time", "geography", "vertical"]}
  • You can see that this looks just like loading a single file.

    Coordinates

    Let’s have a look at the time coordinate first:

    (You can see that this is a similar notation to pandas)

    ds['valid_time']
    <xarray.DataArray 'valid_time' (valid_time: 24)> Size: 192B
    array(['2000-01-01T00:00:00.000000000', '2000-02-01T00:00:00.000000000',
           '2000-03-01T00:00:00.000000000', '2000-04-01T00:00:00.000000000',
           '2000-05-01T00:00:00.000000000', '2000-06-01T00:00:00.000000000',
           '2000-07-01T00:00:00.000000000', '2000-08-01T00:00:00.000000000',
           '2000-09-01T00:00:00.000000000', '2000-10-01T00:00:00.000000000',
           '2000-11-01T00:00:00.000000000', '2000-12-01T00:00:00.000000000',
           '2001-01-01T00:00:00.000000000', '2001-02-01T00:00:00.000000000',
           '2001-03-01T00:00:00.000000000', '2001-04-01T00:00:00.000000000',
           '2001-05-01T00:00:00.000000000', '2001-06-01T00:00:00.000000000',
           '2001-07-01T00:00:00.000000000', '2001-08-01T00:00:00.000000000',
           '2001-09-01T00:00:00.000000000', '2001-10-01T00:00:00.000000000',
           '2001-11-01T00:00:00.000000000', '2001-12-01T00:00:00.000000000'],
          dtype='datetime64[ns]')
    Coordinates:
      * valid_time  (valid_time) datetime64[ns] 192B 2000-01-01 ... 2001-12-01
        expver      (valid_time) <U4 384B dask.array<chunksize=(12,), meta=np.ndarray>
        number      int64 8B 0
    Attributes:
        long_name:      time
        standard_name:  time

    The array contains numbers 12 datetimes, they represent the months of the year. If we were to look into the ERA5 documentation, we know that these represent the average for each month during 2002.

    The location coordinates are also self-explaining:

    ds.longitude  # This is an alternate notation to ds['longitude] that works in pandas and xarray as long as there are not spaces
    <xarray.DataArray 'longitude' (longitude: 1440)> Size: 12kB
    array([0.0000e+00, 2.5000e-01, 5.0000e-01, ..., 3.5925e+02, 3.5950e+02,
           3.5975e+02], shape=(1440,))
    Coordinates:
      * longitude  (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
        number     int64 8B 0
    Attributes:
        units:          degrees_east
        standard_name:  longitude
        long_name:      longitude
    ds.latitude
    <xarray.DataArray 'latitude' (latitude: 721)> Size: 6kB
    array([ 90.  ,  89.75,  89.5 , ..., -89.5 , -89.75, -90.  ], shape=(721,))
    Coordinates:
      * latitude  (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
        number    int64 8B 0
    Attributes:
        units:             degrees_north
        standard_name:     latitude
        long_name:         latitude
        stored_direction:  decreasing

    Q: what is the spatial resolution of the ERA5 data?

    Variables

    Variables can also be accessed directly from the dataset:

    ds.sst
    <xarray.DataArray 'sst' (valid_time: 24, latitude: 721, longitude: 1440)> Size: 100MB
    dask.array<concatenate, shape=(24, 721, 1440), dtype=float32, chunksize=(6, 361, 720), chunktype=numpy.ndarray>
    Coordinates:
      * valid_time  (valid_time) datetime64[ns] 192B 2000-01-01 ... 2001-12-01
        expver      (valid_time) <U4 384B dask.array<chunksize=(12,), meta=np.ndarray>
      * latitude    (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
      * longitude   (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
        number      int64 8B 0
    Attributes: (12/31)
        GRIB_paramId:                             34
        GRIB_dataType:                            an
        GRIB_numberOfPoints:                      1038240
        GRIB_typeOfLevel:                         surface
        GRIB_stepUnits:                           1
        GRIB_stepType:                            avgua
        ...                                       ...
        GRIB_shortName:                           sst
        GRIB_units:                               K
        long_name:                                Sea surface temperature
        units:                                    K
        standard_name:                            unknown
        GRIB_surface:                             0.0
  • GRIB_paramId :
    34
    GRIB_dataType :
    an
    GRIB_numberOfPoints :
    1038240
    GRIB_typeOfLevel :
    surface
    GRIB_stepUnits :
    1
    GRIB_stepType :
    avgua
    GRIB_gridType :
    regular_ll
    GRIB_uvRelativeToGrid :
    0
    GRIB_NV :
    0
    GRIB_Nx :
    1440
    GRIB_Ny :
    721
    GRIB_cfName :
    unknown
    GRIB_cfVarName :
    sst
    GRIB_gridDefinitionDescription :
    Latitude/Longitude Grid
    GRIB_iDirectionIncrementInDegrees :
    0.25
    GRIB_iScansNegatively :
    0
    GRIB_jDirectionIncrementInDegrees :
    0.25
    GRIB_jPointsAreConsecutive :
    0
    GRIB_jScansPositively :
    0
    GRIB_latitudeOfFirstGridPointInDegrees :
    90.0
    GRIB_latitudeOfLastGridPointInDegrees :
    -90.0
    GRIB_longitudeOfFirstGridPointInDegrees :
    0.0
    GRIB_longitudeOfLastGridPointInDegrees :
    359.75
    GRIB_missingValue :
    3.4028234663852886e+38
    GRIB_name :
    Sea surface temperature
    GRIB_shortName :
    sst
    GRIB_units :
    K
    long_name :
    Sea surface temperature
    units :
    K
    standard_name :
    unknown
    GRIB_surface :
    0.0
  • The attributes of a variable are extremely important, they carry the metadata and must be specified by the data provider. Here we can read in which units the variable is defined, as well as a description of the variable (the “long_name” attribute).

    Q: what other information can we read from this printout? Explore the other data variables and see if you understand all of them. Note: you can expand each variable’s attributes in the html display, or use the method ds.info() to list all vars and attributes.

    # your answer here

    First Plots

    Let’t create a first set of plots. For example, we can quickly produce a map of SST in June.

    To do so, we have to select data based on the time coordinate using the.sel() method. The resulting slice is a 2-D grid with latitude and longitude.

    sst_jun= ds.sst.sel(valid_time="2000-06-01")
    sst_jun
    <xarray.DataArray 'sst' (latitude: 721, longitude: 1440)> Size: 4MB
    dask.array<getitem, shape=(721, 1440), dtype=float32, chunksize=(361, 720), chunktype=numpy.ndarray>
    Coordinates:
      * latitude    (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
      * longitude   (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
        number      int64 8B 0
        valid_time  datetime64[ns] 8B 2000-06-01
        expver      <U4 16B dask.array<chunksize=(), meta=np.ndarray>
    Attributes: (12/31)
        GRIB_paramId:                             34
        GRIB_dataType:                            an
        GRIB_numberOfPoints:                      1038240
        GRIB_typeOfLevel:                         surface
        GRIB_stepUnits:                           1
        GRIB_stepType:                            avgua
        ...                                       ...
        GRIB_shortName:                           sst
        GRIB_units:                               K
        long_name:                                Sea surface temperature
        units:                                    K
        standard_name:                            unknown
        GRIB_surface:                             0.0
  • GRIB_paramId :
    34
    GRIB_dataType :
    an
    GRIB_numberOfPoints :
    1038240
    GRIB_typeOfLevel :
    surface
    GRIB_stepUnits :
    1
    GRIB_stepType :
    avgua
    GRIB_gridType :
    regular_ll
    GRIB_uvRelativeToGrid :
    0
    GRIB_NV :
    0
    GRIB_Nx :
    1440
    GRIB_Ny :
    721
    GRIB_cfName :
    unknown
    GRIB_cfVarName :
    sst
    GRIB_gridDefinitionDescription :
    Latitude/Longitude Grid
    GRIB_iDirectionIncrementInDegrees :
    0.25
    GRIB_iScansNegatively :
    0
    GRIB_jDirectionIncrementInDegrees :
    0.25
    GRIB_jPointsAreConsecutive :
    0
    GRIB_jScansPositively :
    0
    GRIB_latitudeOfFirstGridPointInDegrees :
    90.0
    GRIB_latitudeOfLastGridPointInDegrees :
    -90.0
    GRIB_longitudeOfFirstGridPointInDegrees :
    0.0
    GRIB_longitudeOfLastGridPointInDegrees :
    359.75
    GRIB_missingValue :
    3.4028234663852886e+38
    GRIB_name :
    Sea surface temperature
    GRIB_shortName :
    sst
    GRIB_units :
    K
    long_name :
    Sea surface temperature
    units :
    K
    standard_name :
    unknown
    GRIB_surface :
    0.0
  • We can now make a plot:

    # Define the map projection (how does the map look like)
    ax = plt.axes(projection=ccrs.EqualEarth())     
    # ax is an empty plot. We now plot the variable sst_jan onto ax
    sst_jun.plot(ax=ax, transform=ccrs.PlateCarree()) 
    # the keyword "transform" tells the function in which projection the data is stored 
    ax.coastlines()
    ax.gridlines(); # Add gridlines and coastlines to the plot

    Simple Analysis

    Analysing climate data is extremely easy in Python thanks to the xarray and cartopy libraries. First we are going to compute the time average of the SST over the year:

    Similar to pandas we can perform computations over our data, like taking the average [.mean()]. We just have need to specify the dimension ('valid_time', 'latitude', 'longitude').

    sst_avg = ds.sst.mean(dim='valid_time')

    What is sst_avg by the way?

    sst_avg
    <xarray.DataArray 'sst' (latitude: 721, longitude: 1440)> Size: 4MB
    array([[271.46005, 271.46005, 271.46005, ..., 271.46005, 271.46005,
            271.46005],
           [271.46005, 271.46   , 271.46005, ..., 271.46005, 271.46005,
            271.46005],
           [271.46005, 271.46005, 271.46005, ..., 271.46005, 271.46005,
            271.46005],
           ...,
           [      nan,       nan,       nan, ...,       nan,       nan,
                  nan],
           [      nan,       nan,       nan, ...,       nan,       nan,
                  nan],
           [      nan,       nan,       nan, ...,       nan,       nan,
                  nan]], shape=(721, 1440), dtype=float32)
    Coordinates:
      * latitude   (latitude) float64 6kB 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
      * longitude  (longitude) float64 12kB 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
        number     int64 8B ...
    Attributes: (12/31)
        GRIB_paramId:                             34
        GRIB_dataType:                            an
        GRIB_numberOfPoints:                      1038240
        GRIB_typeOfLevel:                         surface
        GRIB_stepUnits:                           1
        GRIB_stepType:                            avgua
        ...                                       ...
        GRIB_shortName:                           sst
        GRIB_units:                               K
        long_name:                                Sea surface temperature
        units:                                    K
        standard_name:                            unknown
        GRIB_surface:                             0.0

    So sst_avg is a 2-dimensional array of dimensions [latitude, longitude] (note that the time dimension has disapeared).

    When we applied the .mean() function, we added an argument (called a keyword argument): dim='valid_time'. With this argument, we told the function to compute the average over the time dimension.

    Let’s make another plot of this. Give it a try.

    # Plot the mean sst 
    # Try what happens if you don't specify the dimension for your averaging. 
    # Try averaging SST over two dimensions (e.g. latitude and longitude). What kind of plot will this generate? 

    Q: What do you think happened?

    Note: scalar output is quite verbose in xarray… Your can print just the data onto the screen with the .values attribute:

    ds.sst.mean().values

    Q: what should we expect from the folowing commands:

    ds.sst.mean(dim='longitude')
    ds.sst.mean(dim='valid_time').mean(dim='longitude')
    ds.sst.mean(dim=['valid_time', 'longitude'])

    Try them out!

    # Try the commands above. Do they work as expected? 

    E: what is the maximum SST value? And the minimum? (hint)

    # your answer here

    Slicing Data

    Sometimes, we want to focus on a certain region or period within our dataset, which means that we want to look at a slice of the total array. xarray allows us to select data on such a slice. See what is happening below.

    sst_Australia = ds.sst.sel(valid_time='2000-06-01',longitude = slice(105,160),latitude =slice(-8,-45))
    sst_Australia
    <xarray.DataArray 'sst' (latitude: 149, longitude: 221)> Size: 132kB
    array([[301.85425, 301.91772, 301.92065, ..., 302.18433, 302.65405, 302.73608],
           [301.6902 , 301.73413, 301.7605 , ...,       nan,       nan, 302.48218],
           [301.52905, 301.5642 , 301.57983, ..., 302.36206, 302.43237, 302.52417],
           ...,
           [283.31616, 283.29956, 283.2517 , ..., 285.33667, 285.2273 , 285.07886],
           [283.08862, 283.13843, 283.21265, ..., 285.07983, 285.01636, 284.90698],
           [282.7771 , 282.989  , 283.06323, ..., 284.79272, 284.82886, 284.84448]],
          shape=(149, 221), dtype=float32)
    Coordinates:
      * latitude    (latitude) float64 1kB -8.0 -8.25 -8.5 ... -44.5 -44.75 -45.0
      * longitude   (longitude) float64 2kB 105.0 105.2 105.5 ... 159.5 159.8 160.0
        number      int64 8B ...
        valid_time  datetime64[ns] 8B 2000-06-01
        expver      <U4 16B ...
    Attributes: (12/31)
        GRIB_paramId:                             34
        GRIB_dataType:                            an
        GRIB_numberOfPoints:                      1038240
        GRIB_typeOfLevel:                         surface
        GRIB_stepUnits:                           1
        GRIB_stepType:                            avgua
        ...                                       ...
        GRIB_shortName:                           sst
        GRIB_units:                               K
        long_name:                                Sea surface temperature
        units:                                    K
        standard_name:                            unknown
        GRIB_surface:                             0.0

    We can then make another plot. I am selecting a different map projection LambertCylindrical(), which looks better in this case. If you want to learn more about mapping and map projections you can read about them here.

    # Define the map projection (how does the map look like)
    ax = plt.axes(projection=ccrs.LambertCylindrical())  #
    # ax is an empty plot. We now plot the variable sst_jan onto ax
    sst_Australia.plot(ax=ax, transform=ccrs.PlateCarree()) 
    # the keyword "transform" tells the function in which projection the data is stored 
    ax.coastlines(); ax.gridlines(); # Add gridlines and coastlines to the plot