The SeaFlux dataset

The SeaFlux data set allows for the homogenization of air-sea CO\(_2\) flux calculations. To minimize the potential differences that may arise during calculation, we’ve made a product available at 1°⨉1° by monthly resolution for 1988-2018. The product can be accessed and downloaded from

Here, we’ll show you how to download the data with Python and quickly calculate air-sea CO\(_2\) fluxes when you only have \(p\)CO\(_2\) based on the bulk flux formulation:

\[F\text{CO}_2 = K_0 \cdot k_w \cdot (p\text{CO}_2^{sea} - p\text{CO}_2^{atm}) \cdot ice^{free}\]

Where \(K_0\) is the solubility of CO\(_2\) in seawater, \(k_w\) is the gas transfer velocity using the formulation of Wanninkhof (1992) that has a square dependence on wind speed (second moment of the wind), \(p\text{CO}_2^{sea}\) is the oceanic partial pressure of carbon dioxide in the surface ocean, \(p\text{CO}_2^{atm}\) is the atmospheric partial pressure of carbon dioxide in the marine boundary layer, \(ice^{free}\) is the fraction of open ocean in a particular grid cell.

Here we tackle the differences that arise in each of these components. In this notebook, we’ll detail how to implement these corrections using Python.

%pylab inline

# for the rest of the document seaflux is referred to as sf
import pyseaflux as sf
import xarray as xr
Populating the interactive namespace from numpy and matplotlib

pCO2 area coverage

This correction applies to data-based surface \(p\text{CO}_2^{sea}\) products that provide pseudo-global coverage. However, due to various implementations and predictor variables used by each of these data-based products, there is a difference in the coverage of these products. This is true primarily for the coastal ocean and seasonally ice covered regions.

Recent work by Landschützer et al. (2020) provides a monthly \(p\text{CO}_2^{sea}\) climatology that is available for open ocean, coastal ocean, and seasonally ice-covered seas. By scaling this product, SeaFlux is able to provide a data product that can fill \(p\text{CO}_2^{sea}\) for the pseudo-global data-based products.

For details on this process, please see Fay et al. (2021).

Open example data set

I use the CSIR-ML6 product to demonstrate the data filling procedure. I have pre-downloaded the data into the folder ~/Downloads/.

I use xarray.mfdataset to open the file. This allows for a preprocessor function to be passed. The SeaFlux package includes a preprocess function that tries to conform the data to the SeaFlux data.

csir_fname = '~/Downloads/'

preprocessor =
csir = xr.open_mfdataset(csir_fname, preprocess=preprocessor)

csir_spco2 = csir.spco2

Download scaled climatology

The scaled climatology is downloadable directly at Zenodo (
However, by using the SeaFlux tool to download the data you can be sure that you’ll always have the latest version of the data.
The function downloads the data and returns it as an xr.Dataset which provides a useful interface to view the data.

Note that the default download location is set to ~/Downloads/SeaFlux_v2021.01. This will create a folder in your downloads folder. This can be changed to any other folder. In addition to downloading the data, a README.txt file will be created in the given directory alongside a downloading log.

sf_data = sf.get_seaflux_data(verbose=True)
2021-04-11 16:09:08 [DOWNLOAD]  ================================================================================

2021-04-11 16:09:08 [DOWNLOAD]  Start of logging session
2021-04-11 16:09:08 [DOWNLOAD]  --------------------------------------------------------------------------------
2021-04-11 16:09:08 [DOWNLOAD]    5 files at
2021-04-11 16:09:08 [DOWNLOAD]  Files will be saved to /Users/luke/Downloads/SeaFlux_v2021.01
2021-04-11 16:09:08 [DOWNLOAD]  retrieving
2021-04-11 16:09:08 [DOWNLOAD]  retrieving
2021-04-11 16:09:08 [DOWNLOAD]  retrieving
2021-04-11 16:09:08 [DOWNLOAD]  retrieving
2021-04-11 16:09:08 [DOWNLOAD]  retrieving
2021-04-11 16:09:08 [DOWNLOAD]  SUMMARY: Retrieved=5, Failed=0 listing failed below:

Dimensions:            (lat: 180, lon: 360, product: 6, time: 372, wind: 5)
  * lat                (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
  * lon                (lon) float64 -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
  * time               (time) datetime64[ns] 1988-01-15 ... 2018-12-15
  * wind               (wind) object 'CCMP2' 'ERA5' 'JRA55' 'NCEP1' 'NCEP2'
  * product            (product) object 'MPI_SOMFFN' 'JMA_MLR' ... 'CSIR_ML6'
Data variables:
    ice                (time, lat, lon) float32 dask.array<chunksize=(372, 180, 360), meta=np.ndarray>
    kw_scaled          (wind, time, lat, lon) float64 dask.array<chunksize=(5, 372, 180, 360), meta=np.ndarray>
    kw_a_scaling       (wind) float64 dask.array<chunksize=(5,), meta=np.ndarray>
    ice_frac           (time, lat, lon) float32 dask.array<chunksize=(372, 180, 360), meta=np.ndarray>
    pCO2atm            (time, lat, lon) float64 dask.array<chunksize=(372, 180, 360), meta=np.ndarray>
    spco2_filled       (product, time, lat, lon) float64 dask.array<chunksize=(6, 372, 180, 360), meta=np.ndarray>
    spco2_clim_scaled  (time, lat, lon) float64 dask.array<chunksize=(372, 180, 360), meta=np.ndarray>
    scaling_factor     (time) float64 dask.array<chunksize=(372,), meta=np.ndarray>
    product_mask       (product, time, lat, lon) bool dask.array<chunksize=(6, 372, 180, 360), meta=np.ndarray>
    sol_Weiss74        (time, lat, lon) float64 dask.array<chunksize=(372, 180, 360), meta=np.ndarray>

Filling the example data

Rather than creating a whole new way of doing things, SeaFlux leverages xarray functionality. The fillna method will fill missing data with the given input, which is exactly what we have to do! For this to work, the dimensions of both data arrays need to be the same, i.e., the time, lat and lon have to have the same values.

csir_filled = csir_spco2.fillna(sf_data.spco2_clim_scaled)

The figure below shows average pCO2 for the unfilled (CSIR), filler (MPI-ULB-SOMFFN scaled), and filled products (unfilled + filler).



Gas transfer velocity: Here, for the sake of simplicity, we choose to use \(k_w\) that has been calculated with the ERA5 winds only. However, in the product, we have calculated \(k_w\) for the following wind products: CCMP, ERA5, JRA55, NCEP R1/2. The \(k_w\) for this dataset is scaled to a global value of 16.5 cm hr\(^{-1}\) over the period 1988-2017 (30 years). The period is chosen as all wind products overlap.

Solubility: Solubility is calculated using the Weiss (1980) formulation, with the coefficients from the table in Wanninkhof (2014). The other variables used are sea surface temperature and salinity.

Atmospheric pCO2: Atmospheric \(p\text{CO}_2\) for the marine boundary layer is calculated from the NOAAs marine boundary layer \(x\text{CO}_2\) product ( \(x\text{CO}_2 * (P_{atm} - p\text{H}_2\text{O})\). Where \(P_{atm}\) is the ERA5 mean sea level pressure. \(p\text{H}_2\text{O}\) is calculated using vapour pressure from Dickson et al. (2007).

Ice cover and sea surface temperature: We use the NOAA OI.v2 SST monthly fields. These are derived by a linear interpolation of the weekly optimum interpolation (OI) version 2 fields to daily fields then averaging the daily values over a month. The monthly fields are in the same format and spatial resolution as the weekly fields. The ice field shows the approximate monthly average of the ice concentration values input to the SST analysis. Ice concentration is stored as the percentage of area covered. For the ice fields, the land and coast grid cells have been set to the netCDF missing value. (

Salinity: We use the EN4.2.1 salinity for the shallowest level. Specifically, we use the objective analyses data with the Gouretski and Reseghetti (2010) corrections. (

# download flux data - verbosity is False by default
sf_data =

ds = sf_data.sel(wind='ERA5').drop('wind')
# use the SeaFlux function to calculate area
ds['area'] = sf.get_area_from_dataset(ds)

# assigning variables and performing unit transformations
kw = ds.kw_scaled * (24/100)      # cm/hr --> m/d
K0 = ds.sol_Weiss74               # mol/m3/uatm
dpco2 = csir_filled - ds.pCO2atm  # uatm
ice_free = 1 -   # fill the open ocean with zeros
area = ds.area                    # m2
# bulk flux calculation
flux_mol_m2_day = kw * K0 * dpco2 * ice_free
flux_avg_yr = flux_mol_m2_day.mean('time') * 365  # molC/m2/year
flux_integrated = (flux_mol_m2_day * area * 365 * 12.011).sum(dim=['lat', 'lon'])  # gC/year

Unit analysis

The SeaFlux product provides the remaining parameters to calculate fluxes. This means that the calculation can be a simple multiplication. However, units have to be taken care of. Below we show a table of the units for each of the SeaFlux variables used in the bulk flux calculation.


SeaFlux units


Output units

\(\Delta p\text{CO}_2\)




mol m\(^{-3}\) µatm\(^{-1}\)

mol m\(^{-3}\) µatm\(^{-1}\)


cm hr\(^{-1}\)

\(\times \frac{24}{100}\)

m day\(^{-1}\)


ice fraction

\(1 - ice^{conc}\)

ice free fraction






\(K_0 \cdot k_w \cdot \Delta pCO_2 \cdot\) ice

molC m\(^{-2}\) d\(^{-1}\)



molC m\(^{-2}\) d\(^{-1}\)

\(\times\) (m\(^2 \cdot\) 12.01 g mol\(^{-1}\) \(\cdot\) 365 d yr\(^{-1}\))

gC yr\(^{-1}\)

The figure below shows the output for the fluxes calculated using the SeaFlux data. Notice that the ice covered regions, particularly the Arctic, have low air-sea CO\(_2\) fluxes. The output has been multiplied by 365 days/year to convert the flux to \(mol C\ m^{-1}\, yr^{-2}\).

The bottom figure shows the globally integrated air-sea CO\(_2\) fluxes. Here, the fluxes have been multiplied by the area (\(m^2\)) and converted to \(gC\ yr^{-1}\) (using the conversion shown in the table above).

b18afb6b76c742a58ff9f6f5ef1f0307 a1ff486425384e04b96e27c668bb111d