Accessing ICESat-2 Data#

Learning Objectives

  • Use icepyx to search, download, and read ICESat-2 granules

  • Use sliderule to get GeoDataFrames of ICESat-2 data

  • Use h5coro to directly read ICESat-2 granules in an S3 bucket

Part 1: icepyx#

icepyx logo

icepyx is a community and software library for searching, downloading, and reading ICESat-2 data. While opening data should be straightforward, there are some oddities in navigating the highly nested organization and hundreds of variables of the ICESat-2 data. icepyx provides tools to help with those oddities.

icepyx was started and initially developed by Jessica Scheick to provide easy programmatic access to ICESat-2 data (before earthaccess existed!) and facilitate collaborative development around ICESat-2 data products, including training, skill building, and support around practicing open science and contributing to open-source software. Thanks to contributions from countless community members, icepyx can (for ICESat-2 data):

  • search for available data granules (data files)

  • order and download data or access it directly in the cloud

  • order a subset of data: clipped in space, time, containing fewer variables, or a few other options provided by NSIDC

  • search through the available ICESat-2 data variables

  • read ICESat-2 data into xarray DataArrays, including merging data from multiple files

Under the hood, icepyx relies on earthaccess to help handle authentication, especially for obtaining S3 tokens to access ICESat-2 data in the cloud. All this happens without the user needing to take any action other than supplying their Earthdata Login credentials using one of the methods described in the earthaccess tutorial.

Credit#

This part of the notebook is based on an icepyx Tutorial originally created by Rachel Wegener, Univ. Maryland and updated by Amy Steiker, NSIDC, and Jessica Scheick, Univ. of New Hampshire. It was updated in May 2024 to utilize (at a minimum) v1.0.0 of icepyx.

For the original notebook, which includes additional examples and information, see: https://book.cryointhecloud.com/tutorials/NASA-Earthdata-Cloud-Access/4.icepyx.html

For more information#

GitHub: icesat2py/icepyx
Documentation: https://icepyx.readthedocs.io/en/latest/

Prerequisites#

  • An Earth Data Login account.

  • A .netrc file, that contains your Earthdata Login credentials, in your home directory.

import icepyx as ipx
import json
import math
import warnings

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

from shapely.geometry import shape, GeometryCollection

Example 1: Search and Download ATL08 Granule#

# Open a geojson of our area of interest
with open("./grandmesa.geojson") as f:
    features = json.load(f)["features"]

grandmesa = GeometryCollection([shape(feature["geometry"]).buffer(0) for feature in features])
grandmesa
../../_images/96658b4af6980e09dd18e951371434b6e7b7a6cb67ac6ea1483d24063eb6ab13.svg
# Use our search parameters to setup a search Query
short_name = 'ATL08'
spatial_extent = list(grandmesa.bounds)
date_range = ['2019-12-01','2019-12-12']
region = ipx.Query(short_name, spatial_extent, date_range)
# Display if any data files, or granules, matched our search
region.avail_granules(ids=True)
[['ATL08_20191211143520_11560506_006_01.h5']]
# We can also get the S3 urls
print(region.avail_granules(ids=True, cloud=True))
s3urls = region.avail_granules(ids=True, cloud=True)[1]
[['ATL08_20191211143520_11560506_006_01.h5'], ['s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/006/2019/12/11/ATL08_20191211143520_11560506_006_01.h5']]
# Download the granules to a into a folder called 'bosque_primavera_ATL08'
region.download_granules('/tmp/grandmesa_ATL08')
Total number of data order requests is  1  for  1  granules.
Data request  1  of  1  is submitting to NSIDC
order ID:  5000005727224
Initial status of your order request at NSIDC is:  processing
Your order status is still  processing  at NSIDC. Please continue waiting... this may take a few moments.
Your order is: complete
Beginning download of zipped output...
Data request 5000005727224 of  1  order(s) is downloaded.
Download complete

Example 2: Reading a Granule with icepyx#

To read a file with icepyx there are several steps:

  1. Create a Read object. This sets up an initial connection to your file(s) and validates the metadata.

  2. Tell the Read object what variables you would like to read

  3. Load your data!

Create a Read object#

# access the file you've downloaded
reader = ipx.Read('/tmp/grandmesa_ATL08')
reader
<icepyx.core.read.Read at 0x7f1eb3fb7050>

Explore your variables#

reader.vars.avail()
Hide code cell output
['ancillary_data/atlas_sdp_gps_epoch',
 'ancillary_data/control',
 'ancillary_data/data_end_utc',
 'ancillary_data/data_start_utc',
 'ancillary_data/end_cycle',
 'ancillary_data/end_delta_time',
 'ancillary_data/end_geoseg',
 'ancillary_data/end_gpssow',
 'ancillary_data/end_gpsweek',
 'ancillary_data/end_orbit',
 'ancillary_data/end_region',
 'ancillary_data/end_rgt',
 'ancillary_data/granule_end_utc',
 'ancillary_data/granule_start_utc',
 'ancillary_data/land/atl08_region',
 'ancillary_data/land/bin_size_h',
 'ancillary_data/land/bin_size_n',
 'ancillary_data/land/bright_thresh',
 'ancillary_data/land/ca_class',
 'ancillary_data/land/can_noise_thresh',
 'ancillary_data/land/can_stat_thresh',
 'ancillary_data/land/canopy20m_thresh',
 'ancillary_data/land/canopy_flag_switch',
 'ancillary_data/land/canopy_seg',
 'ancillary_data/land/class_thresh',
 'ancillary_data/land/cloud_filter_switch',
 'ancillary_data/land/del_amp',
 'ancillary_data/land/del_mu',
 'ancillary_data/land/del_sigma',
 'ancillary_data/land/dem_filter_switch',
 'ancillary_data/land/dem_removal_percent_limit',
 'ancillary_data/land/dragann_switch',
 'ancillary_data/land/dseg',
 'ancillary_data/land/dseg_buf',
 'ancillary_data/land/fnlgnd_filter_switch',
 'ancillary_data/land/gnd_stat_thresh',
 'ancillary_data/land/gthresh_factor',
 'ancillary_data/land/h_canopy_perc',
 'ancillary_data/land/iter_gnd',
 'ancillary_data/land/iter_max',
 'ancillary_data/land/lseg',
 'ancillary_data/land/lseg_buf',
 'ancillary_data/land/lw_filt_bnd',
 'ancillary_data/land/lw_gnd_bnd',
 'ancillary_data/land/lw_toc_bnd',
 'ancillary_data/land/lw_toc_cut',
 'ancillary_data/land/max_atl03files',
 'ancillary_data/land/max_atl09files',
 'ancillary_data/land/max_peaks',
 'ancillary_data/land/max_try',
 'ancillary_data/land/min_nphs',
 'ancillary_data/land/n_dec_mode',
 'ancillary_data/land/night_thresh',
 'ancillary_data/land/noise_class',
 'ancillary_data/land/outlier_filter_switch',
 'ancillary_data/land/p_static',
 'ancillary_data/land/ph_removal_percent_limit',
 'ancillary_data/land/proc_geoseg',
 'ancillary_data/land/psf',
 'ancillary_data/land/ref_dem_limit',
 'ancillary_data/land/ref_finalground_limit',
 'ancillary_data/land/relief_hbot',
 'ancillary_data/land/relief_htop',
 'ancillary_data/land/shp_param',
 'ancillary_data/land/sig_rsq_search',
 'ancillary_data/land/sseg',
 'ancillary_data/land/stat20m_thresh',
 'ancillary_data/land/stat_thresh',
 'ancillary_data/land/tc_thresh',
 'ancillary_data/land/te_class',
 'ancillary_data/land/terrain20m_thresh',
 'ancillary_data/land/toc_class',
 'ancillary_data/land/up_filt_bnd',
 'ancillary_data/land/up_gnd_bnd',
 'ancillary_data/land/up_toc_bnd',
 'ancillary_data/land/up_toc_cut',
 'ancillary_data/land/yapc_switch',
 'ancillary_data/qa_at_interval',
 'ancillary_data/release',
 'ancillary_data/start_cycle',
 'ancillary_data/start_delta_time',
 'ancillary_data/start_geoseg',
 'ancillary_data/start_gpssow',
 'ancillary_data/start_gpsweek',
 'ancillary_data/start_orbit',
 'ancillary_data/start_region',
 'ancillary_data/start_rgt',
 'ancillary_data/version',
 'ds_geosegments',
 'ds_metrics',
 'ds_surf_type',
 'gt1l/land_segments/asr',
 'gt1l/land_segments/atlas_pa',
 'gt1l/land_segments/beam_azimuth',
 'gt1l/land_segments/beam_coelev',
 'gt1l/land_segments/brightness_flag',
 'gt1l/land_segments/canopy/can_noise',
 'gt1l/land_segments/canopy/canopy_h_metrics',
 'gt1l/land_segments/canopy/canopy_h_metrics_abs',
 'gt1l/land_segments/canopy/canopy_openness',
 'gt1l/land_segments/canopy/canopy_rh_conf',
 'gt1l/land_segments/canopy/centroid_height',
 'gt1l/land_segments/canopy/h_canopy',
 'gt1l/land_segments/canopy/h_canopy_20m',
 'gt1l/land_segments/canopy/h_canopy_abs',
 'gt1l/land_segments/canopy/h_canopy_quad',
 'gt1l/land_segments/canopy/h_canopy_uncertainty',
 'gt1l/land_segments/canopy/h_dif_canopy',
 'gt1l/land_segments/canopy/h_max_canopy',
 'gt1l/land_segments/canopy/h_max_canopy_abs',
 'gt1l/land_segments/canopy/h_mean_canopy',
 'gt1l/land_segments/canopy/h_mean_canopy_abs',
 'gt1l/land_segments/canopy/h_median_canopy',
 'gt1l/land_segments/canopy/h_median_canopy_abs',
 'gt1l/land_segments/canopy/h_min_canopy',
 'gt1l/land_segments/canopy/h_min_canopy_abs',
 'gt1l/land_segments/canopy/n_ca_photons',
 'gt1l/land_segments/canopy/n_toc_photons',
 'gt1l/land_segments/canopy/photon_rate_can',
 'gt1l/land_segments/canopy/photon_rate_can_nr',
 'gt1l/land_segments/canopy/segment_cover',
 'gt1l/land_segments/canopy/subset_can_flag',
 'gt1l/land_segments/canopy/toc_roughness',
 'gt1l/land_segments/cloud_flag_atm',
 'gt1l/land_segments/cloud_fold_flag',
 'gt1l/land_segments/delta_time',
 'gt1l/land_segments/delta_time_beg',
 'gt1l/land_segments/delta_time_end',
 'gt1l/land_segments/dem_flag',
 'gt1l/land_segments/dem_h',
 'gt1l/land_segments/dem_removal_flag',
 'gt1l/land_segments/h_dif_ref',
 'gt1l/land_segments/last_seg_extend',
 'gt1l/land_segments/latitude',
 'gt1l/land_segments/latitude_20m',
 'gt1l/land_segments/layer_flag',
 'gt1l/land_segments/longitude',
 'gt1l/land_segments/longitude_20m',
 'gt1l/land_segments/msw_flag',
 'gt1l/land_segments/n_seg_ph',
 'gt1l/land_segments/night_flag',
 'gt1l/land_segments/ph_ndx_beg',
 'gt1l/land_segments/ph_removal_flag',
 'gt1l/land_segments/psf_flag',
 'gt1l/land_segments/rgt',
 'gt1l/land_segments/sat_flag',
 'gt1l/land_segments/segment_id_beg',
 'gt1l/land_segments/segment_id_end',
 'gt1l/land_segments/segment_landcover',
 'gt1l/land_segments/segment_snowcover',
 'gt1l/land_segments/segment_watermask',
 'gt1l/land_segments/sigma_across',
 'gt1l/land_segments/sigma_along',
 'gt1l/land_segments/sigma_atlas_land',
 'gt1l/land_segments/sigma_h',
 'gt1l/land_segments/sigma_topo',
 'gt1l/land_segments/snr',
 'gt1l/land_segments/solar_azimuth',
 'gt1l/land_segments/solar_elevation',
 'gt1l/land_segments/surf_type',
 'gt1l/land_segments/terrain/h_te_best_fit',
 'gt1l/land_segments/terrain/h_te_best_fit_20m',
 'gt1l/land_segments/terrain/h_te_interp',
 'gt1l/land_segments/terrain/h_te_max',
 'gt1l/land_segments/terrain/h_te_mean',
 'gt1l/land_segments/terrain/h_te_median',
 'gt1l/land_segments/terrain/h_te_min',
 'gt1l/land_segments/terrain/h_te_mode',
 'gt1l/land_segments/terrain/h_te_rh25',
 'gt1l/land_segments/terrain/h_te_skew',
 'gt1l/land_segments/terrain/h_te_std',
 'gt1l/land_segments/terrain/h_te_uncertainty',
 'gt1l/land_segments/terrain/n_te_photons',
 'gt1l/land_segments/terrain/photon_rate_te',
 'gt1l/land_segments/terrain/subset_te_flag',
 'gt1l/land_segments/terrain/terrain_slope',
 'gt1l/land_segments/terrain_flg',
 'gt1l/land_segments/urban_flag',
 'gt1l/signal_photons/classed_pc_flag',
 'gt1l/signal_photons/classed_pc_indx',
 'gt1l/signal_photons/d_flag',
 'gt1l/signal_photons/delta_time',
 'gt1l/signal_photons/ph_h',
 'gt1l/signal_photons/ph_segment_id',
 'gt1r/land_segments/asr',
 'gt1r/land_segments/atlas_pa',
 'gt1r/land_segments/beam_azimuth',
 'gt1r/land_segments/beam_coelev',
 'gt1r/land_segments/brightness_flag',
 'gt1r/land_segments/canopy/can_noise',
 'gt1r/land_segments/canopy/canopy_h_metrics',
 'gt1r/land_segments/canopy/canopy_h_metrics_abs',
 'gt1r/land_segments/canopy/canopy_openness',
 'gt1r/land_segments/canopy/canopy_rh_conf',
 'gt1r/land_segments/canopy/centroid_height',
 'gt1r/land_segments/canopy/h_canopy',
 'gt1r/land_segments/canopy/h_canopy_20m',
 'gt1r/land_segments/canopy/h_canopy_abs',
 'gt1r/land_segments/canopy/h_canopy_quad',
 'gt1r/land_segments/canopy/h_canopy_uncertainty',
 'gt1r/land_segments/canopy/h_dif_canopy',
 'gt1r/land_segments/canopy/h_max_canopy',
 'gt1r/land_segments/canopy/h_max_canopy_abs',
 'gt1r/land_segments/canopy/h_mean_canopy',
 'gt1r/land_segments/canopy/h_mean_canopy_abs',
 'gt1r/land_segments/canopy/h_median_canopy',
 'gt1r/land_segments/canopy/h_median_canopy_abs',
 'gt1r/land_segments/canopy/h_min_canopy',
 'gt1r/land_segments/canopy/h_min_canopy_abs',
 'gt1r/land_segments/canopy/n_ca_photons',
 'gt1r/land_segments/canopy/n_toc_photons',
 'gt1r/land_segments/canopy/photon_rate_can',
 'gt1r/land_segments/canopy/photon_rate_can_nr',
 'gt1r/land_segments/canopy/segment_cover',
 'gt1r/land_segments/canopy/subset_can_flag',
 'gt1r/land_segments/canopy/toc_roughness',
 'gt1r/land_segments/cloud_flag_atm',
 'gt1r/land_segments/cloud_fold_flag',
 'gt1r/land_segments/delta_time',
 'gt1r/land_segments/delta_time_beg',
 'gt1r/land_segments/delta_time_end',
 'gt1r/land_segments/dem_flag',
 'gt1r/land_segments/dem_h',
 'gt1r/land_segments/dem_removal_flag',
 'gt1r/land_segments/h_dif_ref',
 'gt1r/land_segments/last_seg_extend',
 'gt1r/land_segments/latitude',
 'gt1r/land_segments/latitude_20m',
 'gt1r/land_segments/layer_flag',
 'gt1r/land_segments/longitude',
 'gt1r/land_segments/longitude_20m',
 'gt1r/land_segments/msw_flag',
 'gt1r/land_segments/n_seg_ph',
 'gt1r/land_segments/night_flag',
 'gt1r/land_segments/ph_ndx_beg',
 'gt1r/land_segments/ph_removal_flag',
 'gt1r/land_segments/psf_flag',
 'gt1r/land_segments/rgt',
 'gt1r/land_segments/sat_flag',
 'gt1r/land_segments/segment_id_beg',
 'gt1r/land_segments/segment_id_end',
 'gt1r/land_segments/segment_landcover',
 'gt1r/land_segments/segment_snowcover',
 'gt1r/land_segments/segment_watermask',
 'gt1r/land_segments/sigma_across',
 'gt1r/land_segments/sigma_along',
 'gt1r/land_segments/sigma_atlas_land',
 'gt1r/land_segments/sigma_h',
 'gt1r/land_segments/sigma_topo',
 'gt1r/land_segments/snr',
 'gt1r/land_segments/solar_azimuth',
 'gt1r/land_segments/solar_elevation',
 'gt1r/land_segments/surf_type',
 'gt1r/land_segments/terrain/h_te_best_fit',
 'gt1r/land_segments/terrain/h_te_best_fit_20m',
 'gt1r/land_segments/terrain/h_te_interp',
 'gt1r/land_segments/terrain/h_te_max',
 'gt1r/land_segments/terrain/h_te_mean',
 'gt1r/land_segments/terrain/h_te_median',
 'gt1r/land_segments/terrain/h_te_min',
 'gt1r/land_segments/terrain/h_te_mode',
 'gt1r/land_segments/terrain/h_te_rh25',
 'gt1r/land_segments/terrain/h_te_skew',
 'gt1r/land_segments/terrain/h_te_std',
 'gt1r/land_segments/terrain/h_te_uncertainty',
 'gt1r/land_segments/terrain/n_te_photons',
 'gt1r/land_segments/terrain/photon_rate_te',
 'gt1r/land_segments/terrain/subset_te_flag',
 'gt1r/land_segments/terrain/terrain_slope',
 'gt1r/land_segments/terrain_flg',
 'gt1r/land_segments/urban_flag',
 'gt1r/signal_photons/classed_pc_flag',
 'gt1r/signal_photons/classed_pc_indx',
 'gt1r/signal_photons/d_flag',
 'gt1r/signal_photons/delta_time',
 'gt1r/signal_photons/ph_h',
 'gt1r/signal_photons/ph_segment_id',
 'gt2l/land_segments/asr',
 'gt2l/land_segments/atlas_pa',
 'gt2l/land_segments/beam_azimuth',
 'gt2l/land_segments/beam_coelev',
 'gt2l/land_segments/brightness_flag',
 'gt2l/land_segments/canopy/can_noise',
 'gt2l/land_segments/canopy/canopy_h_metrics',
 'gt2l/land_segments/canopy/canopy_h_metrics_abs',
 'gt2l/land_segments/canopy/canopy_openness',
 'gt2l/land_segments/canopy/canopy_rh_conf',
 'gt2l/land_segments/canopy/centroid_height',
 'gt2l/land_segments/canopy/h_canopy',
 'gt2l/land_segments/canopy/h_canopy_20m',
 'gt2l/land_segments/canopy/h_canopy_abs',
 'gt2l/land_segments/canopy/h_canopy_quad',
 'gt2l/land_segments/canopy/h_canopy_uncertainty',
 'gt2l/land_segments/canopy/h_dif_canopy',
 'gt2l/land_segments/canopy/h_max_canopy',
 'gt2l/land_segments/canopy/h_max_canopy_abs',
 'gt2l/land_segments/canopy/h_mean_canopy',
 'gt2l/land_segments/canopy/h_mean_canopy_abs',
 'gt2l/land_segments/canopy/h_median_canopy',
 'gt2l/land_segments/canopy/h_median_canopy_abs',
 'gt2l/land_segments/canopy/h_min_canopy',
 'gt2l/land_segments/canopy/h_min_canopy_abs',
 'gt2l/land_segments/canopy/n_ca_photons',
 'gt2l/land_segments/canopy/n_toc_photons',
 'gt2l/land_segments/canopy/photon_rate_can',
 'gt2l/land_segments/canopy/photon_rate_can_nr',
 'gt2l/land_segments/canopy/segment_cover',
 'gt2l/land_segments/canopy/subset_can_flag',
 'gt2l/land_segments/canopy/toc_roughness',
 'gt2l/land_segments/cloud_flag_atm',
 'gt2l/land_segments/cloud_fold_flag',
 'gt2l/land_segments/delta_time',
 'gt2l/land_segments/delta_time_beg',
 'gt2l/land_segments/delta_time_end',
 'gt2l/land_segments/dem_flag',
 'gt2l/land_segments/dem_h',
 'gt2l/land_segments/dem_removal_flag',
 'gt2l/land_segments/h_dif_ref',
 'gt2l/land_segments/last_seg_extend',
 'gt2l/land_segments/latitude',
 'gt2l/land_segments/latitude_20m',
 'gt2l/land_segments/layer_flag',
 'gt2l/land_segments/longitude',
 'gt2l/land_segments/longitude_20m',
 'gt2l/land_segments/msw_flag',
 'gt2l/land_segments/n_seg_ph',
 'gt2l/land_segments/night_flag',
 'gt2l/land_segments/ph_ndx_beg',
 'gt2l/land_segments/ph_removal_flag',
 'gt2l/land_segments/psf_flag',
 'gt2l/land_segments/rgt',
 'gt2l/land_segments/sat_flag',
 'gt2l/land_segments/segment_id_beg',
 'gt2l/land_segments/segment_id_end',
 'gt2l/land_segments/segment_landcover',
 'gt2l/land_segments/segment_snowcover',
 'gt2l/land_segments/segment_watermask',
 'gt2l/land_segments/sigma_across',
 'gt2l/land_segments/sigma_along',
 'gt2l/land_segments/sigma_atlas_land',
 'gt2l/land_segments/sigma_h',
 'gt2l/land_segments/sigma_topo',
 'gt2l/land_segments/snr',
 'gt2l/land_segments/solar_azimuth',
 'gt2l/land_segments/solar_elevation',
 'gt2l/land_segments/surf_type',
 'gt2l/land_segments/terrain/h_te_best_fit',
 'gt2l/land_segments/terrain/h_te_best_fit_20m',
 'gt2l/land_segments/terrain/h_te_interp',
 'gt2l/land_segments/terrain/h_te_max',
 'gt2l/land_segments/terrain/h_te_mean',
 'gt2l/land_segments/terrain/h_te_median',
 'gt2l/land_segments/terrain/h_te_min',
 'gt2l/land_segments/terrain/h_te_mode',
 'gt2l/land_segments/terrain/h_te_rh25',
 'gt2l/land_segments/terrain/h_te_skew',
 'gt2l/land_segments/terrain/h_te_std',
 'gt2l/land_segments/terrain/h_te_uncertainty',
 'gt2l/land_segments/terrain/n_te_photons',
 'gt2l/land_segments/terrain/photon_rate_te',
 'gt2l/land_segments/terrain/subset_te_flag',
 'gt2l/land_segments/terrain/terrain_slope',
 'gt2l/land_segments/terrain_flg',
 'gt2l/land_segments/urban_flag',
 'gt2l/signal_photons/classed_pc_flag',
 'gt2l/signal_photons/classed_pc_indx',
 'gt2l/signal_photons/d_flag',
 'gt2l/signal_photons/delta_time',
 'gt2l/signal_photons/ph_h',
 'gt2l/signal_photons/ph_segment_id',
 'gt2r/land_segments/asr',
 'gt2r/land_segments/atlas_pa',
 'gt2r/land_segments/beam_azimuth',
 'gt2r/land_segments/beam_coelev',
 'gt2r/land_segments/brightness_flag',
 'gt2r/land_segments/canopy/can_noise',
 'gt2r/land_segments/canopy/canopy_h_metrics',
 'gt2r/land_segments/canopy/canopy_h_metrics_abs',
 'gt2r/land_segments/canopy/canopy_openness',
 'gt2r/land_segments/canopy/canopy_rh_conf',
 'gt2r/land_segments/canopy/centroid_height',
 'gt2r/land_segments/canopy/h_canopy',
 'gt2r/land_segments/canopy/h_canopy_20m',
 'gt2r/land_segments/canopy/h_canopy_abs',
 'gt2r/land_segments/canopy/h_canopy_quad',
 'gt2r/land_segments/canopy/h_canopy_uncertainty',
 'gt2r/land_segments/canopy/h_dif_canopy',
 'gt2r/land_segments/canopy/h_max_canopy',
 'gt2r/land_segments/canopy/h_max_canopy_abs',
 'gt2r/land_segments/canopy/h_mean_canopy',
 'gt2r/land_segments/canopy/h_mean_canopy_abs',
 'gt2r/land_segments/canopy/h_median_canopy',
 'gt2r/land_segments/canopy/h_median_canopy_abs',
 'gt2r/land_segments/canopy/h_min_canopy',
 'gt2r/land_segments/canopy/h_min_canopy_abs',
 'gt2r/land_segments/canopy/n_ca_photons',
 'gt2r/land_segments/canopy/n_toc_photons',
 'gt2r/land_segments/canopy/photon_rate_can',
 'gt2r/land_segments/canopy/photon_rate_can_nr',
 'gt2r/land_segments/canopy/segment_cover',
 'gt2r/land_segments/canopy/subset_can_flag',
 'gt2r/land_segments/canopy/toc_roughness',
 'gt2r/land_segments/cloud_flag_atm',
 'gt2r/land_segments/cloud_fold_flag',
 'gt2r/land_segments/delta_time',
 'gt2r/land_segments/delta_time_beg',
 'gt2r/land_segments/delta_time_end',
 'gt2r/land_segments/dem_flag',
 'gt2r/land_segments/dem_h',
 'gt2r/land_segments/dem_removal_flag',
 'gt2r/land_segments/h_dif_ref',
 'gt2r/land_segments/last_seg_extend',
 'gt2r/land_segments/latitude',
 'gt2r/land_segments/latitude_20m',
 'gt2r/land_segments/layer_flag',
 'gt2r/land_segments/longitude',
 'gt2r/land_segments/longitude_20m',
 'gt2r/land_segments/msw_flag',
 'gt2r/land_segments/n_seg_ph',
 'gt2r/land_segments/night_flag',
 'gt2r/land_segments/ph_ndx_beg',
 'gt2r/land_segments/ph_removal_flag',
 'gt2r/land_segments/psf_flag',
 'gt2r/land_segments/rgt',
 'gt2r/land_segments/sat_flag',
 'gt2r/land_segments/segment_id_beg',
 'gt2r/land_segments/segment_id_end',
 'gt2r/land_segments/segment_landcover',
 'gt2r/land_segments/segment_snowcover',
 'gt2r/land_segments/segment_watermask',
 'gt2r/land_segments/sigma_across',
 'gt2r/land_segments/sigma_along',
 'gt2r/land_segments/sigma_atlas_land',
 'gt2r/land_segments/sigma_h',
 'gt2r/land_segments/sigma_topo',
 'gt2r/land_segments/snr',
 'gt2r/land_segments/solar_azimuth',
 'gt2r/land_segments/solar_elevation',
 'gt2r/land_segments/surf_type',
 'gt2r/land_segments/terrain/h_te_best_fit',
 'gt2r/land_segments/terrain/h_te_best_fit_20m',
 'gt2r/land_segments/terrain/h_te_interp',
 'gt2r/land_segments/terrain/h_te_max',
 'gt2r/land_segments/terrain/h_te_mean',
 'gt2r/land_segments/terrain/h_te_median',
 'gt2r/land_segments/terrain/h_te_min',
 'gt2r/land_segments/terrain/h_te_mode',
 'gt2r/land_segments/terrain/h_te_rh25',
 'gt2r/land_segments/terrain/h_te_skew',
 'gt2r/land_segments/terrain/h_te_std',
 'gt2r/land_segments/terrain/h_te_uncertainty',
 'gt2r/land_segments/terrain/n_te_photons',
 'gt2r/land_segments/terrain/photon_rate_te',
 'gt2r/land_segments/terrain/subset_te_flag',
 'gt2r/land_segments/terrain/terrain_slope',
 'gt2r/land_segments/terrain_flg',
 'gt2r/land_segments/urban_flag',
 'gt2r/signal_photons/classed_pc_flag',
 'gt2r/signal_photons/classed_pc_indx',
 'gt2r/signal_photons/d_flag',
 'gt2r/signal_photons/delta_time',
 'gt2r/signal_photons/ph_h',
 'gt2r/signal_photons/ph_segment_id',
 'gt3l/land_segments/asr',
 'gt3l/land_segments/atlas_pa',
 'gt3l/land_segments/beam_azimuth',
 'gt3l/land_segments/beam_coelev',
 'gt3l/land_segments/brightness_flag',
 'gt3l/land_segments/canopy/can_noise',
 'gt3l/land_segments/canopy/canopy_h_metrics',
 'gt3l/land_segments/canopy/canopy_h_metrics_abs',
 'gt3l/land_segments/canopy/canopy_openness',
 'gt3l/land_segments/canopy/canopy_rh_conf',
 'gt3l/land_segments/canopy/centroid_height',
 'gt3l/land_segments/canopy/h_canopy',
 'gt3l/land_segments/canopy/h_canopy_20m',
 'gt3l/land_segments/canopy/h_canopy_abs',
 'gt3l/land_segments/canopy/h_canopy_quad',
 'gt3l/land_segments/canopy/h_canopy_uncertainty',
 'gt3l/land_segments/canopy/h_dif_canopy',
 'gt3l/land_segments/canopy/h_max_canopy',
 'gt3l/land_segments/canopy/h_max_canopy_abs',
 'gt3l/land_segments/canopy/h_mean_canopy',
 'gt3l/land_segments/canopy/h_mean_canopy_abs',
 'gt3l/land_segments/canopy/h_median_canopy',
 'gt3l/land_segments/canopy/h_median_canopy_abs',
 'gt3l/land_segments/canopy/h_min_canopy',
 'gt3l/land_segments/canopy/h_min_canopy_abs',
 'gt3l/land_segments/canopy/n_ca_photons',
 'gt3l/land_segments/canopy/n_toc_photons',
 'gt3l/land_segments/canopy/photon_rate_can',
 'gt3l/land_segments/canopy/photon_rate_can_nr',
 'gt3l/land_segments/canopy/segment_cover',
 'gt3l/land_segments/canopy/subset_can_flag',
 'gt3l/land_segments/canopy/toc_roughness',
 'gt3l/land_segments/cloud_flag_atm',
 'gt3l/land_segments/cloud_fold_flag',
 'gt3l/land_segments/delta_time',
 'gt3l/land_segments/delta_time_beg',
 'gt3l/land_segments/delta_time_end',
 'gt3l/land_segments/dem_flag',
 'gt3l/land_segments/dem_h',
 'gt3l/land_segments/dem_removal_flag',
 'gt3l/land_segments/h_dif_ref',
 'gt3l/land_segments/last_seg_extend',
 'gt3l/land_segments/latitude',
 'gt3l/land_segments/latitude_20m',
 'gt3l/land_segments/layer_flag',
 'gt3l/land_segments/longitude',
 'gt3l/land_segments/longitude_20m',
 'gt3l/land_segments/msw_flag',
 'gt3l/land_segments/n_seg_ph',
 'gt3l/land_segments/night_flag',
 'gt3l/land_segments/ph_ndx_beg',
 'gt3l/land_segments/ph_removal_flag',
 'gt3l/land_segments/psf_flag',
 'gt3l/land_segments/rgt',
 'gt3l/land_segments/sat_flag',
 'gt3l/land_segments/segment_id_beg',
 'gt3l/land_segments/segment_id_end',
 'gt3l/land_segments/segment_landcover',
 'gt3l/land_segments/segment_snowcover',
 'gt3l/land_segments/segment_watermask',
 'gt3l/land_segments/sigma_across',
 'gt3l/land_segments/sigma_along',
 'gt3l/land_segments/sigma_atlas_land',
 'gt3l/land_segments/sigma_h',
 'gt3l/land_segments/sigma_topo',
 'gt3l/land_segments/snr',
 'gt3l/land_segments/solar_azimuth',
 'gt3l/land_segments/solar_elevation',
 'gt3l/land_segments/surf_type',
 'gt3l/land_segments/terrain/h_te_best_fit',
 'gt3l/land_segments/terrain/h_te_best_fit_20m',
 'gt3l/land_segments/terrain/h_te_interp',
 'gt3l/land_segments/terrain/h_te_max',
 'gt3l/land_segments/terrain/h_te_mean',
 'gt3l/land_segments/terrain/h_te_median',
 'gt3l/land_segments/terrain/h_te_min',
 'gt3l/land_segments/terrain/h_te_mode',
 'gt3l/land_segments/terrain/h_te_rh25',
 'gt3l/land_segments/terrain/h_te_skew',
 'gt3l/land_segments/terrain/h_te_std',
 'gt3l/land_segments/terrain/h_te_uncertainty',
 'gt3l/land_segments/terrain/n_te_photons',
 'gt3l/land_segments/terrain/photon_rate_te',
 'gt3l/land_segments/terrain/subset_te_flag',
 'gt3l/land_segments/terrain/terrain_slope',
 'gt3l/land_segments/terrain_flg',
 'gt3l/land_segments/urban_flag',
 'gt3l/signal_photons/classed_pc_flag',
 'gt3l/signal_photons/classed_pc_indx',
 'gt3l/signal_photons/d_flag',
 'gt3l/signal_photons/delta_time',
 'gt3l/signal_photons/ph_h',
 'gt3l/signal_photons/ph_segment_id',
 'gt3r/land_segments/asr',
 'gt3r/land_segments/atlas_pa',
 'gt3r/land_segments/beam_azimuth',
 'gt3r/land_segments/beam_coelev',
 'gt3r/land_segments/brightness_flag',
 'gt3r/land_segments/canopy/can_noise',
 'gt3r/land_segments/canopy/canopy_h_metrics',
 'gt3r/land_segments/canopy/canopy_h_metrics_abs',
 'gt3r/land_segments/canopy/canopy_openness',
 'gt3r/land_segments/canopy/canopy_rh_conf',
 'gt3r/land_segments/canopy/centroid_height',
 'gt3r/land_segments/canopy/h_canopy',
 'gt3r/land_segments/canopy/h_canopy_20m',
 'gt3r/land_segments/canopy/h_canopy_abs',
 'gt3r/land_segments/canopy/h_canopy_quad',
 'gt3r/land_segments/canopy/h_canopy_uncertainty',
 'gt3r/land_segments/canopy/h_dif_canopy',
 'gt3r/land_segments/canopy/h_max_canopy',
 'gt3r/land_segments/canopy/h_max_canopy_abs',
 'gt3r/land_segments/canopy/h_mean_canopy',
 'gt3r/land_segments/canopy/h_mean_canopy_abs',
 'gt3r/land_segments/canopy/h_median_canopy',
 'gt3r/land_segments/canopy/h_median_canopy_abs',
 'gt3r/land_segments/canopy/h_min_canopy',
 'gt3r/land_segments/canopy/h_min_canopy_abs',
 'gt3r/land_segments/canopy/n_ca_photons',
 'gt3r/land_segments/canopy/n_toc_photons',
 'gt3r/land_segments/canopy/photon_rate_can',
 'gt3r/land_segments/canopy/photon_rate_can_nr',
 'gt3r/land_segments/canopy/segment_cover',
 'gt3r/land_segments/canopy/subset_can_flag',
 'gt3r/land_segments/canopy/toc_roughness',
 'gt3r/land_segments/cloud_flag_atm',
 'gt3r/land_segments/cloud_fold_flag',
 'gt3r/land_segments/delta_time',
 'gt3r/land_segments/delta_time_beg',
 'gt3r/land_segments/delta_time_end',
 'gt3r/land_segments/dem_flag',
 'gt3r/land_segments/dem_h',
 'gt3r/land_segments/dem_removal_flag',
 'gt3r/land_segments/h_dif_ref',
 'gt3r/land_segments/last_seg_extend',
 'gt3r/land_segments/latitude',
 'gt3r/land_segments/latitude_20m',
 'gt3r/land_segments/layer_flag',
 'gt3r/land_segments/longitude',
 'gt3r/land_segments/longitude_20m',
 'gt3r/land_segments/msw_flag',
 'gt3r/land_segments/n_seg_ph',
 'gt3r/land_segments/night_flag',
 'gt3r/land_segments/ph_ndx_beg',
 'gt3r/land_segments/ph_removal_flag',
 'gt3r/land_segments/psf_flag',
 'gt3r/land_segments/rgt',
 'gt3r/land_segments/sat_flag',
 'gt3r/land_segments/segment_id_beg',
 'gt3r/land_segments/segment_id_end',
 'gt3r/land_segments/segment_landcover',
 'gt3r/land_segments/segment_snowcover',
 'gt3r/land_segments/segment_watermask',
 'gt3r/land_segments/sigma_across',
 'gt3r/land_segments/sigma_along',
 'gt3r/land_segments/sigma_atlas_land',
 'gt3r/land_segments/sigma_h',
 'gt3r/land_segments/sigma_topo',
 'gt3r/land_segments/snr',
 'gt3r/land_segments/solar_azimuth',
 'gt3r/land_segments/solar_elevation',
 'gt3r/land_segments/surf_type',
 'gt3r/land_segments/terrain/h_te_best_fit',
 'gt3r/land_segments/terrain/h_te_best_fit_20m',
 'gt3r/land_segments/terrain/h_te_interp',
 'gt3r/land_segments/terrain/h_te_max',
 'gt3r/land_segments/terrain/h_te_mean',
 'gt3r/land_segments/terrain/h_te_median',
 'gt3r/land_segments/terrain/h_te_min',
 'gt3r/land_segments/terrain/h_te_mode',
 'gt3r/land_segments/terrain/h_te_rh25',
 'gt3r/land_segments/terrain/h_te_skew',
 'gt3r/land_segments/terrain/h_te_std',
 'gt3r/land_segments/terrain/h_te_uncertainty',
 'gt3r/land_segments/terrain/n_te_photons',
 'gt3r/land_segments/terrain/photon_rate_te',
 'gt3r/land_segments/terrain/subset_te_flag',
 'gt3r/land_segments/terrain/terrain_slope',
 'gt3r/land_segments/terrain_flg',
 'gt3r/land_segments/urban_flag',
 'gt3r/signal_photons/classed_pc_flag',
 'gt3r/signal_photons/classed_pc_indx',
 'gt3r/signal_photons/d_flag',
 'gt3r/signal_photons/delta_time',
 'gt3r/signal_photons/ph_h',
 'gt3r/signal_photons/ph_segment_id',
 'orbit_info/bounding_polygon_lat1',
 'orbit_info/bounding_polygon_lon1',
 'orbit_info/crossing_time',
 'orbit_info/cycle_number',
 'orbit_info/lan',
 'orbit_info/orbit_number',
 'orbit_info/rgt',
 'orbit_info/sc_orient',
 'orbit_info/sc_orient_time',
 'quality_assessment/qa_granule_fail_reason',
 'quality_assessment/qa_granule_pass_fail']

Thats a lot of variables!

One key feature of icepyx is the ability to browse the variables available in the dataset. There are typically hundreds of variables in a single dataset, so that is a lot to sort through! Let’s take a moment to get oriented to the organization of ATL08 variables, by first a few important pieces of the algorithm.

  1. To create higher level variables like canopy or terrain height, the ATL08 algorithms goes through a series of steps:

  2. Identify signal photons from noise photons

  3. Classify each of the signal photons as either terrain, canopy, or canopy top

  4. Remove elevation, so the heights are with respect to the ground

  5. Group the signal photons into 100m segments. If there are a sufficient number of photons in that group, calculate statistics for terrain and canopy (ex. mean height, max height, standard deviation, etc.)

ATL08 Photon Classification Example

Fig. 4. An example of the classified photons produced from the ATL08 algorithm. Ground photons (red dots) are labeled as all photons falling within a point spread function distance of the estimated ground surface. The top of canopy photons (green dots) are photons that fall within a buffer distance from the upper canopy surface, and the photons that lie between the top of canopy surface and ground surface are labeled as canopy photons (blue dots). (Neuenschwander & Pitts, 2019)

ATL08 Structure

Load your variables#

reader.vars.append(var_list=['h_canopy', 'latitude', 'longitude'])
ds = reader.load()
ds
<xarray.Dataset> Size: 171kB
Dimensions:              (gran_idx: 1, photon_idx: 1852, spot: 6)
Coordinates:
  * gran_idx             (gran_idx) float64 8B 1.156e+05
  * photon_idx           (photon_idx) int64 15kB 0 1 2 3 ... 1848 1849 1850 1851
  * spot                 (spot) uint8 6B 1 2 3 4 5 6
    source_file          (gran_idx) <U70 280B '/tmp/grandmesa_ATL08/processed...
    delta_time           (photon_idx) datetime64[ns] 15kB 2019-12-11T14:40:39...
Data variables:
    sc_orient            (gran_idx) int8 1B 1
    cycle_number         (gran_idx) int8 1B 5
    rgt                  (gran_idx, spot, photon_idx) float32 44kB nan ... nan
    atlas_sdp_gps_epoch  (gran_idx) datetime64[ns] 8B 2018-01-01T00:00:18
    data_start_utc       (gran_idx) datetime64[ns] 8B 2019-12-11T14:35:19.988979
    data_end_utc         (gran_idx) datetime64[ns] 8B 2019-12-11T14:43:50.730291
    latitude             (spot, gran_idx, photon_idx) float32 44kB nan ... nan
    longitude            (spot, gran_idx, photon_idx) float32 44kB nan ... nan
    gt                   (gran_idx, spot) object 48B 'gt3r' 'gt3l' ... 'gt1l'
    h_canopy             (photon_idx) float32 7kB 6.826 8.899 ... 15.78 31.38
Attributes:
    data_product:  ATL08
    Description:   Contains data categorized as land at 100 meter intervals.
    data_rate:     Data are stored as aggregates of 100 meters.
ds.plot.scatter(x="longitude", y="latitude", hue="h_canopy")
<matplotlib.collections.PathCollection at 0x7f1eb1061e50>
../../_images/2591dec95300c511190bf4683c77c6575aa65c2ba34a1172d62bc99ba7672523.png

Example 3: Reading a granule with h5py?#

import h5py
import numpy as np
f = h5py.File("/tmp/grandmesa_ATL08/processed_ATL08_20191211143520_11560506_006_01.h5", mode='r')
f["/"].keys()
<KeysViewHDF5 ['METADATA', 'ancillary_data', 'ds_geosegments', 'ds_metrics', 'ds_surf_type', 'gt1l', 'gt1r', 'gt2l', 'gt2r', 'gt3l', 'gt3r', 'orbit_info', 'quality_assessment']>
h_canopy = np.array(f["/gt1l/land_segments/canopy/h_canopy"])
h_canopy
Hide code cell output
array([6.82592773e+00, 8.89868164e+00, 5.43261719e+00, 4.63061523e+00,
       4.54296875e+00, 5.24804688e+00, 4.41186523e+00, 3.89916992e+00,
       9.48120117e+00, 1.07646484e+01, 2.37392578e+01, 1.00693359e+01,
       3.86791992e+00, 1.35986328e+00, 6.63916016e+00, 6.70996094e+00,
       6.27221680e+00, 3.40282347e+38, 5.68359375e+00, 3.40282347e+38,
       1.24887695e+01, 8.06347656e+00, 7.88720703e+00, 3.40747070e+00,
       3.18945312e+00, 2.81152344e+00, 9.56201172e+00, 1.50603027e+01,
       7.54663086e+00, 1.22834473e+01, 9.54809570e+00, 4.61621094e+00,
       3.53637695e+00, 3.76904297e+00, 7.18701172e+00, 1.60043945e+01,
       1.96469727e+01, 3.40282347e+38, 2.13913574e+01, 1.94482422e+01,
       1.48708496e+01, 1.69958496e+01, 1.52304688e+01, 1.79970703e+01,
       1.14118652e+01, 9.28051758e+00, 7.92846680e+00, 1.42802734e+01,
       1.34736328e+01, 3.66406250e+00, 3.72558594e+00, 1.34113770e+01,
       1.81997070e+01, 6.85571289e+00, 7.87841797e+00, 2.78906250e+00,
       3.04589844e+00, 5.70214844e+00, 2.24389648e+00, 4.83789062e+00,
       8.34985352e+00, 1.06958008e+01, 2.31843262e+01, 2.94582520e+01,
       2.58112793e+01, 2.17145996e+01, 1.85002441e+01, 1.64414062e+01,
       1.54970703e+01, 1.10722656e+01, 1.29038086e+01, 3.40282347e+38,
       1.31000977e+01, 1.50708008e+01, 1.09299316e+01, 2.11633301e+01,
       1.93339844e+01, 1.94951172e+01, 2.04709473e+01, 2.24191895e+01,
       2.33315430e+01, 1.86020508e+01, 1.41843262e+01, 1.59016113e+01,
       6.48730469e+00, 3.40282347e+38, 3.37695312e+00, 2.93798828e+00,
       3.09497070e+00, 2.54467773e+00, 3.11889648e+00, 3.60913086e+00,
       5.75415039e+00, 3.56518555e+00, 1.16108398e+01, 9.29272461e+00,
       2.53686523e+00, 6.21337891e+00, 4.37084961e+00, 1.81105957e+01,
       1.57587891e+01, 8.64038086e+00, 2.61328125e+00, 4.00317383e+00,
       6.99169922e+00, 2.76660156e+00, 4.61401367e+00, 4.38916016e+00,
       2.79565430e+00, 3.03881836e+00, 3.55322266e+00, 5.37475586e+00,
       3.75170898e+00, 1.90629883e+01, 1.36696777e+01, 1.68891602e+01,
       1.16149902e+01, 3.40282347e+38, 2.11640625e+01, 2.49699707e+01,
       2.40285645e+01, 3.40282347e+38, 3.40282347e+38, 2.71843262e+01,
       2.46640625e+01, 2.46933594e+01, 2.64416504e+01, 2.84414062e+01,
       1.87243652e+01, 1.20036621e+01, 1.25017090e+01, 9.19995117e+00,
       4.33105469e+00, 5.97875977e+00, 1.38188477e+01, 5.21801758e+00,
       6.54418945e+00, 6.48632812e+00, 4.31347656e+00, 3.22729492e+00,
       5.57275391e+00, 1.82556152e+01, 1.83542480e+01, 7.46899414e+00,
       2.15588379e+01, 2.01042480e+01, 1.99946289e+01, 1.31286621e+01,
       7.18554688e+00, 5.58422852e+00, 2.84724121e+01, 1.86489258e+01,
       2.08405762e+01, 8.15869141e+00, 6.35327148e+00, 3.90673828e+00,
       5.31713867e+00, 8.25903320e+00, 8.90551758e+00, 2.17082520e+01,
       2.61423340e+01, 2.84587402e+01, 2.55974121e+01, 2.72182617e+01,
       2.36420898e+01, 4.55541992e+00, 3.40282347e+38, 9.37011719e+00,
       1.40803223e+01, 1.57014160e+01, 1.59594727e+01, 3.40282347e+38,
       2.20192871e+01, 6.75805664e+00, 4.81054688e+00, 3.48828125e+00,
       3.68432617e+00, 3.29418945e+00, 4.44946289e+00, 2.75512695e+00,
       9.43627930e+00, 1.07866211e+01, 4.53369141e+00, 3.40282347e+38,
       1.00212402e+01, 1.74226074e+01, 1.19001465e+01, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 1.67258301e+01, 3.40282347e+38,
       1.42563477e+01, 1.44150391e+01, 1.20014648e+01, 3.40282347e+38,
       9.43774414e+00, 9.73803711e+00, 3.40282347e+38, 1.24138184e+01,
       1.20412598e+01, 3.40282347e+38, 1.15126953e+01, 6.58203125e+00,
       8.82128906e+00, 3.22558594e+00, 1.13186035e+01, 6.20532227e+00,
       2.64965820e+00, 9.68017578e+00, 7.39550781e+00, 1.27314453e+01,
       1.74250488e+01, 1.58566895e+01, 2.07294922e+01, 8.82788086e+00,
       8.45214844e+00, 1.06098633e+01, 5.94067383e+00, 7.09985352e+00,
       2.08530273e+01, 1.72912598e+01, 1.64267578e+01, 1.67080078e+01,
       1.84438477e+01, 2.04689941e+01, 2.22739258e+01, 5.44067383e+00,
       3.40282347e+38, 3.40282347e+38, 3.79223633e+00, 7.29321289e+00,
       4.28979492e+00, 9.01245117e+00, 6.86083984e+00, 9.46655273e+00,
       5.01586914e+00, 9.97583008e+00, 5.12207031e+00, 5.49365234e+00,
       8.35400391e+00, 3.40282347e+38, 3.40282347e+38, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 3.40282347e+38, 3.40282347e+38,
       2.58269043e+01, 3.40282347e+38, 3.40282347e+38, 2.18022461e+01,
       5.05737305e+00, 3.40282347e+38, 2.26381836e+01, 2.05205078e+01,
       2.48500977e+01, 1.47624512e+01, 2.56430664e+01, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 3.40282347e+38, 3.40282347e+38,
       3.40282347e+38, 2.75573730e+01, 3.40282347e+38, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 3.40282347e+38, 2.09458008e+01,
       1.48725586e+01, 1.52705078e+01, 1.58483887e+01, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 3.40282347e+38, 3.40282347e+38,
       5.34301758e+00, 1.09184570e+01, 3.40282347e+38, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 1.49362793e+01, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 3.40282347e+38, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 3.40282347e+38, 3.40282347e+38,
       1.65438232e+01, 1.66979980e+01, 3.40282347e+38, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 3.40282347e+38, 3.40282347e+38,
       3.40282347e+38, 3.40282347e+38, 3.40282347e+38, 3.40282347e+38],
      dtype=float32)

Part 2: SlideRule#

SlideRule is a collaborative effort between NASA Goddard Space Flight Center (GSFC) and the University of Washington, funded by the ICESat-2 program. It provides on-demand science data processing service for ICESat-2 and GEDI data that runs on Amazon Web Services (AWS) and responds to REST-like API calls to process and return science results. This science-data-as-a-service model is a new way for researchers to work and analyze data, enabling them to have low-latency access to custom-generated, high-level data products.

SlideRule users provide specific parameters at the time of the request to compute products that fit their science needs. SlideRule then uses cloud-optimized versions of computational algorithms and a scalable cluster of EC2 instances to process data efficiently. All data is then returned to the user as a geopandas GeoDataFrame.

SlideRule Overview

For more information#

Website: https://slideruleearth.io
Documentation: https://slideruleearth.io/web/rtd/
GitHub: SlideRuleEarth/sliderule
Examples: SlideRuleEarth/sliderule-python
Contact: support@mail.slideruleearth.io

# To use the latest version of the sliderule client, run this cell.
# It will install the sliderule Python client into your current conda environment.
# You will then need to restart your kernel to have the changes take effect.
%pip install --quiet "sliderule>=4.6"
Note: you may need to restart the kernel to use updated packages.

Example 1: Just Get Me Some Data#

# (1) Import the client
from sliderule import sliderule, icesat2
# (2) Initialize the client
sliderule.init("slideruleearth.io");
# (3) Define an area of interest
region = sliderule.toregion("grandmesa.geojson");
# (4) Specify the processing parameters
parms = {
    "poly": region["poly"],
    "srt": icesat2.SRT_LAND,
    "len": 20.0,
    "res": 100.0
}
# (5) Make the processing request
gdf = icesat2.atl06p(parms)

Display the results#

gdf
region h_sigma rms_misfit spot pflags rgt y_atc w_surface_window_final gt x_atc h_mean segment_id dh_fit_dx cycle n_fit_photons geometry
time
2018-10-16 10:49:21.763047168 6 0.059213 0.414242 3 0 272 41194.648438 3.146032e+00 40 15710061.0 1797.827692 784344 0.121020 1 49 POINT (-108.09813 39.15732)
2018-10-16 10:49:21.773934080 6 0.171685 0.679926 6 0 272 44567.667969 3.917953e+00 10 15712286.0 2205.110892 784455 0.151762 1 17 POINT (-108.06191 39.13431)
2018-10-16 10:49:21.893560064 6 0.530147 1.702927 6 0 272 44545.421875 1.265432e+01 10 15713088.0 2260.114661 784495 0.182269 1 11 POINT (-108.0631 39.12714)
2018-10-16 10:49:22.009937920 6 0.035914 0.287156 1 0 272 37942.792969 1.717312e+01 60 15711985.0 1813.438733 784440 0.059123 1 64 POINT (-108.13779 39.14302)
2018-10-16 10:49:22.116894976 6 0.000000 0.000000 6 1 272 44486.722656 3.000000e+01 10 15714591.0 2549.139896 784570 -0.791383 1 23 POINT (-108.06554 39.11371)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-05-05 21:42:08.529739776 2 6.008234 308.324646 1 0 737 -1859.211670 3.588987e+20 10 4352867.5 1948.284281 217054 2.667867 23 2689 POINT (-108.09843 39.12156)
2024-05-05 21:42:08.594035200 2 5.008744 204.956665 4 0 737 1433.140137 3.588987e+20 40 4355633.5 1780.562661 217192 -0.860279 23 1675 POINT (-108.13939 39.14353)
2024-05-05 21:42:08.608098560 2 0.000000 0.000000 4 1 737 1433.046631 1.914062e+01 40 4355734.0 1777.159349 217197 -0.750229 23 1698 POINT (-108.1395 39.14443)
2024-05-05 21:42:08.664648704 2 4.936912 201.764343 4 0 737 1432.968018 3.588987e+20 40 4356135.0 2167.969007 217217 -1.484943 23 1671 POINT (-108.13994 39.14802)
2024-05-05 21:42:08.881896704 2 4.585921 200.818405 1 0 737 -1858.465698 3.588987e+20 10 4355373.0 1866.818607 217179 1.150830 23 1918 POINT (-108.1012 39.14401)

15904 rows × 16 columns

Plot the results#

import matplotlib.pyplot as plt
region_lon = [e["lon"] for e in region["poly"]]
region_lat = [e["lat"] for e in region["poly"]]
f, ax = plt.subplots()
ax.set_title("ATL06-SR Points")
ax.set_aspect('equal')
gdf.plot(ax=ax, column='h_mean', cmap='inferno', s=0.1)
ax.plot(region_lon, region_lat, linewidth=1, color='g');
plt.show()
../../_images/a538d610cda7b82eb91d762ad7639a587aae106478e5d60716c3cba1564503a3.png

Explanation of what happened#

(1) Import the client#

from sliderule import sliderule, icesat2

The SlideRule Python client is broken up into different modules:

  • sliderule: core general functionality

  • icesat2: ICESat-2 on-demand, subsetting, raster sampling products

  • gedi: GEDI subsetting, and raster sampling products

  • h5: direct HDF5 data access

  • earthdata: CMR, CMR-STAC, TNM helper functions (use earthaccess instead)

  • io: reading and writing results to/from local files

  • ipysliderule: toolbox for building SlideRule interfaces in a Jupyter notebook

(2) Initialize the client#

sliderule.init("slideruleearth.io");

Configure the client settings:

  • url: address of sliderule service (default = “slideruleearth.io”)

  • verbose: display messages from server (default = False)

  • loglevel: criticality of log messages to display (default = logging.INFO)

  • organization: selection of cluster, used for private clusters (default = “sliderule”)

  • desired_nodes: number of nodes to run in a private cluster (default = None)

  • time_to_live: how long to deploy a private cluster (default = 60 minutes)

  • bypass_dns: query the provisioning system for IP address and don’t use DNS lookup hostname (default = False)

  • plugins: check if plugin is present (default = [])

  • trust_env: use netrc file for authentication (default = False)

  • log_handler: attach handler to client logging (default = None)

  • rethrow: immediately rethrow any caught exception inside of the client (default = None)

(3) Define an area of interest#

region = sliderule.toregion("grandmesa.geojson");

SlideRule uses an area of interest for determining which dataset resources to process and to then subset those resources to provide data only inside the area of interest. The sliderule.toregion function converts multiple input types into a format understood by SlideRule. The inputs types supported are: geojson, shapefile, GeoDataFrame, list of coordinates, and a dictionary of coordinates.

The resources (e.g. granules) to process can always be supplied in any of the processing APIs. But if they are not supplied (which is typical), then to determine which resources to process, the SlideRule server-side code uses the area of interest to make requests to NASA’s Common Metadata Repository (CMR) legacy and STAC interfaces, along with USGS’s The National Map interface. The server code automatically determines which interfaces should be queried and the parameters of the query needed for properly filtering results.

In rare cases when the area of interest is very complex (e.g. a bunch of islands, or an extremely high vertice-count polygon), then the user can request the server to rasterize the area of interest and use it as a mask for determining which data to process. See https://slideruleearth.io/web/rtd/user_guide/SlideRule.html#geojson for more details.

(4) Specify the processing parameters#

parms = {
    "poly": region["poly"],
    "srt": icesat2.SRT_LAND,
    "len": 20.0,
    "res": 100.0
}

There is a multitude of processing parameters that are available to each API. The ones used here are:

  • poly: area of interest

  • srt: surface reference type; if set to -1 (or icesat2.DYNAMIC), then all surface types are used

  • len: length of the extent (or variable-length segment) of along-track photon clouds to use in processing each posting

  • res: the step size between postings

See user’s guide for additional parameters: https://slideruleearth.io/web/rtd/index.html

(5) Make the processing request#

gdf = icesat2.atl06p(parms)

Under-the-hood this makes an HTTP request to the SlideRule service running in AWS to perform the ATL06 surface-finding algorithm on ATL03 photons to produce an elevation, and then collects the results into a pandas GeoDataFrame.

The different ICESat-2 APIs available are:

  • atl03sp: subset and filter ATL03 photons; provide custom YAPC and ATL08 classifications

  • atl03v: fast segment level subsetting of ATL03 photons

  • atl06s: subset the ATL06 land elevation product

  • atl06p: dynamically generate ATL06 surface elevation product

  • atl08p: dynamically generate the ATL08 vegetation density product (PhoREAL)

  • atl13p: subset the ATL13 coastal water product

Example 2: Sample GEDI Elevation Product at ICESat-2 Dynamically Generated Postings#

from sliderule import sliderule, icesat2, gedi
sliderule.init("slideruleearth.io", verbose=True);
Setting URL to slideruleearth.io
Login status to slideruleearth.io/sliderule: failure
parms = {
    "poly": sliderule.toregion('grandmesa.geojson')['poly'],
    "t0": '2019-11-14T00:00:00Z',
    "t1": '2019-11-15T00:00:00Z',
    "srt": icesat2.SRT_LAND,
    "len": 100,
    "res": 100,
    "pass_invalid": False, 
    "atl08_class": ["atl08_ground", "atl08_canopy", "atl08_top_of_canopy"],
    "atl08_fields": ["h_dif_ref"],
    "phoreal": {"binsize": 1.0, "geoloc": "center", "use_abs_h": False, "send_waveform": False},
    "samples": {"gedi": {"asset": "gedil3-elevation"}}
};
atl08 = icesat2.atl08p(parms)
request <AppServer.10153> retrieved 1 resources from CMR
proxy request <AppServer.10153> querying resources for gedi
proxy request <AppServer.10153> returned 0 resources for gedi
Starting proxy for atl08 to process 1 resource(s) with 1 thread(s)
request <AppServer.10171> processing initialized on ATL03_20191114034331_07370502_006_01.h5 ...
Successfully completed processing resource [1 out of 1]: ATL03_20191114034331_07370502_006_01.h5
atl08
gt h_min_canopy rgt veg_ph_count landcover x_atc h_mean_canopy segment_id h_te_median canopy_h_metrics ... gnd_ph_count h_max_canopy cycle canopy_openness geometry h_dif_ref gedi.value gedi.file_id gedi.flags gedi.time
time
2019-11-14 03:46:36.935118336 10 0.501465 737 27 30 -6.882999e+10 1.156670 215507 1958.305542 (1.479736328125, 1.479736328125, 1.47973632812... ... 75 2.261719 5 0.443189 POINT (-108.12262 38.83912) 0.963623 1777.066040 0 0 1.326586e+12
2019-11-14 03:46:36.949218304 10 0.515503 737 59 30 -6.882593e+10 1.304013 215512 1964.416748 (1.4913330078125, 1.4913330078125, 1.491333007... ... 54 3.137817 5 0.636816 POINT (-108.12272 38.84002) -3.216064 1925.270142 0 0 1.326586e+12
2019-11-14 03:46:36.963318272 10 0.515259 737 54 30 -6.882186e+10 1.834195 215517 1976.178833 (1.5015869140625, 1.5015869140625, 1.501586914... ... 49 4.442627 5 0.892432 POINT (-108.12283 38.84092) -5.043213 1925.270142 0 0 1.326586e+12
2019-11-14 03:46:36.977417984 10 0.804443 737 68 30 -6.881773e+10 2.465483 215522 1991.423218 (1.7940673828125, 1.7940673828125, 1.794067382... ... 35 6.167480 5 1.023235 POINT (-108.12295 38.84182) 2.426270 1925.270142 0 0 1.326586e+12
2019-11-14 03:46:36.980918016 30 0.508423 737 95 20 -6.927786e+10 2.816030 215529 1822.414673 (1.5048828125, 1.5048828125, 1.5048828125, 1.5... ... 19 7.769043 5 1.389679 POINT (-108.08651 38.84583) 5.912720 1779.992554 0 0 1.326586e+12
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2019-11-14 03:46:42.283918336 60 0.520142 737 38 40 -6.851285e+10 2.928200 217296 1718.457764 (1.51220703125, 1.51220703125, 1.51220703125, ... ... 310 7.713501 5 2.058075 POINT (-108.08783 39.16624) -0.977173 1781.542358 0 0 1.326586e+12
2019-11-14 03:46:42.293068288 20 0.504028 737 130 30 -6.805046e+10 3.273977 217288 1786.887939 (1.5029296875, 1.5029296875, 1.5029296875, 1.5... ... 192 8.225342 5 2.291607 POINT (-108.16125 39.1593) -3.731567 1797.069336 0 0 1.326586e+12
2019-11-14 03:46:42.293818368 40 0.000000 737 0 40 -6.828026e+10 NaN 217294 1709.912354 (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ... ... 272 0.000000 5 NaN POINT (-108.12461 39.16313) -0.086914 1720.139282 0 0 1.326586e+12
2019-11-14 03:46:42.295218176 60 0.548218 737 11 40 -6.851079e+10 4.089300 217301 1716.896362 (0.8890380859375, 0.8890380859375, 0.889038085... ... 191 10.701416 5 3.587460 POINT (-108.08792 39.16696) -2.204712 1781.542358 0 0 1.326586e+12
2019-11-14 03:46:42.304318208 20 0.511963 737 13 20 -6.804797e+10 0.724375 217293 1784.938721 (1.41064453125, 1.41064453125, 1.41064453125, ... ... 161 1.410645 5 0.226650 POINT (-108.16134 39.16002) -1.957886 1797.069336 0 0 1.326586e+12

2110 rows × 25 columns

Plot the results#

import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=[8,6])

d0=np.min(atl08['x_atc'])

plt.plot(atl08['x_atc']-d0, atl08['h_te_median'], 'o',  markersize=1, color='green', label='h_mean_canopy')
plt.plot(atl08['x_atc']-d0, atl08['gedi.value'], 'o',  markersize=1, color='gray', label='gedi elevation')
hl=plt.legend(loc=3, frameon=False, markerscale=5)

plt.gca().set_ylim([1500, 3500])
(1500.0, 3500.0)
plt.show()
../../_images/63278f39e03470f3ce78a87255f76a28272cfaaf85d3e6abc996a0df157d7e85.png

Explanation of what’s new#

  • The on-demand ATL08 product (different than the ICESat-2 Standard Data Product) was generated and streamed back to the user. The ATL08 on-demand product uses University of Texas at Austin’s PhoREAL algorithm which was integrated into SlideRule to generate customizable vegetation metrics using ATL03 photon data.

  • A time range was specified in the request limiting the results to data collected only between the start and stop times supplied.

  • The "atl08_class" parameter specified that only photons in ATL03 that were classified as "atl08_ground", "atl08_canopy", or "atl08_top_of_canopy" in the ATL08 standard data product are to be supplied to the PhoREAL algorithm and used in the results.

  • The "atl08_fields" parameter specifies that the "h_dif_ref" variable from the ATL08 standard data product is to be associated with each result returned by SlideRule. SlideRule attempts to find the value of the variable closest in time to the dynamically generated result.

  • The "phoreal" parameter provides the processing parameters for the PhoREAL algorithm.

  • The "samples" parameter provides a list of raster datasets that SlideRule should sample at each generated result. So for each 100m segment that PhoREAL processes, the server-side code will also sample the gedil3-elevation product at the latitude and longitude of that segment and return the value with the results.

For a list of raster datasets that are available to sample in SlideRule, see: https://slideruleearth.io/web/rtd/user_guide/GeoRaster.html#asset-directory

Example 3: Produce GeoParquet of Coastal Photons#

from sliderule import sliderule, icesat2
import geopandas as gpd
sliderule.init(verbose=True)
Setting URL to slideruleearth.io
Login status to slideruleearth.io/sliderule: failure
True
region = sliderule.toregion("bathy.geojson");
# ATL03 subsetting request parameters
parms = {
    "poly": region['poly'],
    "srt": icesat2.SRT_DYNAMIC,
    "len": 100,
    "res": 100,
    "pass_invalid": True,
    "output": {
        "asset":"sliderule-stage",
        "format": "parquet",
        "as_geo": True,
        "open_on_complete": False
    }    
}
atl03_url = icesat2.atl03sp(parms, resources=['ATL03_20230213042035_08341807_006_02.h5'])
Starting proxy for atl03s to process 1 resource(s) with 1 thread(s)
request <AppServer.10175> processing initialized on ATL03_20230213042035_08341807_006_02.h5 ...
request <AppServer.10175> processing of ATL03_20230213042035_08341807_006_02.h5 complete (366784/0/0)
Initiated upload of results to S3, bucket = sliderule-public, key = sliderule.00000015394A5FCC.geoparquet
Upload to S3 completed, bucket = sliderule-public, key = sliderule.00000015394A5FCC.geoparquet, size = 10133778
atl03_url
's3://sliderule-public/sliderule.00000015394A5FCC.geoparquet'
# Recent issues with pandas and geopandas have made direct reads temperamental
# atl03 = gpd.pd.read_parquet(atl03_url)

import boto3
atl03_url_tokens = atl03_url.split('/')
s3_client = boto3.client('s3')
s3_client.download_file(atl03_url_tokens[2], atl03_url_tokens[3], "/tmp/" + atl03_url_tokens[3])

atl03 = gpd.read_parquet("/tmp/" + atl03_url_tokens[3])
atl03.keys()
Index(['extent_id', 'x_atc', 'landcover', 'y_atc', 'atl03_cnf', 'atl08_class',
       'snowcover', 'quality_ph', 'yapc_score', 'relief', 'height', 'cycle',
       'pair', 'sc_orient', 'rgt', 'track', 'background_rate', 'segment_id',
       'segment_dist', 'solar_elevation', 'region', 'geometry'],
      dtype='object')

Plot the results#

import matplotlib.pyplot as plt
import numpy as pd
df = atl03
df = df[df["pair"] == icesat2.LEFT_PAIR]
df = df[df["track"] == 3]
plt.figure(figsize=[8,6])
plt.plot(df['x_atc']+df['segment_dist'], df['height'], 'o',  markersize=1, color='blue')
[<matplotlib.lines.Line2D at 0x7f1e9f79b4d0>]
plt.show()
../../_images/5e1c20b1273cad6f9e4abfe69b29a4efc1068ddbad0c432904ab8057a4617b4b.png

Part 3: H5Coro - The HDF5 Cloud-Optimized Read-Only Python Package#

h5coro is a pure Python implementation of a subset of the HDF5 specification that has been optimized for reading data out of S3.

The project has its roots in SlideRule, where a new C++ implementation of the HDF5 specification was developed for performant read access to Earth science datasets stored in AWS S3. Over time, user’s of SlideRule began requesting the ability to performantly read HDF5 and NetCDF files out of S3 from their own Python scripts. The result is h5coro: the re-implementation in Python of the core HDF5 reading logic that exists in SlideRule. Since then, h5coro has become its own project, which will continue to grow and diverge in functionality from its parent implementation.

h5coro is optimized for reading HDF5 data in high-latency high-throughput environments. It accomplishes this through a few key design decisions:

  • All reads are concurrent. Each dataset and/or attribute read by h5coro is performed in its own thread.

  • Intelligent range gets are used to read as many dataset chunks as possible in each read operation. This drastically reduces the number of HTTP requests to S3 and means there is no longer a need to re-chunk the data (it actually works better on smaller chunk sizes due to the granularity of the request).

  • Block caching is used to minimize the number of GET requests made to S3. S3 has a large first-byte latency (we’ve measured it at ~60ms on our systems), which means there is a large penalty for each read operation performed. h5coro performs all reads to S3 as large block reads and then maintains data in a local cache for access to smaller amounts of data within those blocks.

  • The system is serverless and does not depend on any external services to read the data. This means it scales naturally as the user application scales, and it reduces overall system complexity.

  • No metadata repository is needed. The structure of the file are cached as they are read so that successive reads to other datasets in the same file will not have to re-read and re-build the directory structure of the file.

For more information:#

GitHub: SlideRuleEarth/h5coro

# To use the latest version of the sliderule client, run this cell.
# It will install the sliderule Python client into your current conda environment.
# You will then need to restart your kernel to have the changes take effect.
%pip install --quiet "h5coro>=0.0.7"
Note: you may need to restart the kernel to use updated packages.

Example 1: Read ATL03 variables for bathymetry#

# (1) Import modules
from h5coro import h5coro, s3driver
import earthaccess
# (2) Authenticate to Earth Data Login
auth = earthaccess.login()
s3_creds = auth.get_s3_credentials(daac="NSIDC")
# (3) Initialize h5coro object
granule = "nsidc-cumulus-prod-protected/ATLAS/ATL03/006/2023/02/13/ATL03_20230213042035_08341807_006_02.h5"
h5obj = h5coro.H5Coro(granule, s3driver.S3Driver, errorChecking=True, verbose=False, credentials=s3_creds, multiProcess=False)
# (4) Read the data
variables = ["/gt3l/heights/h_ph", "/gt3l/heights/dist_ph_along", "/gt3l/geolocation/segment_dist_x", "/gt3l/geolocation/segment_ph_cnt"]
promise = h5obj.readDatasets(variables, block=True, enableAttributes=False)
for variable in promise:
    print(f'{variable}: {promise[variable][0:10]}')
gt3l/heights/h_ph: [-47.941536 -51.9231   -48.09843  -47.873924 -48.12945  -48.118694
 -48.308052 -48.208042 -47.802708 -48.004234]
gt3l/heights/dist_ph_along: [0.7542868  0.76623714 1.4717534  2.187351   2.1880984  2.9048157
 2.905563   3.621905   4.337497   5.0545855 ]
gt3l/geolocation/segment_dist_x: [17068770.48934802 17068790.54479094 17068810.60023396 17068830.65567708
 17068850.7111203  17068870.76656362 17068890.82200704 17068910.87745056
 17068930.93289417 17068950.98833789]
gt3l/geolocation/segment_ph_cnt: [37 25 44 39 22 40 37 42 35 38]

Explanation of what happened#

(1) Import the necessary packages to use h5coro.#

h5coro relies on earthaccess for authenticating to Earth Data Login. The modules a user might want to import are:

  • s3driver: for reading data out of an s3 bucket

  • filedriver: for reading data out of a local file

  • webdriver: for reading data diretly over https (including objects in s3 buckets)

  • logger: for configuring the logging in h5coro

(2) Authenticate to Earth Data Login#

In my system I have a .netrc file setup with the following line:

machine urs.earthdata.nasa.gov login <my_user_name> password <my_password>

(3) Create an h5coro object for the granule that you want to read#

h5coro is object oriented, so all context information associated with the provided granule is stored in the object. Note that the full path to the granule is needed, including the s3 bucket.

(4) Read the data#

h5coro implements an asynchronous I/O interface, meaning that when the readDatasets function is called, it makes a read “request” in the background and returns immediately back to the caller. The caller receives something called a “promise” (or “future”) which is a promise that data will be there in the future at some point. You then can do other things while you wait, and when you finally need the data, you have to “block” or wait for it to be available.

In this example, I set the “block” parameter to True so that it would wait right away. But in more sophisticated examples, other work could have been done by the notebook while waiting for the results of the read.