SEPPO

SEPPO - The Software for Earth Big Data Processing, Prediction, and Organization - is a novel, modern age set of tools for big data processing tasks largely rooted in well-developed open source libraries and add-on modules wrapping SEPPO around COTS software for processing at scale. The philosophy behind SEPPO is to provide a processing solution close to a user’s Linux environment with a modular design, for combining open source components to optimally customize big data processing solutions. As such, SEPPO provides a rich API library with many Python and Bash programs and scripts in a native, scalable Linux processing environment.

SEPPO development was rooted in scaling geospatial and remote sensing data processing and offers modules for these application domains where SEPPO also emphasizes modern geospatial data mining and time series analysis techniques. At this point SEPPO offers modules for processing SAR data with the Gamma Remote Sensing processing software and will integrate the ISCE software in anticipation of the launch of the NISAR satellite mission.

While harnessing the cloud-processing solutions offered by Amazon Web Services (AWS), SEPPO does not use any proprietary scripting and the entire SEPPO source code is delivered to Earth Big Data customers for its open source, non-COTS components.

Modules

SEPPO is available as a set of modules to provide flexibility to meet customer needs. The CORE module is the backbone for all operations. Add-on modules currently available focus on geospatial (GEOSPATIAL) data processing and visualization (VISUALIZATION) with specialty processing modules for SAR data processing at scale handling a variety of SAR sensors via GAMMA Remote Sensing (SARGAMMA) and NASA JPL ISCE (SARISCE) software components.

CORE

This module contains the core functionality to setup and manage a company/institutional cloud infrastructure and handle big data processing jobs at scale

  • Administrator tools to manage users and cloud resources, setting of user policies for cloud resource access and quota

  • User management tools to setup and maintain user accounts in a cloud environment

  • Administrator/user tools for secret key management and key rotation

  • Linux OS user management on local and cloud instances with linux style access policies for cloud bucket access

  • Creation and management of amazon machine images for customized compute environments

  • Creation and management of core cloud storage buckets (administration, users, shared) and user/company buckets

  • Multi-region support for efficient “process next to the data” deployments

  • Creation and management of a cloud database server (POSTGRES) for job management with full control of user queues, queue priority and interdependency in complex multi-stage processing tasks. Queue parameters to include machine image, instance type, and job execution time settings for fine-grained management of processing jobs

  • Creation and management of autoscaling groups that can tap into the cost-saving Spot market. Tested with deployments of thousands of compute instances in several petabyte scale processing jobs

  • Job queuing system

  • seppo_recipe_processor.py script with flexible data sourcing from multiple cloud access protocols (s3, http, gcp, scp, sftp, etc.)

  • Deployment of a cloud-daemon to handle the job queueing system

  • Conda/mambaforge scientific computing environment for python deployment

  • Jupyter Notebook / Jupyter Lab conda environment setup and connection to local/cloud jupyter server scripts

  • Setup scripts and tools to operate local and cloud-based Jupyter notebook servers

  • Bash scripting

  • Sandbox setup for users to experiment and develop code before production processing

  • Examples and scripts for full automation of routine processing via cronjobs to setup end-to-end processing pipelines

  • GitOps tools and integration with GitHub for code management

  • SEPPO software delivered in source code (python 3 and bash), based on open source software components for full transparency and API availability to users

  • HTML and PDF documentation of programs and API library

GEOSPATIAL

  • Extension of the POSTGRES database with POSTGIS handling

  • Access routines to the ASF DAAC for SAR Data inventory management

  • DEM tools to handle a variety of DEM data sets in cloud environments (SRTM, NASADEM, COPERNICUS DEM, NED, USGS DEM resource, custom DEMs)

  • Tiling tool to produce tiled output at Lat/Lon or Military Grid Reference System (MGRS) tiles from Satellite imagery processing

  • time series tools for data stacking and optimized cloud data store format handling (ZARR, GEOTiffs/BIGTiff, NetCDF, HDF5, etc.)

  • time series metrics computation for large time series in spatial and temporal dimensions, e.g. mean, median, maximum, minimum, percentiles, skewnessm kurtosis, coefficient of variation, standard deviation, cumulative sums; can be computed in groupings, e.g. monthly or seasonal)

  • several rastertools to manage raster data in a cloud setting

  • GDAL integration (gdal VSI enhanced tools)

  • RioXarray integration

  • QGIS conda environment setup and connection to geospatial cloud database

  • sample recipe scripts for routine geospatial processing operations

  • Jupyter notebook examples of cloud access to data sets and cloud-based processing and visualization of cloud-native data

SARGAMMA

  • GammaTools.py library for full integration of GAMMA Remote Sensing AG processing software for SAR data processing (geocoding, radiometric terrain correction, speckle filtering) for all major spaceborne and airborne SAR Sensors

  • Tied into GEOSPATIAL module for DEM handling, tiling, and time series handling

  • Tied into ASF DAAC SAR data management for SAR processing job monitoring

  • seppo_gamma_proc.py processor for wrapping GAMMA software routines into an easily deployed cloud scaling approach for SAR data processing

  • Precision orbit handling and routine updates handling with pre-processing check option

  • Option to develop interferometric processing chains with GAMMA Interferometry module purchase (ISP)

  • Multi-temporal speckle filter

  • SAR data format conversions (dB scale, power scale , amplitude scale)

  • SAR tailored time series processing tools (correct conversions to/from power format for mathmatical operations)

  • Requires purchase of a GAMMA license for cloud processing with full options to include interferometric processing. Support from GAMMA Remote Sensing in addition to support from the Earth Big Data Team.

  • SARGAMMA updates and maintenance/support are included in the first year with a GAMMA software license agreement

SARISCE

  • Integration with the ISCE software to support NISAR processing

  • Tied into GEOSPATIAL module for DEM handling, tiling, and time series handling

  • Tied into ASF DAAC SAR data management for SAR processing job monitoring

  • seppo_isce_proc.py processor for wrapping ISCE software routines into an easily deployed cloud scaling approach for SAR data processing

  • Multi-temporal speckle filter

  • SAR data format conversions (dB scale, power scale , amplitude scale)

  • SAR tailored time series processing tools (correct conversions to/from power format for mathmatical operations)


VISUALIZATION

  • Cloud-optimized tools for data enhancement and scientific data visualization

  • Time-series animation tool to produce b/w and color animations of change

  • Jupyter-notebook based visualizations with modern python environment tools

  • seppo_vis_make_enhanced_8bit.py routine to prepare single- or multi-band enhanced data sets in cloud-optimized format for direct cloud-server based visualization

  • Tools to prepare data sets as spatio-temporal asset catalog (STAC) based cloud stores for ready visualization

  • Tools for preparation of cloud visualization with Cloud-optimized Geotiff (COG) based tiling access

  • kerchunk and STAC support for multi-dimensional data visualization

  • Tools to tie into geojson based notebook visualizations

  • Tools to provide data for visualization at REST endpoints