SEPPO
SEPPO - The Software for Earth Big Data Processing, Prediction, and Organization - is a novel, modern age set of tools for big data processing tasks largely rooted in well-developed open source libraries and add-on modules wrapping SEPPO around COTS software for processing at scale. The philosophy behind SEPPO is to provide a processing solution close to a user’s Linux environment with a modular design, for combining open source components to optimally customize big data processing solutions. As such, SEPPO provides a rich API library with many Python and Bash programs and scripts in a native, scalable Linux processing environment.
SEPPO development was rooted in scaling geospatial and remote sensing data processing and offers modules for these application domains where SEPPO also emphasizes modern geospatial data mining and time series analysis techniques. At this point SEPPO offers modules for processing SAR data with the Gamma Remote Sensing processing software and will integrate the ISCE software in anticipation of the launch of the NISAR satellite mission.
While harnessing the cloud-processing solutions offered by Amazon Web Services (AWS), SEPPO does not use any proprietary scripting and the entire SEPPO source code is delivered to Earth Big Data customers for its open source, non-COTS components.
Modules
SEPPO is available as a set of modules to provide flexibility to meet customer needs. The CORE module is the backbone for all operations. Add-on modules currently available focus on geospatial (GEOSPATIAL) data processing and visualization (VISUALIZATION) with specialty processing modules for SAR data processing at scale handling a variety of SAR sensors via GAMMA Remote Sensing (SARGAMMA) and NASA JPL ISCE (SARISCE) software components.
CORE
This module contains the core functionality to setup and manage a company/institutional cloud infrastructure and handle big data processing jobs at scale
Administrator tools to manage users and cloud resources, setting of user policies for cloud resource access and quota
User management tools to setup and maintain user accounts in a cloud environment
Administrator/user tools for secret key management and key rotation
Linux OS user management on local and cloud instances with linux style access policies for cloud bucket access
Creation and management of amazon machine images for customized compute environments
Creation and management of core cloud storage buckets (administration, users, shared) and user/company buckets
Multi-region support for efficient “process next to the data” deployments
Creation and management of a cloud database server (POSTGRES) for job management with full control of user queues, queue priority and interdependency in complex multi-stage processing tasks. Queue parameters to include machine image, instance type, and job execution time settings for fine-grained management of processing jobs
Creation and management of autoscaling groups that can tap into the cost-saving Spot market. Tested with deployments of thousands of compute instances in several petabyte scale processing jobs
Job queuing system
seppo_recipe_processor.py script with flexible data sourcing from multiple cloud access protocols (s3, http, gcp, scp, sftp, etc.)
Deployment of a cloud-daemon to handle the job queueing system
Conda/mambaforge scientific computing environment for python deployment
Jupyter Notebook / Jupyter Lab conda environment setup and connection to local/cloud jupyter server scripts
Setup scripts and tools to operate local and cloud-based Jupyter notebook servers
Bash scripting
Sandbox setup for users to experiment and develop code before production processing
Examples and scripts for full automation of routine processing via cronjobs to setup end-to-end processing pipelines
GitOps tools and integration with GitHub for code management
SEPPO software delivered in source code (python 3 and bash), based on open source software components for full transparency and API availability to users
HTML and PDF documentation of programs and API library
GEOSPATIAL
Extension of the POSTGRES database with POSTGIS handling
Access routines to the ASF DAAC for SAR Data inventory management
DEM tools to handle a variety of DEM data sets in cloud environments (SRTM, NASADEM, COPERNICUS DEM, NED, USGS DEM resource, custom DEMs)
Tiling tool to produce tiled output at Lat/Lon or Military Grid Reference System (MGRS) tiles from Satellite imagery processing
time series tools for data stacking and optimized cloud data store format handling (ZARR, GEOTiffs/BIGTiff, NetCDF, HDF5, etc.)
time series metrics computation for large time series in spatial and temporal dimensions, e.g. mean, median, maximum, minimum, percentiles, skewnessm kurtosis, coefficient of variation, standard deviation, cumulative sums; can be computed in groupings, e.g. monthly or seasonal)
several rastertools to manage raster data in a cloud setting
GDAL integration (gdal VSI enhanced tools)
RioXarray integration
QGIS conda environment setup and connection to geospatial cloud database
sample recipe scripts for routine geospatial processing operations
Jupyter notebook examples of cloud access to data sets and cloud-based processing and visualization of cloud-native data
SARGAMMA
GammaTools.py library for full integration of GAMMA Remote Sensing AG processing software for SAR data processing (geocoding, radiometric terrain correction, speckle filtering) for all major spaceborne and airborne SAR Sensors
Tied into GEOSPATIAL module for DEM handling, tiling, and time series handling
Tied into ASF DAAC SAR data management for SAR processing job monitoring
seppo_gamma_proc.py processor for wrapping GAMMA software routines into an easily deployed cloud scaling approach for SAR data processing
Precision orbit handling and routine updates handling with pre-processing check option
Option to develop interferometric processing chains with GAMMA Interferometry module purchase (ISP)
Multi-temporal speckle filter
SAR data format conversions (dB scale, power scale , amplitude scale)
SAR tailored time series processing tools (correct conversions to/from power format for mathmatical operations)
Requires purchase of a GAMMA license for cloud processing with full options to include interferometric processing. Support from GAMMA Remote Sensing in addition to support from the Earth Big Data Team.
SARGAMMA updates and maintenance/support are included in the first year with a GAMMA software license agreement
SARISCE
Integration with the ISCE software to support NISAR processing
Tied into GEOSPATIAL module for DEM handling, tiling, and time series handling
Tied into ASF DAAC SAR data management for SAR processing job monitoring
seppo_isce_proc.py processor for wrapping ISCE software routines into an easily deployed cloud scaling approach for SAR data processing
Multi-temporal speckle filter
SAR data format conversions (dB scale, power scale , amplitude scale)
SAR tailored time series processing tools (correct conversions to/from power format for mathmatical operations)
VISUALIZATION
Cloud-optimized tools for data enhancement and scientific data visualization
Time-series animation tool to produce b/w and color animations of change
Jupyter-notebook based visualizations with modern python environment tools
seppo_vis_make_enhanced_8bit.py routine to prepare single- or multi-band enhanced data sets in cloud-optimized format for direct cloud-server based visualization
Tools to prepare data sets as spatio-temporal asset catalog (STAC) based cloud stores for ready visualization
Tools for preparation of cloud visualization with Cloud-optimized Geotiff (COG) based tiling access
kerchunk and STAC support for multi-dimensional data visualization
Tools to tie into geojson based notebook visualizations
Tools to provide data for visualization at REST endpoints