Live · Open Source · Apache 2.0

Planetary Sensing Engine

Open-source spatiotemporal data fusion for Earth observation, climate science, and planetary intelligence.

PSE ingests data from six live sources, normalizes everything into a common spatiotemporal model, and serves the result through a single REST API. Quality-weighted fusion blends overlapping sources by reliability and recency. Full provenance metadata traces every data point back to its origin.

Built for the Pangeo scientific Python ecosystem — xarray, Dask, Zarr, and the tools that climate scientists and Earth observation researchers already use.

6 connectors · Quality-weighted fusion · REST API · xarray · Apache 2.0

THE ENGINE

What PSE does

Earth observation data is fragmented. Temperature comes from ECMWF. Solar irradiance comes from the Global Solar Atlas. Satellite imagery comes from Copernicus. Infrastructure maps come from OpenStreetMap. Socioeconomic indicators come from the World Bank. Each source has its own format, coordinate system, temporal resolution, and access method.

PSE unifies them. It ingests data from multiple sources, normalizes everything into a common spatiotemporal model, and serves the result through a single REST API. When multiple sources provide the same variable, PSE uses quality-weighted fusion — blending them based on reliability and recency rather than picking one. Every response carries full provenance metadata, tracing each data point back to its original source and transformation history.

The output is a standard xarray Dataset, fully compatible with the Pangeo scientific Python ecosystem — Dask, Zarr, Xarray, and the tools that thousands of climate scientists and Earth observation researchers already use.

ARCHITECTURE

How PSE fits in the Northflow platform

PSE and HGE are two distinct engines that serve different functions in the Northflow architecture.

PSE senses.

It ingests, fuses, and serves observational data about the physical world.

HGE thinks.

It generates, ranks, and evaluates scientific hypotheses from data.

Domain adapters connect to whichever engines they need. FLUX (renewable energy intelligence) is built on PSE. CERES (famine early warning) is built on HGE. As the platform matures, adapters will draw from both engines simultaneously — PSE providing the observations, HGE generating the hypotheses.

Domain Adapters

FLUX · CERES · MARVIS · AION · Laplace · ...

PSE

Senses

HGE

Thinks

Earth Data

Data & Hypotheses

DATA SOURCES

Six live data connectors

Each connector implements a common interface, making it straightforward to add new sources.

ERA5

ECMWF Climate Reanalysis

Global atmospheric reanalysis produced by the European Centre for Medium-Range Weather Forecasts. Covers temperature, wind components, solar radiation, precipitation, humidity, surface pressure, and more. PSE handles async job submission, polling, and download through the Copernicus Climate Data Store API.

Resolution: 0.25° (~28 km)

Update: Daily, 5-day lag

Open-Meteo

Weather Archive and Forecast

Open-access weather data providing historical archive and short-range forecasts. 24 variables including temperature, wind, humidity, solar irradiance, cloud cover, soil moisture, and evapotranspiration. Hourly resolution. PSE automatically routes requests between the archive and forecast APIs based on a 5-day cutoff and fetches grid points in parallel.

Resolution: Hourly

Update: Continuous

Global Solar Atlas

Solar Resource Data

Long-term solar resource data from the World Bank. Provides Global Horizontal Irradiance (GHI), Direct Normal Irradiance (DNI), Diffuse Irradiance (DIF), Global Tilted Irradiance (GTI), and photovoltaic power output potential. Delivered as monthly climatology with annual totals.

Resolution: ~1 km

Update: Monthly climatology

Sentinel-2

Optical Satellite Imagery

Multispectral satellite imagery from ESA’s Sentinel-2 constellation via the Copernicus Data Space Ecosystem. PSE derives NDVI (vegetation index), NDWI (water index), and a 4-class land use classification from the Scene Classification Layer. OAuth2 authentication with automatic cloud filtering.

Resolution: 10–60 m

Update: 5-day revisit

OpenStreetMap

Infrastructure and Land Use

Global infrastructure data from the OpenStreetMap community via the Overpass API. PSE extracts power substations, transmission lines, road networks, settlement boundaries, and waterway density. Returns aggregate metrics per grid cell with full GeoJSON geometries stored in dataset attributes.

Resolution: Vector

Update: Community-maintained

World Bank

Socioeconomic Indicators

Country-level development indicators from the World Bank Open Data API. Covers electricity access rates, GDP per capita, renewable energy share, CO₂ emissions, population, and rural/urban electrification disparities. Annual values broadcast to spatial grids for integration with other sources.

Resolution: Country-level

Update: Annual

FUSION

The fusion engine

When a query spans multiple data sources, PSE does not simply pick one. It retrieves data from all relevant connectors in parallel, then fuses the results through a multi-step pipeline.

1

Spatial alignment

Sources arrive at different resolutions — ERA5 at 0.25°, Sentinel-2 at 10 m, World Bank at country level. PSE regrids everything to a common spatial grid using linear interpolation, with fallback broadcasting for sources with coarser resolution.

2

Temporal alignment

Sources have different time axes — hourly weather, daily reanalysis, monthly climatology, annual statistics, static snapshots. PSE classifies each source’s temporal structure and aligns them to a common time axis appropriate for the query.

3

Quality-weighted merge

When multiple sources provide the same variable (for example, both ERA5 and Open-Meteo provide temperature), PSE blends them using quality scores computed as 0.7 × source reliability + 0.3 × data recency. This produces more accurate results than any single source alone.

4

Conflict detection

PSE monitors the coefficient of variation across sources for each variable. When sources disagree significantly, the conflict is flagged in the response metadata so downstream applications can handle it appropriately.

5

Provenance tracking

Every fused dataset includes complete provenance metadata: which sources contributed, what transformations were applied, quality scores for each source, and the timestamp of the fusion operation. Any result can be traced back to its origins.

API

The API

PSE exposes all capabilities through a REST API built on FastAPI.

Health and Status

GET/api/v1/health

Returns the operational status of every connector, including data freshness, lag time, available variables, and update frequency. Also reports cache statistics and database connectivity.

GET/api/v1/sources

Lists all registered data connectors with their current status.

GET/api/v1/sources/{id}/status

Detailed freshness and last-ingest information for a specific connector.

Data Queries

GET/api/v1/point

Point query for a single latitude/longitude coordinate. Returns time-series values for requested variables from the best available sources.

GET/api/v1/query

Bounding box query returning a gridded dataset as JSON. Specify the spatial extent, time range, variables, and target resolution.

POST/api/v1/fuse

Explicit multi-source fusion request. Submit a structured request body specifying exactly which sources to fuse and how to weight them.

Response format

All data responses return xarray-compatible JSON structures with coordinates (latitude, longitude, time), data variables, and attributes including full provenance metadata. Export to Zarr, NetCDF, GeoJSON, and CSV is supported.

STORAGE

Storage and caching

Zarr Store

All gridded array data is stored in Zarr format with daily partitioning, organized as {root}/{source_id}/{YYYY-MM-DD}/. Storage works on local disk or Amazon S3 via fsspec, making PSE cloud-ready without code changes.

PostgreSQL/PostGIS

Point observations, ingest records, and data source status metadata are stored in PostgreSQL with the PostGIS spatial extension, enabling fast spatial queries and metadata lookups.

Intelligent Cache

PSE maintains an in-process TTL + LRU cache keyed by SHA-256 hashes of query parameters. Cache entries are source-aware, supporting prefix-sweep invalidation when a specific connector’s data is refreshed.

COMMERCIAL ADAPTER

FLUX — First commercial adapter

FLUX is the first commercial application built on PSE. It provides AI-powered intelligence for renewable energy siting and development in developing countries.

FLUX queries PSE for solar irradiance, wind speed, temperature, terrain, land use, and grid infrastructure data, then runs the results through a pipeline of domain-specific models:

Solar PV yield modeling

Using pvlib — from solar position calculation through irradiance decomposition, plane-of-array transposition, cell temperature modeling, and system losses to hourly energy output. Five panel types with P50/P90/P10 uncertainty bounds.

Wind energy yield modeling

Log-law hub-height extrapolation, Weibull distribution fitting, and cubic power curve modeling with cut-in, rated, and cut-out wind speeds. Four turbine models.

Grid infrastructure analysis

Identifies nearest substations from OpenStreetMap data, calculates connection distances, estimates required voltage levels based on project capacity, and models connection costs.

Financial modeling

Levelized Cost of Energy (LCOE), Internal Rate of Return (IRR) via Newton-Raphson method, Net Present Value (NPV), simple payback period, and debt service calculations with country-specific benchmarks.

Climate risk assessment

Evaluates six hazards (flood, cyclone, extreme heat, wind storm, drought, sea level rise) using elevation, latitude, and climate zone heuristics, producing a weighted composite risk score with mitigation recommendations.

Natural language explanation

Generates a structured narrative interpreting all scores, identifying the strongest and weakest dimensions, and producing a prioritized list of actionable recommendations.

Country configurations are live for Indonesia, Kenya, and Vietnam, with country-specific grid data, electricity tariffs, regulatory exclusion zones, and financial parameters.

FLUX is proprietary. PSE is open source. This is the model: the data infrastructure is free, the intelligence built on top of it is the product.

EXTENSIBILITY

Extending PSE

PSE is designed to be extended. Adding a new data source means implementing the BaseConnector abstract class — approximately 100 lines of Python defining how to fetch data, assess quality, and report freshness for your source.

The connector interface ensures that every new source automatically works with the fusion engine, the cache, the API, and every adapter built on PSE. A connector for ocean temperature data, once written, immediately becomes available to FLUX for renewable energy intelligence and to any future adapter built on PSE.

Priority connectors for future development

  • AIS vessel tracking data (BarentsWatch, AISstream)
  • CMIP6 climate model projections
  • Air quality observations
  • Ocean sensor networks
  • Agricultural yield statistics

Contributions are welcome. The source code is available at github.com/northflowlabs/pse under the Apache 2.0 license. See the contributing guide for instructions on submitting new connectors.

SPECIFICATIONS

Technical summary

AttributeDetail
LanguagePython 3.11+
LicenseApache 2.0
Codebase~12,500 lines across 69 files
Tests200+ passing (unit + integration)
Data formatxarray Dataset (CF-conventions)
StorageZarr (local/S3) + PostgreSQL/PostGIS
APIFastAPI (async)
ConnectorsERA5, Open-Meteo, Global Solar Atlas, Sentinel-2, OpenStreetMap, World Bank
FusionQuality-weighted (0.7×reliability + 0.3×recency)
EcosystemPangeo (Xarray, Dask, Zarr, fsspec)
DeploymentDocker Compose, Railway, or any cloud provider