Guidance for FAIR Publication and Citation of Earth Imagery in Soil Research

1. Purpose

Earth observation data—such as satellite, airborne, and drone imagery—are increasingly used to model, understand, and predict soil properties, soil health indicators, and landscape interactions. While these datasets are often too large or externally hosted to deposit in traditional research repositories, they must still be FAIR: Findable, Accessible, Interoperable, and Reusable—especially when used as covariates in soil mapping or related earth system products.

This guidance explains how to properly reference, document, and ensure traceability of earth imagery in FAIR-compliant publications.

2. Key Challenges

Unlike traditional datasets:

  • Earth imagery is high-volume and externally hosted.
  • Data is typically accessed from platforms or cloud services (e.g. Google Earth Engine, Copernicus, Amazon Web Services, NASA DAACs) rather than institutional repositories.
  • Frequently updated versions (e.g. Sentinel, Landsat, PlanetScope) complicate referencing.
  • Derived products (e.g. NDVI, texture layers, composites) often lack explicit citation metadata.

Despite these challenges, FAIR publication requires persistent references, transparency of processing, and reproducibility.

3. FAIR Principles Applied to Earth Imagery

Findable

Even if large imagery files aren’t deposited in repositories:

  • Identify data source/platform (e.g. Google Earth Engine, Sentinel Hub, Copernicus DIAS).

  • Include dataset identifiers:

    • Mission/sensor (e.g. Sentinel-2 MSI, Landsat 8 OLI, PlanetScope)
    • Product name and processing level (e.g. L2A surface reflectance)
    • Tile or scene ID(s), date ranges, and spatial extent
    • Cloud collection ID or asset ID (if available)
  • If a derived layer (e.g. NDVI or PCA stack) was used, document the algorithm and input scenes.

Accessible

Instead of a repository deposit:

  • Document how and where imagery can be accessed.

    • Platform or API (e.g. COPERNICUS/S2_SR, LANDSAT/LC08/C02/T1_L2)
    • Access conditions (open data, license requirements, API keys)
    • Links to provider documentation (e.g. Copernicus Hub, USGS Earth Explorer)

If imagery access requires credentials, include:

  • Instructions for obtaining access.
  • Reference to the licensing terms.

Interoperable

To promote reuse and linkage:

  • Record CRS (e.g. EPSG:4326 or UTM Zone 33N)
  • Specify file formats (e.g. GeoTIFF, COG, NetCDF, JPEG2000)
  • Use machine-readable metadata formats where possible (e.g. STAC, ISO 19115)

Reusable

Ensure traceability and transparency by documenting:

  • Acquisition dates or time windows
  • Preprocessing steps (e.g. atmospheric correction, mosaicking, scaling)
  • Derived layers created (e.g. vegetation index, bare soil index, elevation derivatives)
  • Version or revision number of the source dataset
  • Citation or attribution guidelines of the data provider

4. Citation Practices for Earth Imagery

Even when imagery is not physically deposited, it must still be citable using a combination of:

  • Source dataset name and identifier
  • Access platform/service
  • Acquisition time window
  • Processing steps or composite method
  • Persistent identifiers where available (e.g. DOI, STAC item ID, AWS S3 path, USGS/ESA scene ID)

6. Reproducibility for Derived Covariates

When imagery is used to generate predictors for soil modeling:

  • Document computation steps (e.g. bands, indices, thresholds).
  • Record software, libraries, and versions (e.g. GEE script ID, Python/R code).
  • Provide links or copies of scripts where possible.
  • Store exact datasets and date filters.

If storing the derived product is infeasible:

  • Share the workflow and references to input layers instead.

7. Integration in FAIR Soil Products

When satellite or drone imagery is used as covariates in soil prediction or mapping:

  • Include full citation(s) in the metadata of the soil product.

  • For each covariate, mention:

    • Imagery source & version
    • Date or period used
    • Transformation or resampling method
    • Resolution and CRS
  • Provide a data dictionary or table mapping covariate names to imagery sources.

8. Repository Strategy

Even though imagery files aren’t uploaded:

  • Publish metadata and citation details in the FAIR repository alongside the soil product.

  • Link imagery inputs via:

    • DOI placeholders (if available)
    • Cloud-native identifiers (STAC, S3 path, Earth Engine assets)
    • Dataset citation strings

Where derived imagery is small enough (e.g. aggregated rasters), include those layers directly.

9. Summary of Best Practices

Goal What to Do
Ensure Findability Cite source, collection ID, tile/date info
Make Accessible Provide access path or platform instructions
Support Interoperability Use standard CRS, formats, machine-readable metadata
Enable Reuse Document preprocessing, versions, licenses, and citations

10. Conclusion

FAIR publication of earth imagery are challenging for full file deposition, but it does require traceability, transparency, and citable identifiers. By documenting access pathways, identifiers, transformations, and licensing, earth imagery remains a reusable component of derived data products.