Guidance for FAIR Publication of Soil Property and Soil Health Indicator Maps

1. Purpose

This document provides best practices for ensuring that gridded soil products—such as predicted distributions of soil properties or health indicators across space and/or time—are published according to the FAIR principles: Findable, Accessible, Interoperable, and Reusable. These products are often generated using machine learning (commonly Random Forest), based on point observations and environmental co-variates. Proper documentation of methods, inputs, uncertainties, and usage limitations is essential.

2. Core Components of Soil Map Products

2.1 Point Observation Data

Each dataset used to train or validate the model must be referenced and, where possible, shared or openly cited.

Minimum requirements:

  • Persistent identifier (e.g. DOI, accession number)
  • Sampling design overview (source campaigns or databases)
  • Attributes measured (e.g. SOC, pH, bulk density, biological indicators)
  • Spatial coordinate reference system
  • Temporal coverage (when collected)
  • Licensing and access conditions
  • Link to associated metadata

If privacy or license restrictions prevent data sharing, reference the repository or publication where the data can be requested.

2.2 Co-variate Datasets

All co-variates used to fit the model must be properly documented to ensure reproducibility.

Key metadata:

  • Dataset name and version
  • Description (e.g. climate, terrain, remote sensing, parent material)
  • Spatial resolution and coordinate reference system
  • Temporal coverage (for time-specific variables)
  • Source and access link (URL, DOI, repository)
  • License and usage constraints

3. Modeling Framework Documentation

3.1 Algorithm Description

Clearly state:

  • Algorithm used (e.g. Random Forest)
  • Software or library (e.g. scikit-learn, ranger, caret, R randomForest)
  • Version number
  • Computing environment details (OS, language version, dependencies)

3.2 Model Hyperparameters and Fit

Document:

  • Number of trees
  • Node size, mtry/feature selection approach
  • Cross-validation or validation method
  • Train/test split or resampling strategy
  • Feature importance metrics, if calculated

Include or link to:

  • Scripts or notebooks used for training and prediction
  • Logs of training runs or configuration files

3.3 Model Performance Metrics

Provide relevant fit metrics, such as:

  • RMSE, MAE, R² (for continuous properties)
  • Confusion matrix, kappa, AUROC (if classification)
  • Spatial or temporal cross-validation
  • Any external validation datasets

4. Product Metadata and Publication Format

4.1 Core Metadata for the Published Map

For each soil map (raster or vector), make sure metadata includes:

  • Product title and abstract
  • Target property or indicator (select from common vocabularies)
  • Spatial resolution
  • Temporal reference (year, season, baseline or scenario)
  • Spatial extent and coordinate reference system
  • Version or edition number
  • Contact information or responsible organization

4.2 Attribution to Inputs and Model

Include references to:

  • Point datasets (with identifiers)
  • Co-variates (with versions and licenses)
  • Model description, parameters, and performance

These should be captured in metadata fields (e.g. ISO 19115, Dublin Core, DCAT, or INSPIRE-compliant formats).

4.3 File Formats

Preferred FAIR-friendly formats:

  • Raster: GeoTIFF, NetCDF, Cloud-Optimized GeoTIFF
  • Vector: GeoPackage, shapefile (as fallback), GeoJSON
  • Metadata: XML, JSON, or YAML aligned with standards
  • Model Docs: PDF, Markdown, or linked code repository

5. Uncertainty and Usage Limitations

5.1 Uncertainty Representation

Publish one or more of the following:

  • Pixel-level uncertainty or prediction interval maps
  • Standard error or variance layers
  • Validation residual surfaces
  • Confidence class maps

5.2 Usage Constraints and Limitations

Document:

  • Spatial or temporal domains for which predictions are valid
  • Known gaps or biases (e.g. underrepresented soil types or regions)
  • Limitations due to input data density or co-variate quality
  • Scale constraints (e.g. not suitable for farm-level decisions)

Include a clear statement on:

  • Appropriate applications (e.g. regional modeling, national planning)
  • Inappropriate uses (e.g. site-specific legal or regulatory decisions)

5.3 Licensing

Specify:

  • License type (e.g. CC-BY, CC0, ODbL)
  • Any attribution requirements
  • Citation instructions

6. Accessibility and Reuse

6.1 Repository and Access

Deposit map layers and accompanying metadata in a FAIR-compliant repository:

  • Examples: Zenodo, Figshare, institutional data portals, INSPIRE-compliant nodes
  • Provide persistent identifiers (e.g. DOI)

6.2 Interoperability

Publish with:

  • Standardized coordinate reference systems
  • Open geospatial formats
  • Metadata standards (ISO 19115, DCAT, INSPIRE)
  • Optional API or OGC services (WMS/WCS/WFS/GeoTIFF over HTTP)

6.3 Reproducibility

Where feasible, include or link to:

  • Model code and environment specifications
  • Data preparation workflows
  • Documentation for rerunning or updating predictions

7. Versioning and Updates

Track and record:

  • Version numbers and release dates
  • Changes in point data, co-variates, or model parameters
  • Deprecated or superseded versions
  • Archive of previous editions for reference

8. Citation and Acknowledgment

Provide a formatted citation that includes:

  • Title of the dataset
  • Version
  • Authors or organizations
  • Year
  • DOI or persistent link

If the map is derived from external datasets, include recommended citations for each.

9. Summary Checklist

Component FAIR Requirement
Point data Cited, licensed, persistent ID
Co-variates Versioned, referenced, licensed
Model details Algorithm, parameters, validation, code ref
Map product Geospatial metadata, DOI, standardized format
Uncertainty Published or referenced, explained
Usage limits Clearly documented
Licensing Explicit and machine-readable
Versioning Traceable and archived

10. Conclusion

By adhering to these guidelines, soil map products become not only publishable but also traceable, interoperable, and reusable across projects, regions, and time. This ensures scientific transparency, policy relevance, and long-term value of soil information systems.