Guidance for FAIR Publication of Soil Property and Soil Health Indicator Maps

1. Purpose

This document provides best practices for ensuring that gridded soil products—such as predicted distributions of soil properties or health indicators across space and/or time—are published according to the FAIR principles: Findable, Accessible, Interoperable, and Reusable. These products are often generated using machine learning (commonly Random Forest), based on point observations and environmental co-variates. Proper documentation of methods, inputs, uncertainties, and usage limitations is essential.

2. Core Components of Soil Map Products

2.1 Point Observation Data

Each dataset used to train or validate the model must be referenced and, where possible, shared or openly cited.

Minimum requirements:

Persistent identifier (e.g. DOI, accession number)
Sampling design overview (source campaigns or databases)
Attributes measured (e.g. SOC, pH, bulk density, biological indicators)
Spatial coordinate reference system
Temporal coverage (when collected)
Licensing and access conditions
Link to associated metadata

If privacy or license restrictions prevent data sharing, reference the repository or publication where the data can be requested.

2.2 Co-variate Datasets

All co-variates used to fit the model must be properly documented to ensure reproducibility.

Key metadata:

Dataset name and version
Description (e.g. climate, terrain, remote sensing, parent material)
Spatial resolution and coordinate reference system
Temporal coverage (for time-specific variables)
Source and access link (URL, DOI, repository)
License and usage constraints

3. Modeling Framework Documentation

3.1 Algorithm Description

Clearly state:

Algorithm used (e.g. Random Forest)
Software or library (e.g. scikit-learn, ranger, caret, R randomForest)
Version number
Computing environment details (OS, language version, dependencies)

3.2 Model Hyperparameters and Fit

Document:

Number of trees
Node size, mtry/feature selection approach
Cross-validation or validation method
Train/test split or resampling strategy
Feature importance metrics, if calculated

Include or link to:

Scripts or notebooks used for training and prediction
Logs of training runs or configuration files

3.3 Model Performance Metrics

Provide relevant fit metrics, such as:

RMSE, MAE, R² (for continuous properties)
Confusion matrix, kappa, AUROC (if classification)
Spatial or temporal cross-validation
Any external validation datasets

4. Product Metadata and Publication Format

4.1 Core Metadata for the Published Map

For each soil map (raster or vector), make sure metadata includes:

Product title and abstract
Target property or indicator (select from common vocabularies)
Spatial resolution
Temporal reference (year, season, baseline or scenario)
Spatial extent and coordinate reference system
Version or edition number
Contact information or responsible organization

4.2 Attribution to Inputs and Model

Include references to:

Point datasets (with identifiers)
Co-variates (with versions and licenses)
Model description, parameters, and performance

These should be captured in metadata fields (e.g. ISO 19115, Dublin Core, DCAT, or INSPIRE-compliant formats).

4.3 File Formats

Preferred FAIR-friendly formats:

Raster: GeoTIFF, NetCDF, Cloud-Optimized GeoTIFF
Vector: GeoPackage, shapefile (as fallback), GeoJSON
Metadata: XML, JSON, or YAML aligned with standards
Model Docs: PDF, Markdown, or linked code repository

5. Uncertainty and Usage Limitations

5.1 Uncertainty Representation

Publish one or more of the following:

Pixel-level uncertainty or prediction interval maps
Standard error or variance layers
Validation residual surfaces
Confidence class maps

5.2 Usage Constraints and Limitations

Document:

Spatial or temporal domains for which predictions are valid
Known gaps or biases (e.g. underrepresented soil types or regions)
Limitations due to input data density or co-variate quality
Scale constraints (e.g. not suitable for farm-level decisions)

Include a clear statement on:

Appropriate applications (e.g. regional modeling, national planning)
Inappropriate uses (e.g. site-specific legal or regulatory decisions)

5.3 Licensing

Specify:

License type (e.g. CC-BY, CC0, ODbL)
Any attribution requirements
Citation instructions

6. Accessibility and Reuse

6.1 Repository and Access

Deposit map layers and accompanying metadata in a FAIR-compliant repository:

Examples: Zenodo, Figshare, institutional data portals, INSPIRE-compliant nodes
Provide persistent identifiers (e.g. DOI)

6.2 Interoperability

Publish with:

Standardized coordinate reference systems
Open geospatial formats
Metadata standards (ISO 19115, DCAT, INSPIRE)
Optional API or OGC services (WMS/WCS/WFS/GeoTIFF over HTTP)

6.3 Reproducibility

Where feasible, include or link to:

Model code and environment specifications
Data preparation workflows
Documentation for rerunning or updating predictions

7. Versioning and Updates

Track and record:

Version numbers and release dates
Changes in point data, co-variates, or model parameters
Deprecated or superseded versions
Archive of previous editions for reference

8. Citation and Acknowledgment

Provide a formatted citation that includes:

Title of the dataset
Version
Authors or organizations
Year
DOI or persistent link

If the map is derived from external datasets, include recommended citations for each.

9. Summary Checklist

Component	FAIR Requirement
Point data	Cited, licensed, persistent ID
Co-variates	Versioned, referenced, licensed
Model details	Algorithm, parameters, validation, code ref
Map product	Geospatial metadata, DOI, standardized format
Uncertainty	Published or referenced, explained
Usage limits	Clearly documented
Licensing	Explicit and machine-readable
Versioning	Traceable and archived

10. Conclusion

By adhering to these guidelines, soil map products become not only publishable but also traceable, interoperable, and reusable across projects, regions, and time. This ensures scientific transparency, policy relevance, and long-term value of soil information systems.