Imbalanced datasets are one of the main challenges in digital soil mapping. For these datasets, machine learning techniques commonly overestimate the majority classes and underestimate the minority ones. In general, this generates maps with poor precision and unrealistic results. Considering these maps for land use decisionmaking can have dire consequences. This is the case of acid sulfate (AS) soils, a type of harmful soil that can generate serious environmental damage when drained in agricultural or forestry activities. Therefore, it is necessary to create high-precision maps to avoid environmental damage. Although most soil class datasets in nature are imbalanced, this problem has hardly been studied. One of the main objectives of this work is the evaluation of different techniques to address the problem of imbalanced datasets. The methods considered to balance the dataset are an undersampling technique, the addition of more samples, and the combination of both. For increasing the number of samples from the minority class, we develop a new technique by creating artificial samples from the quaternary geological map. The method used for the modeling is Random Forest, one of the best methods for the classification of AS soils. Balancing the dataset improves the performance of the model in all the studied cases, where the values of the metrics for both classes are above 80%. The consideration of artificial non-AS soil samples improves the prediction of the model for the AS soils. Furthermore, we create AS soil probability maps for the four balanced datasets and the imbalanced dataset. The modeled AS soil probability maps created from the balanced datasets have high precision. A detailed comparison between the maps is made. The predictions of some of these maps match between 75%-80% of the study area. In addition, the extent of the AS soils obtained in all the cases is compared with the extent of the AS soils in the conventionally produced occurrence map. The good results of this study confirm the importance of balancing the dataset to improve the prediction and classification of AS soils.
Seasonally frozen ground (SFG) significantly contributes to global carbon sinks. Global warming and anthropogenic-induced disturbances threaten the carbon storage capacity of SFG. Challenges in evaluating the SFG carbon storage potential include the lack of understanding of the control mechanisms of soil organic carbon (SOC) variations and timely spatial estimates of SOC. In this study, we investigated SOC stocks in SFG in the Tibet Autonomous Region, China, in 2020 and 2021. We employed partial least squares structural equation modeling (PLS-SEM) to explore the effect of complex processes (interacting roles of climate, plant physiology and phenology, freeze-thaw cycle, soil environment, and C inputs) on SOC and mapped SOC stocks in the topmost 30 cm. We identified four causal pathways: (1) an indirect pathway representing the effect of climate on plant physiology and phenology through changes in freeze-thaw cycles and soil environment, (2) an indirect pathway representing the effect of climate on C inputs through changes in freeze-thaw cycles, soil environment and plant physiology and phenology, (3) an indirect pathway representing the effect of climate on freeze-thaw cycles, and (4) an indirect pathway representing the effect of climate on the soil environment through changes in freeze--thaw cycles. C inputs exerted the greatest control on SOC. The effect of these factors decreased with increasing soil depth. We used PLS-SEM to generate maps of SOC stocks in SFG at a 500 m resolution with a moderate accuracy. The estimated mean SOC stocks in the 0-30 cm layer reached 6.87 kg m(-2), with a 95% confidence interval ranging from 6.2 to 7.5 kg m(-2). Our results indicated that it is critical to consider the depth dependence of the direct and indirect effects of environmental factors when assessing the control mechanisms of SOC vari-ations. In this work, we also demonstrated that spatially explicit SOC estimates based on timely investigations are important for assessing C stocks against the background of considerable environmental changes across the Ti-betan Plateau.
Soil corrosivity is a term used to describe the corroding susceptibility (risk) of metal infrastructure in different soil environments. Soil corrosivity mapping is a crucial step in identifying potentially problematic, high-maintenance fence lines and can help improve fence longevity by identifying soil environments where the use of more expensive, corrosion-resistant materials would be more cost-effective in the long term. Soil corrosion damage sustained on exclusion fences can be a serious management issue for conservation programs and initiatives, as it weakens the fence netting and provides opportunities for invasive animal migration and occupation (e.g. feral cats and foxes) into areas of high conservation value. The increasing accessibility of geospatial analysis software and the availability of open-source soil data provide land managers with the opportunity to implement digital soil databases and pedotransfer functions to produce fence corrosion risk maps using commonly measured soil attributes. This paper uses open-source government agency soil data (shapefiles) to map fence corrosion risk in the southern part of the Yorke Peninsula in South Australia, with the intention to assist with the installation of a new barrier (exclusion) fence as part of the Marna Banggara rewilding project. The risk classifications (low, moderate and high risk) made by this map were compared with rates of zinc corrosion (mu m/year zinc loss) observed at field sites and correctly predicted the amount of fence damage sustained at five of the eight sites. The mapping approach outlined in this study can be implemented by environmental managers in other areas to inform strategies for enhancing fence longevity.
Soil texture data are the basic input parameters for many Earth System Models. As the largest middle-low altitude permafrost regions on the planet, the land surface processes on the Qinghai-Tibet Plateau can affect regional and even global water and energy cycles. However, the spatial distribution of soil texture data on the plateau is largely unavailable due to the difficulty of obtaining field data. Based on collection data from field surveys and environmental factors, we predicted the spatial distribution of clay, silt, and sand contents at a 1 km resolution, from 0-5, 5-15, 15-30, 30-60, 60-100, and 100-200 cm soil depth layers. The random forest models were constructed to predict the soil texture according to the relationships between environmental factors and soil texture data. The results showed that the soil particles of the QTP are dominated by sand, which accounts for more than 70% of the total particles. As for the spatial distribution, silt and clay contents are high in the southeast plateau, and low values of silt and clay mainly appeared in the northwest plateau. Climate and NDVI values are the most important factors that affect the spatial distribution of soil texture on the QTP. The results of this study provide the soil texture data at different depths for the whole plateau at a spatial resolution of 1 km, and the dataset can be used as an input parameter for many Earth System Models.