冰冻圈科学与冻土工程全国重点实验室

Uncertainty quantification in data-driven modelling with application to soil properties prediction

He, Geng-Fu , Yin, Zhen-Yu , Zhang, Pin

Accurate estimation of soil properties is crucial for reliability-based design in engineering practices. Conventional empirical equations and prevalent data-driven models rarely consider uncertainty quantification in both measurement and modelling processes. This study tailors three uncertainty quantification methods including Bayesian learning, Markov chain Monte Carlo and ensemble learning into data-driven modelling, in which support vector regression is selected as the baseline algorithm. The compression index of clay is adopted as an example for model training and testing. In this context, Bayesian learning and Markov chain quantify uncertainty by considering the distribution of function and hyper-parameters, respectively, while different sampled data are employed to explore model uncertainty. These models are evaluated in terms of accuracy, reliability and cost-effectiveness and also compared with Gaussian process regression, etc. The results reveal that based on built-in structural risk minimization, sparse solution and uncertainty quantification, developed models can capture more accurate and reliable correlations from actual measured data over other methods. Their practicability and generalization ability are also verified on a new creep index database. The proposed probabilistic methods are also compiled into a user-friendly platform, showing a significant potential to enrich the data-driven modelling framework and be applied in other geotechnical properties.

期刊论文 2025-02-01 DOI: 10.1007/s11440-024-02484-9 ISSN: 1861-1125

Investigating agricultural drought in Northern Italy through explainable Machine Learning: Insights from the 2022 drought

Xue, Chenli , Ghirardelli, Aurora , Chen, Jianping , Tarolli, Paolo

Agricultural drought is a complex natural hazard involving multiple variables and has garnered increasing attention for its severe threat to food security worldwide. In the context of climate change and the increased occurrence of drought events, it is crucial to monitor drought drivers and progression to plan the subsequent efforts in drought prevention, adaptation, and migration. However, previous studies on agricultural drought often focused on precipitation or evapotranspiration, overlooking other potential drivers related to crop drought stress. Additionally, macro-level analyses of drought-driving mechanisms struggle to reveal the underlying contexts of varying drought intensities. Northern Italy is one of the most important agricultural regions in Europe and is also a hotspot affected by extreme climate events in the world. In the summer of 2022, an extreme drought struck Europe once again, causing significant damage to the agricultural regions of Northern Italy. However, no studies to date have revealed the potential impacts and extent of extreme drought on this crucial agricultural area at a regional scale. Therefore, a comprehensive understanding of agricultural drought still requires further clarification and differentiated driver analysis. This study proposed a novel framework to comprehensively monitor agricultural drought with ensemble machine learning by constructing an integrated agriculture drought index (IADI) with remote sensing-related data including meteorology, soil, geomorphology, and vegetation conditions. Additionally, the Shapley Additive Explanation (SHAP) explainable model was applied to reveal the driving mechanism behind the drought event that occurred in northern Italy in the summer of 2022. Results indicated that the proposed explainable ensemble machine learning model with multi-source remote sensing products could effectively depict the evolution of agricultural drought with spatially continuous maps on an 8day scales. The SHAP analysis demonstrated that the extreme and severe agricultural drought in the summer of 2022 was closely related to meteorological indicators especially precipitation and land surface temperature, which contributed 68.88% to the drought. Moreover, the new findings also highlighted that soil properties affected the agricultural drought with a contribution of 28.3%. Specifically, in the case of moderate and slight drought conditions, higher clay and soil organic carbon (SOC) content contribute to mitigating drought effects, while sandy and silty soils have the opposite effect, and the contributions from soil texture and SOC are more significant than precipitation and land surface temperature. The proposed research framework could effectively contribute to improving the methodology in agricultural drought research, potentially bringing more instructive insights for drought prevention and mitigation.

期刊论文 2024-12-01 DOI: 10.1016/j.compag.2024.109572 ISSN: 0168-1699

Predicting gully formation: An approach for assessing susceptibility and future risk

Mokhtari, Leila Goli , Nejad, Nadiya Baghaei , Beheshti, Ali

Gully erosion is a significant natural hazard and a form of soil erosion. This research aims to predict gully formation in the Kalshour basin, Sabzevar, Iran. Employing the Information Gain Ratio (IGR) index, we identified 13 key factors out of 22 for modeling, with elevation emerging as the most influential factor in gully formation. The study evaluated the performance of individual machine learning algorithms and ensemble algorithms, including the Functional Tree (FT) as the main classifier, Bagging (Bagg), AdaBoost (Ada), Rotation Forest (RoF), and Random Subspace (RSS). Using a data set of 400 gully and non-gully points obtained through field investigations (70% for training and 30% for testing), the RoF model achieved an area under the curev (AUC) value of 0.99, indicating its high predictive ability for gully-susceptible areas. Other algorithms also performed well (Ada: 0.90, FT: 0.92, RSS: 0.94, Bagg: 0.95). However, the RoF algorithm with the functional tree as the main classifier (RoF_FT) demonstrated the highest ability in gully classification and susceptibility mapping, enhancing the functional tree's performance. In addition to AUC, the RoF_FT model achieved an F1 score of 0.89 and an MCC of 0.78 on the validation set, indicating a high balance between precision and recall, and a strong correlation between predicted and actual classes, respectively. Similarly, other models showed robust performance with high F1 scores and MCC values, but the RoF_FT model consistently outperformed them, underscoring its robustness and reliability. The resulting gully erosion-susceptibility map can be valuable for decision-makers and local managers in soil conservation and minimizing damages. Implementing proactive measures based on these findings can contribute to sustainable land management practices in the Kalshour basin.Recommendations Gully erosion threat: Gully erosion poses a significant threat to soil, with far-reaching environmental consequences. Predictive modeling: This research focuses on predicting gully formation in the Kalshour basin, Sabzevar, Iran, using advanced machine learning algorithms. Key findings for decision-makers: The study evaluates the performance of various machine learning algorithms and ensemble algorithms, with the Functional Tree serving as the main classifier. This not only enhances our ability to predict gully formation but also provides a valuable tool for decision-makers and local managers in soil conservation. Impact on sustainable land management: By offering a gully erosion-susceptibility map, the research empowers decision-makers to implement proactive measures, minimizing damage and contributing to sustainable land management practices. Interdisciplinary approach: The study's combination of geospatial analysis, machine learning, and soil conservation aligns with the journal's mission to advance understanding in environmental modeling.

期刊论文 2024-11-01 DOI: 10.1111/nrm.12409 ISSN: 0890-8575

Ensemble Learning Improves the Efficiency of Microseismic Signal Classification in Landslide Seismic Monitoring

Xin, Bingyu , Huang, Zhiyong , Huang, Shijie , Feng, Liang

A deep-seated landslide could release numerous microseismic signals from creep-slip movement, which includes a rock-soil slip from the slope surface and a rock-soil shear rupture in the subsurface. Machine learning can effectively enhance the classification of microseismic signals in landslide seismic monitoring and interpret the mechanical processes of landslide motion. In this paper, eight sets of triaxial seismic sensors were deployed inside the deep-seated landslide, Jiuxianping, China, and a large number of microseismic signals related to the slope movement were obtained through 1-year-long continuous monitoring. All the data were passed through the seismic event identification mode, the ratio of the long-time average and short-time average. We selected 11 days of data, manually classified 4131 data into eight categories, and created a microseismic event database. Classical machine learning algorithms and ensemble learning algorithms were tested in this paper. In order to evaluate the seismic event classification performance of each algorithmic model, we evaluated the proposed algorithms through the dimensions of the accuracy, precision, and recall of each model. The validation results demonstrated that the best performing decision tree algorithm among the classical machine learning algorithms had an accuracy of 88.75%, while the ensemble algorithms, including random forest, Gradient Boosting Trees, Extreme Gradient Boosting, and Light Gradient Boosting Machine, had an accuracy range from 93.5% to 94.2% and also achieved better results in the combined evaluation of the precision, recall, and F1 score. The specific classification tests for each microseismic event category showed the same results. The results suggested that the ensemble learning algorithms show better results compared to the classical machine learning algorithms.

期刊论文 2024-08-01 DOI: 10.3390/s24154892

Susceptibility Modeling and Potential Risk Analysis of Thermokarst Hazard in Qinghai-Tibet Plateau Permafrost Landscapes Using a New Interpretable Ensemble Learning Method

Yang, Yuting , Wang, Jizhou , Mao, Xi , Lu, Wenjuan , Wang, Rui , Zheng, Hao

Climate change is causing permafrost in the Qinghai-Tibet Plateau to degrade, triggering thermokarst hazards and impacting the environment. Despite their ecological importance, the distribution and risks of thermokarst lakes are not well understood due to complex influencing factors. In this study, we introduced a new interpretable ensemble learning method designed to improve the global and local interpretation of susceptibility assessments for thermokarst lakes. Our primary aim was to offer scientific support for precisely evaluating areas prone to thermokarst lake formation. In the thermokarst lake susceptibility assessment, we identified ten conditioning factors related to the formation and distribution of thermokarst lakes. In this highly accurate stacking model, the primary learning units were the random forest (RF), extremely randomized trees (EXTs), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost) algorithms. Meanwhile, gradient boosted decision trees (GBDTs) were employed as the secondary learning unit. Based on the stacking model, we assessed thermokarst lake susceptibility and validated accuracy through six evaluation indices. We examined the interpretability of the stacking model using three interpretation methods: accumulated local effects (ALE), local interpretable model-agnostic explanations (LIME), and Shapley additive explanations (SHAP). The results showed that the ensemble learning stacking model demonstrated superior performance and the highest prediction accuracy. Approximately 91.20% of the total thermokarst hazard points fell within the high and very high susceptible areas, encompassing 20.08% of the permafrost expanse in the QTP. The conclusive findings revealed that slope, elevation, the topographic wetness index (TWI), and precipitation were the primary factors influencing the assessment of thermokarst lake susceptibility. This comprehensive analysis extends to the broader impacts of thermokarst hazards, with the identified high and very high susceptibility zones affecting significant stretches of railway and highway infrastructure, substantial soil organic carbon reserves, and vast alpine grasslands. This interpretable ensemble learning model, which exhibits high accuracy, offers substantial practical significance for project route selection, construction, and operation in the QTP.

期刊论文 2024-07-01 DOI: 10.3390/atmos15070788

Remote sensing image classification using an ensemble framework without multiple classifiers

Peng Dou , Chunlin Huang , Weixiao Han , Jinliang Hou , Ying Zhang , Juan Gu

Recently, ensemble multiple deep learning (DL) classifiers has been reported to be an effective method for improving remote sensing classification accuracy. Although these approaches still follow the conventional pattern of inputting instance features and outputting corresponding classes, they often overlook the intrinsic relationships between pixels beyond their spatial features. As a result, the diversity in the ensemble classification results primarily relies on different DL models. However, training the DL models consumes a significant amount of time, and training multiple networks not only incurs additional time costs but also affects the overall efficiency. To address this, a new approach has been proposed in this paper, which takes advantage of the relationships between pixels and their combinations to generate diverse classification results. It’s a novel ensemble classification framework, termed as the Doublet-Based Ensemble Classification Framework (DBECF), which eliminates the need for multiple classifiers. The DBECF starts by utilizing the training set to combine different samples to generate doublets. Then, features are assigned to these doublets through an exponentiation operation, resulting in a doublet training set. Using both the original training set and the derived doublet datasets, the DBECF is trained. For each input pixel, the DBECF produces multiple classification results, which are then integrated to obtain a more accurate output. To validate the proposed approach, experiments were conducted on three datasets, including multispectral images, hyperspectral images, and time series images. The maximum accuracies achieved by DBECF on the three datasets are 87.80 %, 97.71 %, and 83.51 %, respectively. In comparison to the contrastive methods, the incremental improvements in accuracy are 3.73 %, 7.66 %, and 9.16 %, respectively. The experimental results indicate that no matter using DL or non-deep learning for training, our proposed framework achieves progress on accuracy improvement outperforming classifications using comparative approach that based on single instance. This research provides a new perspective on the combination of DL and ensemble learning, highlighting its important implications and practical value in enhancing classification accuracy and efficiency.

期刊论文 2024-02-01 ISSN: 0924-2716

Recognition of thaw slumps based on machine learning and UAVs: A case study in the Qilian Mountains, northeastern Qinghai-Tibet Plateau

Lou, Peiqing , Wu, Tonghua , Chen, Jie , Fu, Bolin , Zhu, Xiaofan , Chen, Jianjun , Wu, Xiaodong , Yang, Sizhong , Li, Ren , Lin, Xingchen , Shang, Chengpeng , Wen, Amin , Wang, Dong , La, Yune , Ma, Xin

The thawing of permafrost on the Qinghai-Tibet Plateau (QTP) leads to more frequent occurrences of thaw slump (TS), which have significant impacts on local ecosystems, carbon cycles, and infrastructure development. Ac-curate recognition of TS would help in understanding its occurrence and evolution. Machine learning capabilities for TS recognition are still not fully exploited. We systematically evaluate the performance of machine learning models for TS recognition from unmanned aerial vehicle (UAV) and propose an ensemble learning object-based model for TS recognition (EOTSR). The EOTSR has the following advantages: 1) pioneering the introduction of spatial information to assist in recognition; 2) the misclassification of recognition models is improved by object -based technology; and 3) attempting to integrate the strengths of different machine learning models to obtain a recognition accuracy no less than that of commonly used deep learning models. The results show that object -based technology is more suitable for TS recognition than pixel-based technology. Recursive feature elimina-tion (RFE)-based feature selection proves that texture and geometry are effective complements to TS recognition. Among the improved object-based machine learning models, support vector machine (SVM) has the highest recognition accuracy, with an overall accuracy of 93.06 %. McNemar's test proves that EOTSR significantly improves TS recognition compared to a single model and achieves an overall accuracy of 97.32 %. The EOTSR model provides an effective recognition method for the increasingly frequent TS events in the permafrost regions of the QTP, and can produce label data for deep learning models based on satellite imagery.

期刊论文 2023-02-01 DOI: http://dx.doi.org/10.1016/j.jag.2022.103163 ISSN: 1569-8432

Soil moisture content retrieval from Landsat 8 data using ensemble learning

Zhang, Yufang , Liang, Shunlin , Zhu, Zhiliang , Ma, Han , He, Tao

Although detailed spatial and temporal distribution of soil moisture is crucial for numerous applications, current global soil moisture products generally have low spatial resolutions (25-50 km), which largely limit their application at local scales. In this study, we developed a high-resolution soil moisture retrieval framework based on ensemble learning by integrating Landsat 8 optical and thermal observations with multi-source datasets, including in-situ measurements from 1,154 stations in the International Soil Moisture Network, the Soil Moisture Active Passive (SMAP) soil moisture product, the ERA5-Land reanalysis dataset, and auxiliary datasets (terrain, soil texture, and precipitation). Two widely used ensemble learning models were explored and compared using ten-fold cross-validation. The extreme gradient boosting (XGBoost) model performed slightly better than the random forest (RF) model, with a root mean square error (RMSE) of 0.047 m(3)/m(3) and correlation coefficient (R) of 0.952, respectively. Further validation using data from four independent soil moisture networks demonstrated that the prediction accuracy of the XGBoost model was comparable to the SMAP soil moisture product, but with a much higher spatial resolution. The model was finally used to map soil moisture over the high-altitude Tibetan Plateau, which is especially sensitive to climate change, from May to September of 2015. The comparison between our fine-scale soil moisture map at 30 m resolution and the coarse-scale SMAP soil moisture product (36 km) revealed high spatial consistency. These results suggest that there is potential to generate accurate soil moisture products globally at 30 m spatial resolution from the long-term Landsat archive. This finding has practical implications in scenarios requiring fine-scale soil moisture maps, such as climate change and permafrost modeling, hydrological and land surface modeling, and agriculture monitoring.

期刊论文 2022-03-01 DOI: 10.1016/j.isprsjprs.2022.01.005 ISSN: 0924-2716

在列表中检索

研究专题

资源类型

发布日期

关键词

收录类型

作者

出处