Document Type : Complete scientific research article
Authors
1
PhD student, Department of Soil Science, Faculty of Agriculture, University of Zanjan, Iran
2
Associate Professor, Department of Soil Science, Faculty of Agriculture, University of Zanjan, Iran
3
Assistant Professor, Soil and Water Research Institute, Agricultural Research, Education and Extension Organization, Karaj, Iran.
Abstract
Background and objectives: Optimal soil management and sustainable agricultural development require access to accurate and reliable information about the condition and classification of soil, and accurate prediction of soil classes and their location is of great importance. The use of machine learning methods and especially the cost-sensitive learning approach can help to improve the accuracy and efficiency of soil class prediction by considering the imbalance in the distribution of soil classes and providing valuable information for optimal soil management and agriculture. With this aim, this study was conducted in a part of the southwest lands of Zanjan province.
Materials and methods: A number of 148 soil profiles were excavated using a regular grid pattern with an average spacing of 500 meters (and in some locations, up to 700 meters based on expert recommendations), described and classified by laboratory analysis up to the family level. Covariates included geomorphological and geological map information, digital elevation model (DEM), and data from Landsat 8 satellite images that used principal component analysis (PCA) and expert knowledge approaches, some covariates including geomorphological maps, geological information, analytical hill shading, sunrise, valley depth, LS factor, channel network distance, topographic wetness index and multi-resolution ridge top flatness as the most effective covariates for predicting soil classes and model input is selected. Modeling of the soil-landscape relationship was performed using the algorithm, random forest (RF), and ensemble model (after data balancing) in “Rstudio” software.
Results: The soils of the region at the subgroup level were categorized in five classes, with imbalanced distribution, including Typic Calcixerepts, Typic Haploxerepts, Gypsic Haploxerepts, Typic Xerorthents, and Lithic Xerorthents. The results of overall accuracy and Kappa coefficient for evaluating soil map in random forest model were 65% and 0.32 before data balancing and after balancing the data with a cost-sensitive learning approach 86% and 0.77, respectively. The accuracy values of the prediction of soil classes at the subgroup level showed that after balancing with a cost-sensitive learning approach, all soil classes, especially the two minority classes of Gypsic Haploxerepts and Lithic Xerorthents, with user accuracy values of 100% and 100% and producer accuracy of 91% and 85% respectively, were predicted with very high accuracy. The values of the sensitivity index for the two minority classes of Gypsic Haploxerepts (zero) and Lithic Xerorthents (zero) show that no correct prediction has been made for these two minority classes. The Specificity index values for Gypsic Haploxerepts and Lithic Xerorthents classes are equal to 1 and 0.97, respectively, these values show that the ability of the model to distinguish these two classes is very high compared to other classes. The results of balanced accuracy showed that the accuracy of the model in differentiating the minority classes of Gypsic Haploxerepts and Lithic Xerorthents with the values of 0.50 and 0.49 by the model is more difficult than other classes, but the model can predict the classes relatively well.
Conclusion: The results of the study confirm that the method of improving imbalanced data with a cost-sensitive learning approach increases the accuracy of prediction in soil classes and produced maps. The focus of the model in the cost-sensitive learning method is on the data with the low number (minority) and this reduces the prediction error and increases the accuracy of the model. The results showed that the random forest algorithm using the cost-sensitive learning approach can have a significant improvement in distinguishing soil classes, especially minority classes.
Keywords
Main Subjects