Evaluation Extrapolation of Random Forest Model to Predict Soil Classes at Subgroup Level

Document Type : Complete scientific research article

Authors

1 Ph.D Student, Department of soil science, University of Zanjan, and Instructor Soil and Water Research Institute (SWRI), Agricultural Research, Education and Extension Organization, Ministry of Agriculture.

2 Associate Professor, University of Zanjan

3 Assistant of Professor, Ardakan University

4 New Mexico State University Department of Plant and Environmental Sciences NM, USA

Abstract

Background and objectives: Many soil maps that produced in Iran are in medium scale related to the soil survey projects that have done over the past six decades. In many cases, soil maps have not updated due to the high cost of soil survey activities in conventional methods. A proposed solution to overcome limitations of the conventional soil survey is digital soil mapping (DSM) that extensively used for producing soil maps in many countries recently. The extrapolation method in which soil pattern rules in reference area is used for soil class prediction in other areas as a cost-effective method have been mentioned by some soil surveyors. To achieve the main advantages of extrapolation in DSM, in this research we evaluated the use of random forest model in a reference area (donor area) for producing soil taxonomic classes at subgroup level in a site out of the reference area (recipient area).
Materials and methods: In this study two neighboring areas in Fars Province in southern Iran were selected: 1) Saadat Shahr plain as donor site and, 2) Seidan plain as recipient area. Two agricultural plain have a moderately similar environmental condition such as elevation, geology, physiography, and climate and agriculture behavior. In donor area, 82 soil profiles were excavated, described and analyzed. Latin hypercube sampling (LHS) was used as a statistical method in donor area. In recipient area, 27 locations were determined on some parallel transects across the plain. All soils were classified according to USDA soil taxonomy System (2014). Random forest (RF) in R statistical software was used to predict soil classes in donor area. Then the constructed model in donor area saved and applied to the recipient area. 25 variables related to soil forming factors consist of 1) primary and secondary train attributes and 2) remote sensing indices obtained from Landsat 8 satellite, OLI sensor imagery were used in this study. All auxiliary environmental covariate layers were resampled to a 30 resolution. Producer's, users and overall accuracy and kappa index calculated according to the agreement of the field surveyed with predicted soil classes.
Results: Using RF algorithm from the 25 variables related to soil forming factors, five primary and secondary train attributes consist of slop, multiresolution index of valley bottom flatness (MRVBF), terrain ruggedness index, topographic wetness index and modified catchment area were selected as influential covariates. An overall accuracy of 72%, and a Kappa index of 0.59 in the donor area, illustrating the relatively desirable agreement between observed and predicted soil classes. For extrapolating evaluation, the result of RF model with 70% of soil samples in the donor area was compared with the output of the transported RF model using 27 observations of the validation dataset. The overall accuracy of the external validation was 45%, and the Kappa index was 0.28. Transferring the RF model constructed by all soil samples of the donor area (100%) showed a better result of soil prediction in the recipient area. The overall accuracy and the Kappa index of the external validation was 52% and 0.38, respectively. From the six soil subgroup classes, the best predicted classes were Typic Calcixerepts and Typic Xerorthents. Some classes were too sparse and the model was unable to predict them correctly.
Conclusion: The results showed that the model extrapolation in the framework of DSM could be a powerful tool for producing soil map in the area of Iran that soil maps are not available or updating the present soil maps are time and cost consuming. The low-cost and time saving method reported here, encourages soil surveyors to select model extrapolation for their survey activities.

Keywords


 1.Abbaszadeh Afshara, F., Ayoubib, S., and Jafari, A. 2018. The extrapolation of soil great groups using multinomial logistic regression at regional scale in arid regions of Iran. Geoderma. 315: 1. 367-48.
2.Breiman, L. 2001. Random forests. Machine Learning. 45: 1. 5-32.
3.Brungard, C.W., Boettinger, J.L., Duniway, M.C., Wills, S.A., and Edwards, T.C. 2015. Machine earning for predicting soil classes in three semi-arid landscapes. Geoderma. 239-240: 1. 68-83.
4.Bui, E.N. 2004. Soil survey as a knowledge system. Geoderma. 120: 1-2. 17-26.
5.Carré, F., and Girard, M.C. 2002. Quantitative mapping of soil types based on regression kriging of taxonomic distances with landform and land cover attributes. Geoderma. 110: 3-4. 241-263.
6.Coll, C., Galve, J.M., Sanchez, J.M., and Caselles, V. 2010. Validation of Landsat-7/ETM+ thermal-band calibration and atmospheric correction with ground-based measurements. IEEE Transactions on Geoscience and Remote Sensing.
48: 1. 547-555.
 7.Congalton, R. 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment. 37: 1. 35-46.
8.Debella-Gilo, M., and Etzelmüller, B. 2009. Spatial prediction of soil classes using digital terrain analysis and multinomial logistic regression modeling integrated in GIS. Examples from Vestfold County, Norway. Catena. 77: 1. 8-18.
9.Gallant, J.C., and Austin, J.M. 2015. Derivation of terrain covariates for digital soil mapping in Australia. Soil Research. 53: 1. 895-90.
10.Grimm, R., Behrens, T., Marker, M., and Elsenbeer, H. 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island-Digital soil mapping using random forests analysis. Geoderma. 146: 1-2. 102-113.
11.Grinand, C., Arrouays, D., Laroche, B., and Martin, M.P. 2008. Extrapolating regional soil-landscapes from an existing soil map: sampling intensity, validation procedures, and integration of spatial context. Geoderma. 143: 1-2. 180-190.
12.Guo, P.T., Li, M.F., Luo, W., Tang, Q.F., Liu, Z.W., and Lin, Z.M. 2015. Digital mapping of soil organic matter for rubber plantation at regional scale: an application of Random Forest plus residual kriging approach. Geoderma. 237-238: 1. 49-59.
13.Heung, B.C., Bulmer, C.E., and Schmitdt, M.G. 2014. Predictive soil parent material mapping at a regional-scale: A random forest approach. Geoderma. 214-215: 1. 141-154.
14.Ho, H.C., Knudby, A., Sirovyak, P., Xu, Y., Hodul, M., and Henderson, S.B. 2014. Mapping maximum urban air temperature on hot summer days. Remote Sensing of Environment. 154: 1. 38-45.
15.Jenny H. 1941. Factors of Soil Formation, a System of Quantitative Pedology. McGraw-Hill, New York, 281p.
16.Lagacherie, P. 2002. Cartographie de la diversité des sols viticoles de versant par imagerie à haute résolution: contribution à la connaissance des terroirs, Montpellier, France.
17.Lagacherie, P., Legros, J.P., and Burrough, P.A. 1995. A soil survey procedure using the knowledge on soil pattern of a previously mapped reference area. Geoderma. 65: 3-4. 283-301.
18.Mahler, P.J. 1970. Manual of Multipurpose Land Classification. Report no. 212. Soil and Water Research Institute, Tehran. Iran.
19.Mallavan, B.P., Minasny, B., and McBratney, A.B. 2010. Homosoil: a methodology for quantitative extrapolation of soil information across the globe.
P 137-149. In: J.L. Boettinger (ed.) Digital Soil Mapping: Bridging Research, Environmental Application, and Operation. Springer, London.
20.Malone, B.P., Sanjeev, K.J., Minasny, B., and McBratney, A.B. 2016. Comparing regression-based digital soil mapping and multiple-point geostatistics for the spatial extrapolation of soil data. Geoderma. 262: 1. 243-253.
21.McBratney, A.B., Mendonça Santos, M.L., and Minasny, B. 2003. On digital soil mapping. Geoderma. 117: 1-2. 3-52.
22.Mehnatkesh, A., Ayoubi, S., Jalalian, A., and Sahrawat, K.L. 2013. Relationships between soil depth and terrain attributes in a semi-arid hilly region in western Iran. J. Moun. Sci. 10: 1. 163-172.
23.Minasny, B., and McBratney, A.B. 2006. A conditioned Latin hypercube method for sampling in the presence of  ancillary information. Computer and Geoscience. 32: 9. 1378-1388.
24.Minasny, B., and McBratney, A.B. 2007. Spatial prediction of soil properties using EBLUP with the Matern covariance function. Geoderma. 140: 1. 324-336.
25.Moore, I.D., Gessler, P., Nielsen, G., and Peterson, G. 1993. Soil attribute prediction using terrain analysis. Soil Sci. Soc. Amer. J. 57: 2. 443-452.
26.Pahlavan Rad, M.R., Toomanian, N., Khormali, F., Brungard, C.W., Komaki, C.B., and Bogaert, P. 2014. Updating soil survey maps using random forest and conditioned Latin hypercube sampling in the loess derived soils of northern Iran. Geoderma. 232-234: 1. 97-106.
27.RStudio. 2015. RStudio: Integrated Development Environment for R, Boston, MA. Available at http://www.
r-studio.com. (Visited 20 November 2018).
28.Saga Development Team. 2011. System for Automated Geoscientific Analyses (SAGA). Available at http://saga-gis. org/en/index.html (visited 12 August 2012).
29.Schoeneberger, P.J., Wysocki, D.A., Benham, E.C., and Broderson, W.D. 2012. Field book for describing and sampling soils, version 3.0. USDA Natural Resources Conservation Service, National Soil Survey Center, Lincoln, NE.
30.Sim, J., and Wright, C.C. 2005. The kappa statistic in reliability studies: use, interpretation and sample size requirements. Physical Therapy. 85: 3. 257-268.
31.Soil and Water Research Institute. 1999. Semi detailed soil survey of Saadat Shahr, Sivand, Seydan and Arsenjan. Soil and Water Research Institute of Iran, Ministryof Agricultures, Tehran, Iran. (In Persian)
32.Soil Survey Staff. 2014. Keys to soil taxonomy, 12th edition. USDA Natural Resources Conservation Service.
33.Taghizadeh-Mehrjardi, R., Nabiollahi, K., Minasny, B., and Triantafilis, J. 2015. Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma. 253-254: 1. 67-77.
34.Thompson, J.A., Pena-Yewtukhiq, E.M., and Grove, J.H. 2006. Soil-landscape modeling across a physiographic
region: topographic patterns and model transportability. Geoderma. 133: 1-2. 57-70.
35.United State Department of Agriculture. Soil Conservation Service. 1993. Soil survey manual. Soil Survey. Div.
Staff. US. Department of Agriculture. Handbook. 18. Washington, DC.
36.Zhu, A.X., Hudson, B., Burt, J., Lubich, K., and Simonson, D. 2001. Soil mapping using GIS, expert knowledge, and fuzzy logic. Soil Sci. Soc. Amer. J. 65: 5. 1463-1472.
37.Zhu, A.X., Liu, J., Du, F., Zhang, S.J., Qin, C.Z., Burt, J., Behrens, T., and Scholten, T. 2015. Predictive soil mapping with limited sample data. Europ. J. Soil Sci. 66: 1. 535-547.