Predicting of Key Environmental Factors from Soil Properties Based on Artificial Neural Network and Random Forest Learning Model
-
摘要: 土壤与其发生环境密切相关。如何利用土壤属性准确地推测环境要素的信息,是法庭土壤学的重要研究问题。本文以我国东部4省2市(北京、天津、河北、山东、安徽和江苏)为研究区,基于746个土壤表层样本的理化性质和光谱数据构建特征,使用人工神经网络和随机森林两种机器学习模型对海拔高度、年均温、年均降雨量和地表温度四个关键环境要素进行预测,并对两种模型的预测准确度进行了对比分析。结果显示:两个模型对四个目标环境变量的预测准确度R2在0.39 ~ 0.61之间;与神经网络模型相比,随机森林模型能够解释的环境变量的空间变异分别提高了9.9%、16.5%、10.3%、10.9%;同时发现,对海拔高度和降雨的预测效果要优于其他环境要素。这表明,利用机器学习的方法可以有效地从土壤属性反推其成土环境条件的信息,这为法庭土壤物证研究学中未知土壤样本的来源地范围识别提供了技术参考。Abstract: Soil is closely related to its formative environment. How to use soil properties to accurately predict the associated environmental information is an important research problem in soil forensics. About 746 soil samples were selected from Beijing, Tianjin, Hebei, Shandong, Anhui and Jiangsu in eastern China. Four key environmental information (elevation, average annual temperature, average annual rainfall and surface temperature) were predicted based on basic soil properties and spectral data using two machine learning models (neural network and random forest). Root mean square error (RMSE), determination coefficients (R2) and concordance correlation coefficient (CCC) were used to calculate the prediction accuracy. Results showed that the prediction accuracy of the two methods were between 0.39 and 0.61. Compared with the neural network model, the spatial variation of environmental variables using random forest model were increased by 9.9% (elevation), 16.5% (average annual temperature), 10.3% (average annual rainfall), and 10.9% (surface temperature). And altitude and rainfall in this study area showed a better prediction accuracy than the other environmental variables. This suggests that the machine learning methods can be effective for predicting environmental information based on soil properties. This study provided a technical support for identifying the source of unknown soil samples in soil forensics.
-
Key words:
- Soil forensics /
- Neural network /
- Random forest /
- Environmental factor /
- Soil attribute
-
表 1 各环境变量的BPNN模型参数
Table 1. BPNN model parameters of environmental variables
目标变量
Target variable循环次数
Cycle times隐藏层神经元数
Number of neurons训练方法
Training method训练次数
Training times海拔高度 50 24 ADAPTgd 10000 年均温 100 28 ADAPTgd 1000 年均降水量 50 32 ADAPTgd 100 地表温度 100 45 ADAPTgd 1000 表 2 各环境变量的RF模型参数
Table 2. RF model parameters for environmental variables
目标变量
Target variables节点变量数
Mtry决策树数量
Ntree海拔高度 3 1000 年均温 3 500 年均降水量 3 1000 地表温度 3 800 表 3 环境变量的描述性统计
Table 3. Descriptive statistics of environmental variables
环境变量
Environmental variables最小值
Minimum最大值
Maximum均值
Mean标准差
Standard deviation变异系数(%)
Coefficient of variable偏度
Skewness峰度
Kurtosis海拔高度(m) −1 2044 215.19 407.44 189.34 2.46 5.29 年均温(℃) 0 17.8 13.0 3.68 28.36 −1.56 2.29 年均降水量(mm) 289.79 2286.48 800.70 333.02 41.59 0.84 0.30 地表温度(℃) 29.5 31.1 30.49 0.31 1.00 −0.56 0.14 表 4 BPNN与RF对各环境变量的预测准确度对比
Table 4. Comparisons of the predictive accuracy of environmental variables between BPNN and RF models
环境变量
Environmental variable模型
Model决定系数
R2均方根误差
RMSE一致性相关系数
CCC海拔高度(m) BPNN 0.510 255.6 0.698 RF 0.609 228.3 0.750 年均温(℃) BPNN 0.394 2.582 0.600 RF 0.559 2.205 0.707 年均降水量(mm) BPNN 0.510 240.7 0.668 RF 0.613 214.0 0.729 地表温度(℃) BPNN 0.404 0.246 0.558 RF 0.513 0.222 0.633 -
[1] Moore I D, Gessler P E, Nielsen G A, et al. Soil Attribute Prediction Using Terrain Analysis[J]. Soil Science Society of America Journal, 1993, 57: 443 − 452. doi: 10.2136/sssaj1993.03615995005700020026x [2] Thompson J A, Bell J C, Butler C A. Digital elevation model resolution: effects on terrain attribute calculation and quantitative soil-landscape modeling[J]. Geoderma, 2001, 100: 67 − 89. doi: 10.1016/S0016-7061(00)00081-1 [3] 孙孝林, 赵玉国, 秦承志, 等. DEM栅格分辨率对多元线性土壤-景观模型及其制图应用的影响[J]. 土壤学报, 2008, 45: 971 − 977. [4] Piccini C, Marchetti A, Francaviglia R. Estimation of soil organic matter by geostatistical methods: Use of auxiliary information in agricultural and environmental assessment[J]. Ecological Indicators, 2014, 36: 301 − 314. doi: 10.1016/j.ecolind.2013.08.009 [5] Zhang Y K, Ji W J, Saurette D D, et al. Three-dimensional digital soil mapping of multiple soil properties at a field-scale using regression kriging[J]. Geoderma, 2020, 366 [6] 赵 量, 赵玉国, 李德成, 等. 基于模糊集理论提取土壤-地形定量关系及制图应用[J]. 土壤学报, 2007: 961 − 967. [7] 李启权, 王昌全, 张文江, 等. 基于神经网络模型和地统计学方法的土壤养分空间分布预测[J]. 应用生态学报, 2013, 24: 459 − 466. [8] 齐雁冰, 王茵茵, 陈 洋, 等. 基于遥感与随机森林算法的陕西省土壤有机质空间预测[J]. 自然资源学报, 2017, 32: 1074 − 1086. [9] 王茵茵, 齐雁冰, 陈 洋, 等. 基于多分辨率遥感数据与随机森林算法的土壤有机质预测研究[J]. 土壤学报, 2016, 53: 342 − 354. [10] Liu F, Zhang G L, Song X D, et al. High-resolution and three-dimensional mapping of soil texture of China[J]. Geoderma, 2020, 361 [11] Liang Z Z, Chen S C, Yang Y Y, et al. High-resolution three-dimensional mapping of soil organic carbon in China: Effects of SoilGrids products on national modeling[J]. Science of the Total Environment, 2019, 685: 480 − 489. doi: 10.1016/j.scitotenv.2019.05.332 [12] 黄 魏, 许 伟, 汪善勤, 等. 基于不确定性模型的土壤-环境关系知识获取方法的研究[J]. 土壤学报, 2018, 55(1): 54 − 63. [13] Ramcharan A, Hengl T, Nauman T, et al. Soil Property and Class Maps of the Conterminous United States at 100-Meter Spatial Resolution[J]. Soil Science Society of America Journal, 2018, 82: 186 − 201. doi: 10.2136/sssaj2017.04.0122 [14] Rossel R A V, Chen C, Grundy M J, et al. The Australian three-dimensional soil grid: Australia’s contribution to the GlobalSoilMap project[J]. Soil Research, 2015, 53: 845 − 864. doi: 10.1071/SR14366 [15] Wald C. Forensic science: The soil sleuth[J]. Nature, 2015, 520: 422 − 424. doi: 10.1038/520422a [16] Tighe M, Forster N, Guppy C, et al. Georeferenced soil provenancing with digital signatures[J]. Scientific Reports, 2018, 8: 3162. doi: 10.1038/s41598-018-21530-7 [17] Jenny, Ha N S. Factors of Soil Formation[J]. Soil Science, 1941, 52(5): 415. [18] 朱阿兴, 杨 琳, 樊乃卿, 等. 数字土壤制图研究综述与展望[J]. 地理科学进展, 2018, 37(1): 66 − 78. [19] Rosero-vlasova O A, Vlassova L, Perez-cabello F, et al. Soil organic matter and texture estimation from visible-near infrared-shortwave infrared spectra in areas of land cover changes using correlated component regression[J]. Land Degradation & Development, 2019, 30: 544 − 560. [20] Xu D Y, Ma W Z, Chen S C, et al. Assessment of important soil properties related to Chinese Soil Taxonomy based on vis-NIR reflectance spectroscopy[J]. Computers and Electronics in Agriculture, 2018, 144: 1 − 8. doi: 10.1016/j.compag.2017.11.029 [21] 史 舟, 王乾龙, 彭 杰, 等. 中国主要土壤高光谱反射特性分类与有机质光谱预测模型[J]. 中国科学:地球科学, 2014, 44: 978 − 988. [22] 纪文君, 史 舟, 周 清, 等. 几种不同类型土壤的VIS-NIR光谱特性及有机质响应波段[J]. 红外与毫米波学报, 2012, 31(3): 277 − 282. [23] 龚子同, 黄荣金, 张甘霖. 中国土壤地理[M]. 北京: 科学出版社, 2014. [24] 张甘霖, 龚子同. 土壤调查实验室分析方法[M]. 北京: 科学出版社, 2012. [25] 杨诸胜. 高光谱图像降维及分割研究[D]. 西安: 西北工业大学, 2006. [26] 傅 湘, 纪昌明. 区域水资源承载能力综合评价-主成分分析法的应用[J]. 长江流域资源与环境, 1999, 8(2): 168 − 173. [27] 李 硕, 汪善勤, 张美琴. 基于可见-近红外光谱比较主成分回归、偏最小二乘回归和反向传播神经网络对土壤氮的预测研究[J]. 光学学报, 2012, 32(8): 297 − 301. [28] 沈润平, 丁国香, 魏国栓, 等. 基于人工神经网络的土壤有机质含量高光谱反演[J]. 土壤学报, 2009, 46(3): 391 − 397. [29] Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5 − 32. doi: 10.1023/A:1010933404324 [30] 张 雷, 王琳琳, 张旭东, 等. 随机森林算法基本思想及其在生态学中的应用−以云南松分布模拟为例[J]. 生态学报, 2014, 34(3): 650 − 659. [31] 覃光华. 人工神经网络技术及其应用[D]. 四川: 四川大学, 2003. [32] Claudia L, A B P, C I M, et al. Robust and Accurate Shape Model Matching Using Random Forest Regression-Voting[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1862 − 1874. doi: 10.1109/TPAMI.2014.2382106 [33] 张甘霖, 史 舟, 朱阿兴, 等. 土壤时空变化研究的进展与未来[J]. 土壤学报, 2020, 57(5): 1060 − 1070. [34] 孙孝林, 赵玉国, 赵 量, 等. 应用土壤-景观定量模型预测土壤属性空间分布及制图[J]. 土壤, 2008: 837 − 842. [35] 李富富, 陈东湘, 王院民, 等. 基于随机森林与地统计预测城市土壤PAHs分布[J]. 中国环境科学, 2019, 39(12): 5240 − 5247. -