Australian Soil Resources Information System (ASRIS)
2001 ASRIS Collated CSIRO and State soils data into a single nation-wide Oracle database Collated digital soil, land systems, and lithology maps Produced models to predict spatial distribution of soil properties (ph, %SOC, total N and P, texture class, % clay ) from point data for 2 layers Binary decision tree (Cubist and C5.0) modelling for most, regression for % N Mapped the model rules
What does this illustrate? Inductive reasoning approach Legacy data: positional accuracy, methods of laboratory analysis How many points do you need to build a model? How can you evaluate model results?
Total 135 000 points
Topsoil ph: 24 319 points Henderson, B.L., Bui, E.N., Moran, C.J., Simon, D.A.P., 2005. Australiawide predictions of soil properties using decision trees. Geoderma 124:383-398.
ph data No. of horizons with laboratory measurement Method code Method description 72152 4A1 ph of 1:5 soil/water suspension 231 4A_C_1 ph of soil - ph of 1:1 soil/water suspension 3664 4A_C_2.5 ph of soil - ph of 1:2.5 soil/water suspension 38268 4B1 ph of 1:5 soil/0.01m calcium chloride extract - direct 3747 4B2 ph of 1:5 soil/0.01m calcium chloride extract - following Method 4A1 505 4B_C_2.5 ph of soil - ph of 1:2.5 Soil/0.1M CaCl2 suspension 892 4C1 ph of 1:5 soil/1m potassium chloride extract - direct 231 4C_C_1 ph of 1:1 soil/1m potassium chloride suspension 284 4E1 ph of hydrogen peroxide extract 237 4G1 Total potential acidity 31599 4_NR ph of soil - Not recorded
Conversion ph CaCl2 2 4 6 8 10 ph CaCl2 4 6 8 10 additive model cubic Ahern et al. (1995): cubic Little (1992): cubic Ahern et al. (1995): linear 2 4 6 8 10 ph water 4 6 8 10 ph water
Topsoil SOC: 11 483 points
SOC: Pooled 11483 analyses by Walkley-Black (6A1), Heanes wet oxidation (6B1), combustion (6B2, 6B3)
Environmental Modelling Cubist to build binary decision trees using environmental predictors 19 climate surfaces (MAT, mean diurnal range, isothermality, temperature seasonality, max T of warmest month, min T coolest month, annual T range, MAP, precipitation of wettest and driest months, precipitation seasonality, annual mean radiation, highest and lowest monthly radiation, radiation seasonality, AMMI, highest and lowest month moisture index, and moisture index seasonality) Landsat MSS (4 bands) Lithology, land use, ASC (all categorical) 9 DEM & terrain attributes (e.g., slope, distances to ridges/rivers, relief )
Climatic variables
Example Rule 1: If ammi <=5542 mmi seasonality > 38200 DEM <= 226 Lith {6,14,19} Then %SOC = 5.7 + 0.006 clim1 + 0.005 clim12 0.001 clim28 0.005 clim5-0.007 clim22
Topsoil organic C
% total N in A-horizon ln (%TN) = -2.6589 + 0.8761 ln (%SOC)
Topsoil ph <3.5 3.5 4.0 4.0 4.5 4.5 5.0 5.0 5.5 5.5 6.0 6.0 6.5 6.5 7.0 7.0 7.5 7.5 8.0 > 8.0
Model evaluation statistics Property N Performance on test data set (30% withheld) Model unit (total) R 2 RMSE ave. err rel. err corr. rules ph (topsoil) ph (subsoil) SOC (topsoil) SOC (subsoil) total N (topsoil) total P (topsoil) Cubist 24319 0.67 0.77 0.56 0.51 0.82 27 Cubist 12193 0.54 0.96 0.72 0.59 0.74 27 Cubist log 11483 0.41 0.57 0.40 0.68 0.64 29 Cubist log 5100 0.24 0.77 0.59 0.84 0.50 19 regression log 4746 0.75 0.42 Cubist log 7377 0.62 0.92 0.68 0.54 0.79 18
ph data by State 4 6 8 10 4 6 8 10 NSW QLD SA TAS 1500 1000 500 Count 1500 VIC WA CSIRO 0 1000 500 0 4 6 8 10 4 6 8 10 ph in layer 1
Topsoil ph model fit 4 5 6 7 8 9 4 5 6 7 8 9 NSW QLD SA TAS 8 Observed ph layer 1 8 VIC WA CSIRO 6 4 6 4 4 5 6 7 8 9 Predicted ph layer 1 4 5 6 7 8 9
Topsoil ph model fit (overall)
Spatial structure of residuals NSW QLD SA TAS gamma 0.0 0.2 0.4 0.6 gamma 0.0 0.2 0.4 gamma 0.0 0.4 0.8 1.2 gamma 0.0 0.2 0.4 0.0 0.05 0.15 distance (degrees) 0.0 0.05 0.15 distance (degrees) 0.0 0.05 0.15 distance (degrees) 0.0 0.05 0.15 distance (degrees) VIC WA CSIRO gamma 0.0 0.2 0.4 gamma 0.0 0.10 0.20 gamma 0.0 0.2 0.4 0.0 0.05 0.15 distance (degrees) 0.0 0.05 0.15 distance (degrees) 0.0 0.05 0.15 distance (degrees)
Wynn et al. (2006) sites Auxiliary material from Wynn et al. 2006. Global Biogeochemical Cycles, vol. 20, GB1007
8 7 6 5 4 3 2 2 3 4 5 6 7 8 ph est. in CaCl2 (0-30 cm) near trees in grass ASRIS predicted topsoil ph in CaCl2
14 R 2 bet. ASRIS predicted and %SOC_30_T = 0.85 R 2 bet. ASRIS predicted and %SOC_30_G = 0.78 12 10 %SOC (predicted) 8 6 4 2 0 0 1 2 3 4 5 6 7 %SOC (meas. by Wynn et al. 2006) SOC_30_T SOC_30_G
0.7 R 2 bet. ASRIS predicted and %N_30_T = 0.64 R 2 bet. ASRIS predicted and %N_30_G = 0.67 ASRIS overestimates %N 0.6 0.5 % N (predicted) 0.4 0.3 0.2 0.1 0 0 0.05 0.1 0.15 0.2 0.25 % N meas. by Wynn et al. (2006) %N_30_T %N_30_G
0.45 0.4 abs. error in N = predicted - measured 0.35 0.3 0.25 0.2 0.15 0.1 y = 0.0191e 0.0543x y = 0.0261e 0.0388x 0.05 0 0 10 20 30 40 50 60 C:N (meas. by Wynn et al. 2006) %N_30_T Expon. (%N_30_T) %N_30_G Expon. (%N_30_G)
On a linear scale % SOC 20 18 16 14 12 10 8 6 4 2 C:N ~ 25 C:N ~ 12 0 0 0.2 0.4 0.6 0.8 1 1.2 % N
Summary Testing with withheld data subset is good for development of a parsimonious model but is inadequate for model evaluation Independent ground-truth is necessary for accuracy assessment Cubist SOC model is better than it appears from statistical model testing because the input data contain much noise Cubist algorithm is able to identify valid structure in the data (and generate knowledge)
Vis-NIR análisis para aumentar la base de datos Raphael Viscarra Rossel CSIRO Land and Water
The Australian vis NIR library National soil archive 16,000 Some analytical data WA Agriculture 232 Qld PI & Fisheries 1,578 Other 737 Largely with analytical data NGSA 2,244 0-20 and 50-80 cm no analytical data
Incomplete analytical data Using the spectra for samples with analytical data we can populate our databases with new spectroscopic estimates N = 22000 TOC phw Clay CEC BD Minerals n 8479 16,570 13,499 3530 1232 8
Spectroscopic modelling by DWT-ANN n vis-nir spectra X k Modelling statistics for cross validation and independent test set validation Wavelet coeffs. DWT Daubechies 4 k Soil attribute n train /n test R 2 xval RMSE xval R 2 test RMSE test nrmse test X TOC 5619/ 2809 0.78 0.26 0.76 0.28 0.08 n Select coefficients ph w 11045/ 5523 0.79 0.53 0.77 0.56 0.09 X y Model with ANN Clay 9000/ 4499 CEC 2335/ 1166 BD 823/ 409 0.82 6.3 0.82 7.1 0.08 0.85 0.21 0.83 0.22 0.08 0.69 0.18 0.67 0.21 0.11 nrmse = 1 N N i=1 ( ) ˆ y i y i Viscarra Rossel & Lark 2009 EJSS y max y min
Por ejemplo, los datos 0 20 cm Total 4606 surface (0 20 cm) samples CSIRO s National Soil Archive (NSA) National Geochemical Survey of Australia (NGSA) WA Agriculture Queensland Primary Industries & Fisheries (QPIF) Other sources
Mapas del color de suelos 0 20 cm Simulationes geoestadisticas de R,G,B (i) RGB composite
Distribucion de oxidos de hierro 0 20 cm Simulaciones indicadoras del normalised iron oxide difference index (NIODI) Ix; ( z k )= ( ) < z k 1, if zx 0, otherwise = D 920 D 880 D 880 + D 920 Viscarra Rossel et al. (201x) submitted
Arcilla y ph 0 20 cm
Carbon organico total 0 20 cm
Conclusiones Es posible crear modelos para asignar las propiedades del suelo con los datos en los puntos con variables ambientales utilizando árboles de regresión híbridos o ANN; o con geoestadisticas Análisis de espectros vis-nir engendra datos utiles para aumentar una base de datos medidos en el laboratorio Preparar la base de datos es lo que toma lo mas tiempo!