vignettes/examples_1_mainFunctions_AB.Rmd
examples_1_mainFunctions_AB.RmdHere are presented, in a full and complete example, all main
functions (starting with BIOMOD_[...]) of
biomod2.
The data set used is the DataSTOC containing abundance
data.
Similar examples are presented for binary data on the Main functions (bin)
webpage.
library(biomod2)
library(terra)
# Load species abundances (20 species available)
data('DataSTOC')
head(DataSTOC)
# Divide into calibration/validation and evaluation
period1 <- which(DataSTOC[, 'period'] == '2006-2011')
period2 <- which(DataSTOC[, 'period'] == '2012-2017')
# Select the name of the studied species
myRespName <- 'Periparus.ater'
# Get corresponding count data
myResp <- as.numeric(DataSTOC[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSTOC[, c('X_WGS84', 'Y_WGS84')]
# Get corresponding environmental variables
myExpl <- DataSTOC[, c('temp', 'precip', 'sdiv_hab',
'cover_agri', 'cover_water', 'cover_wet')]Main difference when providing abundance data instead of binary is to
define the data.type parameter.
The distribution used
in the selected algorithms afterwards will be set accordingly, for
example :
| Type | Data | Distribution |
|---|---|---|
binary |
Only 1, and 0 or NA | binomial |
count |
Positive integer values | poisson |
multiclass |
Categorical classes | classification |
ordinal |
Ordered categorical classes | classification |
relative |
Continuous values between 0 and 1 | beta |
abundance |
Positive continuous values | gaussian |
# Format data with count data type
myBiomodData <- BIOMOD_FormatingData(resp.name = myRespName,
resp.var = myResp[period1],
resp.xy = myRespXY[period1, ],
expl.var = myExpl[period1, ],
eval.resp.var = myResp[period2],
eval.resp.xy = myRespXY[period2, ],
eval.expl.var = myExpl[period2, ],
data.type = 'count')
myBiomodData
summary(myBiomodData)
plot(myBiomodData, plot.eval = TRUE)Note that pseudo-absence selection is only possible when
data.type = 'binary'.
Several cross-validation methods are available and can be selected
with the BIOMOD_Modeling
function, which calls the bm_CrossValidation
function to do so. More examples are presented on the Auxiliary functions
webpage.
When using the balance parameter with
multiclass or ordinal data, proportions are
balanced within each class as much as possible.
# # k-fold selection
# cv.k <- bm_CrossValidation(bm.format = myBiomodData,
# strategy = 'kfold',
# nb.rep = 2,
# k = 3)
#
# # stratified selection (geographic)
# cv.s <- bm_CrossValidation(bm.format = myBiomodData,
# strategy = 'strat',
# k = 2,
# balance = 'presences',
# strat = 'x')
# head(cv.k)
# head(cv.s)Modeling options are automatically retrieved from selected models
within the BIOMOD_Modeling
function, which calls the bm_ModelingOptions
function to do so. Model parameters can also be automatically tuned to a
specific dataset, by calling the bm_Tuning function,
however it can be quite long. More examples are presented on the Auxiliary functions
webpage.
# # bigboss parameters
# opt.b <- bm_ModelingOptions(data.type = 'count',
# models = c('DNN', 'XGBOOST'),
# strategy = 'bigboss')
#
# # tuned parameters with formated data
# opt.t <- bm_ModelingOptions(data.type = 'count',
# models = c('DNN', 'XGBOOST'),
# strategy = 'tuned',
# bm.format = myBiomodData)
#
# opt.b
# opt.tNot all algorithms are available when it comes to data type other than binary.
| ANN | CTA | DNN | FDA | GAM | GBM | GLM | MARS | MAXENT | MAXNET | RF | RFd | SRE | XGBOOST | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| binary | x | x | x | x | x | x | x | x | x | x | x | x | x | x |
| multiclass | x | x | x | x | x | x | ||||||||
| ordinal | x | x | x | x | x | x | x | x | ||||||
| count / relative / abundance | x | x | x | x | x | x | x | x |
New evaluation metrics have been added :
Accuracy, Recall, Precision
and F1 for multiclass and ordinalRMSE, MSE, MAE,
Max_error, Rsquared and
Rsquared_aj for count, relative and abundanceas well as new ensemble models :
EMmode and EMfreq for multiclass and
ordinal The selection of single
models for the ensemble modeling is different for RMSE,
MSE, MAE and Max_error metrics :
are kept all models whose evaluation value is < (best value + the
given threshold).
# Model single models
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'AllModels',
models = c('DNN', 'GBM', 'GLM', 'MARS', 'RF', 'XGBOOST'),
CV.strategy = 'block',
OPT.strategy = 'bigboss',
metric.eval = c('Rsquared', 'Rsquared_aj', 'RMSE'),
var.import = 3)
# seed.val = 123)
# nb.cpu = 8)
myBiomodModelOut
# Get evaluation scores & variables importance
get_evaluations(myBiomodModelOut)
get_variables_importance(myBiomodModelOut)
# Represent evaluation scores & variables importance
bm_PlotEvalMean(bm.out = myBiomodModelOut, dataset = 'calibration')
bm_PlotEvalMean(bm.out = myBiomodModelOut, dataset = 'validation')
bm_PlotEvalBoxplot(bm.out = myBiomodModelOut, dataset = 'calibration', group.by = c('algo', 'algo'))
bm_PlotEvalBoxplot(bm.out = myBiomodModelOut, dataset = 'calibration', group.by = c('algo', 'run'))
bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('expl.var', 'algo', 'algo'))
bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('expl.var', 'algo', 'run'))
bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('algo', 'expl.var', 'run'))
# Represent response curves
bm_PlotResponseCurves(bm.out = myBiomodModelOut,
models.chosen = get_built_models(myBiomodModelOut)[c(1:3, 7:9)],
fixed.var = 'median')
bm_PlotResponseCurves(bm.out = myBiomodModelOut,
models.chosen = get_built_models(myBiomodModelOut)[c(1:3, 7:9)],
fixed.var = 'min')
bm_PlotResponseCurves(bm.out = myBiomodModelOut,
models.chosen = get_built_models(myBiomodModelOut)[3],
fixed.var = 'median',
do.bivariate = TRUE)
# Explore models' outliers & residuals
bm_ModelAnalysis(bm.mod = myBiomodModelOut,
models.chosen = get_built_models(myBiomodModelOut)[c(1:3, 7:9)])
# Model ensemble models
myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut,
models.chosen = 'all',
em.by = 'all',
em.algo = c('EMmedian', 'EMmean', 'EMwmean'),
metric.select = c('Rsquared_aj', 'RMSE'),
metric.select.thresh = c(0.4, 2),
metric.eval = c('Rsquared', 'Rsquared_aj', 'RMSE'),
var.import = 3)
myBiomodEM
# Get evaluation scores & variables importance
get_evaluations(myBiomodEM)
get_variables_importance(myBiomodEM)
# Represent evaluation scores & variables importance
bm_PlotEvalMean(bm.out = myBiomodEM, dataset = 'calibration', group.by = 'full.name')
bm_PlotEvalBoxplot(bm.out = myBiomodEM, dataset = 'calibration', group.by = c('full.name', 'full.name'))
bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('expl.var', 'full.name', 'full.name'))
bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('expl.var', 'algo', 'merged.by.run'))
bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('algo', 'expl.var', 'merged.by.run'))
# Represent response curves
bm_PlotResponseCurves(bm.out = myBiomodEM,
models.chosen = get_built_models(myBiomodEM)[c(2, 5)],
fixed.var = 'median')
bm_PlotResponseCurves(bm.out = myBiomodEM,
models.chosen = get_built_models(myBiomodEM)[c(2, 5)],
fixed.var = 'min')
bm_PlotResponseCurves(bm.out = myBiomodEM,
models.chosen = get_built_models(myBiomodEM)[5],
fixed.var = 'median',
do.bivariate = TRUE)The digits parameter is used to define the number of
digits after the decimal point for the predicted values.
For
relative data, the on_0_1000 parameter can be used in the
same way than with binary data.
Note that integer values are lighter when saved in
memory than float (decimal values).
# Project single models
myBiomodProj <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
proj.name = 'Period1',
new.env = myExpl[period1, ],
new.env.xy = myRespXY[period1, ],
models.chosen = 'all',
build.clamping.mask = TRUE,
digits = 1)
myBiomodProj
plot(myBiomodProj, coord = myRespXY[period1, ])
# Project ensemble models (from single projections)
myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = myBiomodEM,
bm.proj = myBiomodProj,
models.chosen = 'all')
# Project ensemble models (building single projections)
myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = myBiomodEM,
proj.name = 'Period1EM',
new.env = myExpl[period1, ],
models.chosen = 'all')
myBiomodEMProj
plot(myBiomodEMProj, coord = myRespXY[period1, ])
# Project onto future conditions
myBiomodProjectionFuture <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
proj.name = 'Period2',
new.env = myExpl[period2, ],
new.env.xy = myRespXY[period2, ],
models.chosen = 'all',
build.clamping.mask = TRUE)
# Compute differences
myBiomodRangeSize <- BIOMOD_RangeSize(proj.current = myBiomodProj,
proj.future = myBiomodProjectionFuture,
metric.binary = 'TSS')
myBiomodRangeSize@Compt.By.Models