biomod2
is a wrapper calling for single models functions
from external packages. Modeling options are automatically retrieved
from these packages, allowing the use of all arguments taken into
account by these functions.
Default parameter values are unmodified and often
non-adapted to species distribution modeling in general, and to specific
dataset in particular. Bigboss options provided by
biomod2 team tend to correct at least the species distribution modeling
aspect, while tuned options allow to try and find more
appropriate parameterization for user data through caret package mainly.
The user can also defines its own modeling options parameterization
(user.defined).
Note that only binary data type and associated models are allowed currently, but the package structure has been changed to enable the addition of new data types in near future, such as absolute or relative abundances.
In the dataset ModelsTable
, all
the different algorithms are listed with their packages and functions
:
model type package func train
1 ANN binary nnet nnet avNNet
2 CTA binary rpart rpart rpart
3 FDA binary mda fda fda
4 GAM binary gam gam gamLoess
5 GAM binary mgcv bam bam
6 GAM binary mgcv gam gam
7 GBM binary gbm gbm gbm
8 GLM binary stats glm glm
9 MARS binary earth earth earth
10 MAXENT binary MAXENT MAXENT ENMevaluate
11 MAXNET binary maxnet maxnet maxnet
12 RF binary randomForest randomForest rf
13 SRE binary biomod2 bm_SRE bm_SRE
14 XGBOOST binary xgboost xgboost xgbTree
All the examples are made with the data of the package.
For the beginning of the code, see the main functions
vignette.
biomod2
has a set of default
options,
matching most of the time the algorithms’ default values, but with some
minor modifications to allow the BIOMOD_Modeling
function to run smoothly.
Please be aware that this strategy can often lead to bad models or even some errors.
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'Example',
models = c('RF', 'GLM'),
CV.strategy = 'random',
CV.nb.rep = 2,
CV.perc = 0.8,
OPT.strategy = 'default',
metric.eval = c('TSS','ROC'),
var.import = 2,
seed.val = 42)
You can retrieve the models options with get_options
get_options(myBiomodModelOut)
The bigboss
set of parameters is available in the
dataset OptionsBigboss
.
This set should give better results than the default set and will be
continued to be optimized by the biomod2
Team.
Keep in mind that this is something general and dependent of your case, the results can be not better than the default set.
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'Example',
models = c('RF', 'GLM'),
CV.strategy = 'random',
CV.nb.rep = 2,
CV.perc = 0.8,
OPT.strategy = 'bigboss',
metric.eval = c('TSS','ROC'),
var.import = 2,
seed.val = 42)
With tuned
options, some algorithms can be trained over
your dataset, and optimized parameters are returned to be used within
the BIOMOD_Modeling
function. This tuning is mostly based upon the caret
package
which calls a specific function to tune each algorithm (see column
train
in ModelsTable
). As exception, the
ENMevaluate
function of the ENMeval
package is called for MAXENT
and the biomod2
team wrote a special function for SRE
.
Here is the list of the parameters that can be tuned :
algorithm | parameters |
---|---|
ANN |
size , decay ,
bag
|
FDA |
degree , nprune
|
GAM |
select , method
|
GBM |
n.trees , interaction.depth ,
shrinkage , n.minobsinnode
|
MARS |
degree , nprune
|
RF | mtry |
SRE | quant |
XGBOOST |
nrounds , max_depth ,
eta , gamma , colsampl_bytree ,
min_child_weight , subsample
|
For almost every algorithm (except MAXENT
,
MAXNET
and SRE
), you can choose to optimize
the formula by setting do.formula = TRUE
. The optimized
formula will be chosen between the different type (simple
,
quadratic
, polynomial
,
s_smoother
) and for different interaction level.
In the same way, a variable selection can be run for GLM
and GAM
if do.stepAIC = TRUE
(respectively,
MASS::stepAIC
and gam::step.Gam
).
More information about the training can be found in the
documentation of the bm_Tuning
function.
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'Example',
models = c('RF','SRE'),
CV.strategy = 'random',
CV.nb.rep = 2,
CV.perc = 0.8,
OPT.strategy = 'tuned',
metric.eval = c('TSS','ROC'),
var.import = 2,
seed.val = 42)
print(get_options(myBiomodModelOut), dataset = '_allData_RUN1')
The user.defined
option allows you to adjust yourself
the parameters of all the algorithms.
Note that you can find information about the parameters of MAXENT
within the documentation of the bm_ModelingOptions
function.
Example :
RF
, GLM
and
MARS.
BiomodData
and you set your cross-validation table.bigboss
parameters as a
base.
myCVtable <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "random",
nb.rep = 2,
perc = 0.8)
myOpt <- bm_ModelingOptions(data.type = 'binary',
models = c('RF','GLM','MARS'),
strategy = 'bigboss',
bm.format = myBiomodData,
calib.lines = myCVtable)
print(myOpt)
RF
and you want
to change the formula for GLM
.
tuned.rf <- bm_Tuning(model = 'RF',
tuning.fun = 'rf', ## see in ModelsTable
do.formula = TRUE,
bm.options = myOpt@options$RF.binary.randomForest.randomForest,
bm.format = myBiomodData,
calib.lines = myCVtable)
form.GLM <- bm_MakeFormula(resp.name = myBiomodData@sp.name,
expl.var = head(myBiomodData@data.env.var),
type = 'simple',
interaction.level = 0)
user.GLM <- list('_allData_RUN1' = list(formula = form.GLM),
'_allData_RUN2' = list(formula = form.GLM))
BIOMOD.models.options
object and run the BIOMOD_Modeling
function.
## Gather in one list
## Models names can be found in OptionsBigboss@models
user.val <- list( RF.binary.randomForest.randomForest = tuned.rf,
GLM.binary.stats.glm= user.GLM)
myOpt <- bm_ModelingOptions(data.type = 'binary',
models = c('RF','GLM','MARS'),
strategy = "user.defined",
user.val = user.val,
user.base = "bigboss",
bm.format = myBiomodData,
calib.lines = myCVtable)
print(myOpt)
print(myOpt, dataset = '_allData_RUN1')
print(myOpt, dataset = '_allData_RUN2')
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'Example',
models = c('RF','GLM','MARS'),
CV.strategy = 'user.defined',
CV.user.table = myCVtable,
OPT.user = myOpt,
metric.eval = c('TSS','ROC'),
var.import = 2)
You can find more examples in the Secondary functions vignette.