R/bm_ModelingOptions.R
bm_ModelingOptions.Rd
Parameterize and/or tune biomod2's single models options.
bm_ModelingOptions(
data.type = "binary",
models,
strategy,
user.val = NULL,
user.base = "bigboss",
bm.format = NULL,
calib.lines = NULL
)
a character
corresponding to the data type to be used, must be either
binary
, count
, relative
, abundance
, ordinal
a vector
containing model names to be computed, must be among
ANN
, CTA
, FDA
, GAM
, GBM
, GLM
, MARS
,
MAXENT
, MAXNET
, RF
, RFd
, SRE
, XGBOOST
a character
corresponding to the method to select models' parameters
values, must be either default
, bigboss
, user.defined
, tuned
(see Details)
(optional, default NULL
)
A list
containing parameters values for some (all) models
(optional, default bigboss
)
If strategy = 'user.defined'
, a character
corresponding to the basic set of
options to be modified by user defined values, must be either default
or bigboss
(see Details)
(optional, default NULL
)
A BIOMOD.formated.data
or BIOMOD.formated.data.PA
object returned
by the BIOMOD_FormatingData
function
(optional, default NULL
)
A data.frame
object returned by get_calib_lines
or
bm_CrossValidation
functions
A BIOMOD.models.options
of object that can be used to build species
distribution model(s) with the BIOMOD_Modeling
function.
This function creates a BIOMOD.models.options
object containing parameter values
for each single model that can be run within biomod2 through
BIOMOD_Modeling
function.
11 different algorithms are currently available, with different versions for some (GAM
,
MAXENT
, RF
). These models are associated with binary or nonbinary
datatypes, but all combinations are not available. They are all listed within the
ModelsTable
dataset.
Different strategies are available to set those parameters, through the strategy
argument :
all parameters names and values are directly retrieve from functions to be
called through formalArgs
and formals
functions
default parameter values are updated with values predefined by biomod2
team (see OptionsBigboss
)
default parameter values are updated with values provided by the user
default parameter values are updated by calling bm_Tuning
function
To define the same options for all datasets of a model, options can be provided through the
user.val
parameter as a list
whose name is for_all_datasets
.
MAXENT
being the only external model (not called through a R
package),
default parameters, and their values, are the following :
FLAG | default | Description |
path_to_maxent.jar | getwd() | a character corresponding to path to
maxent.jar file |
memory_allocated | 512 | an integer corresponding to the amount of memory
(in Mo) reserved for java to run MAXENT , must be either 64 ,
128 , 256 , 512 , 1024 ... or NULL to use default java
memory limitation parameter |
initial_heap_size | NULL | a character corresponding to initial heap
space (shared memory space) allocated to java (argument -Xms when calling
java ), must be either 1024K , 4096M , 10G ... or NULL to
use default java parameter. Used in BIOMOD_Projection but not in
BIOMOD_Modeling . |
max_heap_size | NULL | a character corresponding to maximum heap
space (shared memory space) allocated to java (argument -Xmx when calling
java ), must be either 1024K , 4096M , 10G ... or NULL to
use default java parameter, and must be larger than initial_heap_size . Used
in BIOMOD_Projection but not in BIOMOD_Modeling . |
background_data_dir | 'default' | a character corresponding to path
to folder where explanatory variables are stored as ASCII files (raster format).
If specified, MAXENT will generate its own background data from rasters of
explanatory variables ('default' value). Otherwise biomod2 pseudo-absences
will be used (see BIOMOD_FormatingData ). |
visible | FALSE | a logical value defining whether MAXENT
user interface is to be used or not |
linear | TRUE | a logical value defining whether linear features are
to be used or not |
quadratic | TRUE | a logical value defining whether quadratic features are
to be used or not |
product | TRUE | a logical value defining whether product features are
to be used or not |
threshold | TRUE | a logical value defining whether threshold features are
to be used or not |
hinge | TRUE | a logical value defining whether hinge features are
to be used or not |
l2lqthreshold | 10 | an integer corresponding to the number of
samples at which quadratic features start being used |
lq2lqptthreshold | 80 | an integer corresponding to the number of
samples at which product and threshold features start being used |
hingethreshold | 15 | an integer corresponding to the number of
samples at which hinge features start being used |
beta_lqp | -1.0 | a numeric corresponding to the regularization
parameter to be applied to all linear, quadratic and product features (negative value
enables automatic setting) |
beta_threshold | -1.0 | a numeric corresponding to the regularization
parameter to be applied to all threshold features (negative value enables automatic
setting) |
beta_hinge | -1.0 | a numeric corresponding to the regularization
parameter to be applied to all hinge features (negative value enables automatic
setting) |
beta_categorical | -1.0 | a numeric corresponding to the regularization
parameter to be applied to all categorical features (negative value enables automatic
setting) |
betamultiplier | 1 | a numeric corresponding to the number by which
multiply all automatic regularization parameters (higher number gives a more
spread-out distribution) |
defaultprevalence | 0.5 | a numeric corresponding to the default
prevalence of the modelled species (probability of presence at ordinary occurrence
points) |
togglelayerselected | NULL | a character defining the prefix to be used
to REMOVE environmental layers whose names begin with this prefix (all selected by
default) |
maximumbackground | 10000 | an integer corresponding to the maximum
number of pixels to be selected randomly among background points if larger than this number |
maximumiterations | 500 | an integer corresponding to the maximum number
of iterations for training |
convergencethreshold | 0.00005 | a numeric corresponding to the drop in
log loss per iteration below which the training will be stopped |
autofeature | TRUE | a logical value defining whether feature classes
to be used will automatically be selected based on number of training samples or not |
jackknife | FALSE | a logical value defining whether variables'
importance is to be measured or not. If selected, training will be done with each
environmental variable first omitted then used in isolation. |
writeclampgrid | FALSE | a logical value defining whether clamping mask
is to be saved or not (absolute difference between prediction values with and without
clamping) |
writemess | FALSE | a logical value defining whether multidimensional
environmental similarity surface (MESS) is to be saved or not (degree of novelness of
environmental conditions; and which variable is the most out of range at each point) |
logfile | 'maxent.log' | a character corresponding to file name in which
debugging information will be written |
verbose | FALSE | a logical value defining whether detailed diagnostics
for debugging should be given or not |
ModelsTable
, OptionsBigboss
,
BIOMOD.models.options
,
bm_Tuning
, BIOMOD_Modeling
Other Secondary functions:
bm_BinaryTransformation()
,
bm_CrossValidation()
,
bm_FindOptimStat()
,
bm_MakeFormula()
,
bm_PlotEvalBoxplot()
,
bm_PlotEvalMean()
,
bm_PlotRangeSize()
,
bm_PlotResponseCurves()
,
bm_PlotVarImpBoxplot()
,
bm_PseudoAbsences()
,
bm_RangeSize()
,
bm_RunModelsLoop()
,
bm_SRE()
,
bm_SampleBinaryVector()
,
bm_SampleFactorLevels()
,
bm_Tuning()
,
bm_VariablesImportance()
library(terra)
# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'GuloGulo'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)
DONTSHOW({
myExtent <- terra::ext(0,30,45,70)
myExpl <- terra::crop(myExpl, myExtent)
})
# ---------------------------------------------------------------#
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.name = myRespName,
resp.var = myResp,
resp.xy = myRespXY,
expl.var = myExpl)
# k-fold selection
cv.k <- bm_CrossValidation(bm.format = myBiomodData,
strategy = 'kfold',
nb.rep = 2,
k = 3)
# ---------------------------------------------------------------#
allModels <- c('ANN', 'CTA', 'FDA', 'GAM', 'GBM', 'GLM', 'MARS'
, 'MAXENT', 'MAXNET', 'RF', 'RFd', 'SRE', 'XGBOOST')
# default parameters
opt.d <- bm_ModelingOptions(data.type = 'binary',
models = allModels,
strategy = 'default')
# providing formated data
opt.df <- bm_ModelingOptions(data.type = 'binary',
models = allModels,
strategy = 'default',
bm.format = myBiomodData,
calib.lines = cv.k)
opt.d
opt.d@models
opt.d@options$ANN.binary.nnet.nnet
names(opt.d@options$ANN.binary.nnet.nnet@args.values)
opt.df@options$ANN.binary.nnet.nnet
names(opt.df@options$ANN.binary.nnet.nnet@args.values)
# ---------------------------------------------------------------#
# bigboss parameters
opt.b <- bm_ModelingOptions(data.type = 'binary',
models = allModels,
strategy = 'bigboss')
# user defined parameters
user.SRE <- list('_allData_allRun' = list(quant = 0.01))
user.XGBOOST <- list('_allData_allRun' = list(nrounds = 10))
user.val <- list(SRE.binary.biomod2.bm_SRE = user.SRE
, XGBOOST.binary.xgboost.xgboost = user.XGBOOST)
opt.u <- bm_ModelingOptions(data.type = 'binary',
models = c('SRE', 'XGBOOST'),
strategy = 'user.defined',
user.val = user.val)
opt.b
opt.u
if (FALSE) { # \dontrun{
# tuned parameters with formated data
opt.t <- bm_ModelingOptions(data.type = 'binary',
models = c('SRE', 'XGBOOST'),
strategy = 'tuned',
bm.format = myBiomodData)
opt.t
} # }