This internal biomod2 function allows to build a cross-validation table
according to 6 different methods : random, kfold, block, strat,
env or user.defined (see Details).
bm_CrossValidation(
bm.format,
strategy = "random",
nb.rep = 0,
perc = 0.8,
k = 0,
balance = "presences",
env.var = NULL,
strat = "both",
user.table = NULL,
do.full.models = FALSE
)
bm_CrossValidation_user.defined(bm.format, ...)
# S4 method for class 'BIOMOD.formated.data'
bm_CrossValidation_user.defined(bm.format, user.table)
# S4 method for class 'BIOMOD.formated.data.PA'
bm_CrossValidation_user.defined(bm.format, user.table)
bm_CrossValidation_random(bm.format, ...)
# S4 method for class 'BIOMOD.formated.data'
bm_CrossValidation_random(bm.format, nb.rep, perc)
# S4 method for class 'BIOMOD.formated.data.PA'
bm_CrossValidation_random(bm.format, nb.rep, perc)
bm_CrossValidation_kfold(bm.format, ...)
# S4 method for class 'BIOMOD.formated.data'
bm_CrossValidation_kfold(bm.format, nb.rep, k)
# S4 method for class 'BIOMOD.formated.data.PA'
bm_CrossValidation_kfold(bm.format, nb.rep, k)
bm_CrossValidation_block(bm.format, ...)
# S4 method for class 'BIOMOD.formated.data'
bm_CrossValidation_block(bm.format)
# S4 method for class 'BIOMOD.formated.data.PA'
bm_CrossValidation_block(bm.format)
bm_CrossValidation_strat(bm.format, ...)
# S4 method for class 'BIOMOD.formated.data'
bm_CrossValidation_strat(bm.format, balance, strat, k)
# S4 method for class 'BIOMOD.formated.data.PA'
bm_CrossValidation_strat(bm.format, balance, strat, k)
bm_CrossValidation_env(bm.format, ...)
# S4 method for class 'BIOMOD.formated.data'
bm_CrossValidation_env(bm.format, balance, k, env.var)
# S4 method for class 'BIOMOD.formated.data.PA'
bm_CrossValidation_env(bm.format, balance, k, env.var)a BIOMOD.formated.data or BIOMOD.formated.data.PA
object returned by the BIOMOD_FormatingData function
a character corresponding to the cross-validation selection strategy,
must be among random, kfold, block, strat, env or
user.defined
(optional, default 0)
If strategy = 'random' or strategy = 'kfold', an integer corresponding
to the number of sets (repetitions) of cross-validation points that will be drawn
(optional, default 0)
If strategy = 'random', a numeric between 0 and 1 defining the
percentage of data that will be kept for calibration
(optional, default 0)
If strategy = 'kfold' or strategy = 'strat' or strategy = 'env', an
integer corresponding to the number of partitions
(optional, default 'presences')
If strategy = 'strat' or strategy = 'env', a character corresponding
to how data will be balanced between partitions, must be either presences or
absence
(optional)
If strategy = 'env', a character corresponding to the environmental variables
used to build the partition. k partitions will be built for each environmental
variables. By default the function uses all environmental variables available.
(optional, default 'both')
If strategy = 'env', a character corresponding to how data will partitioned
along gradient, must be among x, y, both
(optional, default NULL)
If strategy = 'user.defined', a matrix or data.frame defining for each
repetition (in columns) which observation lines should be used for models calibration
(TRUE) and validation (FALSE)
(optional, default TRUE)
A logical value defining whether models should be also calibrated and validated over
the whole dataset (and pseudo-absence datasets) or not
(optional, one or several of the listed above arguments depending on the selected method)
A matrix or data.frame defining for each repetition (in columns) which
observation lines should be used for models calibration (TRUE) and validation
(FALSE).
Several parameters are available within the function and some of them can be used with different cross-validation strategies :
| ....... | random | kfold | block | strat | env |
__________________________________________________ | nb.rep. | x..... | x.... | ..... | ..... | ... | | perc... | x..... | ..... | ..... | ..... | ... | | k...... | ...... | x.... | ..... | x.... | x.. | | balance | ...... | ..... | ..... | x.... | x.. | | strat.. | ...... | ..... | ..... | x.... | ... |
Concerning column names of matrix output :
The number of columns depends on the strategy selected.
The column names are given a posteriori of the selection, ranging from 1 to the
number of columns.
If do.full.models = TRUE, columns merging runs (and/or pseudo-absence datasets)
are added at the end.
Concerning cross-validation strategies :
Most simple method to calibrate and validate a model is to split the original
dataset in two datasets : one to calibrate the model and the other one to validate it. The
splitting can be repeated nb.rep times.
The k-fold method splits the original dataset in k datasets of equal
sizes : each part is used successively as the validation dataset while the other k-1
parts are used for the calibration, leading to k calibration/validation ensembles.
This multiple splitting can be repeated nb.rep times.
It may be used to test for model overfitting and to assess transferability in
geographic space. block stratification was described in Muscarella et al. 2014
(see References). Four bins of equal size are partitioned (bottom-left, bottom-right,
top-left and top-right).
It may be used to test for model overfitting and to assess transferability
in geographic space. x and y stratification was described in Wenger and
Olden 2012 (see References). y stratification uses k partitions along the
y-gradient, x stratification does the same for the x-gradient. both returns
2k partitions: k partitions stratified along the x-gradient and k
partitions stratified along the y-gradient.
It may be used to test for model overfitting and to assess
transferability in environmental space. It returns k partitions for each variable
given in env.var.
Allow the user to give its own crossvalidation table. For a
presence-absence dataset, column names must be formatted as: _allData_RUNx with
x an integer. For a presence-only dataset for which several pseudo-absence dataset
were generated, column names must be formatted as: _PAx_RUNy with x an
integer and PAx an existing pseudo-absence dataset and y an integer
Concerning balance parameter :
If balance = 'presences', presences are divided (balanced) equally over the partitions
(e.g. Fig. 1b in Muscarelly et al. 2014).
Absences or pseudo-absences will however be unbalanced over the partitions especially if the
presences are clumped on an edge of the study area.
If balance = 'absences', absences (resp. pseudo-absences or background) are divided
(balanced) as equally as possible between the partitions (geographical balanced bins given
that absences are spread over the study area equally, approach similar to Fig. 1 in
Wenger et Olden 2012). Presences will however be unbalanced over the partitions especially
if the presences are clumped on an edge of the study area.
Muscarella, R., Galante, P.J., Soley-Guardia, M., Boria, R.A., Kass, J.M., Uriarte, M. & Anderson, R.P. (2014). ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models. Methods in Ecology and Evolution, 5, 1198-1205.
Wenger, S.J. & Olden, J.D. (2012). Assessing transferability of ecological models: an underappreciated aspect of statistical validation. Methods in Ecology and Evolution, 3, 260-267.
get.block, kfold,
BIOMOD_FormatingData, BIOMOD_Modeling
Other Secondary functions:
bm_BinaryTransformation(),
bm_FindOptimStat(),
bm_MakeFormula(),
bm_ModelingOptions(),
bm_PlotEvalBoxplot(),
bm_PlotEvalMean(),
bm_PlotRangeSize(),
bm_PlotResponseCurves(),
bm_PlotVarImpBoxplot(),
bm_PseudoAbsences(),
bm_RangeSize(),
bm_RunModelsLoop(),
bm_SRE(),
bm_SampleBinaryVector(),
bm_SampleFactorLevels(),
bm_Tuning(),
bm_VariablesImportance()
library(terra)
# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'GuloGulo'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)
DONTSHOW({
myExtent <- terra::ext(0,30,45,70)
myExpl <- terra::crop(myExpl, myExtent)
})
# --------------------------------------------------------------- #
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.name = myRespName,
resp.var = myResp,
resp.xy = myRespXY,
expl.var = myExpl)
# --------------------------------------------------------------- #
# Create the different validation datasets
# random selection
cv.r <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "random",
nb.rep = 3,
k = 0.8)
# k-fold selection
cv.k <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "kfold",
nb.rep = 2,
k = 3)
# block selection
cv.b <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "block")
# stratified selection (geographic)
cv.s <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "strat",
k = 2,
balance = "presences",
strat = "x")
# stratified selection (environmental)
cv.e <- bm_CrossValidation(bm.format = myBiomodData,
strategy = "env",
k = 2,
balance = "presences")
head(cv.r)
apply(cv.r, 2, table)
head(cv.k)
apply(cv.k, 2, table)
head(cv.b)
apply(cv.b, 2, table)
head(cv.s)
apply(cv.s, 2, table)
head(cv.e)
apply(cv.e, 2, table)