R/BIOMOD_FormatingData.R
    BIOMOD_FormatingData.RdThis function gathers together all input data needed (xy, presences/absences, explanatory variables, and the same for evaluation data if available) to run biomod2 models. It allows to select pseudo-absences if no absence data is available, with different strategies (see Details).
BIOMOD_FormatingData(
  resp.name,
  resp.var,
  resp.xy = NULL,
  expl.var,
  dir.name = ".",
  data.type = "binary",
  eval.resp.var = NULL,
  eval.resp.xy = NULL,
  eval.expl.var = NULL,
  PA.nb.rep = 0,
  PA.nb.absences = 1000,
  PA.strategy = NULL,
  PA.dist.min = 0,
  PA.dist.max = NULL,
  PA.sre.quant = 0.025,
  PA.fact.aggr = NULL,
  PA.user.table = NULL,
  na.rm = TRUE,
  filter.raster = FALSE,
  seed.val = NULL
)a character corresponding to the species name
a vector, a SpatVector without associated
data (if presence-only), or a SpatVector object containing
binary data (1 : presence, 0 : absence, NA : indeterminate) or other
data (see data.type and Details) for a single species that will be used to build the
species distribution model(s)
Note that old format from sp are still supported such as
 SpatialPoints (if presence-only) or SpatialPointsDataFrame
 object containing binary data or other data.
(optional, default NULL) 
If resp.var is a vector, a 2-columns matrix or data.frame
containing the corresponding X and Y coordinates that will be
used to build the species distribution model(s)
a matrix, data.frame, SpatVector
or SpatRaster object containing the explanatory variables
(in columns or layers) that will be used to build the species distribution model(s)
Note that old format from raster and sp are still supported such as
RasterStack and SpatialPointsDataFrame objects. 
(optional, default .) 
A character corresponding to the modeling folder
a character, corresponding to the response data type to be used,
must be either binary, count, multiclass, ordinal, relative, or
abundance, and match the data contained in resp.var
If not provided, biomod2 will try to guess.
(optional, default NULL) 
A vector or a SpatVector object containing binary data
(1 : presence, 0 : absence) for a single species that will be used to evaluate
the species distribution model(s) with independent data
Note that old format from sp are still supported such as
 SpatialPoints (if presence-only) or SpatialPointsDataFrame
 object containing binary data.
(optional, default NULL) 
If resp.var is a vector, a 2-columns matrix or data.frame
containing the corresponding X and Y coordinates that will be
used to evaluate the species distribution model(s) with independent data
(optional, default NULL) 
A matrix, data.frame, SpatVector
or SpatRaster object containing the explanatory variables
(in columns or layers) that will be used to evalute the species distribution model(s) with
independent data.
Note that old format from raster and sp are still supported such as
RasterStack and SpatialPointsDataFrame objects. 
(optional, default 0) 
If pseudo-absence selection, an integer corresponding to the number of sets
(repetitions) of pseudo-absence points that will be drawn
(optional, default 0) 
If pseudo-absence selection, and PA.strategy = 'random' or PA.strategy = 'sre'
or PA.strategy = 'disk', an integer corresponding to the number of pseudo-absence
points that will be selected for each pseudo-absence repetition (true absences included). 
It can also be a vector of the same length as PA.nb.rep containing integer
values corresponding to the different numbers of pseudo-absences to be selected (see Details)
(optional, default NULL) 
If pseudo-absence selection, a character defining the strategy that will be used to
select the pseudo-absence points. Must be random, sre, disk or
user.defined (see Details)
(optional, default 0) 
If pseudo-absence selection and PA.strategy = 'disk', a numeric defining the
minimal distance to presence points used to make the disk pseudo-absence selection
(in the same projection system units as resp.xy and expl.var, see Details)
(optional, default 0) 
If pseudo-absence selection and PA.strategy = 'disk', a numeric defining the
maximal distance to presence points used to make the disk pseudo-absence selection
(in the same projection system units as resp.xy and expl.var, see Details)
(optional, default 0) 
If pseudo-absence selection and PA.strategy = 'sre', a numeric between 0
and 0.5 defining the half-quantile used to make the sre pseudo-absence selection
(see Details)
(optional, default NULL) 
If pseudo-absence selection and PA.strategy = 'random' or PA.strategy = 'disk',
an integer defining the factor of aggregation to reduce the spatial resolution of the
environmental variables
(optional, default NULL) 
If pseudo-absence selection and PA.strategy = 'user.defined', a matrix or
data.frame with as many rows as resp.var values, as many columns as
PA.nb.rep, and containing TRUE or FALSE values defining which points
will be used to build the species distribution model(s) for each repetition (see Details)
(optional, default TRUE) 
A logical value defining whether points having one or several missing values for
explanatory variables should be removed from the analysis or not
(optional, default FALSE) 
If expl.var is of raster type, a logical value defining whether resp.var
is to be filtered when several points occur in the same raster cell
(optional, default NULL) 
An integer value corresponding to the new seed value to be set
A BIOMOD.formated.data or BIOMOD.formated.data.PA object that can
be used to build species distribution model(s) with the BIOMOD_Modeling
function. print/show,
plot and
summary functions
are available to have a summary of the created object.
This function gathers and formats all input data needed to run biomod2 models. It
supports different kind of inputs (e.g. matrix,
SpatVector, SpatRaster)
and provides different methods to select pseudo-absences if needed. 
Concerning explanatory variables and XY coordinates :
if SpatRaster, RasterLayer or RasterStack
  provided for expl.var or eval.expl.var, 
biomod2 will extract
  the corresponding values from XY coordinates provided :
either through resp.xy or eval.resp.xy respectively
or through resp.var or eval.resp.var, if provided as
    SpatVector or SpatialPointsDataFrame
Be sure to give the objects containing XY coordinates in the same projection system than the raster objects !
if data.frame or matrix provided for expl.var or
   eval.expl.var, 
biomod2 will simply merge it (cbind)
   with resp.var without considering XY coordinates. 
Be sure to give explanatory and response values in the same row order !
Concerning pseudo-absence selection (see bm_PseudoAbsences) :
Only in the case of binary data !
if both presence and absence data are available : PA.nb.rep = 0 and no
  pseudo-absence will be selected.
if no absence data is available, several pseudo-absence repetitions
  are recommended (to estimate the effect of pseudo-absence selection), as well as high
  number of pseudo-absence points. 
Be sure not to select more pseudo-absence points than maximum number of pixels in
  the studied area !
it is possible to create several pseudo-absence repetitions with different
  number of points, BUT with the same sampling strategy. PA.nb.absences must contain
  as many values as the number of sets of pseudo-absences (PA.nb.rep). 
biomod2 models single species at a time (no multi-species). 
  Hence, resp.var must be an uni-dimensional object, either :
a vector, a one-column matrix or data.frame, a
    SpatVector (without associated data - if presence-only)
a SpatialPoints (if presence-only)
a SpatialPointsDataFrame or SpatVector object
If resp.var is a non-spatial object (vector, matrix or
  data.frame), XY coordinates must be provided through resp.xy. 
  Different data types are available, and require different values :
1 : presences, 0 : true absences or NA : no
    information point (can be used to select pseudo-absences) 
If no true absences are available, pseudo-absence selection must be done.
positive integer values
factor values
ordered factor values
numeric values between 0 and 1
positive numeric values
Factorial variables are allowed, but might lead to some pseudo-absence strategy or models
  omissions (e.g. sre).
Although biomod2 provides tools to automatically divide dataset into calibration and
  validation parts through the modeling process (see CV.[..] parameters in
  BIOMOD_Modeling function ; or bm_CrossValidation
  function), it is also possible (and strongly advised) to directly provide two independent
  datasets, one for calibration/validation and one for evaluation
bm_PseudoAbsences)Only in the case of binary data ! 
  If no true absences are available, pseudo-absences must be selected from the
  background data, meaning data there is no information whether the species of
  interest occurs or not. It corresponds either to the remaining pixels of the expl.var
  (if provided as a SpatRaster or RasterStack)
  or to the points identified as  NA in resp.var (if expl.var
  provided as a matrix or data.frame). 
  Several methods are available to do this selection :
all points of initial background are pseudo-absence candidates.
    PA.nb.absences are drawn randomly, for each PA.nb.rep requested.
pseudo-absences have to be selected in conditions (combination of explanatory
    variables) that differ in a defined proportion (PA.sre.quant) from those of
    presence points. A Surface Range Envelop model is first run over the species of
    interest (see bm_SRE), and pseudo-absences are selected outside this envelop. 
This case is appropriate when all the species climatic niche has been sampled,
    otherwise it may lead to over-optimistic model evaluations and predictions !
pseudo-absences are selected within circles around presence points defined by
    PA.dist.min and PA.dist.max distance values (in the same projection system
    units as coord and expl.var). It allows to select pseudo-absence points that
    are not too close to (avoid same niche and pseudo-replication) or too far (localized
    sampling strategy) from presences.
pseudo-absences are defined in advance and given as data.frame
    through the PA.user.table parameter.
bm_PseudoAbsences, BIOMOD_Modeling
Other Main functions:
BIOMOD_EnsembleForecasting(),
BIOMOD_EnsembleModeling(),
BIOMOD_LoadModels(),
BIOMOD_Modeling(),
BIOMOD_Projection(),
BIOMOD_RangeSize()
library(terra)
# Load species occurrences (6 species available)
data(DataSpecies)
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'GuloGulo'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data(bioclim_current)
myExpl <- terra::rast(bioclim_current)
DONTSHOW({
myExtent <- terra::ext(0,30,45,70)
myExpl <- terra::crop(myExpl, myExtent)
})
# ---------------------------------------------------------------#
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.name = myRespName,
                                     resp.var = myResp,
                                     resp.xy = myRespXY,
                                     expl.var = myExpl)
myBiomodData
summary(myBiomodData)
plot(myBiomodData)
# ---------------------------------------------------------------#
# # Transform true absences into potential pseudo-absences
# myResp.PA <- ifelse(myResp == 1, 1, NA)
# 
# # Format Data with pseudo-absences : random method
# myBiomodData.r <- BIOMOD_FormatingData(resp.name = myRespName,
#                                        resp.var = myResp.PA,
#                                        resp.xy = myRespXY,
#                                        expl.var = myExpl,
#                                        PA.nb.rep = 4,
#                                        PA.nb.absences = 1000,
#                                        PA.strategy = 'random')
# 
# # Format Data with pseudo-absences : disk method
# myBiomodData.d <- BIOMOD_FormatingData(resp.name = myRespName,
#                                        resp.var = myResp.PA,
#                                        resp.xy = myRespXY,
#                                        expl.var = myExpl,
#                                        PA.nb.rep = 4,
#                                        PA.nb.absences = 500,
#                                        PA.strategy = 'disk',
#                                        PA.dist.min = 5,
#                                        PA.dist.max = 35)
# 
# # Format Data with pseudo-absences : SRE method
# myBiomodData.s <- BIOMOD_FormatingData(resp.name = myRespName,
#                                        resp.var = myResp.PA,
#                                        resp.xy = myRespXY,
#                                        expl.var = myExpl,
#                                        PA.nb.rep = 4,
#                                        PA.nb.absences = 1000,
#                                        PA.strategy = 'sre',
#                                        PA.sre.quant = 0.025)
# 
# # Format Data with pseudo-absences : user.defined method
# myPAtable <- data.frame(PA1 = ifelse(myResp == 1, TRUE, FALSE),
#                         PA2 = ifelse(myResp == 1, TRUE, FALSE))
# for (i in 1:ncol(myPAtable)) myPAtable[sample(which(myPAtable[, i] == FALSE), 500), i] = TRUE
# myBiomodData.u <- BIOMOD_FormatingData(resp.name = myRespName,
#                                        resp.var = myResp.PA,
#                                        resp.xy = myRespXY,
#                                        expl.var = myExpl,
#                                        PA.strategy = 'user.defined',
#                                        PA.user.table = myPAtable)
# 
# myBiomodData.r
# myBiomodData.d
# myBiomodData.s
# myBiomodData.u
# plot(myBiomodData.r)
# plot(myBiomodData.d)
# plot(myBiomodData.s)
# plot(myBiomodData.u)
# ---------------------------------------------------------------#
# # Select multiple sets of pseudo-absences
#
# # Transform true absences into potential pseudo-absences
# myResp.PA <- ifelse(myResp == 1, 1, NA)
# 
# # Format Data with pseudo-absences : random method
# myBiomodData.multi <- BIOMOD_FormatingData(resp.name = myRespName,
#                                            resp.var = myResp.PA,
#                                            resp.xy = myRespXY,
#                                            expl.var = myExpl,
#                                            PA.nb.rep = 4,
#                                            PA.nb.absences = c(1000, 500, 500, 200),
#                                            PA.strategy = 'random')
# myBiomodData.multi
# summary(myBiomodData.multi)
# plot(myBiomodData.multi)