Hello ! If you are here, you want to model abundance data.

You could try and install our new version of `biomod2`

with
the Abundance branch:

`devtools::install_github("biomodhub/biomod2", ref = "Abundance")`

Keep in mind that it is a development branch: it can change quickly and sometimes fail ! Consequently, this branch is not reproducibility-friendly.

We invite you to report any problems, to ask for enhances or to
discuss about the modeling in the issues or the forum of the
`biomod2`

github.

This vignette will be updated regularly : think to look at it to see
if there are a few modifications.

We will also update the documentation on the branch : you can call the
help `?BIOMOD_FormatingData`

for example.

The documentation on this website will still be the documentation of
`biomod2 V4.2-6-1`

.

Here is presented an example of abundance modeling with
`biomod2`

.

(As we haven’t add example data to `biomod2`

yet, the example
will be made with fake data. Sorry `>{o.o}<`

)

```
library(biomod2)
library(terra)
# Load species occurrences (6 species available)
data("DataSpecies")
head(DataSpecies)
# Select the name of the studied species
myRespName <- 'VulpesVulpes'
# Get corresponding presence/absence data
myResp <- as.numeric(DataSpecies[, myRespName])
# Get corresponding XY coordinates
myRespXY <- DataSpecies[, c('X_WGS84', 'Y_WGS84')]
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data("bioclim_current")
myExpl <- rast(bioclim_current)
```

Creating your `BIOMOD.formated.data`

object is similar
than `biomod2`

with binary data. `biomod2`

will
guess your data type but you can specify it with the argument
`data.type`

.

There are 5 different data types :

Type | Data | Distribution |
---|---|---|

binary | Numeric (or factor) response with only 0 and 1 | binomial |

abundance | Positive numeric response | gaussian |

count | Positive integer response | poisson |

ordinal | Ordered factor response | classification |

relative | Numeric response between 0 and 1 | beta |

Here we will build count data, by transforming our available binary data :

```
# Transform binary data as count data
poissonDistri <- rpois(sum(myResp), 5)
myResp[myResp == 1] <- poissonDistri
```

```
# Format Data with true absences
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
expl.var = myExpl,
resp.xy = myRespXY,
resp.name = myRespName)
myBiomodData
plot(myBiomodData)
#Or
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
expl.var = myExpl,
resp.xy = myRespXY,
resp.name = myRespName,
data.type = "count")
```

As usual, it also possible to add evaluation data. However, no pseudo-absences extraction is possible with abundance data.

The same cross-validation (CV) methods are available and can be
selected with the `BIOMOD_Modeling`

function, which calls the `bm_CrossValidation`

function to do so.

The same proportion of absences of the whole data will be kept for the
different CV datasets (if possible).

A balance will be kept for the different classes in the case of ordinal
data.

```
# # k-fold selection
# cv.k <- bm_CrossValidation(bm.format = myBiomodData,
# strategy = "kfold",
# nb.rep = 2,
# k = 3)
#
# # random selection
# cv.r <- bm_CrossValidation(bm.format = myBiomodData,
# strategy = "random",
# nb.rep = 4,
# perc = 0.8)
# head(cv.k)
# head(cv.r)
# plot(myBiomodData, calib.lines = cv.r)
```

Different sets of modeling options are built corresponding to the
`data.type`

.

You still have the `default`

options and `bigboss`

options. However, lot of work must be done in order to optimize
`bigboss`

options. So for the moment, it’s totally possible
`bigboss`

doesn’t lead to better results than
`default`

options.

The tuning option are not available yet with non binary data.

```
# # bigboss parameters with ordinal datatype
# opt.o <- bm_ModelingOptions(data.type = 'ordinal',
# models = c('RF', 'GLM'),
# strategy = 'bigboss')
#
# # tuned parameters with formated data
# opt.c <- bm_ModelingOptions(data.type = 'count',
# models = c('GAM', 'MARS'),
# strategy = 'default',
# bm.format = myBiomodData)
#
# opt.o
# opt.c
```

The modeling is similar than with binary data. However, not all
models are available. We have :

- `CTA`

, `GAM`

, `GBM`

,
`GLM`

, `MARS`

, `RF`

, and
`XGBOOST`

for abundance, count and relative data

- `CTA`

, `FDA`

, `GAM`

,
`GLM`

, `MARS`

, `RF`

, and
`XGBOOST`

for ordinal data

The metrics are also different obviously. For the moment, we have
implemented : `RMSE`

, `MSE`

, `MAE`

,
`Max_error`

, `Rsquared`

and
`Rsquared_aj`

(see `?BIOMOD_Modeling`

)

For ordinal data, we have `Accuracy`

, `Recall`

,
`Precision`

and `F1`

.

(`Accuracy`

is in lower case to contrast with
`ACCURACY`

for binary data)

```
# Model single models
myBiomodModelOut <- BIOMOD_Modeling(bm.format = myBiomodData,
modeling.id = 'CountExample',
models = c("GAM","MARS","RF"),
CV.strategy = 'random',
CV.nb.rep = 3,
CV.perc = 0.8,
OPT.strategy = 'bigboss',
var.import = 3,
metric.eval = c('RMSE','Rsquared'))
myBiomodModelOut
# Get evaluation scores & variables importance
get_evaluations(myBiomodModelOut)
get_variables_importance(myBiomodModelOut)
# Represent evaluation scores & variables importance
bm_PlotEvalMean(bm.out = myBiomodModelOut)
bm_PlotEvalBoxplot(bm.out = myBiomodModelOut, group.by = c('algo', 'algo'))
bm_PlotEvalBoxplot(bm.out = myBiomodModelOut, group.by = c('algo', 'run'))
bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('expl.var', 'algo', 'algo'))
bm_PlotVarImpBoxplot(bm.out = myBiomodModelOut, group.by = c('expl.var', 'algo', 'run'))
# Represent response curves
bm_PlotResponseCurves(bm.out = myBiomodModelOut,
models.chosen = get_built_models(myBiomodModelOut)[c(1:3)],
fixed.var = 'median')
```

**! Warning !** The selection of single models for the
ensemble modeling is different for the metrics `RMSE`

,
`MSE`

, `MAE`

and `Max_error`

.

For example, with `RMSE`

, `biomod2`

will select
the best model and all the models with a `RMSE`

under the
best value + the threshold you give (here 2).

E.g. if the best model have a `RMSE`

of
`1.850`

, `BIOMOD_EnsembleModeling`

will select all
the models with a `RMSE`

under `1.850 + 2`

.

```
# Model ensemble models
myBiomodEM <- BIOMOD_EnsembleModeling(bm.mod = myBiomodModelOut,
models.chosen = 'all',
em.by = 'all',
em.algo = c('EMmean', 'EMcv', 'EMci', 'EMmedian', 'EMwmean'),
metric.select = c('RMSE','Rsquared'),
metric.select.thresh = c(2, 0.4),
metric.eval = c('RMSE','Rsquared'),
var.import = 3,
EMci.alpha = 0.05,
EMwmean.decay = 'proportional')
myBiomodEM
# Get evaluation scores & variables importance
get_evaluations(myBiomodEM)
get_variables_importance(myBiomodEM)
# Represent evaluation scores & variables importance
bm_PlotEvalMean(bm.out = myBiomodEM, group.by = 'full.name')
bm_PlotEvalBoxplot(bm.out = myBiomodEM, group.by = c('full.name', 'full.name'))
bm_PlotVarImpBoxplot(bm.out = myBiomodEM, group.by = c('expl.var', 'algo', 'merged.by.run'))
# Represent response curves
bm_PlotResponseCurves(bm.out = myBiomodEM,
models.chosen = get_built_models(myBiomodEM)[c(1, 5, 6)],
fixed.var = 'median')
```

The argument `digits`

indicates the number of digits for
the predicted values.

Keep in mind that `integer`

are “lighter” than
`float`

.

For `relative`

data, you can use the same argument
`on_0_1000`

than binary data.

```
# Project single models
myBiomodProj <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
proj.name = 'Current',
new.env = myExpl,
models.chosen = 'all',
build.clamping.mask = TRUE,
digits = 1)
myBiomodProj
plot(myBiomodProj)
```

```
# Project ensemble models (from single projections)
myBiomodEMProj <- BIOMOD_EnsembleForecasting(bm.em = myBiomodEM,
bm.proj = myBiomodProj,
models.chosen = get_built_models(myBiomodEM)[c(1,3:7,9:12)])
myBiomodEMProj
plot(myBiomodEMProj)
```

```
# Load environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
data("bioclim_future")
myExplFuture = rast(bioclim_future)
# Project onto future conditions
myBiomodProjFuture <- BIOMOD_Projection(bm.mod = myBiomodModelOut,
proj.name = 'FutureProj',
new.env = myExplFuture,
models.chosen = 'all')
# Load current and future binary projections
CurrentProj <- get_predictions(myBiomodProj)
FutureProj <- get_predictions(myBiomodProjFuture)
myBiomodRangeSize <- BIOMOD_RangeSize(proj.current = CurrentProj,
proj.future = FutureProj,
thresholds = c(10,30,50))
# Represent main results
gg = bm_PlotRangeSize(bm.range = myBiomodRangeSize,
do.count = TRUE,
do.perc = TRUE,
do.maps = TRUE,
do.mean = FALSE,
do.plot = TRUE,
row.names = c("Species", "Dataset", "Run", "Algo"))
```

**! Remember !** This is fake data !