This internal biomod2 function allows the user to find the threshold to convert continuous values into binary ones leading to the best score for a given evaluation metric.

bm_FindOptimStat(
  metric.eval = "TSS",
  obs,
  fit,
  nb.thresh = 100,
  threshold = NULL,
  boyce.bg.env = NULL,
  mpa.perc = 0.9,
  k = NULL
)

get_optim_value(metric.eval)

bm_CalculateStatBin(misc, metric.eval = "TSS")

bm_CalculateStatAbun(metric.eval, obs, fit, k)

Arguments

metric.eval

a character corresponding to the evaluation metric to be used, must be either ROC, TSS, KAPPA, ACCURACY, BIAS, POD, FAR, POFD, SR, CSI, ETS, OR, ORSS, BOYCE, MPA (binary data), RMSE, MAE, MSE, Rsquared, Rsquared_aj, Max_error (abundance / count / relative data), Accuracy, Recall, Precision, F1 (ordinal data)

obs

a vector of observed values (binary, 0 or 1)

fit

a vector of fitted values (continuous)

nb.thresh

an integer corresponding to the number of thresholds to be tested over the range of fitted values

threshold

(optional, default NULL)
A numeric corresponding to the threshold used to convert the given data

boyce.bg.env

(optional, default NULL)
A matrix, data.frame, SpatVector or SpatRaster object containing values of environmental variables (in columns or layers) extracted from the background (if presences are to be compared to background instead of absences or pseudo-absences selected for modeling)
Note that old format from raster and sp are still supported such as RasterStack and SpatialPointsDataFrame objects.

mpa.perc

a numeric between 0 and 1 corresponding to the percentage of correctly classified presences for Minimal Predicted Area (see ecospat.mpa() in ecospat)

k

an integer corresponding to the number of environmental variables used in the model

misc

a matrix corresponding to a contingency table

Value

A 1 row x 5 columns data.frame containing :

  • metric.eval : the chosen evaluation metric

  • cutoff : the associated cut-off used to transform the continuous values into binary

  • sensitivity : the sensibility obtained on fitted values with this threshold

  • specificity : the specificity obtained on fitted values with this threshold

  • best.stat : the best score obtained for the chosen evaluation metric

Details

Please refer to CAWRC website ("Methods for dichotomous forecasts") to get detailed description (simple/complex metrics).
Optimal value of each method can be obtained with the get_optim_value function.

simple

  • POD : Probability of detection (hit rate)

  • FAR : False alarm ratio

  • POFD : Probability of false detection (fall-out)

  • SR : Success ratio

  • ACCURACY : Accuracy (fraction correct)

  • BIAS : Bias score (frequency bias)

complex

  • ROC : Relative operating characteristic

  • TSS : True skill statistic (Hanssen and Kuipers discriminant, Peirce's skill score)

  • KAPPA : Cohen's Kappa (Heidke skill score)

  • OR : Odds Ratio

  • ORSS : Odds ratio skill score (Yule's Q)

  • CSI : Critical success index (threat score)

  • ETS : Equitable threat score (Gilbert skill score)

presence-only

  • BOYCE : Boyce index

  • MPA : Minimal predicted area (cutoff optimizing MPA to predict 90% of presences)

abundance / count / relative data

  • RMSE : Root Mean Square Error

  • MSE : Mean Square Error

  • MAE : Mean Absolute Error

  • Rsquared : R squared

  • Rsquared_aj : R squared adjusted

  • Max_error : Maximum error

ordinal data

  • Accuracy : Accuracy

  • Recall : Macro average Recall

  • Precision : Macro average Precision

  • F1 : Macro F1 score

Note that if a value is given to threshold, no optimization will be done, and only the score for this threshold will be returned.

The Boyce index returns NA values for SRE models because it can not be calculated with binary predictions.
This is also the reason why some NA values might appear for GLM models if they did not converge.

Note

In order to break dependency loop between packages biomod2 and ecospat, code of ecospat.boyce() and ecospat.mpa() in ecospat) functions have been copied within this file from version 3.2.2 (august 2022).

References

  • Engler, R., Guisan, A., and Rechsteiner L. 2004. An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data. Journal of Applied Ecology, 41(2), 263-274.

  • Hirzel, A. H., Le Lay, G., Helfer, V., Randin, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modelling, 199(2), 142-152.

Author

Damien Georges

Examples

## Generate a binary vector
vec.a <- sample(c(0, 1), 100, replace = TRUE)

## Generate a 0-1000 vector (random drawing)
vec.b <- runif(100, min = 0, max = 1000)

## Generate a 0-1000 vector (biased drawing)
BiasedDrawing <- function(x, m1 = 300, sd1 = 200, m2 = 700, sd2 = 200) {
  return(ifelse(x < 0.5, rnorm(1, m1, sd1), rnorm(1, m2, sd2)))
}
vec.c <- sapply(vec.a, BiasedDrawing)
vec.c[which(vec.c < 0)] <- 0
vec.c[which(vec.c > 1000)] <- 1000

## Find optimal threshold for a specific evaluation metric
bm_FindOptimStat(metric.eval = 'TSS', fit = vec.b, obs = vec.a)
bm_FindOptimStat(metric.eval = 'TSS', fit = vec.c, obs = vec.a, nb.thresh = 100)
bm_FindOptimStat(metric.eval = 'TSS', fit = vec.c, obs = vec.a, threshold = 280)