Scoring via area under the Receiver Operating Characteristic curve (ROC AUC)
Source:R/score-roc_auc.R
score_roc_auc.Rd
The area under the ROC curves can be used to measure predictor importance.
Format
An object of class filtro::class_score_roc_auc
(inherits from filtro::class_score
, S7_object
) of length 1.
Value
An S7 object. The primary property of interest is in results
. This
is a data frame of results that is populated by the fit()
method and has
columns:
name
: The name of the score (e.g.,roc_auc
).score
: The estimates for each predictor.outcome
: The name of the outcome column.predictor
: The names of the predictor inputs.
These data are accessed using object@results
(see examples below).
Details
This objects are used when either:
The predictors are numeric and the outcome is a factor/category, or
The predictors are factors and the outcome is numeric.
In either case, a ROC curve (via pROC::roc()
or pROC::multiclass.roc()
) is created
with the proper variable roles, and the area under the ROC curve is computed (via pROC::auc()
).
Values higher than 0.5 (i.e., max(roc_auc, 1 - roc_auc)
> 0.5) are associated with
more important predictors.
Estimating the scores
In filtro, the score_*
objects define a scoring method (e.g., data
input requirements, package dependencies, etc). To compute the scores for
a specific data set, the fit()
method is used. The main arguments for
these functions are:
object
A score class object (e.g.,
score_cor_pearson
).formula
A standard R formula with a single outcome on the right-hand side and one or more predictors (or
.
) on the left-hand side. The data are processed viastats::model.frame()
data
A data frame containing the relevant columns defined by the formula.
...
Further arguments passed to or from other methods.
case_weights
A quantitative vector of case weights that is the same length as the number of rows in
data
. The default ofNULL
indicates that there are no case weights. NOTE case weights cannot be used when a multiclass ROC is computed.
Missing values are removed for each predictor/outcome combination being scored.
In cases where the underlying computations fail, the scoring proceeds silently, and a missing value is given for the score.
See also
Other class score metrics:
score_aov_pval
,
score_cor_pearson
,
score_imp_rf
,
score_info_gain
,
score_xtab_pval_chisq
Examples
library(dplyr)
# ROC AUC where the numeric predictors are the predictors and
# `class` is the class outcome/response
cells_subset <- modeldata::cells |>
dplyr::select(
class,
angle_ch_1,
area_ch_1,
avg_inten_ch_1,
avg_inten_ch_2,
avg_inten_ch_3
)
cells_roc_auc_res <- score_roc_auc |>
fit(class ~ ., data = cells_subset)
cells_roc_auc_res@results
#> # A tibble: 5 × 4
#> name score outcome predictor
#> <chr> <dbl> <chr> <chr>
#> 1 roc_auc 0.502 class angle_ch_1
#> 2 roc_auc 0.591 class area_ch_1
#> 3 roc_auc 0.760 class avg_inten_ch_1
#> 4 roc_auc 0.777 class avg_inten_ch_2
#> 5 roc_auc 0.513 class avg_inten_ch_3
# ----------------------------------------------------------------------------
# ROC AUC where `Sale_Price` is the numeric predictor and the class predictors
# are the outcomes/responses
ames_subset <- modeldata::ames |>
dplyr::select(
Sale_Price,
MS_SubClass,
MS_Zoning,
Lot_Frontage,
Lot_Area,
Street
)
ames_subset <- ames_subset |>
dplyr::mutate(Sale_Price = log10(Sale_Price))
ames_roc_auc_res <- score_roc_auc |>
fit(Sale_Price ~ ., data = ames_subset)
ames_roc_auc_res@results
#> # A tibble: 5 × 4
#> name score outcome predictor
#> <chr> <dbl> <chr> <chr>
#> 1 roc_auc 0.742 Sale_Price MS_SubClass
#> 2 roc_auc 0.853 Sale_Price MS_Zoning
#> 3 roc_auc NA Sale_Price Lot_Frontage
#> 4 roc_auc NA Sale_Price Lot_Area
#> 5 roc_auc 0.807 Sale_Price Street
# TODO Add multiclass example