The MultiMlearn
package is designed for estimating the individualized treatment rules (ITRs) under the setting of multicategory treatments. It is designed for data from observational studies (such as electronic health records), but it can also be used for randomized controlled trials.
The MultiMlearn
package implements a matched learning (M-learning) model that uses the one-versus-one approach to convert the multicategory treatment comparison to a set of binary classification problems.
Also, the proposed model can incorporate the identified latent subgroup information to alleviate confounding effects among subjects and improve the performance of the M-learning. The coding example for subgroup identification can be found on this GitHub repository, while it is currently not a function of the MultiMlearn
package.
A simulation study illustrating the necessity of latent subgroup information can be found in this supplementary material. The following explainatory example uses a similar simulation scenario to that stated in Section S.1.
Install the development version using the devtools
package:
devtools::install_github("jitonglou/MultiMlearn")
Load the packages:
Following the settings (equation (1), (2), (5), and (6)) in Section S.1.1 and S.1.2 of the supplementary material, we simulate a dataset for N=200 subjects. The covariate vector \(\mathbf{x}_i\) was set to be p=20 dimensional. Moreover, we set the number of treatments K=4, the number of groups J=4.
rseed = 0
set.seed(rseed)
simdata = simulate_data(
N = 200, p = 20, K = 4, J = 4,
propensity_func = pi.true,
main_func = mu.true,
interaction_func = delta.true
)
# ?simulate_data # explanations of function arguments and the columns in the returned data frame
# ?pi.true # true propensity model used in the supplementary material
# ?mu.true # true main effects used in the supplementary material
# ?delta.true # true interaction effects used in the supplementary material
table(treatment=simdata$treatment, group=simdata$cluster) # contingency table of observed treatment vs group
#> group
#> treatment 1 2 3 4
#> treatment_1 11 12 15 12
#> treatment_2 9 10 14 8
#> treatment_3 17 18 11 8
#> treatment_4 13 15 11 16
Next, we will show how to estimate propensity and prognostic scores using the random forest model. We take subjects in group 1 as an example:
## extract features for subjects in group 1
group = 1
data_covariate = simdata %>%
filter(cluster == group) %>%
select(ID, cluster, reward, treatment, starts_with("feature"))
n_feature = ncol(data_covariate) - 4
Train a random forest model to estimate prognostic scores. We selected the tuning parameters (mtry
and ntree
) by 10-fold cross-validation with 3 repeats.
# ?rfcv2 # a customized function for using cross-validation to train random forests
## tuning parameters
metric = "RMSE"
tunegrid = expand.grid(.mtry=seq(1,n_feature/3, 1), .ntree=seq(500,2000,500))
control = trainControl(method="repeatedcv", number=10, repeats=3)
## train model (around 1 minute)
set.seed(rseed)
prognostic_fit = train(
reward~., data=data_covariate %>% select(-ID, -cluster, -treatment),
method=rfcv2("Regression"), metric=metric,
tuneGrid=tunegrid, trControl=control
)
print(prognostic_fit)
#> 50 samples
#> 20 predictors
#>
#> No pre-processing
#> Resampling: Cross-Validated (10 fold, repeated 3 times)
#> Summary of sample sizes: 44, 44, 46, 44, 46, 45, ...
#> Resampling results across tuning parameters:
#>
#> mtry ntree RMSE Rsquared MAE
#> 1 500 1.263909 0.2374684 1.0026379
#> 1 1000 1.265846 0.2393138 1.0031806
#> 1 1500 1.265537 0.2690098 0.9996474
#> 1 2000 1.263070 0.2842502 1.0005525
#> 2 500 1.261495 0.2689129 1.0001293
#> 2 1000 1.259356 0.2693475 0.9959385
#> 2 1500 1.269358 0.2825926 1.0054877
#> 2 2000 1.266920 0.2703527 1.0049415
#> 3 500 1.265764 0.2692819 1.0019152
#> 3 1000 1.263893 0.2760030 1.0050117
#> 3 1500 1.263505 0.2996385 0.9989403
#> 3 2000 1.262860 0.2807268 1.0022529
#> 4 500 1.263106 0.3021249 1.0038525
#> 4 1000 1.261288 0.3009510 1.0047467
#> 4 1500 1.257289 0.2793284 1.0022257
#> 4 2000 1.261132 0.2931092 1.0017244
#> 5 500 1.268665 0.3075777 1.0083188
#> 5 1000 1.264453 0.2865868 1.0082328
#> 5 1500 1.260397 0.3052117 1.0038140
#> 5 2000 1.263131 0.3047422 1.0095299
#> 6 500 1.261791 0.3289496 1.0074004
#> 6 1000 1.264546 0.2682642 1.0127456
#> 6 1500 1.263643 0.2895696 1.0096727
#> 6 2000 1.261639 0.2971942 1.0095640
#>
#> RMSE was used to select the optimal model using the smallest value.
#> The final values used for the model were mtry = 4 and ntree = 1500.
## estimators
sub_prog = predict(
prognostic_fit, newdata=data_covariate %>% select(-ID, -cluster, -treatment)
)
Train a random forest model to estimate propensity scores. We selected the tuning parameters (mtry
and ntree
) by 10-fold cross-validation with 3 repeats.
## tuning parameters
metric = "Kappa"
tunegrid = expand.grid(.mtry=seq(1,sqrt(n_feature), 1), .ntree=seq(500,2000,500))
control = trainControl(method="repeatedcv", number=10, repeats=3)
## train model (around 1 minute)
set.seed(rseed)
propensity_fit = train(
treatment~., data=data_covariate %>% select(-ID, -cluster, -reward),
method=rfcv2("Classification"), metric=metric,
tuneGrid=tunegrid, trControl=control
)
print(propensity_fit)
#> 50 samples
#> 20 predictors
#> 4 classes: 'treatment_1', 'treatment_2', 'treatment_3', 'treatment_4'
#>
#> No pre-processing
#> Resampling: Cross-Validated (10 fold, repeated 3 times)
#> Summary of sample sizes: 46, 46, 45, 46, 45, 44, ...
#> Resampling results across tuning parameters:
#>
#> mtry ntree Accuracy Kappa
#> 1 500 0.2327778 -0.0593179571
#> 1 1000 0.2572222 -0.0343456824
#> 1 1500 0.2683333 -0.0192242028
#> 1 2000 0.2616667 -0.0329631332
#> 2 500 0.2394444 -0.0452919214
#> 2 1000 0.2433333 -0.0381937624
#> 2 1500 0.2627778 -0.0181223106
#> 2 2000 0.2422222 -0.0399217290
#> 3 500 0.2694444 0.0046937196
#> 3 1000 0.2816667 0.0200164949
#> 3 1500 0.2750000 0.0118664062
#> 3 2000 0.2638889 -0.0074735034
#> 4 500 0.2622222 -0.0008315157
#> 4 1000 0.2611111 -0.0084169858
#> 4 1500 0.2761111 0.0126289334
#> 4 2000 0.2694444 0.0028106787
#>
#> Kappa was used to select the optimal model using the largest value.
#> The final values used for the model were mtry = 3 and ntree = 1000.
## estimators
sub_prop = predict(
propensity_fit, newdata=data_covariate %>% select(-ID, -cluster, -reward), type = "prob"
)
We performs a nested cross-validation (on tuning parameters) of weighted support vector machines (SVMs) to fit the M-learning model and estimate ITRs.
Incorporate the estimated propensity and prognostics scores to the feature set, and compute the Euclidean distance between subjects based on the features.
## final data frame
data_feature = data.frame(
data_covariate %>% mutate(reward_res = reward),
sub_prop %>% select(-levels(simdata$treatment)[1]), # remove propensity scores of the first treatment to avoid colinearity of propensity scores
prog = sub_prog
)
## compute the Euclidean distance between features of subjects
dist_feature = data_feature %>%
select(-ID, -cluster, -treatment, -reward, -reward_res) %>%
dist() %>%
as.matrix()
summary(c(dist_feature))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.000 3.498 3.869 3.810 4.235 6.276
## Create an index data frame to select the corresponding row of the distance matrix
idx_data_feature = data.frame(ID=data_feature$ID, index=1:nrow(data_feature))
We split the data to 3 folds by letting nfolds_outer=3
. In each test fold, the tuning parameter in SVMs (cost of constraints violation) is selected from \(\{2^k:k=0,\pm1,\ldots,\pm8\}\) using another layer of 3-fold cross-validation (nfolds_inner=3
).
## User inputs
ksvm.grid = expand.grid(C = 2^(-8:8)) # a data frame of the tuning parameter
nfolds_outer = 3 # number of folds in the outer layer of the nested cross-validation
nfolds_inner = 3 # number of folds in the inner layer of the nested cross-validation
g_func = function(x){abs(x)} # weights on the outcome in support vector machines
max_size = 1 # maximum size of the matched set
## Fit M-learning model (around 1 minute)
set.seed(rseed)
# ?mlearn.wsvm.cv # explanation of arguments
ITR = mlearn.wsvm.cv(
data=data_feature, idx=idx_data_feature,
trts=levels(data_feature$treatment), max_size=max_size,
delta=max(dist_feature), dist_mat=dist_feature, g_func=g_func,
kernel="rbfdot", kpar="automatic",
nfolds_outer=nfolds_outer, nfolds_inner=nfolds_inner, tuneGrid=ksvm.grid, propensity=sub_prop,
foldid_outer=NULL
)
#> [1] "Outer fold 1 has 33 training samples."
#> [1] "Outer fold 1 is done."
#> [1] "Outer fold 2 has 33 training samples."
#> [1] "Outer fold 2 is done."
#> [1] "Outer fold 3 has 34 training samples."
#> [1] "Outer fold 3 is done."
summary(ITR)
#> Length Class Mode
#> fit 3 -none- list
#> foldid_outer 50 -none- numeric
#> prediction 5 data.frame list
head(ITR$prediction) # "vote" represents the recommended treatments for the subjects. "treatment" represents the observed treatments.
#> ID reward treatment vote fold
#> 1 2 2.7538025 treatment_1 treatment_1 2
#> 2 5 0.3607558 treatment_3 treatment_1 1
#> 3 7 3.5833142 treatment_1 treatment_1 3
#> 4 14 0.8700615 treatment_2 treatment_1 1
#> 5 15 0.4640756 treatment_2 treatment_1 1
#> 6 16 2.5734038 treatment_3 treatment_1 2
We extract the recommended treatments of the ITR, and then calculate the empirical value function (EVF) and misclassification rate as equation (3) and equation (4) in the supplementary material.
## create a function to calculate EVF
evf.func = function(reward, pi, x, y){
flag = (x==y)
return(sum((reward/pi)[flag])/sum((1/pi)[flag]))
}
## EVF and misclassification rate of observed and recommended treatments
summary_evf = ITR$prediction %>%
left_join(simdata %>% select(ID, pi_true, contains("opt")), # join the simdata to use true propensity score and optimal treatment information
by = "ID") %>%
group_by(fold) %>%
summarize(
evf_obs = evf.func(reward, pi_true, treatment, treatment),
mis_obs = 1-sum(treatment_opt == treatment)/n(),
evf_rec = evf.func(reward, pi_true, treatment, vote),
mis_rec = 1-sum(treatment_opt == vote)/n(),
n=n() # sample size in each fold
) %>%
as.data.frame
summary(summary_evf)
#> fold evf_obs mis_obs evf_rec mis_rec
#> Min. :1.0 Min. :0.8164 Min. :0.7059 Min. :0.8785 Min. :0
#> 1st Qu.:1.5 1st Qu.:0.9893 1st Qu.:0.7592 1st Qu.:1.4472 1st Qu.:0
#> Median :2.0 Median :1.1622 Median :0.8125 Median :2.0159 Median :0
#> Mean :2.0 Mean :1.1692 Mean :0.7806 Mean :2.1850 Mean :0
#> 3rd Qu.:2.5 3rd Qu.:1.3456 3rd Qu.:0.8180 3rd Qu.:2.8382 3rd Qu.:0
#> Max. :3.0 Max. :1.5289 Max. :0.8235 Max. :3.6604 Max. :0
#> n
#> Min. :16.00
#> 1st Qu.:16.50
#> Median :17.00
#> Mean :16.67
#> 3rd Qu.:17.00
#> Max. :17.00
Compared with the observed treatments, the recommended treatments by M-learning have a greater EVF (2.0159 vs 1.1622) and a lower misclassification rate (0 vs. 0.7806) on average. To reproduce the results in Table S1 and Figure S1 in Section S.1.3 of the supplementary material, you need to run the above scripts for group 2 to 4, and repeat the procedure for 100 times using rseed=1,...,100
, ntree=seq(500,5000,500)
, ksvm.grid = expand.grid(C = 2^(-15:15))
, nfolds_outer = 5
, and nfolds_outer = 5
. Each replication can be done by parallel computing.