Fit a nested cross-validation of weighted kernel support vector machines (SVMs)

This function performs a nested cross-validation (on tuning parameters) of weighted SVMs for multicategory treatment comparisons and estimating individualized treatment rules.

mlearn.wsvm.cv(
  data,
  idx,
  trts,
  max_size,
  delta,
  dist_mat,
  g_func,
  kernel = "rbfdot",
  kpar = "automatic",
  nfolds_outer = 3,
  nfolds_inner = 3,
  tuneGrid,
  propensity,
  foldid_outer = NULL
)

Arguments

data	a data frame containing the ID (`ID`), outcome (`reward_res`), observed treatment (`treatment`), and health feature information of subjects.
idx	a data frame of two columns `ID` and `index`. `ID` records the IDs of subjects in `train_data`. `index` records the column indices of these subjects in `dist_mat`.
trts	a vector of treatment names.
max_size	an integer indicating the upper limit of the sizes of all matched sets. The default setting is `max_size=1` which means finding the nearest neighbors of subjects.
delta	a scalar, as defined in `equation (6)` in `Section 2.2` of the manuscript, indicating the upper limit of distances between a subject and the subjects in its matched set. In future versions, we will extend this argument to a vector which means the upper limit can vary with subjects.
dist_mat	a precalculated matrix of distances between subjects. This matrix must include all subjects in `train_data`.
g_func	a function that transforms the differences between outcomes of a set of subjects and the subjects in their matched sets to the weights in SVMs. In `equation (7)` in `Section 2.2` of the manuscript, `g(.) = \|.\|` and the weights are `\|Rj-Ri\|`.
kernel	the kernel function used in SVMs. Supported argument values can be found in `ksvm` and `dots`. Default: "rbfdot".
kpar	the list of hyper-parameters (kernel parameters). Valid parameters for supported kernels can be found in `ksvm` and `dots`. Default: "automatic".
nfolds_outer	the number of folds in the outer layer of the nested cross-validation. Default: 3.
nfolds_inner	the number of folds in the inner layer (for tuning parameters) of the nested cross-validation. Values greater than or equal to 3 usually yield better results. Default: 3.
tuneGrid	a data frame of tuning parameter(s). Each column for each parameter. Usually, the first column is the cost of constraints violation ("C"-constant) in SVMs.
propensity	a data frame with `K` columns, where `K` is the total number of treatments. The `k`th column is the propensity scores of assigning the `k`th treatment to subjects.
foldit_outer	(optional) a user-specified vector recording the split of folds in the outer layer of the nested cross-validation. This vector should match the number of rows in `data` and the number of treatments in `trts`.

Value

A list with 3 sublists as follows:

fit: a list with nfolds_outer sublists. The jth sublist contains the inner cross-validation result of the weighted SVM that used the jth fold of subjects as the test fold.
foldid_outer: a vector recording the split of folds in the outer layer of the nested cross-validation.
prediction: a matrix with 5 columns recording ID(ID), outcome (reward), observed treatment (treatment), recommended treatment (vote), and the fold in the cross-validation(fold) information of subjects.