This function performs a cross-validation (on tuning parameters) of weighted SVMs for multicategory treatment comparisons.

mlearn.wsvm.tune(
  data,
  idx,
  trts,
  max_size,
  delta,
  dist_mat,
  g_func,
  kernel = "rbfdot",
  kpar = "automatic",
  nfolds_inner = 3,
  tuneGrid,
  propensity
)

Arguments

data

a data frame containing the ID (ID), outcome (reward), outcome residual (reward_res), observed treatment (treatment), and health feature information of subjects.

idx

a data frame of two columns ID and index. ID records the IDs of subjects in train_data. index records the column indices of these subjects in dist_mat.

trts

a vector of treatment names.

max_size

an integer indicating the upper limit of the sizes of all matched sets. The default setting is max_size=1 which means finding the nearest neighbors of subjects.

delta

a scalar, as defined in equation (6) in Section 2.2 of the manuscript, indicating the upper limit of distances between a subject and the subjects in its matched set.
In future versions, we will extend this argument to a vector which means the upper limit can vary with subjects.

dist_mat

a precalculated matrix of distances between subjects. This matrix must include all subjects in train_data.

g_func

a function that transforms the differences between outcomes of a set of subjects and the subjects in their matched sets to the weights in SVMs. In equation (7) in Section 2.2 of the manuscript, g(.) = |.| and the weights are |Rj-Ri|.

kernel

the kernel function used in SVMs. Supported argument values can be found in ksvm and dots. Default: "rbfdot".

kpar

the list of hyper-parameters (kernel parameters). Valid parameters for supported kernels can be found in ksvm and dots. Default: "automatic".

nfolds_inner

the number of folds in the cross-validation. Values greater than or equal to 3 usually yield better results. Default: 3.

tuneGrid

a data frame of tuning parameter(s). Each column for each parameter. Usually, the first column is the cost of constraints violation ("C"-constant) in SVMs.

propensity

a data frame with K columns, where K is the total number of treatments. The kth column is the propensity scores of assigning the kth treatment to subjects.

Value

A list with 7 sublists as follows:

  • best_fit: the final weighted SVM using the best tuning parameter(s).

  • params: the list of tuning parameter(s) used to train the model. Same as tuneGrid.

  • best_param: the best tuning parameter(s).

  • best_idx: the index of the best tuning parameter(s) in tuneGrid/params.

  • cv_mat: the matrix of the metric values for the cross-validation.

  • cv_est: the cross-validation estimators (row means of cv_mat).

  • foldid_inner: a vector recording the split of folds.