Customized function for training random forest

rfcv2 creates a random forest model which has both mtry and ntree as tuning parameters for cross-validations. This function is an extension to random forest models that are currently supported by the train function of the caret package as all of those models just use mtry.

rfcv2(type)

Arguments

type	the type of the prediction problem. One of `Regression` and `Classification`.

Value

A function to be used in the train function of the caret package.

Examples

library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(randomForest)
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
#> 
#> Attaching package: 'randomForest'
#> The following object is masked from 'package:ggplot2':
#> 
#>     margin
library(mlbench)

#######################################
## Classification Example
data(iris)

set.seed(0)
rf_class_fit = train(Species ~ .,
                     data=iris,
                     method=rfcv2("Classification"),
                     tuneGrid=expand.grid(
                       .mtry=seq(1,ncol(iris)-1, 1),
                       .ntree=seq(100,500,100)),
                     trControl=trainControl(method="cv"))
print(rf_class_fit)
#> 150 samples
#>   4 predictor
#>   3 classes: 'setosa', 'versicolor', 'virginica' 
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... 
#> Resampling results across tuning parameters:
#> 
#>   mtry  ntree  Accuracy   Kappa
#>   1     100    0.9466667  0.92 
#>   1     200    0.9466667  0.92 
#>   1     300    0.9466667  0.92 
#>   1     400    0.9400000  0.91 
#>   1     500    0.9466667  0.92 
#>   2     100    0.9533333  0.93 
#>   2     200    0.9466667  0.92 
#>   2     300    0.9533333  0.93 
#>   2     400    0.9533333  0.93 
#>   2     500    0.9533333  0.93 
#>   3     100    0.9533333  0.93 
#>   3     200    0.9533333  0.93 
#>   3     300    0.9533333  0.93 
#>   3     400    0.9533333  0.93 
#>   3     500    0.9466667  0.92 
#>   4     100    0.9533333  0.93 
#>   4     200    0.9533333  0.93 
#>   4     300    0.9533333  0.93 
#>   4     400    0.9466667  0.92 
#>   4     500    0.9466667  0.92 
#> 
#> Accuracy was used to select the optimal model using the largest value.
#> The final values used for the model were mtry = 2 and ntree = 100.

#######################################
## Regression Example
data(BostonHousing)

set.seed(0)
rf_reg_fit = train(medv ~ .,
                   data = BostonHousing,
                   method=rfcv2("Regression"),
                   tuneGrid=expand.grid(
                     .mtry=seq(1,sqrt(ncol(BostonHousing)-1), 1),
                     .ntree=seq(100,500,100)),
                   trControl=trainControl(method="cv"))
print(rf_reg_fit)
#> 506 samples
#>  13 predictor
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 454, 455, 457, 454, 456, 455, ... 
#> Resampling results across tuning parameters:
#> 
#>   mtry  ntree  RMSE      Rsquared   MAE     
#>   1     100    4.307455  0.8231317  2.927802
#>   1     200    4.297767  0.8260105  2.897532
#>   1     300    4.321638  0.8229013  2.913916
#>   1     400    4.328720  0.8235357  2.914079
#>   1     500    4.306999  0.8263237  2.908991
#>   2     100    3.489209  0.8741194  2.326531
#>   2     200    3.416644  0.8819662  2.324176
#>   2     300    3.420260  0.8800818  2.305374
#>   2     400    3.428410  0.8805542  2.314938
#>   2     500    3.452766  0.8776414  2.324047
#>   3     100    3.189066  0.8935496  2.173963
#>   3     200    3.223522  0.8902936  2.183458
#>   3     300    3.172203  0.8941432  2.152793
#>   3     400    3.203657  0.8908199  2.180217
#>   3     500    3.208938  0.8911333  2.177033
#> 
#> RMSE was used to select the optimal model using the smallest value.
#> The final values used for the model were mtry = 3 and ntree = 300.