rfcv2 creates a random forest model which has both mtry and ntree as tuning parameters for cross-validations. This function is an extension to random forest models that are currently supported by the train function of the caret package as all of those models just use mtry.

rfcv2(type)

Arguments

type

the type of the prediction problem. One of Regression and Classification.

Value

A function to be used in the train function of the caret package.

Examples

#> Loading required package: lattice
#> Loading required package: ggplot2
#> randomForest 4.6-14
#> Type rfNews() to see new features/changes/bug fixes.
#> #> Attaching package: 'randomForest'
#> The following object is masked from 'package:ggplot2': #> #> margin
library(mlbench) ####################################### ## Classification Example data(iris) set.seed(0) rf_class_fit = train(Species ~ ., data=iris, method=rfcv2("Classification"), tuneGrid=expand.grid( .mtry=seq(1,ncol(iris)-1, 1), .ntree=seq(100,500,100)), trControl=trainControl(method="cv")) print(rf_class_fit)
#> 150 samples #> 4 predictor #> 3 classes: 'setosa', 'versicolor', 'virginica' #> #> No pre-processing #> Resampling: Cross-Validated (10 fold) #> Summary of sample sizes: 135, 135, 135, 135, 135, 135, ... #> Resampling results across tuning parameters: #> #> mtry ntree Accuracy Kappa #> 1 100 0.9466667 0.92 #> 1 200 0.9466667 0.92 #> 1 300 0.9466667 0.92 #> 1 400 0.9400000 0.91 #> 1 500 0.9466667 0.92 #> 2 100 0.9533333 0.93 #> 2 200 0.9466667 0.92 #> 2 300 0.9533333 0.93 #> 2 400 0.9533333 0.93 #> 2 500 0.9533333 0.93 #> 3 100 0.9533333 0.93 #> 3 200 0.9533333 0.93 #> 3 300 0.9533333 0.93 #> 3 400 0.9533333 0.93 #> 3 500 0.9466667 0.92 #> 4 100 0.9533333 0.93 #> 4 200 0.9533333 0.93 #> 4 300 0.9533333 0.93 #> 4 400 0.9466667 0.92 #> 4 500 0.9466667 0.92 #> #> Accuracy was used to select the optimal model using the largest value. #> The final values used for the model were mtry = 2 and ntree = 100.
####################################### ## Regression Example data(BostonHousing) set.seed(0) rf_reg_fit = train(medv ~ ., data = BostonHousing, method=rfcv2("Regression"), tuneGrid=expand.grid( .mtry=seq(1,sqrt(ncol(BostonHousing)-1), 1), .ntree=seq(100,500,100)), trControl=trainControl(method="cv")) print(rf_reg_fit)
#> 506 samples #> 13 predictor #> #> No pre-processing #> Resampling: Cross-Validated (10 fold) #> Summary of sample sizes: 454, 455, 457, 454, 456, 455, ... #> Resampling results across tuning parameters: #> #> mtry ntree RMSE Rsquared MAE #> 1 100 4.307455 0.8231317 2.927802 #> 1 200 4.297767 0.8260105 2.897532 #> 1 300 4.321638 0.8229013 2.913916 #> 1 400 4.328720 0.8235357 2.914079 #> 1 500 4.306999 0.8263237 2.908991 #> 2 100 3.489209 0.8741194 2.326531 #> 2 200 3.416644 0.8819662 2.324176 #> 2 300 3.420260 0.8800818 2.305374 #> 2 400 3.428410 0.8805542 2.314938 #> 2 500 3.452766 0.8776414 2.324047 #> 3 100 3.189066 0.8935496 2.173963 #> 3 200 3.223522 0.8902936 2.183458 #> 3 300 3.172203 0.8941432 2.152793 #> 3 400 3.203657 0.8908199 2.180217 #> 3 500 3.208938 0.8911333 2.177033 #> #> RMSE was used to select the optimal model using the smallest value. #> The final values used for the model were mtry = 3 and ntree = 300.