breast {randomForestSRC} | R Documentation |
Recurrence of breast cancer from 198 breast cancer patients, all of which exhibited no evidence of distant metastases at the time of diagnosis. The first 30 features of the data describe characteristics of the cell nuclei present in the digitized image of a fine needle aspirate (FNA) of the breast mass.
The data were obtained from the UCI machine learning repository, see http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Prognostic).
## ------------------------------------------------------------ ## Standard analysis ## ------------------------------------------------------------ data(breast, package = "randomForestSRC") breast <- na.omit(breast) o <- rfsrc(status ~ ., data = breast, nsplit = 10) print(o) ## ------------------------------------------------------------ ## The data is imbalanced so we use balanced random forests ## with undersampling of the majority class ## ## Specifically let n0, n1 be sample sizes for majority, minority ## cases. We sample 2 x n1 cases with majority, minority cases chosen ## with probabilities n1/n, n0/n where n=n0+n1 ## ------------------------------------------------------------ y <- breast$status o <- rfsrc(status ~ ., data = breast, nsplit = 10, case.wt = randomForestSRC:::make.wt(y), sampsize = randomForestSRC:::make.size(y)) print(o)