模型评估的方法
最后发布时间:2022-03-23 16:33:30
浏览量:
留出法(Hold-out)
- 随机抽样,一部分作为训练集,另外一部分作为验证集
交叉检验法(cross vaildation)
- 将数据集划分为大小相同的k份
- 每一次将其中一份作为测试集,剩余的k-1份作为训练集
- 以k次测试结果的平均值作为最终的测试误差
SAMD11 NOC2L KLHL17 PLEKHN1
GSM5576716 3.013174 6.577959 5.494301 1.7501401
GSM5576717 1.133067 6.177461 4.656629 0.1330668
GSM5576718 3.588339 6.600312 5.488803 3.5883389
GSM5576719 1.267004 5.821593 4.910860 0.2670040
cv_kfold <- function (data,k=10,seed=2022){
n_row <- nrow(data)
n_foldmarkers <- rep(1:k, ceiling(n_row/k))[1:n_row]
set.seed(seed)
n_foldmarkers <- sample(n_foldmarkers)
k_fold <- lapply(1:k,function (i){
(1:n_row)[n_foldmarkers==i]
})
return(k_fold)
}
[[1]]
[1] 2 18 31 36 42 48 57 81 82 86 88 102
[[2]]
[1] 10 20 23 29 33 34 49 56 60 62 96 97
....
[[10]]
[1] 1 25 27 32 47 72 76 79 80 110 112
sp <- Sys.time()
cat(as.character(sp),"\n")
kfolds <- cv_kfold(input)
for (i in 1:length(kfolds)){
curr_fold <- kfolds[[i]]
train_set <- input[-curr_fold,]
test_set <- input[curr_fold,]
predicted_train <- kknn(group~.,
train=train_set,
test=train_set,
k=best_k,
kernel = best_kernel)$fit
imetrics("kknn","Train",predicted_train,train_set$group)
predicted_test <- kknn(group~.,
train=train_set,
test=test_set,
k=best_k,
kernel = best_kernel)$fit
imetrics("kknn","test",predicted_train,train_set$group)
}
ep <- Sys.time()
cat(as.character(ep),"\n")
difftime(ep,sp,units = "secs")
自助法(out-of-bag)
- 采取有放回抽样,产生训练集
- 有36%左右的样本不会被抽到,作为测试集
- 在随机森林等组合学习的算法中使用较多
评估多个模型方法
类别不均衡:查全率、查准率;类别相对均衡:错误率、正确率
global_performance <- NULL
imetrics <- function (method,type,predicted,actual){
con_table <- table(predicted,actual)
cur_one <- data.frame(method=method,
type=type,
accuray=sum(diag(con_table)) /sum(con_table),
error_rate=1-accuray)
assign("global_performance",
rbind(get("global_performance",envir = .GlobalEnv),
cur_one),
envir = .GlobalEnv)
}