K折交叉验证

交叉验证对于不平衡的数据需采用分层采样。

from sklearn.model_selection import StratifiedKFold

accuracy_list=[] 
skf=StratifiedKFold(n_splits=10,random_state=0)#进行10折交叉验证，且是分层采样
for train_index,test_index in skf.split(x,y):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]
    clf = GradientBoostingClassifier(n_estimators=300, learning_rate=0.1, max_depth=2, min_samples_split=200,min_samples_leaf=6)  # 0.847
    clf.fit(x_train,y_train) #训练模型
    score=clf.score(x_test,y_test) #计算准确率
    accuracy_list.append(score)
#10次验证结果的平均值作为最终的评估结果
print('accuracy:%.3f +/- %.3f' %(np.mean(accuracy_list),np.std(accuracy_list)))

上述例子是以accuarcy作为评价指标，可以是其他指标，比如precision,recall,f1-score，同样是取10次的平均值。

划分训练集和测试集

如果只是简单地划分训练集和测试集，使用train_test_split，可以实现分层的划分

from sklearn.model_selection import  train_test_split
#test_size=0.2是测试集占总数据集的比例，random_state=0说明0式随机数生成器的种子，则每次分配的数据相同
#shuffle=True表示在划分数据之前先打乱数据，如果shuffle=False,则stratify为None
#stratify=y表示按照y的类别进行分层划分
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0,shuffle=True,stratify=y)

K折交叉验证

K折交叉验证

results matching ""

No results matching ""