github 地址: https://github.com/HansRen1024/Image-Pre-Classification/tree/master
之前訓練 SVM 用了 PSO 太慢了。
這次比較幸運看到一篇關於調參的博客。
給了很大啟發。
地址: http://blog.csdn.net/han_xiaoyang/article/details/52663170
不具體針對某種分類算法詳細說了,這個真的需要大量實踐經驗,這也是我欠缺的。
我參考上面博客做了一些實驗,準確率是在逐步提升,但因為我特徵處理這塊做的不好,準確率提升不明顯。
再一點,我沒用文章中作者使用的調優接口 GridSearchCV,我是單純的使用 for 循環來做的。
我的代碼中只有 adaboost 的函數,其他方法可以參考我 上一篇博客
自己修改。
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 23 11:24:32 2018
author: hans
"""
from sklearn.externals import joblib
from sklearn.ensemble import AdaBoostClassifier
from sklearn import tree
filename = '_feature_rgb.pkl'
train_list = "train_all.txt"
test_list = "test_all.txt"
def adaboost(n):
clf = AdaBoostClassifier(tree.DecisionTreeClassifier(criterion='gini',max_depth=11, min_samples_split=400, \
min_samples_leaf=30,max_features=30,random_state=10), \
algorithm="SAMME", n_estimators=n, learning_rate=0.001,random_state=10)
return clf
def findBestParam():
X_train = joblib.load(train_list.split('.')[0]+filename)
y_train = joblib.load(train_list.split('.')[0]+'_label.pkl')
X_test = joblib.load(test_list.split('.')[0]+filename)
y_test = joblib.load(test_list.split('.')[0]+'_label.pkl')
best_test_score=0
best_train_score=0
best_param=0
for n in range(1500,1501,10):
clf = adaboost(n)
clf = clf.fit(X_train, y_train)
train_score = clf.score(X_train, y_train)
test_score = clf.score(X_test, y_test)
print ("--------------------------------\nCurrent train score: %.4f" %train_score)
print ("Current test score: %.4f" %test_score)
print ("Current param: %d" %n)
if test_score > best_test_score:
best_test_score = test_score
best_train_score = train_score
best_param = n
print ("--------------------------------\nBest train score: %.4f" %best_train_score)
print ("Best test score: %.4f" %best_test_score)
print ("Best param: %d" %best_param)
if __name__ == '__main__':
findBestParam()