パラメーター C　線形SVMのハイパーパラメーター

SVMにもロジスティック回帰と同様に分類の誤りの許容度を示すCがパラメーターとして定義されています。
使い方もロジスティック回帰と同様です。

SVMはロジスティック回帰に比べてCによるデータのラベルの予測値変動が激しいです。
SVMのアルゴリズムはロジスティック回帰にくらべてより一般化された境界線を得るため、誤りの許容度が上下するとサポートベクターが変化し、ロジスティック回帰よりも正解率が上下することになります。

線形SVMモデルではCの初期値は1.0です。

モジュールはLinearSVCを利用します。

import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC
from sklearn.datasets import make_classification
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
%matplotlib inline

# データの生成
X, y = make_classification(
    n_samples=1250, n_features=4, n_informative=2, n_redundant=2, random_state=42)
train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=42)

# Cの値の範囲を設定(今回は1e-5,1e-4,1e-3,0.01,0.1,1,10,100,1000,10000)
C_list = [10 ** i for i in range(-5, 5)]

# グラフ描画用の空リストを用意
svm_train_accuracy = []
svm_test_accuracy = []
log_train_accuracy = []
log_test_accuracy = []

# 以下にコードを書いてください。

# コードの編集はここまでです。
    
# グラフの準備
# semilogx()はxのスケールを10のx乗のスケールに変更する

fig = plt.figure(figsize=(16, 6))
plt.subplots_adjust(wspace=0.4, hspace=0.4)
ax = fig.add_subplot(1, 2, 1)
ax.grid(True)
ax.set_title("SVM")
ax.set_xlabel("C")
ax.set_ylabel("accuracy")
ax.semilogx(C_list, svm_train_accuracy, label="accuracy of train_data")
ax.semilogx(C_list, svm_test_accuracy, label="accuracy of test_data")
ax.legend()
ax.plot()

ax = fig.add_subplot(1, 2, 2)
ax.grid(True)
ax.set_title("LogisticRegression")
ax.set_xlabel("C")
ax.set_ylabel("accuracy")
ax.semilogx(C_list, log_train_accuracy, label="accuracy of train_data")
ax.semilogx(C_list, log_test_accuracy, label="accuracy of test_data")
ax.legend()
ax.plot()
plt.show()

import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression

from sklearn.svm import LinearSVC

from sklearn.datasets import make_classification

from sklearn import preprocessing

from sklearn.model_selection import train_test_split

%matplotlib inline

# データの生成

X, y = make_classification(

n_samples=1250, n_features=4, n_informative=2, n_redundant=2, random_state=42)

train_X, test_X, train_y, test_y = train_test_split(X, y, random_state=42)

# Cの値の範囲を設定(今回は1e-5,1e-4,1e-3,0.01,0.1,1,10,100,1000,10000)

C_list = [10 ** i for i in range(-5, 5)]

# グラフ描画用の空リストを用意

svm_train_accuracy = []

svm_test_accuracy = []

log_train_accuracy = []

log_test_accuracy = []

# 以下にコードを書いてください。

# コードの編集はここまでです。

# グラフの準備

# semilogx()はxのスケールを10のx乗のスケールに変更する

fig = plt.figure(figsize=(16, 6))

plt.subplots_adjust(wspace=0.4, hspace=0.4)

ax = fig.add_subplot(1, 2, 1)

ax.grid(True)

ax.set_title("SVM")

ax.set_xlabel("C")

ax.set_ylabel("accuracy")

ax.semilogx(C_list, svm_train_accuracy, label="accuracy of train_data")

ax.semilogx(C_list, svm_test_accuracy, label="accuracy of test_data")

ax.legend()

ax.plot()

ax = fig.add_subplot(1, 2, 2)

ax.grid(True)

ax.set_title("LogisticRegression")

ax.set_xlabel("C")

ax.set_ylabel("accuracy")

ax.semilogx(C_list, log_train_accuracy, label="accuracy of train_data")

ax.semilogx(C_list, log_test_accuracy, label="accuracy of test_data")

ax.legend()

ax.plot()

plt.show()

[`yahoo` not found]

Reader Interactions

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル