LinearSVC和SVC（kernel =「linear」）有什麼區別？

我發現sklearn.svm.LinearSVC和sklearn.svm.SVC(kernel='linear')，他們看起來和我很相似，但我在路透社得到了非常不同的結果。LinearSVC和SVC（kernel =「linear」）有什麼區別？

sklearn.svm.LinearSVC: 81.05% in 28.87s train/ 9.71s test 
sklearn.svm.SVC  : 33.55% in 6536.53s train/2418.62s test

兩者都有一個線性內核。該LinearSVC的耐受性比SVC的一個更高：

LinearSVC(C=1.0, tol=0.0001, max_iter=1000, penalty='l2', loss='squared_hinge', dual=True, multi_class='ovr', fit_intercept=True, intercept_scaling=1) 
SVC  (C=1.0, tol=0.001, max_iter=-1, shrinking=True, probability=False, cache_size=200, decision_function_shape=None)

如何兩種功能方面不同？即使我設置了kernel='linear,tol=0.0001,max_iter=1000 and decision_function_shape ='ovr'the SVC takes much longer than LinearSVC`。爲什麼？

我用sklearn 0.18，兩者都包裹在OneVsRestClassifier。我不確定這是否與multi_class='ovr'/decision_function_shape='ovr'相同。

來源

2017-07-29 Martin Thoma

您可以升級到0.18.2並查看結果是否仍然不同？ – sera

我相信版本不是這種情況。 'sklearn'文檔包含了擬合這些分類器的例子。結果會因模型使用的方法而有所不同。 –

已經有一些討論，可能檢查這些： https://stackoverflow.com/questions/33843981/under-what-parameters-are-svc-and-linearsvc-in-scikit-learn-equivalent 和 https://stackoverflow.com/questions/35076586/linearsvc-vs-svckernel-linear-conflicting-arguments – phev8

真正地，LinearSVC和SVC(kernel='linear')產生不同的結果，即度量分數和決策邊界，因爲他們使用不同的方法。下面的玩具例子證明了這一點：

from sklearn.datasets import load_iris 
from sklearn.svm import LinearSVC, SVC 

X, y = load_iris(return_X_y=True) 

clf_1 = LinearSVC().fit(X, y) # possible to state loss='hinge' 
clf_2 = SVC(kernel='linear').fit(X, y) 

score_1 = clf_1.score(X, y) 
score_2 = clf_2.score(X, y) 

print('LinearSVC score %s' % score_1) 
print('SVC score %s' % score_2) 

-------------------------- 
>>> 0.96666666666666667 
>>> 0.98666666666666669

這種差異的主要原則如下：

默認縮放，LinearSVC最小平方鉸鏈損失，同時SVC最大限度地減少了常規鉸鏈損失。可以手動爲LinearSVC中的loss參數定義「鉸鏈」字符串。
LinearSVC使用一對多（也稱爲One-vs-Rest）多類減少，而SVC使用One-vs-One多類減少。它也被注意到here。此外，對於多類分類問題SVC適合N * (N - 1)/2模型，其中N是類的數量。相反，LinearSVC只適用於N型號。如果分類問題是二元的，那麼只有一個模型適用於這兩種情況。 multi_class和decision_function_shape參數沒有任何共同之處。第二個是聚合器，它將決策函數的結果轉換成(n_features, n_samples)的便利形狀。 multi_class是一種建立解決方案的算法方法。
LinearSVC的基礎估計量爲liblinear，這實際上會懲罰截距。 SVC使用libsvm估計值沒有。 liblinear估計器針對線性（特殊）情況進行了優化，因此在大量數據上收斂速度快於libsvm。這就是爲什麼LinearSVC花費更少的時間解決問題的原因。

實際上，LinearSVC實際上在截距縮放之後並不是線性的，因爲它在註釋部分中陳述過。

來源

2017-07-29 14:41:41

LinearSVC和SVC（kernel =「linear」）有什麼區別？

回答

相關問題