2015-11-07 124 views
0

所以我有一個詞的語料庫我正在運行TFIDF,然後嘗試使用Logistic迴歸和GridSearch進行分類。Python ScikitLearn與TFIDF的GridSearchCV問題 - JobLibValueError?

但我發現了一個巨大的錯誤,當我運行GridSearch ..錯誤是這樣的(這是更長的時間,但我只是複製並粘貼一點點):

An unexpected error occurred while tokenizing input file /Users/yongcho822/anaconda/lib/python2.7/site-packages/sklearn/base.pyc 
The following traceback may be corrupted or invalid 
The error message is: ('EOF in multi-line statement', (2, 0)) 
An unexpected error occurred while tokenizing input file /Users/yongcho822/anaconda/lib/python2.7/site-packages/sklearn/base.pyc 
The following traceback may be corrupted or invalid 
The error message is: ('EOF in multi-line statement', (2, 0)) 


--------------------------------------------------------------------------- 
JoblibValueError       Traceback (most recent call last) 
<ipython-input-43-7c8b397eb30b> in <module>() 
----> 1 gs_lr_tfidf.fit(X_train, y_train) 

/Users/yongcho822/anaconda/lib/python2.7/site-packages/sklearn/grid_search.pyc in fit(self, X, y) 
    802 
    803   """ 
--> 804   return self._fit(X, y, ParameterGrid(self.param_grid)) 
    805 
    806 

/Users/yongcho822/anaconda/lib/python2.7/site-packages/sklearn/grid_search.pyc in _fit(self, X, y, parameter_iterable) 
    551          self.fit_params, return_parameters=True, 
    552          error_score=self.error_score) 
--> 553     for parameters in parameter_iterable 
    554     for train, test in cv) 
    555 

/Users/yongcho822/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable) 
    810     # consumption. 
    811     self._iterating = False 
--> 812    self.retrieve() 
    813    # Make sure that we get a last message telling us we are done 
    814    elapsed_time = time.time() - self._start_time 

/Users/yongcho822/anaconda/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in retrieve(self) 
    760       # a working pool as they expect. 
    761       self._initialize_pool() 
--> 762     raise exception 
    763 
    764  def __call__(self, iterable): 

JoblibValueError: JoblibValueError 
___________________________________________________________________________ 
Multiprocessing exception: 
........................................................................... 
/Users/yongcho822/anaconda/lib/python2.7/runpy.py in _run_module_as_main(mod_name='IPython.kernel.__main__', alter_argv=1) 
    157  pkg_name = mod_name.rpartition('.')[0] 
    158  main_globals = sys.modules["__main__"].__dict__ 
    159  if alter_argv: 
    160   sys.argv[0] = fname 
    161  return _run_code(code, main_globals, None, 
--> 162      "__main__", fname, loader, pkg_name) 
     fname = '/Users/yongcho822/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py' 
     loader = <pkgutil.ImpLoader instance> 
     pkg_name = 'IPython.kernel' 
    163 
    164 def run_module(mod_name, init_globals=None, 
    165    run_name=None, alter_sys=False): 
    166  """Execute a module's code without importing it 

........................................................................... 
/Users/yongcho822/anaconda/lib/python2.7/runpy.py in _run_code(code=<code object <module> at 0x1033028b0, file "/Use...ite-packages/IPython/kernel/__main__.py", line 1>, run_globals={'__builtins__': <module '__builtin__' (built-in)>, '__doc__': None, '__file__': '/Users/yongcho822/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': 'IPython.kernel', 'app': <module 'IPython.kernel.zmq.kernelapp' from '/Us.../site-packages/IPython/kernel/zmq/kernelapp.pyc'>}, init_globals=None, mod_name='__main__', mod_fname='/Users/yongcho822/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py', mod_loader=<pkgutil.ImpLoader instance>, pkg_name='IPython.kernel') 
    67   run_globals.update(init_globals) 
    68  run_globals.update(__name__ = mod_name, 
    69      __file__ = mod_fname, 
    70      __loader__ = mod_loader, 
    71      __package__ = pkg_name) 
---> 72  exec code in run_globals 
     code = <code object <module> at 0x1033028b0, file "/Use...ite-packages/IPython/kernel/__main__.py", line 1> 
     run_globals = {'__builtins__': <module '__builtin__' (built-in)>, '__doc__': None, '__file__': '/Users/yongcho822/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py', '__loader__': <pkgutil.ImpLoader instance>, '__name__': '__main__', '__package__': 'IPython.kernel', 'app': <module 'IPython.kernel.zmq.kernelapp' from '/Us.../site-packages/IPython/kernel/zmq/kernelapp.pyc'>} 
    73  return run_globals 
    74 
    75 def _run_module_code(code, init_globals=None, 
    76      mod_name=None, mod_fname=None, 

........................................................................... 
/Users/yongcho822/anaconda/lib/python2.7/site-packages/IPython/kernel/__main__.py in <module>() 
     1 
     2 
----> 3 
     4 if __name__ == '__main__': 
     5  from IPython.kernel.zmq import kernelapp as app 
     6  app.launch_new_instance() 
     7 
     8 
     9 
    10 

我在做什麼錯?這是我在做什麼:

X_train, X_test, y_train, y_test = train_test_split(train_X_tfidf_DF.values, train_Y, test_size=0.25, random_state=1) 

X_train.shape, type(X_train), y_train.shape, type(y_train) 
>>>((29830, 6648), numpy.ndarray, (29830,), numpy.ndarray) 

X_train[:2] 
>>>array([[ 0., 0., 0., ..., 0., 0., 0.], 
     [ 0., 0., 0., ..., 0., 0., 0.]]) 

y_train[:2] 
>>>array([11, 16]) 

param_grid = [{'clf__penalty': ['l1', 'l2'], 
       'clf__C': [1.0, 10.0, 100.0]}] 

gs_lr_tfidf = GridSearchCV(estimator = LogisticRegression(), 
          param_grid = param_grid, 
          scoring = 'accuracy', 
          cv = 5, verbose = 1, n_jobs = -1) 

gs_lr_tfidf.fit(X_train, y_train) 
(this is where the error pops up) 
+1

看起來像多處理模塊存在一些問題。你試過設置'n_jobs = 1' – Sebastian

+0

@ SebastianRaschka這對我來說很合適,我的意思是,它解決了令人困惑的問題'在標記輸入文件.../sklearn/base.pyc錯誤時出現意外錯誤並顯示了實際錯誤。在我的情況下,實際的問題是一個不正確的參數鍵。 – Poulsbo

+0

我也是。不知何故,在那裏有一個負面的低頭:'n_jobs = -1' –

回答

0

我偶然發現了類似的問題。首先將n_jobs設置爲1,然後運行代碼,結果你會得到真正的錯誤信息,修復錯誤並返回n_jobs = -1