2016-12-16 86 views
1

我的數據集的格式如下所示:如何將數值分類數據轉換爲張量流中的稀疏張量?

8,2,1,1,1,0,3,2,6,2,2,2,2 
8,2,1,2,0,0,15,2,1,2,2,2,1 
5,5,4,4,0,0,6,1,6,2,2,1,2 
8,2,1,3,0,0,2,2,6,2,2,2,2 
8,2,1,2,0,0,3,2,1,2,2,2,1 
8,2,1,4,0,1,3,2,1,2,2,2,1 
8,2,1,2,0,0,3,2,1,2,2,2,1 
8,2,1,3,0,0,2,2,6,2,2,2,2 
8,2,1,12,0,0,5,2,2,2,2,2,1 
3,1,1,2,0,0,3,2,1,2,2,2,1 

它由所有分類數據,各功能的數字編碼的。我試着用下面的代碼:

 monthly_income = tf.contrib.layers.sparse_column_with_keys("monthly_income", keys=['1','2','3','4','5','6']) 
     #Other columns are also declared in the same way 

     m = tf.contrib.learn.LinearClassifier(feature_columns=[ 
     caste, religion, differently_abled, nature_of_activity, school, dropout, qualification, 
     computer_literate, monthly_income, smoke,drink,tobacco,sex], 
     model_dir=model_dir) 

但我收到以下錯誤:

TypeError: Signature mismatch. Keys must be dtype <dtype: 'string'>, got <dtype: 'int64'>. 

回答

4

我認爲問題是,你所示的代碼外。我的猜測是,csv文件中的功能是作爲整數讀取的,但您希望它們是字符串,通過keys=['1', '2', ...]

然而,在這種情況下,我建議你使用sparse_column_with_integerized_feature

monthly_income = tf.contrib.layers.sparse_column_with_integerized_feature("monthly_income", bucket_size=7) 
相關問題