2016-09-27 230 views
2

我讀一個CSV文件未找到,frmo我獲得這些列:熊貓 - 列的數據幀

encoding = "UTF-8-SIG" 
csv_file = "my/path/to/file.csv" 
fields_cols_mapping = { 
    'brand_id': 'Brand', 
    'custom_dashboard': 'Custom Dashboard LO', 
    'custom_dashboard_isfeatured': 'Custom Dashboard LO - Is Featured', 
    'description': 'LODescription', 
    'is_active': 'TrainingIsActive', 
    'lo_id': 'LOID', 
    'lo_type_id': 'LOType', 
    'timestamp': 'Timestamp', 
    'title': 'LOTitle', 
    'training_version_id': 'TrainingVersion' 
} 

dataframe = pd.read_csv(
     csv_file, 
     encoding=encoding, 
     sep='|', 
     usecols=[unicode(v) for v in fields_cols_mapping.values()], 
     dtype={ k: object for k in fields_cols_mapping.keys() }, 
    ) 

然而,儘管有IPDB檢查我發現,所謂的與read_csv解析器不轉換列名Custom Dashboard LO – Is Featured

# debug 
> /../../venvs/myvenv/lib/python2.7/site-packages/pandas/io/parsers.py(1140)__init__() 
1138    col_indices = [] 
1139    for u in self.usecols: 
-> 1140     if isinstance(u, string_types): 
1141      col_indices.append(self.names.index(u)) 
1142     else: 

ipdb> self 
<pandas.io.parsers.CParserWrapper object at 0x10b134710> 
ipdb> self.names 
[u'LOType', u'LOID', u'LOTitle', u'TrainingVersion', u'LODescription', u'TrainingIsActive', u'Custom Dashboard LO', u'Brand',  u'Custom Dashboard LO \u2013 Is Featured', u'Timestamp'] 

有沒有人對我應該做什麼有什麼建議?

回答

0

謝謝。我改變了字典值,但:

In [130]: dataframe = pd.read_csv(
    ...:    lo_csv_path, 
    ...:    encoding=encoding_l, 
    ...:    sep='|', 
    ...:    usecols=[unicode(v) for v in fields_cols_mapping.values()], 
    ...:    dtype={ k: object for k in fields_cols_mapping.keys() }, 
    ...:   ) 
--------------------------------------------------------------------------- 
UnicodeDecodeError      Traceback (most recent call  last) 
<ipython-input-130-670241506984> in <module>() 
     3    encoding=encoding_l, 
     4    sep='|', 
----> 5    usecols=[unicode(v) for v in fields_cols_mapping.values()], 
     6    dtype={ k: object for k in fields_cols_mapping.keys() }, 
     7  ) 

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 20: ordinal not in range(128) 
1

你的問題是數據框中的破折號與字典中的破折號不一樣。數據框中的一個是短劃線(\u2013),而字典中的一個是連字符(\u2010)。它們看起來相似,但它們不是同一個字符,所以字符串不匹配。