2016-12-27 119 views
0

我很煩,讀csv文件。如何從csv中刪除多餘的引號?

我試過替換方法。但numpy不支持。

csv文件格式是這樣的。

"num","phone","sensorID","press","temp","accel","gps_lat","gps_lng","time" 
"1","null","A0:E6:F8:7B:16:EA","0","17","1.25","0","0","2016-12-14 13:34:59" 
"2","null","A0:E6:F8:7B:16:A9","0","18","1.19","0","0","2016-12-14 13:34:59" 
"3","null","A0:E6:F8:7B:15:A5","0","18","1.19","0","0","2016-12-14 13:34:59" 
"4","null","A0:E6:F8:7B:16:EA","0","17","1.25","0","0","2016-12-14 13:35:00" 
"5","null","A0:E6:F8:7B:16:A9","0","18","1.19","0","0","2016-12-14 13:35:00" 
"6","null","A0:E6:F8:7B:15:A5","0","19","1.38","0","0","2016-12-14 13:35:00" 
"7","null","A0:E6:F8:7B:16:D6","0","18","1.12","0","0","2016-12-14 13:35:01" 
"8","null","A0:E6:F8:7B:16:EA","0","17","1.31","0","0","2016-12-14 13:35:01" 
"9","null","A0:E6:F8:7B:15:A5","0","19","1.38","0","0","2016-12-14 13:35:01" 

但是當我在numpy.loadtxt使用此文件中的結果也正是如此

的源代碼

import numpy as np 
a= np.loadtxt('db_file.csv', delimiter=',', dtype='str', unpack=True) 
print a 

結果

[['"num"' '"1"' '"2"' ..., '"6979"' '"6980"' '"6981"'] 
['"phone"' '"null"' '"null"' ..., '" 821099631345"' '" 821099631345"' 
    '" 821099631345"'] 
['"sensorID"' '"A0:E6:F8:7B:16:EA"' '"A0:E6:F8:7B:16:A9"' ..., 
    '"A0:E6:F8:7B:16:EA"' '"A0:E6:F8:7B:16:A9"' '"A0:E6:F8:7B:16:D6"'] 
..., 
['"gps_lat"' '"0"' '"0"' ..., '37.596332"' '"37.596332"' '"37.596332"'] 
['"gps_lng"' '"0"' '"0"' ..., '"127.031773"' '"127.031773"' '"127.031773"'] 
['"time"' '"2016-12-14 13:34:59"' '"2016-12-14 13:34:59"' ..., 
    '"2016-12-15 00:03:11"' '"2016-12-15 00:03:11"' '"2016-12-15 00:03:12"']] 

我要去掉「這一個。

所以我很想要這個列表。

[['num', '1', '2' ..., '6979', '6980', '6981'] 
['phone', 'null', 'null' ..., '821099631345', ' 821099631345' 
    ' 821099631345'] 
['sensorID', 'A0:E6:F8:7B:16:EA', 'A0:E6:F8:7B:16:A9' ..., 
    'A0:E6:F8:7B:16:EA', 'A0:E6:F8:7B:16:A9', 'A0:E6:F8:7B:16:D6'] 
..., 
['gps_lat', '0', '0' ..., '37.596332' '37.596332' '37.596332'] 
['gps_lng' '0' '0' ..., '127.031773' '127.031773' '127.031773'] 
['time' '2016-12-14 13:34:59' '2016-12-14 13:34:59' ..., 
    '2016-12-15 00:03:11' '2016-12-15 00:03:11' '2016-12-15 00:03:12']] 

我用什麼代碼?

+0

主題行需要更正。 – hpaulj

+0

'pd.read_csv'似乎處理這個文件沒有問題。我們也可以使'genfromtxt'工作,但是如果你有'pandas',那會更簡單。 – hpaulj

+0

這裏有用嗎? http://stackoverflow.com/questions/2664790/reading-csv-files-in-numpy-where-delimiter-is –

回答

0

隨着熊貓,我得到:

In [1278]: pd.read_csv('stack41338622.txt') 
Out[1278]: 
    num phone   sensorID press temp accel gps_lat gps_lng \ 
0 1 null A0:E6:F8:7B:16:EA  0 17 1.25  0  0 
1 2 null A0:E6:F8:7B:16:A9  0 18 1.19  0  0 
2 3 null A0:E6:F8:7B:15:A5  0 18 1.19  0  0 
3 4 null A0:E6:F8:7B:16:EA  0 17 1.25  0  0 
4 5 null A0:E6:F8:7B:16:A9  0 18 1.19  0  0 
5 6 null A0:E6:F8:7B:15:A5  0 19 1.38  0  0 
6 7 null A0:E6:F8:7B:16:D6  0 18 1.12  0  0 
7 8 null A0:E6:F8:7B:16:EA  0 17 1.31  0  0 
8 9 null A0:E6:F8:7B:15:A5  0 19 1.38  0  0 

        time 
0 2016-12-14 13:34:59 
1 2016-12-14 13:34:59 
2 2016-12-14 13:34:59 
3 2016-12-14 13:35:00 
4 2016-12-14 13:35:00 
5 2016-12-14 13:35:00 
6 2016-12-14 13:35:01 
7 2016-12-14 13:35:01 
8 2016-12-14 13:35:01 

隨着convertersReading CSV files in numpy where delimiter is ","描述,我們可以去掉多餘的報價不幸的是。不再適用於轉換器,所以我們必須說明這一點。這裏是一個開始:

In [1327]: def foo(astr): 
     ...:  return astr[1:-1] 
In [1328]: convs = dict((col, foo) for col in range(9)) 
In [1329]: dt = ['i','S10','S20','i', 'i','f','i','i','S20'] 
In [1330]: data = np.genfromtxt('stack41338622.txt', dtype=dt, delimiter=',', names=True, converters=convs) 
In [1331]: data 
Out[1331]: 
array([ (1, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:34:59'), 
     (2, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:34:59'), 
     (3, b'null', b'A0:E6:F8:7B:15:A5', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:34:59'), 
     (4, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:35:00'), 
     (5, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.190000057220459, 0, 0, b'2016-12-14 13:35:00'), 
     (6, b'null', b'A0:E6:F8:7B:15:A5', 0, 19, 1.3799999952316284, 0, 0, b'2016-12-14 13:35:00'), 
     (7, b'null', b'A0:E6:F8:7B:16:D6', 0, 18, 1.1200000047683716, 0, 0, b'2016-12-14 13:35:01'), 
     (8, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.309999942779541, 0, 0, b'2016-12-14 13:35:01'), 
     (9, b'null', b'A0:E6:F8:7B:15:A5', 0, 19, 1.3799999952316284, 0, 0, b'2016-12-14 13:35:01')], 
     dtype=[('num', '<i4'), ('phone', 'S10'), ('sensorID', 'S20'), ('press', '<i4'), ('temp', '<i4'), ('accel', '<f4'), ('gps_lat', '<i4'), ('gps_lng', '<i4'), ('time', 'S20')]) 

考慮的時候,我對這個花費的金額,我傾向於去與其他建議 - 在文本編輯器去掉多餘的報價。這些引號在逗號分隔的文件中是不需要的,更多的是令人討厭而不是幫助。

在一個編輯,我只是​​刪除了"

num,phone,sensorID,press,temp,accel,gps_lat,gps_lng,time 
1,null,A0:E6:F8:7B:16:EA,0,17,1.25,0,0,2016-12-14 13:34:59 
2,null,A0:E6:F8:7B:16:A9,0,18,1.19,0,0,2016-12-14 13:34:59 
3,null,A0:E6:F8:7B:15:A5,0,18,1.19,0,0,2016-12-14 13:34:59 
4,null,A0:E6:F8:7B:16:EA,0,17,1.25,0,0,2016-12-14 13:35:00 
5,null,A0:E6:F8:7B:16:A9,0,18,1.19,0,0,2016-12-14 13:35:00 
... 

In [1336]: data = np.genfromtxt('stack41338622_1.txt', dtype=None, delimiter=',', names=True) 
In [1337]: data 
Out[1337]: 
array([ (1, b'null', b'A0:E6:F8:7B:16:EA', 0, 17, 1.25, 0, 0, b'2016-12-14 13:34:59'), 
     (2, b'null', b'A0:E6:F8:7B:16:A9', 0, 18, 1.19, 0, 0, b'2016-12-14 13:34:59'), 
     (3, b'null', b'A0:E6:F8:7B:15:A5', 0, 18, 1.19, 0, 0, b'2016-12-14 13:34:59'), 
     ..., 
     dtype=[('num', '<i4'), ('phone', 'S4'), ('sensorID', 'S17'), ('press', '<i4'), ('temp', '<i4'), ('accel', '<f8'), ('gps_lat', '<i4'), ('gps_lng', '<i4'), ('time', 'S19')]) 

b''是表示字節串的Python3方式。你不會在Py2中看到這些。