的Python + CSV：從CSV列

輸入文件總結一下類似的價值觀：的Python + CSV：從CSV列

$ cat dummy.csv 
OS,A,B,C,D,E 
Ubuntu,0,1,0,1,1 
Windows,0,0,1,1,1 
Mac,1,0,1,0,0 
Ubuntu,1,1,1,1,0 
Windows,0,0,1,1,0 
Mac,1,0,1,1,1 
Ubuntu,0,1,0,1,1 
Ubuntu,0,0,1,1,1 
Ubuntu,1,0,1,0,0 
Ubuntu,1,1,1,1,0 
Mac,0,0,1,1,0 
Mac,1,0,1,1,1 
Windows,1,1,1,1,0 
Ubuntu,0,0,1,1,0 
Windows,1,0,1,1,1 
Mac,0,1,0,1,1 
Windows,0,0,1,1,1 
Mac,1,0,1,0,0 
Windows,1,1,1,1,0 
Mac,0,0,1,1,0

預期輸出：

OS,A,B,C,D,E 
Mac,4,1,6,5,3 
Ubuntu,3,4,5,6,3 
Windows,3,2,6,6,3

我使用Excel的數據透視表上面的輸出產生。

mycode的：

import csv 
import pprint 
from collections import defaultdict 

d = defaultdict(dict) 

with open('dummy.csv') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
     d[row['OS']]['A'] += row['A'] 
     d[row['OS']]['B'] += row['B'] 
     d[row['OS']]['C'] += row['C'] 
     d[row['OS']]['D'] += row['D'] 
     d[row['OS']]['E'] += row['E'] 

pprint.pprint(d)

錯誤：

$ python3 dummy.py 
Traceback (most recent call last): 
    File "dummy.py", line 10, in <module> 
    d[row['OS']]['A'] += row['A'] 
KeyError: 'A'

我的想法是讓累加到字典中的CSV值稍後打印。但是，當我嘗試添加值時，出現以上錯誤。

這似乎可以通過內置的csv模塊實現。我認爲這是一本容易些:(任何指針將有很大的幫助。

來源

2016-12-28 slayedbylucifer

有兩個問題：嵌套字典最初沒有設置任何鍵，因此d[row[OS]]['A']會導致錯誤;另一個問題是您需要在添加列值之前將列值轉換爲int。

您可以使用Counter以來有丟失的鑰匙默認defaultdict值0：

import csv 
from collections import Counter, defaultdict 

d = defaultdict(Counter) 

with open('dummy.csv') as csvfile: 
    reader = csv.DictReader(csvfile) 

    for row in reader: 
     nested = d[row.pop('OS')] 
     for k, v in row.items(): 
      nested[k] += int(v) 

print(*d.items(), sep='\n')

輸出：

('Ubuntu', Counter({'D': 6, 'C': 5, 'B': 4, 'E': 3, 'A': 3})) 
('Windows', Counter({'C': 6, 'D': 6, 'E': 3, 'A': 3, 'B': 2})) 
('Mac', Counter({'C': 6, 'D': 5, 'A': 4, 'E': 3, 'B': 1}))

來源

2016-12-28 13:53:08 niemmi

d是一本字典，所以d[row['OS']]是有效的表達式，但d[row['OS']]['A']預計字典項是某種類型的集合。既然你沒」 t提供默認值，它將代替None，這不是

來源

2016-12-28 13:51:59

這不回答你的問題完全相同，因爲它確實是可以解決使用csv問題，但值得一提的是pandas非常適合這樣的事情：

In [1]: import pandas as pd 

In [2]: df = pd.read_csv('dummy.csv') 

In [3]: df.groupby('OS').sum() 
Out[3]: 
     A B C D E 
OS 
Mac  4 1 6 5 3 
Ubuntu 3 4 5 6 3 
Windows 3 2 6 6 3

來源

2016-12-28 13:59:01 fuglede

1。但是，我更喜歡'csv'這個工作，因爲這樣可以避免安裝一個新的包，這對我正在使用的服務器來說是不實際的。 – slayedbylucifer

Somethin像這樣？您可以將數據框寫入csv文件以獲得所需的格式。

import pandas as pd 
# df0=pd.read_clipboard(sep=',') 
# df0 
df=df0.copy() 
df=df.groupby(by='OS').sum() 
print df

輸出：

  A B C D E 
OS      
Mac  4 1 6 5 3 
Ubuntu 3 4 5 6 3 
Windows 3 2 6 6 3

df.to_csv('file01')

file01

OS,A,B,C,D,E 
Mac,4,1,6,5,3 
Ubuntu,3,4,5,6,3 
Windows,3,2,6,6,3

來源

2016-12-28 13:59:32 MYGz

+1。但是，我更喜歡'csv'這個工作，因爲這樣可以避免安裝一個新的包，這對我正在使用的服務器來說是不實際的。 – slayedbylucifer

@slayedbylucifer有道理。但是如果你必須做很多這些csv任務，那麼'pandas'是你最好的選擇。 – MYGz

你明白我的異常，因爲是第一次，row['OS']不存在d，所以'A'不存在於d[row['OS']]中。嘗試以下來修復：

import csv 
from collections import defaultdict 

d = defaultdict(dict) 

with open('dummy.csv') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
     d[row['OS']]['A'] = d[row['OS']]['A'] + int(row['A']) if (row['OS'] in d and 'A' in d[row['OS']]) else int(row['A']) 
     d[row['OS']]['B'] = d[row['OS']]['B'] + int(row['B']) if (row['OS'] in d and 'B' in d[row['OS']]) else int(row['B']) 
     d[row['OS']]['C'] = d[row['OS']]['C'] + int(row['C']) if (row['OS'] in d and 'C' in d[row['OS']]) else int(row['C']) 
     d[row['OS']]['D'] = d[row['OS']]['D'] + int(row['D']) if (row['OS'] in d and 'D' in d[row['OS']]) else int(row['D']) 
     d[row['OS']]['E'] = d[row['OS']]['E'] + int(row['E']) if (row['OS'] in d and 'E' in d[row['OS']]) else int(row['E'])

輸出：

>>> import pprint 
>>> 
>>> pprint.pprint(dict(d)) 
{'Mac': {'A': 4, 'B': 1, 'C': 6, 'D': 5, 'E': 3}, 
'Ubuntu': {'A': 3, 'B': 4, 'C': 5, 'D': 6, 'E': 3}, 
'Windows': {'A': 3, 'B': 2, 'C': 6, 'D': 6, 'E': 3}}

來源

2016-12-28 14:17:30 ettanany

+1。我從來沒有意識到鑰匙空置在第一位。可能是因爲我在perl中使用了'autovivification'。現在我明白我錯過了什麼。 – slayedbylucifer

這擴展niemmi's solution格式化輸出是相同OP's example：

import csv 
from collections import Counter, defaultdict 

d = defaultdict(Counter) 
with open('dummy.csv') as csv_file: 
    reader = csv.DictReader(csv_file) 
    field_names = reader.fieldnames 
    for row in reader: 
     counter = d[row.pop('OS')] 
     for key, value in row.iteritems(): 
      counter[key] += int(value) 

print ','.join(field_names) 
for os, counter in sorted(d.iteritems()): 
    print "%s,%s" % (os, ','.join([str(v) for k, v in sorted(counter.iteritems())]))

輸出

OS,A,B,C,D,E 
Mac,4,1,6,5,3 
Ubuntu,3,4,5,6,3 
Windows,3,2,6,6,3

更新：固定輸出。

來源

2016-12-28 15:00:10

由於輸出錯誤，排序/加入上述代碼時出現錯誤。 – slayedbylucifer

謝謝。我忘了整理櫃檯。 –

我假設你的輸入文件被稱爲input_file.csv。

還可以處理數據，並使用從groupby模塊itertools和如下面的例子有所需輸出：

from itertools import groupby 

data = list(k.strip("\n").split(",") for k in open("input_file.csv", 'r')) 

a, b = {}, {} 
for k, v in groupby(data[1:], lambda x : x[0]): 
    try: 
     a[k] += [i[1:] for i in list(v)] 
    except KeyError: 
     a[k] = [i[1:] for i in list(v)] 

for key in a.keys(): 
    for j in range(5): 
     c = 0 
     for i in a[key]: 
      c += int(i[j]) 
     try: 
      b[key] += ',' + str(c) 
     except KeyError: 
      b[key] = str(c)

輸出：

print(','.join(data[0])) 
for k in b.keys(): 
    print("{0},{1}".format(k, b[k])) 

>>> OS,A,B,C,D,E 
>>> Ubuntu,3,4,5,6,3 
>>> Windows,3,2,6,6,3 
>>> Mac,4,1,6,5,3

來源

2016-12-28 21:28:41

的Python + CSV：從CSV列

回答

相關問題