您的問題兩個不同的問題:
- 從字典其中的值是容器,而不是原語創建一個CSV文件。
對於第一個問題,解決方案通常是將容器類型轉換爲基本類型。最常用的方法是創建一個json字符串。因此,例如:
>>> import json
>>> x = [2, 4, 6, 8, 10]
>>> json_string = json.dumps(x)
>>> json_string
'[2, 4, 6, 8, 10]'
所以你的數據轉換可能看起來像:
import json
def convert(datadict):
'''Generator which converts a dictionary of containers into a dictionary of json-strings.
args:
datadict(dict): dictionary which needs conversion
yield:
tuple: key and string
'''
for key, value in datadict.items():
yield key, json.dumps(value)
def dump_to_csv_using_dict(datadict, fields=None, filepath=None, delimiter=None):
'''Dumps a datadict value into csv
args:
datadict(list): list of dictionaries to dump
fieldnames(list): field sequence to use from the dictionary [default: sorted(datadict.keys())]
filepath(str): filepath to save to [default: 'tmp.csv']
delimiter(str): delimiter to use in csv [default: '|']
'''
fieldnames = sorted(datadict.keys()) if fields is None else fields
filepath = 'tmp.csv' if filepath is None else filepath
delimiter = '|' if not delimiter else delimiter
with open(filepath, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='ignore', delimiter=delimiter)
writer.writeheader()
for each_dict in datadict:
writer.writerow(each_dict)
那麼天真的轉換是這樣的:
# Conversion code
test_data = {
"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
}
converted_data = dict(convert(test_data))
data_list = [converted_data]
dump_to_csv(data_list)
- 創建一個實際上是兩種不同數據集的某種合併的最終值。
要做到這一點,您需要找到一種方法來組合來自不同鍵的數據。這通常不是一個容易解決的問題。
這就是說,它很容易兩個清單合併拉鍊。
>>> x = [2, 4, 6]
>>> y = [1, 3, 5]
>>> zip(y, x)
[(1, 2), (3, 4), (5, 6)]
此外,在事件你的列表是不一樣的大小,python的itertools包提供了一個方法,izip_longest,這將產生回全拉鍊即使一個列表比另一個短。注意izip_longest返回一個生成器。
from itertools import izip_longest
>>> x = [2, 4]
>>> y = [1, 3, 5]
>>> z = izip_longest(y, x, fillvalue=None) # default fillvalue is None
>>> list(z) # z is a generator
[(1, 2), (3, 4), (5, None)]
因此,我們可以在這裏添加其他功能:
from itertoops import izip_longest
def combine(data, fields=None, default=None):
'''Combines fields within data
args:
data(dict): a dictionary with lists as values
fields(list): a list of keys to combine [default: all fields in random order]
default: default fill value [default: None]
yields:
tuple: columns combined into rows
'''
fields = data.keys() if field is None else field
columns = [data.get(field) for field in fields]
for values in izip_longest(*columns, fillvalue=default):
yield values
現在我們可以用它來更新我們的初始轉換。
def dump_to_csv(data, filepath=None, delimiter=None):
'''Dumps list into csv
args:
data(list): list of values to dump
filepath(str): filepath to save to [default: 'tmp.csv']
delimiter(str): delimiter to use in csv [default: '|']
'''
fieldnames = sorted(datadict.keys()) if fields is None else fields
filepath = 'tmp.csv' if filepath is None else filepath
delimiter = '|' if not delimiter else delimiter
with open(filepath, 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=delimiter)
for each_row in data:
writer.writerow(each_dict)
# Conversion code
test_data = {
"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
}
combined_data = combine(test_data)
data_list = [combined_data]
dump_to_csv(data_list)
我相信'DictWriter'需要一個'dict'列表而不是'list'的'dict'列表。在這裏看到的例子:https://docs.python.org/2/library/csv.html#csv.DictWriter – FamousJameous