2017-04-12 166 views
0

我想知道是否有更簡單的方法將日期列和其他信息列追加到我現有的csv文件。我添加了這些列,因爲這些信息不在REST API調用的JSON字符串中。將非DataFrame追加到熊貓csv

import requests 
import json 
import http.client 
import datetime 
import pandas as pd 
from pandas.io.json import json_normalize 

url = api.getinfo() 
r = requests.get(url, headers=headers, verify=False) 
if r.status_code != http.client.OK: 
    raise requests.HTTPError(r) 

jsonstring = json.dumps(r.json()["data"]) 
load = json.loads(jsonstring) 
df = json_normalize(load) 
col = ["poolId", "totalPoolCapacity", "totalLocatedCapacity", 
     "availableVolumeCapacity", "usedCapacityRate"] 
with open('hss.csv', 'a') as f: 
    df.to_csv(f, header=False, columns=col) 

a = pd.read_csv('hss.csv') 
a['date'] = [datetime.date.today()] * len(a) 
a.to_csv('hss.csv') 
b = pd.read_csv('hss.csv') 
b['storage system'] = "ssystem22" 
b.to_csv('hss.csv') 

我最終每個腳本運行時獲得額外列Unnamed: 0,Unnamed: 0.1在我的csv文件。每次我追加它也會覆蓋舊的日期。

,Unnamed: 0,Unnamed: 0.1,poolId,totalPoolCapacity, totalLocatedCapacity,availableVolumeCapacity,usedCapacityRate,date,storage system 
0,155472,223618,565064,51,,2017-04-12,ssystem22 
1,943174,819098,262042,58,,2017-04-12,ssystem22 
0,764600,966017,046668,71,,2017-04-12,ssystem22 
1,764600,335680,487650,76,,2017-04-12,ssystem22 
2,373700,459800,304446,67,,2017-04-12,ssystem22 
+0

它可能是索引,而寫入csv使用索引= False。 http://pandas.pydata.org/pandas-docs/version/0.18.0/generated/pandas.DataFrame.to_csv.html – Shijo

+0

謝謝@Shijo。在添加'index = False'後,我現在在csv文件中只有一個'Unnamed:0'的實例。 – Clarkus978

+0

我不明白爲什麼你繼續閱讀文件並將其重新寫回...爲什麼不在第一次寫入csv之前將列添加到df ...只是好奇... – Shahram

回答

0

我一直在研究,發現如何解決這個問題。我應該一直在使用pd.Series函數。以下是更正的代碼:

import requests 
import json 
import http.client 
import datetime 
import pandas as pd 
from pandas.io.json import json_normalize 

url = api.getinfo() 
r = requests.get(url, headers=headers, verify=False) 
if r.status_code != http.client.OK: 
    raise requests.HTTPError(r) 

jsonstring = json.dumps(r.json()["data"]) 
load = json.loads(jsonstring) 
df = json_normalize(load) 
df['storage system'] = pd.Series('ssystem22', index=df.index) 
df['date'] = pd.Series(datetime.date.today().strftime('%m-%d-%Y'), 
         index=df.index) 
col = ["poolId", "totalPoolCapacity", "totalLocatedCapacity", 
     "availableVolumeCapacity", "usedCapacityRate", "storage system", 
     "date"] 
with open(csvfile, 'a') as f: 
    df.to_csv(f, header=False, columns=col)