2015-12-02 84 views
1

我有這個輸入文件,我想將其轉換爲json。在Python中將逐行csv文件轉換爲json

1.]正如你所看到的關鍵:價值是以明智的方式傳播而不是明智的。

2.]每個都有一個「註釋」鍵,其值分佈在每個元素的不同行上。由於有些用戶可能會寫冗長的評論。

key,values 

heading,A 
Title,1 
ID,12 
Owner,John 
Status,Active 
Comments,"Im just pissed " 
     ,"off from your service" 
, 
heading,B 
Title,2 
ID,21 
Owner,Von 
Status,Active 
Comments,"Service is " 
     ,"really great" 
     ,"I just enjoyed my weekend" 
, 
heading,C 
Title,3 
ID,31 
Owner,Jesse 
Status,Active 
Comments,"Service" 
     ,"needs to be" 
     ,"improved" 

輸出

{{'heading':'A','Title':1,'ID':12,'Owner':'John','Status':'Active', "Comments":"Im just pissed off from your service"}, 
{....}, 
{.....}} 

由於我的CSV文件的「鑰匙」:行睿智時尚「值」,我真的很無能,如何將其轉換成JSON。

=====我試過=====

f = open('csv_sample.csv', 'rU') 
reader = csv.DictReader(f, fieldnames = ("key","value")) 
for i in reader: 
    print i 


{'value': 'values', 'key': 'key'} 
{'value': 'A', 'key': 'heading'} 
{'value': '1', 'key': 'Title'} 
{'value': '12', 'key': 'ID'} 
{'value': 'John', 'key': 'Owner'} 
{'value': 'Active', 'key': 'Status'} 

正如你所看到的,這不是我想要的。請幫助

+0

你想要的結果是{'key':'values','heading':'A'... – joojaa

回答

1

編輯:也許嘗試的東西沿着這些路線:

import json 

def headingGen(lines): 
    newHeading = {} 
    for line in lines: 
     try: 
      k, v = line.strip('\n').split(',', 1) 
      v = v.strip('"') 
      if not k and not v: 
       yield newHeading 
       newHeading = {} 
      elif not k.strip(): 
       newHeading[prevk] = newHeading[prevk] + v 
      else: 
       prevk = k 
       newHeading[k] = v 
     except Exception as e: 
      print("I had a problem at line "+line+" : "+str(e)) 
    yield newHeading 


def file_to_json(filename): 
    with open(filename, 'r') as fh: 
     next(fh) 
     next(fh) 
     return json.dumps(list(headingGen(fh))) 
+0

Ive更新了帖子:每個都有一個「評論」鍵,其值分佈在不同的行爲每個元素。由於有些用戶可能會寫冗長的評論。 – shalini

+0

謝謝Dain,但我在「k,v = line.strip('\ n')上得到」太多值來解開「錯誤。split(',') 」 – shalini

+0

如果是因爲有逗號('\ n')。split(',',1)' – DainDwarf

0

這個答案使用Python的列表理解提供了實用的風格替代其他(還不錯)的答案用命令式的風格。我喜歡這種風格,因爲它很好地分離了問題的不同方面。

嵌套列表解析首先將輸入拆分爲多個部分,然後通過使用正則表達式將每個部分拆分爲多個項目並在每個項目中應用函數split_item()以最終獲得鍵/值對。

爲了提高內存效率,分段讀取源數據。

import re 
import json 

# Define a regular expression splitting a section into items. 
# Each newline which is not followed by whitespace splits. 
splitter = re.compile(r'\n(?!\s)') 

def section_generator(f): 
    # Generator reading a single section from the input file in each iteration. 
    # The sections are separated by a comma on a separate line. 
    section = '' 
    for line in f: 
     if line == ',\n': 
      yield section 
      section = '' 
     else: 
      section += line 
    yield section 

def split_item(item): 
    # Convert the the item including "key,value" into a key/value pair. 
    key, value = item.split(',', 1) 
    if value.startswith('"'): 
     # Convert multiline quoted string to unquoted single line. 
     value = ''.join(line.strip().lstrip(',').strip('"') 
         for line in value.strip().splitlines()) 
    elif value.isdigit(): 
     # Convert numeric value to int. 
     value = int(value) 
    return key, value 

with open('csv_sample.csv', 'rU') as f: 
    # Ignore the "header" (skip everything until the empty line is found). 
    for line in f: 
     if line == '\n': 
      break 

    # Construct the resulting list of dictionaries using list comprehensions. 
    result = [dict(split_item(item) for item in splitter.split(section) if item) 
       for section in section_generator(f)] 

print json.dumps(result) 
+0

錯誤:列表索引超出範圍 – shalini

+0

編輯前,你試過嗎?它是正確的它使用了列表理解,如果你不理解它,請在編輯之前詢問 –

+0

我回到了原來的版本,列表理解比循環更有效,我相信它更具可讀性(當你習慣了它)。 –

1

試試這個:

def convert_to_json(fname): 
    result = [] 
    rec = {} 
    with open(fname) as f: 
     for l in f: 
      if not l.strip() or l.startswith('key'): 
       continue 

      if l.startswith(','): 
       result.append(rec) 
       rec = {} 
      else: 
       k, v = l.strip().split(',') 
       if k.strip(): 
        try: 
         rec[k] = int(v) 
        except: 
         rec[k] = v.strip('"') 
       else: 
        rec['Comments'] += v.strip('"') 
    result.append(rec) 
    return result 

print convert_to_json('./csv_sample.csv') 

輸出:

[{'Status': 'Active', 'Title': 1, 'Comments': 'Im just pissed off from your service', 'heading': 'A', 'Owner': 'John', 'ID': 12}, {'Status': 'Active', 'Title': 2, 'Comments': 'Service is really greatI just enjoyed my weekend', 'heading': 'B', 'Owner': 'Von', 'ID': 21}, {'Status': 'Active', 'Title': 3, 'Comments': 'Serviceneeds to beimproved', 'heading': 'C', 'Owner': 'Jesse', 'ID': 31}] 
0

這不是一個簡單的轉換,所以我們需要精確指出:

  • 輸入文件一個csv文件,其中有兩列,名稱分別爲keyvalues
  • 的記錄是由不同的線,限定所述鍵和映射
  • 密鑰heading的值表示記錄的開始
  • 空白密鑰是一個連續行 - 其值應被添加到前一個值
  • 如果連續行的值不是以分隔符開始,並且前一個值沒有以分隔符結尾,則會插入空格(分隔符是空格,製表符,點,逗號和-
  • heading字段不能有延續線 - 這允許更簡單的解碼

代碼可以是:

with open('csv_sample.csv') as fd 
    rd = csv.DictReader(fd) 
    rec = None 
    lastkey = None 
    sep = ' \t,.-' 
    for row in rd: 
     # print row 
     key = row['key'].strip() 
     if key == 'heading': 
      if rec is not None: 
       # process previous record 
       print json.dumps(rec) 
      rec = { key: row['values'] } 
     elif key == '': # continuation line 
      if (rec[lastkey][-1] in sep) or (row['values'] in sep): 
       rec[lastkey] += row['values'] 
      else: 
       rec[lastkey] += ' ' + row['values'] 
     else: 
      # normal field: add it to rec and store key 
      rec[key] = row['values'] 
      lastkey = key 
    # process last record 
    if rec is not None: 
     print json.dumps(rec) 

您可以輕鬆地通過改變其轉換爲發電機print json.dumps(rec)通過yield json.dumps(rec)

你的榜樣,它提供了:

{"Status": "Active", "Title": "1", "Comments": "Im just pissed off from your service", "heading": "A", "Owner": "John", "ID": "12"} 
{"Status": "Active", "Title": "2", "Comments": "Service is really greatI just enjoyed my weekend", "heading": "B", "Owner": "Von", "ID": "21"} 
{"Status": "Active", "Title": "3", "Comments": "Serviceneeds to beimproved", "heading": "C", "Owner": "Jesse", "ID": "31"} 

由於此代碼使用csv模塊,故建設時爲免於評論中的逗號。

+0

LINE 7:KeyError:'key' – shalini

+0

@shalini:我無法在Python 2.7下重現。什麼是你的Python版本,你的操作系統是什麼?或者你可以顯示你的真實輸入,因爲我可以在另一個答案的評論中看到一個UnicodeDecodeError –

+0

你可以取消註釋「print row」行並顯示錯誤之前顯示的內容嗎? –