2017-09-27 495 views
1

我已經從下面的站點使用python Selenium提取數據。如何使用Python將文本轉換爲Json格式

https://portfoliomanager.energystar.gov/pm/targetFinder;jsessionid=F6FC40FBDE075BDA3834643F9BD65E37?execution=e1s2

請看看錶「爲您的設計和/或目標指標比較」。

我已經提取表格作爲文本格式。

這裏是文本的輸出示例如下

Metric Design Project Design Target Median Property* 
ENERGY STAR score (1-100) Not Available 75 50 
Source EUI (kBtu/ft²) 3.1 Not Available 127.9 
Site EUI (kBtu/ft²) 1.0 Not Available 40.7 
Source Energy Use (kBtu) 314.0 Not Available 12,793.0 
Site Energy Use (kBtu) 100.0 Not Available 4,074.2 
Energy Cost ($) 2,000.00 Not Available 81,484.00 
Total GHG Emissions (Metric Tons CO2e) 0.0 Not Available 0.5 

我試圖將文本轉換成JSON,

import csv 
import json 

with open('file.txt', 'rb') as csvfile: 
    filereader = csv.reader(csvfile, delimiter=' ') 
    i = 0 
    header = [] 
    out_data = [] 
    for row in filereader: 
     row = [elem for elem in row if elem] 
     if i == 0: 
      i += 1 
      header = row 
     else: 
      row[0:4] = [row[0]+" "+row[1]+" "+row[2]+" "+row[3]] 
      _dict = {} 
      for elem, header_elem in zip(row, header): 
       _dict[header_elem] = elem 
      out_data.append(_dict) 

print json.dumps(out_data) 

JSON格式輸出,我得到的是像

[{"Project": "75", "Metric": "ENERGY STAR score (1-100)", "Design": "50"}] 

JSON格式輸出格式應爲

[{"Design Project": "Not Available", "Design Target": "75", "Metric": "ENERGY STAR score (1-100)", "Median Property*": "50"}] 
+0

我猜你缺少從你的藍圖一些值,你可以比較和 – bhansa

+0

@bhansa我直接從提取數據填寫該網站中存在的表格 – venkat

+0

如果您製作的是CSV而不是空格分隔值,則會更容易。請顯示生成文件 –

回答

1

你忘了爲其他JSON密鑰數據和報頭(如設計 項目,設計目標等)

這是正確的版本:

import csv 
import json 

with open('test.txt', 'r') as csvfile: # Opens file 
    filereader = csv.reader(csvfile, delimiter=' ') 
    i = 0 
    header = [] 
    out_data = [] 
    for row in filereader: 
     row = [elem for elem in row if elem] 
     if i == 0: 
      i += 2 
      row[1:3] = [row[1]+" "+row[2]] # Design Project key 
      row[2:4] = [row[2]+" "+row[3]] # Design Target key 
      row[3:5] = [row[3]+" "+row[4]] # Median Property* 
      header = row 
     else: 
      row[0:4] = [row[0]+" "+row[1]+" "+row[2]+" "+row[3]] # Metric value 
      if len(row) == 5: # check conditions for better parse 
       row[1:3] = [row[1]+" "+row[2]] # Design Project value 
      _dict = {} 
      for elem, header_elem in zip(row, header): 
       _dict[header_elem] = elem 
      out_data.append(_dict) 

    print json.dumps(out_data) 

時它纔會工作您的數據結構是不變的,而鍵/值由相同數量的詞組成。

你可以(在第21行像我)添加附加條件:

if len(row) == 5: # check conditions for better parse 
    row[1:3] = [row[1]+" "+row[2]] # Design Project value 
+0

的代碼能否以genric方式幫助將其作爲上述文本的整個輸出的JSON? – venkat