2017-08-17 90 views
-1

作爲新的Python,並給出以下職位的答案:解析嵌套的JSON並將其寫入到CSV(重返)

Parsing nested JSON and writing it to CSV

如何定義這個代碼的工作輸入文件?我知道我必須將「outputfile」定義爲我正在寫入的路徑/文件名,但我只是不知道輸入文件應該放在哪裏?

編輯:添加爲清楚說明,我有一個JSON文件的輸入,並希望將其轉換爲CSV文件作爲輸出。我只是想知道如何編寫代碼(從上面的例子),並指定一個特定的JSON文件作爲輸入。同樣爲了清楚起見,JSON文件的名稱將保持不變,但內容每天都會更改,所以我只需要知道在哪裏放置open() 以及如何在腳本中調用它。

EDIT_2:

inputfile = "/some/file.json" 
outputfile = "/some/file.csv" 
with open(inputfile, 'r') as inf: 
    with open(outputfile, 'w') as outf: 
     writer = None # will be set to a csv.DictWriter later 
      fp = open(inputfile, 'r') 
      json_value = fp.read() 
      data = json.loads(json_value) 

     for key, item in sorted(data.items(), key=itemgetter(0)): 
      row = {} 
      nested_name, nested_items = '', {} 
      for k, v in item.items(): 
       if not isinstance(v, dict): 
        row[k] = v 
       else: 
        assert not nested_items, 'Only one nested structure is supported' 
        nested_name, nested_items = k, v 

      if writer is None: 
       # build fields for each first key of each nested item first 
       fields = sorted(row) 

       # sorted keys of first item in key sorted order 
       nested_keys = sorted(sorted(nested_items.items(), key=itemgetter(0))[0][1]) 
       fields.extend('__'.join((nested_name, k)) for k in nested_keys) 

       writer = csv.DictWriter(outf, fields) 
       writer.writeheader() 

      for nkey, nitem in sorted(nested_items.items(), key=itemgetter(0)): 
       row.update(('__'.join((nested_name, k)), v) for k, v in nitem.items()) 
       writer.writerow(row) 

我得到的錯誤是...

for k, v in item.items(): 

AttributeError的: '名單' 對象有沒有屬性 '項目'

我想我可能不會正確讀取JSON文件... Python新手壓力源。

EDIT_3(更新JSON結構): 這裏從JSON文件中的一個 '入口' 我使用(NIST/NVD JSON文件)

{ 
     "CVE_data_type" : "CVE", 
     "CVE_data_format" : "MITRE", 
     "CVE_data_version" : "4.0", 
     "CVE_data_numberOfCVEs" : "6208", 
     "CVE_data_timestamp" : "2017-08-14T18:06Z", 
     "CVE_Items" : [ { 
     "cve" : { 
      "CVE_data_meta" : { 
      "ID" : "CVE-2003-1547" 
      }, 
      "affects" : { 
      "vendor" : { 
       "vendor_data" : [ { 
       "vendor_name" : "francisco_burzi", 
       "product" : { 
        "product_data" : [ { 
        "product_name" : "php-nuke", 
        "version" : { 
         "version_data" : [ { 
         "version_value" : "6.5" 
         }, { 
         "version_value" : "6.5_beta1" 
         }, { 
         "version_value" : "6.5_rc3" 
         }, { 
         "version_value" : "6.5_rc2" 
         }, { 
         "version_value" : "6.5_rc1" 
         } ] 
        } 
        } ] 
       } 
       } ] 
      } 
      }, 
      "problemtype" : { 
      "problemtype_data" : [ { 
       "description" : [ { 
       "lang" : "en", 
       "value" : "CWE-79" 
       } ] 
      } ] 
      }, 
      "references" : { 
      "reference_data" : [ { 
       "url" : "http://secunia.com/advisories/8478" 
      }, { 
       "url" : "http://securityreason.com/securityalert/3718" 
      }, { 
       "url" : "http://www.securityfocus.com/archive/1/archive/1/316925/30/25250/threaded" 
      }, { 
       "url" : "http://www.securityfocus.com/archive/1/archive/1/317230/30/25220/threaded" 
      }, { 
       "url" : "http://www.securityfocus.com/bid/7248" 
      }, { 
       "url" : "https://exchange.xforce.ibmcloud.com/vulnerabilities/11675" 
      } ] 
      }, 
      "description" : { 
      "description_data" : [ { 
       "lang" : "en", 
       "value" : "Cross-site scripting (XSS) vulnerability in block-Forums.php in the Splatt Forum module for PHP-Nuke 6.x allows remote attackers to inject arbitrary web script or HTML via the subject parameter." 
      } ] 
      } 
     }, 
     "configurations" : { 
      "CVE_data_version" : "4.0", 
      "nodes" : [ { 
      "operator" : "OR", 
      "cpe" : [ { 
       "vulnerable" : true, 
       "cpeMatchString" : "cpe:/a:francisco_burzi:php-nuke:6.5", 
       "cpe23Uri" : "cpe:2.3:a:francisco_burzi:php-nuke:6.5:*:*:*:*:*:*:*" 
      }, { 
       "vulnerable" : true, 
       "cpeMatchString" : "cpe:/a:francisco_burzi:php-nuke:6.5_beta1", 
       "cpe23Uri" : "cpe:2.3:a:francisco_burzi:php-nuke:6.5_beta1:*:*:*:*:*:*:*" 
      }, { 
       "vulnerable" : true, 
       "cpeMatchString" : "cpe:/a:francisco_burzi:php-nuke:6.5_rc1", 
       "cpe23Uri" : "cpe:2.3:a:francisco_burzi:php-nuke:6.5_rc1:*:*:*:*:*:*:*" 
      }, { 
       "vulnerable" : true, 
       "cpeMatchString" : "cpe:/a:francisco_burzi:php-nuke:6.5_rc2", 
       "cpe23Uri" : "cpe:2.3:a:francisco_burzi:php-nuke:6.5_rc2:*:*:*:*:*:*:*" 
      }, { 
       "vulnerable" : true, 
       "cpeMatchString" : "cpe:/a:francisco_burzi:php-nuke:6.5_rc3", 
       "cpe23Uri" : "cpe:2.3:a:francisco_burzi:php-nuke:6.5_rc3:*:*:*:*:*:*:*" 
      } ] 
      } ] 
     }, 
     "impact" : { 
      "baseMetricV2" : { 
      "cvssV2" : { 
       "vectorString" : "(AV:N/AC:M/Au:N/C:N/I:P/A:N)", 
       "accessVector" : "NETWORK", 
       "accessComplexity" : "MEDIUM", 
       "authentication" : "NONE", 
       "confidentialityImpact" : "NONE", 
       "integrityImpact" : "PARTIAL", 
       "availabilityImpact" : "NONE", 
       "baseScore" : 4.3 
      }, 
      "severity" : "MEDIUM", 
      "exploitabilityScore" : 8.6, 
      "impactScore" : 2.9, 
      "obtainAllPrivilege" : false, 
      "obtainUserPrivilege" : false, 
      "obtainOtherPrivilege" : false, 
      "userInteractionRequired" : true 
      } 
     }, 
     "publishedDate" : "2003-12-31T05:00Z", 
     "lastModifiedDate" : "2017-08-08T01:29Z" 
     }] 
} 

我想關鍵是主題(如lastModifiedDate,cpe23Uri等)。我可以過濾出空白區域,然後選擇我想要的列,只要我有CSV文件中的標題和數據。

+0

@MartjinPieters ......你接的鏈接的問題......你能回答這個問題嗎? :p –

+0

該問題假設您已經擁有列表中的數據,您可以以任何您想要的方式獲取數據。它不必來自輸入文件,它可以來自您執行的計算。 – Barmar

+0

如果你想從另一個文件中獲取它,只需編寫代碼來打開該文件並以適當的方式解析它。 – Barmar

回答

0

幸運的是,您的JSON數據足夠有效,以便json.load()可以讀取和解析....但只是說您想將這些鍵作爲標題並不夠具體 - 每個條目的不同級別都有很多'(如你將在下面看到的)。請注意,鏈接問題的OP不僅定義了輸入,還特別定義了它中的數據如何映射到CSV文件中具有同樣顯示格式的值列 - 而不僅僅是一些關於映射鍵的手勢到文件頭。

無論如何,這裏有些東西可以幫助你做到這一點。它將讀取與正在閱讀的JSON對象中的頂級"CVE_Items"鍵相關聯的列表中的每個「條目」,並將其打印出來,格式良好。從輸出中,您應該能夠挑選出想要提取的列,並將其作爲行寫入CSV文件,並可以填寫代碼。

import json 

inputfile = "some_file.json" 
outputfile = "some_file.csv" 

with open(outputfile, 'w', newline='') as outf: 
    with open(inputfile, 'r') as fp: 
     data = json.load(fp) 

    # Here is where you should convert each entry into a row of CSV data. 
    # All this does now is show the contents of each entry in "CVE_Items" list. 
    for entry in data["CVE_Items"]: 
     print(json.dumps(entry, indent=4)) 

輸出爲您添加到您的問題樣本JSON數據的單個條目:

{ 
    "cve": { 
     "CVE_data_meta": { 
      "ID": "CVE-2003-1547" 
     }, 
     "affects": { 
      "vendor": { 
       "vendor_data": [ 
        { 
         "vendor_name": "francisco_burzi", 
         "product": { 
          "product_data": [ 
           { 
            "product_name": "php-nuke", 
            "version": { 
             "version_data": [ 
              { 
               "version_value": "6.5" 
              }, 
              { 
               "version_value": "6.5_beta1" 
              }, 
              { 
               "version_value": "6.5_rc3" 
              }, 
              { 
               "version_value": "6.5_rc2" 
              }, 
              { 
               "version_value": "6.5_rc1" 
              } 
             ] 
            } 
           } 
          ] 
         } 
        } 
       ] 
      } 
     }, 
     "problemtype": { 
      "problemtype_data": [ 
       { 
        "description": [ 
         { 
          "lang": "en", 
          "value": "CWE-79" 
         } 
        ] 
       } 
      ] 
     }, 
     "references": { 
      "reference_data": [ 
       { 
        "url": "http://secunia.com/advisories/8478" 
       }, 
       { 
        "url": "http://securityreason.com/securityalert/3718" 
       }, 
       { 
        "url": "http://www.securityfocus.com/archive/1/archive/1/316925/30/25250/threaded" 
       }, 
       { 
        "url": "http://www.securityfocus.com/archive/1/archive/1/317230/30/25220/threaded" 
       }, 
       { 
        "url": "http://www.securityfocus.com/bid/7248" 
       }, 
       { 
        "url": "https://exchange.xforce.ibmcloud.com/vulnerabilities/11675" 
       } 
      ] 
     }, 
     "description": { 
      "description_data": [ 
       { 
        "lang": "en", 
        "value": "Cross-site scripting (XSS) vulnerability in block-Forums.php in the Splatt Forum module for PHP-Nuke 6.x allows remote attackers to inject arbitrary web script or HTML via the subject parameter." 
       } 
      ] 
     } 
    }, 
    "configurations": { 
     "CVE_data_version": "4.0", 
     "nodes": [ 
      { 
       "operator": "OR", 
       "cpe": [ 
        { 
         "vulnerable": true, 
         "cpeMatchString": "cpe:/a:francisco_burzi:php-nuke:6.5", 
         "cpe23Uri": "cpe:2.3:a:francisco_burzi:php-nuke:6.5:*:*:*:*:*:*:*" 
        }, 
        { 
         "vulnerable": true, 
         "cpeMatchString": "cpe:/a:francisco_burzi:php-nuke:6.5_beta1", 
         "cpe23Uri": "cpe:2.3:a:francisco_burzi:php-nuke:6.5_beta1:*:*:*:*:*:*:*" 
        }, 
        { 
         "vulnerable": true, 
         "cpeMatchString": "cpe:/a:francisco_burzi:php-nuke:6.5_rc1", 
         "cpe23Uri": "cpe:2.3:a:francisco_burzi:php-nuke:6.5_rc1:*:*:*:*:*:*:*" 
        }, 
        { 
         "vulnerable": true, 
         "cpeMatchString": "cpe:/a:francisco_burzi:php-nuke:6.5_rc2", 
         "cpe23Uri": "cpe:2.3:a:francisco_burzi:php-nuke:6.5_rc2:*:*:*:*:*:*:*" 
        }, 
        { 
         "vulnerable": true, 
         "cpeMatchString": "cpe:/a:francisco_burzi:php-nuke:6.5_rc3", 
         "cpe23Uri": "cpe:2.3:a:francisco_burzi:php-nuke:6.5_rc3:*:*:*:*:*:*:*" 
        } 
       ] 
      } 
     ] 
    }, 
    "impact": { 
     "baseMetricV2": { 
      "cvssV2": { 
       "vectorString": "(AV:N/AC:M/Au:N/C:N/I:P/A:N)", 
       "accessVector": "NETWORK", 
       "accessComplexity": "MEDIUM", 
       "authentication": "NONE", 
       "confidentialityImpact": "NONE", 
       "integrityImpact": "PARTIAL", 
       "availabilityImpact": "NONE", 
       "baseScore": 4.3 
      }, 
      "severity": "MEDIUM", 
      "exploitabilityScore": 8.6, 
      "impactScore": 2.9, 
      "obtainAllPrivilege": false, 
      "obtainUserPrivilege": false, 
      "obtainOtherPrivilege": false, 
      "userInteractionRequired": true 
     } 
    }, 
    "publishedDate": "2003-12-31T05:00Z", 
    "lastModifiedDate": "2017-08-08T01:29Z" 
} 
+0

謝謝你的代碼。如上所示,我可以打印。如何將這些行壓縮成可以寫入CSV文件的行? –

+0

我不知道如何將它弄平。正如我所說的,在鏈接問題中,OP指定了如何將JSON數據的各個部分轉換爲CSV行。 JSON主要是一個樹形數據結構,而CSV是一個表格(或二維數組/矩陣),從一個到另一個的映射是任意的。我無法決定如何爲你做這件事 - 但如果你至少可以定義你想要的,我可以告訴你如何實現它。一個非常重要的細節是如何將具有多個值的東西(如「version_data」)變成單行。 – martineau

+0

JSON文件中的數據似乎處於一個鍵值對中,有時對於給定鍵有多個值,如上面的「version_data:」所示。我想讓這個對的「關鍵」部分成爲列的標題,「值」(s)填充列作爲數據。在一個鍵的多個值的情況下,我需要連接給定鍵的所有值。 –