2016-12-16 74 views
0

我有一些數據未正確保存在舊數據庫中。我正在將系統移至新的數據庫,並重新格式化舊數據。舊的數據是這樣的:正則表達式重新格式化不正確的JSON數據

a:10:{ 
    s:7:"step_no";s:1:"1"; 
    s:9:"YOUR_NAME";s:14:"Firtname Lastname"; 
    s:11:"CITIZENSHIP"; s:7:"Indian"; 
    s:22:"PROPOSE_NAME_BUSINESS1"; s:12:"ABC Limited"; 
    s:22:"PROPOSE_NAME_BUSINESS2"; s:15:"XYZ Investment"; 
    s:22:"PROPOSE_NAME_BUSINESS3";s:0:""; 
    s:22:"PROPOSE_NAME_BUSINESS4";s:0:""; 
    s:23:"PURPOSE_NATURE_BUSINESS";s:15:"Some dummy content"; 
    s:15:"CAPITAL_COMPANY";s:24:"20 Million Capital"; 
    s:14:"ANOTHER_AMOUNT";s:0:""; 
} 

我希望新的面貌是正確的JSON格式,這樣我就可以在Python閱讀突出部分是這樣的:

data = { 
    "step_no": "1", 
    "YOUR_NAME":"Firtname Lastname", 
    "CITIZENSHIP":"Indian", 
    "PROPOSE_NAME_BUSINESS1":"ABC Limited", 
    "PROPOSE_NAME_BUSINESS2":"XYZ Investment", 
    "PROPOSE_NAME_BUSINESS3":"", 
    "PROPOSE_NAME_BUSINESS4":"", 
    "PURPOSE_NATURE_BUSINESS":"Some dummy content", 
    "CAPITAL_COMPANY":"20 Million Capital", 
    "ANOTHER_AMOUNT":"" 
} 

我使用正則表達式來剔除思考不需要的部分並使用上限中的名稱重新格式化內容將工作,但我不知道如何去做這件事。

回答

2

正則表達式在這裏將是錯誤的方法。沒有必要,格式比你想象的要複雜一點。

您有數據在PHP serialize format。你可以平凡與phpserialize library deserialise它在Python:

import phpserialize 
import json 

def fixup_php_arrays(o): 
    if isinstance(o, dict): 
     if isinstance(next(iter(o), None), int): 
      # PHP has no lists, only mappings; produce a list for 
      # a dictionary with integer keys to 'repair' 
      return [fixup_php_arrays(o[i]) for i in range(len(o))] 
     return {k: fixup_php_arrays(v) for k, v in o.items()} 
    return o 

json.dumps(fixup_php(phpserialize.loads(yourdata, decode_strings=True))) 

注意PHP字符串字節字符串,沒有Unicode文本,所以尤其是在Python 3你不得不後您的鍵值對解碼事實上,如果你想能夠重新編碼爲JSON。 decode_strings=True標誌爲你照顧這個。默認值是UTF-8,傳入encoding參數來選擇不同的編解碼器。

PHP還使用數組的序列號,以便您可能必須轉換解碼任何整數鍵dict對象名單第一,這是fixup_php_arrays()功能做什麼。

演示(與修復的數據,許多串長度爲,並添加空白):

>>> import phpserialize, json 
>>> from pprint import pprint 
>>> data = b'a:10:{s:7:"step_no";s:1:"1";s:9:"YOUR_NAME";s:18:"Firstname Lastname";s:11:"CITIZENSHIP";s:6:"Indian";s:22:"PROPOSE_NAME_BUSINESS1";s:11:"ABC Limited";s:22:"PROPOSE_NAME_BUSINESS2";s:14:"XYZ Investment";s:22:"PROPOSE_NAME_BUSINESS3";s:0:"";s:22:"PROPOSE_NAME_BUSINESS4";s:0:"";s:23:"PURPOSE_NATURE_BUSINESS";s:18:"Some dummy content";s:15:"CAPITAL_COMPANY";s:18:"20 Million Capital";s:14:"ANOTHER_AMOUNT";s:0:"";}' 
>>> pprint(phpserialize.loads(data, decode_strings=True)) 
{'ANOTHER_AMOUNT': '', 
'CAPITAL_COMPANY': '20 Million Capital', 
'CITIZENSHIP': 'Indian', 
'PROPOSE_NAME_BUSINESS1': 'ABC Limited', 
'PROPOSE_NAME_BUSINESS2': 'XYZ Investment', 
'PROPOSE_NAME_BUSINESS3': '', 
'PROPOSE_NAME_BUSINESS4': '', 
'PURPOSE_NATURE_BUSINESS': 'Some dummy content', 
'YOUR_NAME': 'Firstname Lastname', 
'step_no': '1'} 
>>> print(json.dumps(phpserialize.loads(data, decode_strings=True), sort_keys=True, indent=4)) 
{ 
    "ANOTHER_AMOUNT": "", 
    "CAPITAL_COMPANY": "20 Million Capital", 
    "CITIZENSHIP": "Indian", 
    "PROPOSE_NAME_BUSINESS1": "ABC Limited", 
    "PROPOSE_NAME_BUSINESS2": "XYZ Investment", 
    "PROPOSE_NAME_BUSINESS3": "", 
    "PROPOSE_NAME_BUSINESS4": "", 
    "PURPOSE_NATURE_BUSINESS": "Some dummy content", 
    "YOUR_NAME": "Firstname Lastname", 
    "step_no": "1" 
}