2017-08-30 73 views
5

我有一個儲存與縮進/空格中源會計師樹解析層次:如何基於與蟒蛇縮進

Income 
    Revenue 
     IAP 
     Ads 
    Other-Income 
Expenses 
    Developers 
     In-house 
     Contractors 
    Advertising 
    Other Expenses 

有水平的固定號碼,所以我想扁平化層次結構,通過使用3個字段(實際數據具有6個級別,簡化例如):

for rownum in range(6,ws.max_row+1): 
    accountName = str(ws.cell(row=rownum,column=1).value) 
    indent = len(accountName) - len(accountName.lstrip(' ')) 
    if indent == 0: 
     l1 = accountName 
     l2 = '' 
     l3 = '' 
    elif indent == 3: 
     l2 = accountName 
     l3 = '' 
    else: 
     l3 = accountName 

    w.writerow([l1,l2,l3]) 

L1  L2   L3 
Income 
Income Revenue 
Income Revenue  IAP 
Income Revenue  Ads 
Income Other-Income 
Expenses Developers In-house 
... etc 

我可以通過檢查之前的帳戶名的空格數要這樣做

有沒有一種更靈活的方式來實現這一點,基於當前行的縮進與前一行相比,而不是假設它每個級別總是3個空格? L1將始終沒有縮進,並且我們可以相信較低的級別會比其父級進一步縮進,但每個級別可能不總是3個空格。

更新,最終以此作爲邏輯的肉,因爲我最終希望擁有內容的帳戶列表,似乎最簡單的方法是使用縮進來決定是重置,追加還是彈出列表:

 if indent == 0: 
      accountList = [] 
      accountList.append((indent,accountName)) 
     elif indent > prev_indent: 
      accountList.append((indent,accountName)) 
     elif indent <= prev_indent: 
      max_indent = int(max(accountList,key=itemgetter(0))[0]) 
      while max_indent >= indent: 
       accountList.pop() 
       max_indent = int(max(accountList,key=itemgetter(0))[0]) 
      accountList.append((indent,accountName)) 

所以在輸出的每一行accountList都是完整的。

回答

4

你可以模仿Python實際解析縮進的方式。 首先,創建一個包含縮進級別的堆棧。 在每一行上:

  • 如果壓痕大於堆棧頂部,則按下它並增加深度級別。
  • 如果相同,繼續在同一級別。
  • 如果較低,則彈出堆棧頂部,高於新縮進。 如果在查找完全相同之前發現較低的縮進級別,則會出現縮進錯誤。
indentation = [] 
indentation.append(0) 
depth = 0 

f = open("test.txt", 'r') 

for line in f: 
    line = line[:-1] 

    content = line.strip() 
    indent = len(line) - len(content) 
    if indent > indentation[-1]: 
     depth += 1 
     indentation.append(indent) 

    elif indent < indentation[-1]: 
     while indent < indentation[-1]: 
      depth -= 1 
      indentation.pop() 

     if indent != indentation[-1]: 
      raise RuntimeError("Bad formatting") 

    print(f"{content} (depth: {depth})") 

隨着其含量 「的test.txt」 文件是爲您提供:

Income 
    Revenue 
     IAP 
     Ads 
    Other-Income 
Expenses 
    Developers 
     In-house 
     Contractors 
    Advertising 
    Other Expenses 

這裏是輸出:

Income (depth: 0) 
Revenue (depth: 1) 
IAP (depth: 2) 
Ads (depth: 2) 
Other-Income (depth: 1) 
Expenses (depth: 0) 
Developers (depth: 1) 
In-house (depth: 2) 
Contractors (depth: 2) 
Advertising (depth: 1) 
Other Expense (depth: 1) 

所以,你可以你這樣做? 假設你想構建嵌套列表。 首先,創建一個數據堆棧。

  • 當您找到縮進時,在數據堆棧的末尾附加一個新列表。
  • 當您發現一個unindentation時,彈出頂部列表,並將其追加到新的頂部。

而且,無論如何,對於每一行,都會將內容附加到數據堆棧頂部的列表中。

下面是相應的實施:

for line in f: 
    line = line[:-1] 

    content = line.strip() 
    indent = len(line) - len(content) 
    if indent > indentation[-1]: 
     depth += 1 
     indentation.append(indent) 
     data.append([]) 

    elif indent < indentation[-1]: 
     while indent < indentation[-1]: 
      depth -= 1 
      indentation.pop() 
      top = data.pop() 
      data[-1].append(top) 

     if indent != indentation[-1]: 
      raise RuntimeError("Bad formatting") 

    data[-1].append(content) 

while len(data) > 1: 
    top = data.pop() 
    data[-1].append(top) 

你的嵌套列表是在您data堆棧的頂部。 爲同一文件的輸出是:

['Income', 
    ['Revenue', 
     ['IAP', 
     'Ads' 
     ], 
    'Other-Income' 
    ], 
'Expenses', 
    ['Developers', 
     ['In-house', 
     'Contractors' 
     ], 
    'Advertising', 
    'Other Expense' 
    ] 
] 

這是比較容易操縱,雖然相當深度嵌套。 您可以通過級聯項訪問數據訪問:

>>> l = data[0] 
>>> l 
['Income', ['Revenue', ['IAP', 'Ads'], 'Other-Income'], 'Expenses', ['Developers', ['In-house', 'Contractors'], 'Advertising', 'Other Expense']] 
>>> l[1] 
['Revenue', ['IAP', 'Ads'], 'Other-Income'] 
>>> l[1][1] 
['IAP', 'Ads'] 
>>> l[1][1][0] 
'IAP' 
+0

感謝這個,我最終希望能夠輸出在與行的內容沿每一行的層次,所以我稍作修改,但這讓我朝着正確的方向前進。 –

2

如果壓痕是空間固定金額(這裏3個空格),可以簡化縮進級別的計算。

注:我用StringIO的模擬文件

import io 
import itertools 

content = u"""\ 
Income 
    Revenue 
     IAP 
     Ads 
    Other-Income 
Expenses 
    Developers 
     In-house 
     Contractors 
    Advertising 
    Other Expenses 
""" 

stack = [] 
for line in io.StringIO(content): 
    content = line.rstrip() # drop \n 
    row = content.split(" ") 
    stack[:] = stack[:len(row) - 1] + [row[-1]] 
    print("\t".join(stack)) 

你得到:

Income 
Income Revenue 
Income Revenue IAP 
Income Revenue Ads 
Income Other-Income 
Expenses 
Expenses Developers 
Expenses Developers In-house 
Expenses Developers Contractors 
Expenses Advertising 
Expenses Other Expenses 

編輯:壓痕不固定

如果縮進不是固定(你並不總是有3個空格),如下例所示:

content = u"""\ 
Income 
    Revenue 
    IAP 
    Ads 
    Other-Income 
Expenses 
    Developers 
     In-house 
     Contractors 
    Advertising 
    Other Expenses 
""" 

你需要估計在每一個新行轉移:

stack = [] 
last_indent = u"" 
for line in io.StringIO(content): 
    indent = "".join(itertools.takewhile(lambda c: c == " ", line)) 
    shift = 0 if indent == last_indent else (-1 if len(indent) < len(last_indent) else 1) 
    index = len(stack) + shift 
    stack[:] = stack[:index - 1] + [line.strip()] 
    last_indent = indent 
    print("\t".join(stack))