2016-11-12 71 views
1

我想解析一個由單個空格和多個空格分隔的列的表。我能夠使用re.split來分隔多於1個空間的列,但必須重新分割由一個空格分隔的列。下面的代碼通過多次拆分第4列和第5列來完成此操作,但有沒有更好或更有效的方法來實現這一點?可變間距的Python拆分列

我使用下面的方法似乎效率不高:

我的代碼:

import re 

string = '''No Mon  Date   Time Values colors 
1 Nov  11-03-2016  23:17:52 Red colors 
2 Nov  11-03-2016  19:18:00 Yellow colors 
3 Nov  11-03-2016  19:18:18 Blue colors 
4 Oct  10-03-2016  19:22:58 Orange Green colors 
5 Oct  10-07-2016  10:37:36 Red Blue Yellow colors 
6 Oct  10-07-2016  10:37:36 White colors 
7 Sep  09-07-2016  10:37:37 Ping White Yellow Green colors''' 

for i in string.splitlines(): 
    col1 =re.split(r'\s{2,}', i)[0] 
    col2 =re.split(r'\s{2,}', i)[1] 
    col3 = re.split(r'\s{2,}', i)[2] 
    col4 = re.split(r'\s{2,}', i)[3].split()[0] 
    col5 = ' '.join(re.split(r'\s{2,}', i)[3].split()[1:]) 

    print('{:3} | {:3} | {:10} | {:10} | {:23}|'.format(col1, col2, col3, col4, col5)) 

輸出:

No | Mon | Date  | Time  | Values     | 
1 | Nov | 11-03-2016 | 23:17:52 | Red     | 
2 | Nov | 11-03-2016 | 19:18:00 | Yellow     | 
3 | Nov | 11-03-2016 | 19:18:18 | Blue     | 
4 | Oct | 10-03-2016 | 19:22:58 | Orange Green   | 
5 | Oct | 10-07-2016 | 10:37:36 | Red Blue Yellow  | 
6 | Oct | 10-07-2016 | 10:37:36 | White     | 
7 | Sep | 09-07-2016 | 10:37:37 | Ping White Yellow Green| 

回答

1

您可以在一個單一的split操作獲得4個值,然後再使用分割第4個元素\s{2,}

for i in string.splitlines(): 
    arr = re.split(r'\s+', i, 4) 
    print('{:3} | {:3} | {:10} | {:10} | {:23}|'. 
      format(arr[0], arr[1], arr[2], arr[3], re.split(r'\s{2,}', arr[4])[0])) 

No | Mon | Date  | Time  | Values     | 
1 | Nov | 11-03-2016 | 23:17:52 | Red     | 
2 | Nov | 11-03-2016 | 19:18:00 | Yellow     | 
3 | Nov | 11-03-2016 | 19:18:18 | Blue     | 
4 | Oct | 10-03-2016 | 19:22:58 | Orange Green   | 
5 | Oct | 10-07-2016 | 10:37:36 | Red Blue Yellow  | 
6 | Oct | 10-07-2016 | 10:37:36 | White     | 
7 | Sep | 09-07-2016 | 10:37:37 | Ping White Yellow Green| 

Code Demo

+1

感謝您的回覆。是的,這確實簡化了一些並減少了代碼。謝謝。 – MBasith