排序和組織的DNS記錄

所以我有一個非常簡單的數據集，如： ['test.sh','api.test.sh','blah.api.test.sh','test.com','api.test.com']排序和組織的DNS記錄

，我需要轉變成一個分層數據結構，我想用字典這樣做的：

{ 'name':'test.sh', 
    'children': { 'name':'api.test.sh', 
       'children': { 'name':'blah.api.test.sh' } 
       } 
}, 
{ 
    'name':'test.com', 
    'children': { 'name':'api.test.com' } 
}

基本上，對於每個高級別的名稱，我可以按照自己的方式工作，並執行我需要執行的操作。

我的問題與創建一個簡單的排序，匹配和轉換數據的方法有關。我可以想出幾種方法來做到這一點，但我想不出任何相當優雅的東西。另外我在Python中這樣做。

感謝

來源

2014-10-31 sbrichards

我認爲這可能是你在找什麼：

def sort_dns(l): 


    to_return = [] 
    # Get top-level domains: the domains that contain the less amount of dots. 
    count_list = [i.count('.') for i in l] 
    min_dots = min(count_list) 
    top_domains = [i for i in l if i.count('.') == min_dots] 
    # Now for each domain, we find it subdomains. 
    for domain in top_domains: 
     sub_domains = [i for i in l if domain in i and i is not domain] 
     #And untill we aren't at the deepest level, we continue looking for sub domains and repeat the structure 
     sub_sub_domains = sort_dns(sub_domains) if not len(sub_domains) == 0 else None 
     to_return.append({'name' : domain, 'childrens' : sub_sub_domains}) 

    return to_return

正如你看到這個函數調用自身遞歸如果需要去無限「深」。

你的榜樣，結果是以下

[ 
    { 
     'name': 'test.sh', 
     'childrens': [ 
      { 
       'name': 'api.test.sh', 
       'childrens': [ 
        {'name': 'blah.api.test.sh', 'childrens': None} 
       ] 
      } 
     ] 
    }, 
    { 
     'name': 'test.com', 
     'childrens': [ 
      {'name': 'api.test.com', 'childrens': None} 
     ] 
    } 
]

正如你看到它處理多個兒童的情況下，並沒有孩子的。

需要注意的是，如果你不希望'childrens': None做，您可以將功能更改爲：

def sort_dns(l): 


    to_return = [] 
    # Get top-level domains: the domains that contain the less amount of dots. 
    count_list = [i.count('.') for i in l] 
    min_dots = min(count_list) 
    top_domains = [i for i in l if i.count('.') == min_dots] 
    # Now for each domain, we find it subdomains. 
    for domain in top_domains: 
     sub_domains = [i for i in l if domain in i and i is not domain] 
     #And untill we aren't at the deepest level, we continue looking for sub domains and repeat the structure 
     sub_sub_domains = sort_dns(sub_domains) if not len(sub_domains) == 0 else None 
     if sub_sub_domains: 
      to_return.append({'name' : domain, 'childrens' : sub_sub_domains}) 
     else: 
      to_return.append({'name' : domain}) 

    return to_return

注意，這是Python3代碼。

編輯：我讀過roippi答案和這個偉大的作品，他的解決方案肯定是最pythonic。這個優點是它不需要任何進口。但是你應該認真考慮roippi的答案。

來源

2014-10-31 22:31:56

謝謝！這看起來不錯，我實際上想出了一個類似的方法，所以很高興看到我並不孤單。 – sbrichards 2014-10-31 23:56:50

於是，我看到了這個問題的三個步驟發生適當的解決辦法：排序，組，格式。

首先，排序將您的輸入排列在邏輯組中。您可以定義一個快速的輔助函數來定義排序關鍵字：

def sorter(netloc): 
    split = netloc.split('.') 
    return (split[::-1], -len(split))

正是如此使用它：

data = ['test.sh','api.test.sh','blah.api.test.sh','test.com','api.test.com', 'another.com', 'sub.another.com', 'sub.sub.another.com'] 

#shuffling data, to show that sorting works 
import random 
random.shuffle(data) 

sorted(data, key=sorter) 
Out[14]: 
['another.com', 
'sub.another.com', 
'sub.sub.another.com', 
'test.com', 
'api.test.com', 
'test.sh', 
'api.test.sh', 
'blah.api.test.sh']

現在，一切都在正確的順序，做一個類似組荷蘭國際集團運作與itertools.groupby這組由x.y.z.blah.com的blah.com部分：

def grouper(netloc): 
    return ''.join(netloc.split('.')[-2:]) 

#in-place sort, replicating sorted() call above 
data.sort(key=sorter) 

from itertools import groupby 

[list(g) for k,g in groupby(data, grouper)] 
Out[27]: 
[['another.com', 'sub.another.com', 'sub.sub.another.com'], 
['test.com', 'api.test.com'], 
['test.sh', 'api.test.sh', 'blah.api.test.sh']]

末你需要格式這些組合到你想要的層次。這裏是一個快速和骯髒的實現：

def make_hierarchy(groups): 
    from copy import deepcopy 
    _groups = deepcopy(groups) 
    ret = [] 
    for li in _groups: 
     current = {} 
     ret.append(current) 
     while li: 
      current['name'] = li.pop() 
      if li: 
       nxt = {} 
       current['children'] = nxt 
       current = nxt 
    return ret 

print(json.dumps(make_hierarchy(grouped), indent=2)) 
[ 
    { 
    "children": { 
     "children": { 
     "name": "another.com" 
     }, 
     "name": "sub.another.com" 
    }, 
    "name": "sub.sub.another.com" 
    }, 
    { 
    "children": { 
     "name": "test.com" 
    }, 
    "name": "api.test.com" 
    }, 
    { 
    "children": { 
     "children": { 
     "name": "test.sh" 
     }, 
     "name": "api.test.sh" 
    }, 
    "name": "blah.api.test.sh" 
    } 
]

這最後的實施取決於幾個假設，即不會有同組中的任何等效長度netlocs，即sub1.example.com和sub2.example.com永遠不會發生。顯然你可以根據需要調整實現。

來源

2014-10-31 22:50:37 roippi

謝謝！看到其他人的實現很棒，很好的答案！ – sbrichards 2014-10-31 23:56:09

排序和組織的DNS記錄

回答

相關問題