2014-12-31 25 views
0

爲了使事情更容易,但也更復雜,我試圖實現「組合/簡潔標籤」的概念,這些概念進一步擴展爲多個基本標籤形式。逆轉(或簡化)笛卡爾產品?

在這種情況下,標籤包括(一個或多個)「子(多個)標籤」,由分號分隔:

food:fruit:apple:sour/sweet 

drink:coffee/tea:hot/cold 

wall/bike:painted:red/blue 

斜線指示「子標籤」互換性。 因此,解釋它們轉換爲這樣的:

food:fruit:apple:sour 
food:fruit:apple:sweet 

drink:coffee:hot 
drink:coffee:cold 
drink:tea:hot 
drink:tea:cold 

wall:painted:red 
wall:painted:blue 
bike:painted:red 
bike:painted:blue 

代碼中使用(不完美,但工程):

import itertools 

def slash_split_tag(tag): 
    if not '/' in tag: 
     return tag 
    subtags = tag.split(':') 
    pattern, v_pattern =(),() 
    for subtag in subtags: 
     if '/' in subtag: 
      pattern += (None,) 
      v_pattern += (tuple(subtag.split('/')),) 
     else: 
      pattern += (subtag,) 
    def merge_pattern_and_product(pattern, product): 
     ret = list(pattern) 
     for e in product: 
      ret[ret.index(None)] = e 
     return ret 
    CartesianProduct = tuple(itertools.product(*v_pattern)) # http://stackoverflow.com/a/170248 
    return [ ':'.join(merge_pattern_and_product(pattern, product)) for product in CartesianProduct ] 

#=============================================================================== 
# T E S T 
#=============================================================================== 

for tag in slash_split_tag('drink:coffee/tea:hot/cold'): 
    print tag 
print 
for tag in slash_split_tag('A1/A2:B1/B2/B3:C1/C2:D1/D2/D3/D4/EE'): 
    print tag 

問:我怎麼可能恢復這一進程?出於可讀性的原因,我需要這個。

回答

1

這裏是在這樣一個功能簡單的,第一遍嘗試:

def compress_list(alist): 
    """Compress a list of colon-separated strings into a more compact 
    representation. 
    """ 
    components = [ss.split(':') for ss in alist] 

    # Check that every string in the supplied list has the same number of tags 
    tag_counts = [len(cc) for cc in components] 
    if len(set(tag_counts)) != 1: 
     raise ValueError("Not all of the strings have the same number of tags") 

    # For each component, gather a list of all the applicable tags. The set 
    # at index k of tag_possibilities is all the possibilities for the 
    # kth tag 
    tag_possibilities = list() 
    for tag_idx in range(tag_counts[0]): 
     tag_possibilities.append(set(cc[tag_idx] for cc in components)) 

    # Now take the list of tags, and turn them into slash-separated strings 
    tag_possibilities_strs = ['/'.join(tt) for tt in tag_possibilities] 

    # Finally, stitch this together with colons 
    return ':'.join(tag_possibilities_strs) 

希望這些意見在解釋它是如何工作充分。幾個注意事項,但是:

  • 它沒有做任何事情,理智比如逃逸反斜槓如果發現他們的標籤列表。

  • 這不識別是否存在更微妙的分割,或者它是否得到不完整的標籤列表。考慮下面這個例子:

    fish:cheese:red 
    chips:cheese:red 
    fish:chalk:red 
    

    它不會意識到,只有cheese既有fishchips,而會崩潰這fish/chips:cheese/chalk:red

  • 成品字符串中標籤的順序是隨機的(或者至少,我不認爲它與給定列表中字符串的順序有關)。如果這很重要,您可以在加入斜線之前對tt進行排序。

測試在這個問題給出的三個列表似乎工作,但正如我所說,順序可能會有所不同初始字符串:

food:fruit:apple:sweet/sour 
drink:tea/coffee:hot/cold 
wall/bike:painted:blue/red 
+0

謝謝,那正是我想要的。排序和輸入解析不是這裏的問題。我無法弄清楚所有的組合。所以,基本元素數量必須相等,然後按列方式合併。感謝您的時間和新年快樂;] – Firebowl2000