2012-08-08 118 views
0

我剛剛學習python,所以我很感激幫助。我有一個兩列數據集,第一個是唯一的ID,第二個是一串項目。我使用networkX從數據中創建一棵樹(見下文)。我需要知道每個級別的項目頻率。例如,對於A(1,2,3,4)中的路徑,每個節點的計數應該是1:4,2:2,3:2和4:2。我如何獲得節點數?節點頻率使用networkx

我的數據是這樣的:

A  1, 2, 3, 4 
B  1, 2, 1, 4 
C  1, 3, 4, 3 
D  1, 4, 3, 2 

我到目前爲止的代碼如下:

#create graph 
G = nx.MultiGraph() 

#read in strings from csv 
testfile = 'C:…file.txt' 

with open(testfile, "r") as f: 
    line = f.readline 
    f = (i for i in f if '\t' in i.rstrip()) 
    for line in f: 
     customerID, path = line.rstrip().split("\t") 
     path2 = path.rstrip("\\").rstrip("}").split(",") 
     pathInt = list() 
     for x in path2: 
      if x is not None: 
       newx = int(x) 
       pathInt.append(newx) 
       print(pathInt) 
     varlength = len(pathInt) 
     pathTuple = tuple(pathInt) 
     G.add_path([pathTuple[:i+1] for i in range(0, varlength)]) 

nx.draw(G) 
plt.show() # display 
+0

是否比你比如你的實際數據看起來不同,或者是有其他原因,你正在做所有這些'往返()'路徑?你需要在圖中編碼的節點數還是額外的數據結構? – 2012-08-08 14:56:54

+0

@MichaelMauderer是我的實際數據看起來像這樣,因此rstrip()。不,它不需要在圖中編碼。謝謝 – blue 2012-08-08 19:45:14

+0

@MichaelMauderer謝謝! – blue 2012-08-09 00:00:12

回答

0

首先你可以進行轉換,從你的字符串列表到INT元組一點點位更簡潔:

pathTuple = tuple(int(x) for x in path2) 
G.add_path([path[:i+1] for i in range(0, len(path))]) 

爲了存儲計數數據,我會在defaultdict中使用defaultdict,基本的數據結構,允許雙索引,然後默認爲0

import collections 
counts = collections.defaultdict(lambda:collections.defaultdict(lambda:0)) 

這可以用於這種訪問:counts[level][node]我們則可以用它來計算通過查看每個節點上的每個層面出現的頻率它在路徑上的位置。

在這之後你的代碼應該是這樣的:

#create graph 
G = nx.MultiGraph() 

#read in strings from csv 
testfile = 'C:…file.txt' 

with open(testfile, "r") as f: 
    line = f.readline 
    f = (i for i in f if '\t' in i.rstrip()) 
    for line in f: 
     customerID, path = line.rstrip().split("\t") 
     path2 = path.rstrip("\\").rstrip("}").split(",") 
     pathTuple = tuple(int(x) for x in path2) 
     G.add_path([pathTuple[:i+1] for i in range(0, len(pathTuple))]) 

     for level, node in enumerate(path): 
      counts[level][node]+=1 

然後你可以這樣做:

level = 0 
node = 1 
print 'Node', node, 'appears', counts[level][node], 'times on level', level 
>>> Node 1 appears 4 times on level 0