2017-08-13 177 views
0

我的文件「F1」,看起來像這樣的值:的Python:遍歷列表

ID  X   Y   Z 
1 439748.5728 7948406.945 799.391875 
1 439767.6229 7948552.995 796.977271 
1 439805.7229 7948711.745 819.359365 
1 439799.3729 7948851.446 776.425797 
2 440764.5749 7948991.146 235.551602 
2 440504.2243 7948984.796 326.929119 
2 440104.1735 7948984.796 536.893601 
2 439742.2228 7949003.846 737.887029 
2 438580.1705 7949537.247 196.300929 
3 438142.0196 7947340.142 388.997748 
3 438599.2205 7947333.792 480.580256 
3 439126.2716 7947340.142 669.802869 
4 438453.1702 7947594.143 600.856103 
4 438294.4199 7947657.643 581.018396 
4 438167.4197 7947702.093 515.149846 

我想運行一個命令(假設印刷,使其在這裏簡單)使用的X,Y ,在文件F1的每個ID值z值

import numpy as np 
f1 = ('file1.txt') 


id = np.loadtxt(f1, skiprows=1, usecols=[0]) 
for i in id: 
    x = np.loadtxt(f1, skiprows=1, usecols=[1]) 
    y = np.loadtxt(f1, skiprows=1, usecols=[2]) 
    z = np.loadtxt(f1, skiprows=1, usecols=[3]) 
    print ('The x, y, z lists of id= %g are:' %(i)) 
    print (x,y,z) 

此代碼返回的x,對於f1的每行y和z列表,但我想它返回在x,y和z列出了每種不同ID列的值。

例如,對於ID = 3,它應該返回:

[438142.0196, 438599.2205, 439126.2716] [7947340.142, 7947333.792, 7947340.142] [388.997748, 480.580256, 669.802869] 

任何幫助將非常感謝!

+0

你現在這樣做的方式也沒有效率。它多次加載文件。我會說你可以做一些事情:'x,y,z = np.loadtxt(f1,skiprows = 1,usecols = [1,2,3])''。然後x,y,z被自動分配並且可以相應地分配。你會想確保usecols是正確的,雖然它是0索引。因此,1,2,3實際上會獲得列2,3,4相應的numpy.loadtxt參數...但取決於你想要什麼,甚至可以給我更好的答案。我只是想確定更多你實際需要的東西。 – Fallenreaper

+0

你真的需要一個numpy的ndarray嗎? – wwii

回答

0

做一個容器你的結果:

d = {} 

遍歷文件和拆分各行以提取您有興趣

id_, *xyz = line.strip().split() 

零件然後將其添加到字典中

try: 
    d[id_].append(xyz) 
except KeyError: 
    d[id_] = [] 
    d[id_].append(xyz) 

使用collections.defaultdict作爲容器可以簡化代碼 - 在第一次看到id_時,您不需要考慮KeyErrors。

d = collections.defaultdict(list) 
... 
    d[id_].append(xyz) 
0

如果你能夠用大熊貓,這裏有一個簡單的解決方案:

import pandas as pd 
fname = "file1.txt" 
df = pd.read_csv("f1.txt", sep=" ") # or substitute with appropriate separator 

for i in df.ID.unique(): 
    print(df.loc[df.ID==i]) 

    ID   X   Y   Z 
0 1 439748.5728 7948406.945 799.391875 
1 1 439767.6229 7948552.995 796.977271 
2 1 439805.7229 7948711.745 819.359365 
3 1 439799.3729 7948851.446 776.425797 
    ID   X   Y   Z 
4 2 440764.5749 7948991.146 235.551602 
5 2 440504.2243 7948984.796 326.929119 
6 2 440104.1735 7948984.796 536.893601 
7 2 439742.2228 7949003.846 737.887029 
8 2 438580.1705 7949537.247 196.300929 
    ID   X   Y   Z 
9 3 438142.0196 7947340.142 388.997748 
10 3 438599.2205 7947333.792 480.580256 
11 3 439126.2716 7947340.142 669.802869 
    ID   X   Y   Z 
12 4 438453.1702 7947594.143 600.856103 
13 4 438294.4199 7947657.643 581.018396 
14 4 438167.4197 7947702.093 515.149846 

要獲得正是你在OP指定的輸出,使用:

for i in df.ID.unique(): 
    print ('The x, y, z lists of id= %g are:' %(i)) 
    print(df.loc[df.ID==i, ['X','Y','Z']].values) 

The x, y, z lists of id= 1 are: 
[[ 4.39748573e+05 7.94840695e+06 7.99391875e+02] 
[ 4.39767623e+05 7.94855300e+06 7.96977271e+02] 
[ 4.39805723e+05 7.94871175e+06 8.19359365e+02] 
[ 4.39799373e+05 7.94885145e+06 7.76425797e+02]] 
The x, y, z lists of id= 2 are: 
[[ 4.40764575e+05 7.94899115e+06 2.35551602e+02] 
[ 4.40504224e+05 7.94898480e+06 3.26929119e+02] 
[ 4.40104173e+05 7.94898480e+06 5.36893601e+02] 
[ 4.39742223e+05 7.94900385e+06 7.37887029e+02] 
[ 4.38580171e+05 7.94953725e+06 1.96300929e+02]] 
The x, y, z lists of id= 3 are: 
[[ 4.38142020e+05 7.94734014e+06 3.88997748e+02] 
[ 4.38599220e+05 7.94733379e+06 4.80580256e+02] 
[ 4.39126272e+05 7.94734014e+06 6.69802869e+02]] 
The x, y, z lists of id= 4 are: 
[[ 4.38453170e+05 7.94759414e+06 6.00856103e+02] 
[ 4.38294420e+05 7.94765764e+06 5.81018396e+02] 
[ 4.38167420e+05 7.94770209e+06 5.15149846e+02]] 
+0

而不是遍歷'ID',你可以在df.groupby(['ID']):print(val,frame)''中爲val,frame使用'groupby':'。 – blacksite

+0

同意!我避免了'groupby'讓我的答案儘可能接近OP。但'groupby'是一個更優雅和可擴展的解決方案。 –

0

這個怎麼樣 -

import numpy as np 
mydata = np.genfromtxt(r'path\to\my\text.txt', skip_header=1) # to skip the header which is a text 

finalArr = [] # to display our final result 
for i in xrange(len(mydata)): 
    if mydata[i][0] == 3: # 3 is the ID, column 1 of the txt file. Change it with some other ID 
     temp=[] 
     for j in xrange(1, len(mydata[i])): 
      temp.append(mydata[i][j]) 
     finalArr.append(temp) 

print finalArr 
0

try-except,no defaultdict,no pandas。只需建立數據字典使用保存完好的祕密,您可以參考dict值不僅 d[k],而且方法d.get,允許您指定一個默認值,如果該鍵還沒有出現在d,就像在d.get(k, default)

我們的默認值必須是空列表,這是我們可以追加從該行的其餘部分獲得的值的列表中,我們可以得到使用Python a, *r = alist

21:25 $ python 
Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux 
Type "help", "copyright", "credits" or "license" for more information. 
>>> # lines = open('yourdata').readlines() 
>>> lines = '''ID  X   Y   Z 
... 1 439748.5728 7948406.945 799.391875 
... 1 439767.6229 7948552.995 796.977271 
... 1 439805.7229 7948711.745 819.359365 
... 1 439799.3729 7948851.446 776.425797 
... 2 440764.5749 7948991.146 235.551602 
... 2 440504.2243 7948984.796 326.929119 
... 2 440104.1735 7948984.796 536.893601 
... 2 439742.2228 7949003.846 737.887029 
... 2 438580.1705 7949537.247 196.300929 
... 3 438142.0196 7947340.142 388.997748 
... 3 438599.2205 7947333.792 480.580256 
... 3 439126.2716 7947340.142 669.802869 
... 4 438453.1702 7947594.143 600.856103 
... 4 438294.4199 7947657.643 581.018396 
... 4 438167.4197 7947702.093 515.149846'''.split('\n') 
>>> d = {} 
>>> ################## TL ; DR ############################### 
>>> for k, *rest in (line.split() for line in lines[1:] if line): 
...  d[k] = d.get(k, []) + [[float(f) for f in rest]] 
... ################## TL ; DR ############################### 
>>> for k in d: 
...  print(k) 
...  for l in d[k]: print('\t', l) 
... 
1 
     [439748.5728, 7948406.945, 799.391875] 
     [439767.6229, 7948552.995, 796.977271] 
     [439805.7229, 7948711.745, 819.359365] 
     [439799.3729, 7948851.446, 776.425797] 
2 
     [440764.5749, 7948991.146, 235.551602] 
     [440504.2243, 7948984.796, 326.929119] 
     [440104.1735, 7948984.796, 536.893601] 
     [439742.2228, 7949003.846, 737.887029] 
     [438580.1705, 7949537.247, 196.300929] 
3 
     [438142.0196, 7947340.142, 388.997748] 
     [438599.2205, 7947333.792, 480.580256] 
     [439126.2716, 7947340.142, 669.802869] 
4 
     [438453.1702, 7947594.143, 600.856103] 
     [438294.4199, 7947657.643, 581.018396] 
     [438167.4197, 7947702.093, 515.149846] 
>>> 

的新語法如果你需要一個numpy陣列的字典,

>>> import numpy as np 
>>> for k in d: d[k] = np.array(d[k]) 

就是這樣。

0

這裏的答案似乎過分複雜。下面是使用僅numpy的兩班輪:

就加載整個文件,找到的唯一ID:

a = np.loadtxt('file1.txt', skiprows=1) 
ids = np.unique(a[0, :]) 
# ids = array([ 1., 2., 3., 4.]) 

然後,通過索引a在每個ID創建一個列表:

b = [a[a[:, 0] == i, 1:] for i in ids] 

這給:

[array([[ 4.39748573e+05, 7.94840695e+06, 7.99391875e+02], 
     [ 4.39767623e+05, 7.94855300e+06, 7.96977271e+02], 
     [ 4.39805723e+05, 7.94871175e+06, 8.19359365e+02], 
     [ 4.39799373e+05, 7.94885145e+06, 7.76425797e+02]]), 
array([[ 4.40764575e+05, 7.94899115e+06, 2.35551602e+02], 
     [ 4.40504224e+05, 7.94898480e+06, 3.26929119e+02], 
     [ 4.40104173e+05, 7.94898480e+06, 5.36893601e+02], 
     [ 4.39742223e+05, 7.94900385e+06, 7.37887029e+02], 
     [ 4.38580171e+05, 7.94953725e+06, 1.96300929e+02]]), 
array([[ 4.38142020e+05, 7.94734014e+06, 3.88997748e+02], 
     [ 4.38599220e+05, 7.94733379e+06, 4.80580256e+02], 
     [ 4.39126272e+05, 7.94734014e+06, 6.69802869e+02]]), 
array([[ 4.38453170e+05, 7.94759414e+06, 6.00856103e+02], 
     [ 4.38294420e+05, 7.94765764e+06, 5.81018396e+02], 
     [ 4.38167420e+05, 7.94770209e+06, 5.15149846e+02]])] 

例如,如果你現在要爲第i的y值d,只需使用b[0][:, 1]