2016-09-15 71 views
2

我有一個文本文件,它看起來像這樣如何添加座標數組作爲熊貓數據幀行

,A,B 
0,"[[-81.03443909 29.22855949] 
[-81.09729767 29.27094078] 
[-80.9937973 29.19698906] 
[-81.03072357 29.27445984] 
[-81.00499725 29.22187805]]","[[-81.42427063 28.30874634] 
[-81.42427063 28.30874634] 
[-81.42427063 28.30874634] 
[-81.36068726 28.29172897] 
[-81.42297363 28.30497551] 
[-81.48571777 28.24975777] 
[-81.35914612 28.29036331]]" 

這是我使用的是什麼樣子,它置入一個熊貓數據幀

data
[[-78.70117188 33.80754852] 
[-78.9934082 33.61843491] 
[-80.81887817 28.60919952] 
..., 
[-76.62332916 35.54064941] 
[-79.04235077 33.81600952] 
[-79.03309631 33.55596161]] 

而且我想它看起來像這樣

     lat  long 
cluster  point        
0   a  0.445900 -1.286198 
      b  -0.574496 -0.407154 
      c  0.872979 0.068084 
      d  0.297255 -2.157051 

之前,我創建.tx t文件中的數據在nd.array中,我使用熊貓來創建文本文件。所以也許有一種方法可以跳過txt文件並使用熊貓將數組分割或格式化爲一個整潔的數據框。我一直在這一段時間,我不知道如何。

這是我如何生成我的數據。我保持清晰的東西只複製2列,但在未來,我想傳遞一個獨特點標識符

# Generate sample data 
col_1 ="RL15_LONGITUDE" 
col_2 ="RL15_LATITUDE" 

data = pd.read_csv("input_data.csv") 
coords = data.as_matrix(columns=[col_1, col_2]) 
data = data[[col_1,col_2]].dropna() 
data = data.as_matrix().astype('float16',copy=False) 

這是print clusters

[array([[-81.03443909, 29.22855949], 
     [-81.09729767, 29.27094078], 
     [-81.42297363, 28.30497551], 
     [-81.48571777, 28.24975777], 
     [-81.35914612, 28.29036331]], dtype=float32), array([[-81.49134064, 27.58896065], 
     [-81.5194931 , 27.63422012], 
     [-81.5096283 , 27.55581093], 
     [-82.05444336, 26.93555069]], dtype=float32), array([[-82.18956757, 26.52433586], 
     [-82.18956757, 26.52433586], 
     [-82.18956757, 26.52433586], 
     [-82.19439697, 26.53297997]], dtype=float32)] 

輸出這是我是如何創建我的數據幀和寫.txt文件

clusters = pd.DataFrame({'A':[clusters]}) 
clusters.to_csv('output.txt') 
+0

可以張貼clusters'可變的'的樣品(即輸出'print(clusters)'),因爲解析這個文件將會非常棘手...... – MaxU

+0

@MaxU,請參閱編輯。這就是「簇」的輸出(縮寫)。 – rubito

+0

你所有的羣集都有相同的點數嗎? – MaxU

回答

1

這裏是一個起點:

In [72]: (pd.concat([pd.DataFrame(c, columns=['lat','lon']).assign(cluster=i) 
    ....:    for i,c in enumerate(clusters)]) 
    ....: .reset_index() 
    ....: .rename(columns={'index':'point'}) 
    ....:) 
Out[72]: 
    point  lat  lon cluster 
0  0 -81.034439 29.228559  0 
1  1 -81.097298 29.270941  0 
2  2 -81.422974 28.304976  0 
3  3 -81.485718 28.249758  0 
4  4 -81.359146 28.290363  0 
5  0 -81.491341 27.588961  1 
6  1 -81.519493 27.634220  1 
7  2 -81.509628 27.555811  1 
8  3 -82.054443 26.935551  1 
9  0 -82.189568 26.524336  2 
10  1 -82.189568 26.524336  2 
11  2 -82.189568 26.524336  2 
12  3 -82.194397 26.532980  2 

或者與多指數:

In [73]: (pd.concat([pd.DataFrame(c, columns=['lat','lon']).assign(cluster=i) 
    ....:    for i,c in enumerate(clusters)]) 
    ....: .reset_index() 
    ....: .rename(columns={'index':'point'}) 
    ....: .set_index(['cluster','point']) 
    ....:) 
Out[73]: 
        lat  lon 
cluster point 
0  0  -81.034439 29.228559 
     1  -81.097298 29.270941 
     2  -81.422974 28.304976 
     3  -81.485718 28.249758 
     4  -81.359146 28.290363 
1  0  -81.491341 27.588961 
     1  -81.519493 27.634220 
     2  -81.509628 27.555811 
     3  -82.054443 26.935551 
2  0  -82.189568 26.524336 
     1  -82.189568 26.524336 
     2  -82.189568 26.524336 
     3  -82.194397 26.532980 
+1

哇!夥計非常感謝你!我花了最後一天,一半試圖弄清楚這一點! – rubito

+0

@rubito,歡迎您! :)請考慮[接受](http://meta.stackexchange.com/a/5235)一個答案,如果你認爲它解決了你的問題,這也表明你的問題已經回答了 – MaxU

+1

絕對!再次感謝 – rubito