2017-07-02 101 views
2

我正在瀏覽具有1000行的數據幀。我理想地想知道我的循環的進展 - 即它已完成多少行,它已完成總行數的百分比等。Pandas Iterrows行號和百分比

是否有辦法打印行數甚至更好,百分比的行在之前?

我的代碼目前在下面。目前,打印下面看起來如何顯示,現在顯示某種元組/列表,但是我需要的是行號。這可能很簡單。

for row in testDF.iterrows(): 

     print("Currently on row: "+str(row)) 

理想印刷響應:

Currently on row 1; Currently iterrated 1% of rows 
Currently on row 2; Currently iterrated 2% of rows 
Currently on row 3; Currently iterrated 3% of rows 
Currently on row 4; Currently iterrated 4% of rows 
Currently on row 5; Currently iterrated 5% of rows 
+0

爲什麼你要使用循環開始?最有可能是更好的方法。如果必須,那麼可以使用'enumerate'輕鬆計算進度,該枚舉返回當前行的索引(以及行本身),它可以除以總行數。 (testDF.iterrows()):... progress = index/len(testDF)' – DeepSpace

+0

我正在使用iterrows循環,因爲我使用地理編碼數據創建了一個新列。大部分允許您進行地理編碼的服務都有限制,因此我在循環中也添加了0.1秒的延遲。 – christaylor

回答

2

一個與format可能的解決方案,如果唯一單調指數(0,1,2,...):

for i, row in testDF.iterrows(): 
     print("Currently on row: {}; Currently iterrated {}% of rows".format(i, (i + 1)/len(testDF.index) * 100)) 

樣品:

np.random.seed(1332) 
testDF = pd.DataFrame(np.random.randint(10, size=(10, 3))) 
print (testDF) 
    0 1 2 
0 8 1 9 
1 4 3 5 
2 0 1 3 
3 1 8 6 
4 7 4 7 
5 7 5 3 
6 7 9 9 
7 0 1 2 
8 1 3 4 
9 0 0 3 

for i, row in testDF.iterrows(): 
     print("Currently on row: {}; Currently iterrated {}% of rows".format(i, (i + 1)/len(testDF.index) * 100)) 
Currently on row: 0; Currently iterrated 10.0% of rows 
Currently on row: 1; Currently iterrated 20.0% of rows 
Currently on row: 2; Currently iterrated 30.0% of rows 
Currently on row: 3; Currently iterrated 40.0% of rows 
Currently on row: 4; Currently iterrated 50.0% of rows 
Currently on row: 5; Currently iterrated 60.0% of rows 
Currently on row: 6; Currently iterrated 70.0% of rows 
Currently on row: 7; Currently iterrated 80.0% of rows 
Currently on row: 8; Currently iterrated 90.0% of rows 
Currently on row: 9; Currently iterrated 100.0% of rows 

EDI T:

如果一些自定義的索引值,溶液zipnumpy.arange通過length of index什麼是相同的length of df

np.random.seed(1332) 
testDF = pd.DataFrame(np.random.randint(10, size=(10, 3)), index=[2,4,5,6,7,8,2,1,3,5]) 
print (testDF) 
    0 1 2 
2 8 1 9 
4 4 3 5 
5 0 1 3 
6 1 8 6 
7 7 4 7 
8 7 5 3 
2 7 9 9 
1 0 1 2 
3 1 3 4 
5 0 0 3 

for i, (idx, row) in zip(np.arange(len(testDF.index)), testDF.iterrows()): 
    print("Currently on row: {}; Currently iterrated {}% of rows".format(idx, (i + 1)/len(testDF.index) * 100)) 

Currently on row: 2; Currently iterrated 10.0% of rows 
Currently on row: 4; Currently iterrated 20.0% of rows 
Currently on row: 5; Currently iterrated 30.0% of rows 
Currently on row: 6; Currently iterrated 40.0% of rows 
Currently on row: 7; Currently iterrated 50.0% of rows 
Currently on row: 8; Currently iterrated 60.0% of rows 
Currently on row: 2; Currently iterrated 70.0% of rows 
Currently on row: 1; Currently iterrated 80.0% of rows 
Currently on row: 3; Currently iterrated 90.0% of rows 
Currently on row: 5; Currently iterrated 100.0% of rows 
+0

打印你做的或者喜歡的方式是否更好? 'print('目前在行',i','迭代',100 * i/testDF.shape [0],'%')'爲什麼?謝謝你的回答 –

+1

@RayhaneMama - 我認爲有很多可能的方法,你的作品也是。我更喜歡'len(df.index)'因爲最快的方式。 – jezrael

+1

請注意,這裏'i'是每行的索引。它適用於索引包含從0到len(df)-1的整數,但如果'testDF'使用自定義索引值則不會。 –

1

所有iterrows首先給出了(index, row)元組。所以,正確的代碼是

for index, row in testDF.iterrows(): 

指數在一般的情況下是不是行的數量,這是一些標識符(這是熊貓的動力,但它使一些混亂,因爲它的表現還不如蟒蛇,其中一般負責list該索引是行的數量)。這就是爲什麼我們需要獨立計算行數。我們可以引進line_number = 0並在每個環節line_number += 1中增加它。但是python爲我們提供了一個可用的工具:enumerate,它返回(line_number, value)的元組,而不僅僅是value。所以我們回到代碼

for (line_number, (index, row)) in enumerate(testDF.iterrows()): 
    print("Currently on row: {}; Currently iterrated {}% of rows".format(
      line_number, 100*(line_number + 1)/len(testDF))) 

P.S. python2在你分配integeres時返回整數,這就是爲什麼999/1000 = 0,你不期望的。所以你可以改變浮動或者開始100*以獲得整數百分比。

0

對於大數據幀,限制打印可能會更好,這是一項耗時的任務。這是一種方法:

dftest=pd.DataFrame(np.random.rand(10**5,5)) 

percent=0 
n=len(dftest)//100 

for i,row in dftest.iterrows(): 
    if (i+1)//n>percent : 
     percent +=1 
     print (percent, "% realized") 
    dftest.iloc[i] = 2*row #a job