2016-09-29 47 views
0

您好,我正在嘗試使用CSV文件並遍歷每個客戶數據。爲了解釋,每個客戶都有12個月的數據。我想分析他們的年度數據,將這些數據的相關性保存到一個新列表中並循環,直到所有客戶都進行了分析。對CSV進行迭代刪除分析數據

例如這裏是一個客戶的數據可能是什麼樣子(簡化的情況): enter image description here

我已經能夠得到這個工作,以生成一個客戶數據的CSV相關性。但是,我的數據表中有成千上萬的客戶。我想使用嵌套for循環來獲取每個客戶的所有相關值到列表/數組中。該列表將包含一行特定客戶的關聯關係,那麼下一行將成爲下一個客戶。

這裏是我當前的代碼:

import numpy 
from numpy import genfromtxt 
overalldata = genfromtxt('C:\Users\User V\Desktop\CUSTDATA.csv', delimiter=',') 
emptylist = [] 
overalldatasubtract = overalldata[13::] 
#This is where I try to use the four loop to go through all the customers. I  don't know if len will give me all the rows or the number of columns. 
for x in range(0,len(overalldata),11): 
    for x in range(0,13,1): 
      cust_months = overalldata[0:x,1] 
      cust_balancenormal = overalldata[0:x,16] 
      cust_demo_one = overalldata[0:x,2] 
      cust_demo_two = overalldata[0:x,3] 
      num_acct_A = overalldata[0:x,4] 
      num_acct_B = overalldata[0:x,5] 
    #Correlation Calculations 
      demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0] 
      demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0] 
      demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0] 
      demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0] 
      demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0] 
      demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0] 

      result_correlation = [demo_one_corr_balance, demo_two_corr_balance, demo_one_corr_acct_a, demo_one_corr_acct_b, demo_two_corr_acct_a, demo_two_corr_acct_b] 

result_correlation_combined = emptylist.append(result_correlation) 
#This is where I try to delete the rows I have already analyzed. 
overalldata = overalldata[11**x::] 

print result_correlation_combined 
print overalldatasubtract 

看來,我的加減法的工作,但是當我用我的更大的數據集試了一下,我才意識到我的方法是完全錯誤的。

你會以不同的方式做到這一點嗎?我認爲它可以工作,但我找不到我的錯誤。

回答

0

對兩個循環使用相同的變量x。在第二個循環中,x從0變爲12,無論客戶在哪裏,並且由於您僅將行號設置爲x,您將被困在第一位客戶身上。

你的雙循環而應是這樣的:

# loop over the customers 
for x_customer in range(0,len(overalldata),12): 
    # loop over the months 
    for x_month in range(0,12,1): 
     # line number: x 
     x = x_customer*12 + x_month 
     ... 

我改變了邊界和循環的步驟,因爲:

  • 環1:有在12個月每所以12條線路customer - > step = 12
  • loop 2:有12個月,所以月份的數字範圍從0到11 - >range(0,12,1)
+0

謝謝,這似乎是什麼,我試圖做的,但我仍然沒有得到任何輸出。 我想將這些相關性保存到: result_correlation_combined = emptylist.append(result_correlation) 但是,這似乎並沒有保存任何內容,因爲我不斷收到一個空列表。 –

0

這是我如何解決問題:這是我的for循環的位置問題。一個簡單的縮進問題。感謝您對上述海報的幫助。

在範圍x_customer(0,LEN(overalldata),12):

for x in range(0,13,1): 
      cust_months = overalldata[0:x,1] 
      cust_balancenormal = overalldata[0:x,16] 
      cust_demo_one = overalldata[0:x,2] 
      cust_demo_two = overalldata[0:x,3] 
      num_acct_A = overalldata[0:x,4] 
      num_acct_B = overalldata[0:x,5] 
#Correlation Calculations 
      demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0] 
      demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0] 
      demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0] 
      demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0] 
      demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0] 
      demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0] 

      result_correlation = [(demo_one_corr_balance),(demo_two_corr_balance),(demo_one_corr_acct_a),(demo_one_corr_acct_b),(demo_two_corr_acct_a),(demo_two_corr_acct_b)] 
      numpy.savetxt('correlationoutput.csv', (result_correlation)) 
    result_correlation_combined = emptylist.append([result_correlation]) 
    cust_delete_list = [0,(x_customer),1] 
    overalldata = numpy.delete(overalldata, (cust_delete_list), axis=0)