的Python for循環無理停止了一半，同時通過CSV行迭代

我處理的CSV文件來分析，演講的反饋數據，該格式是這樣的Python for循環無理停止了一半，同時通過CSV行迭代

"5631","18650","10",,,"2015-09-18 09:35:11" 
"18650","null","10",,,"2015-09-18 09:37:12" 
"18650","5631","10",,,"2015-09-18 09:37:19" 
"58649","null","6",,,"2015-09-18 09:38:13" 
"45379","31541","10","its friday","nothing yet keep it up","2015-09-18 09:39:46"

我試圖改掉壞數據。只有具有「id1」，「id2」和的另一對應「id2」，「id1」的數據條目才被認爲是有效的。

我正在使用嵌套循環來嘗試找到每行的匹配條目。但是，外環似乎沒有理由停下來。這裏是我的代碼

class Filter: 
    file1 = open('EncodedPeerInteractions.FA2015.csv') 
    peerinter = csv.reader(file1,delimiter=',') 
    def __init__(self): 
     super() 

    def filter(self): 
     file2 = open('FilteredInteractions.csv','a') 
     for row in self.peerinter: 
      print(row) 
      if row[0] == 'null' or row[1] == 'null': 
       continue 
      id1 = int(row[0]) 
      id2 = int(row[1]) 
      for test in self.peerinter: 
       if test[0] == 'null' or test[1] == 'null': 
        continue 
       if int(test[0]) == id2 and int(test[1]) == id1: 
        file2.write("\n") 
        file2.write(str(row)) 
        break 
     file2.close()

我曾嘗試使用PDB步驟低谷的代碼，一切都很好了第一對夫婦的循環，然後才猛然跳到file2.close（），並返回。該程序確實打印出一些有效的條目，但這還不夠。

我測試了csv文件並將其加載到內存中，超過18000個條目。我測試使用打印，但它給出了相同的結果，所以它沒有任何錯誤的附加文件。

編輯

現在我明白是什麼問題。正如this question所說的，當有一個匹配時我發生，但當沒有匹配時，內部循環會消耗所有文件而不重置它。當它返回到外部循環時，它簡單地結束。我應該把它列入清單或讓它重置。

來源

2016-11-05 Bobby

快速問題，爲什麼你使用這個類和調用'super（）'？你使用'python 3'嗎？ – flybonzai

我真的不知道..有人告訴我，它有壞的邏輯在文件中簡單。他們應該被封裝在課堂上，並且在那裏傳遞感覺很奇怪。你建議我做什麼？ – Bobby

與上述問題相關 - 「Filter」類的父級是什麼？爲了安全起見，不要執行'file1 = open（'EncodedPeerInteractions.FA2015.csv'）'，並且打開'（'EncodedPeerInteractions.FA2015.csv'，「r」）作爲file1'。 –

您正在使這種方式更復雜，它需要。

考慮：

$ cat /tmp/so.csv 
"5631","18650","10",,,"2015-09-18 09:35:11" 
"18650","null","10",,,"2015-09-18 09:37:12" 
"18650","5631","10",,,"2015-09-18 09:37:19" 
"58649","null","6",,,"2015-09-18 09:38:13" 
"45379","31541","10","its friday","nothing yet keep it up","2015-09-18 09:39:46"

您可以使用CSV和過濾得到你想要的東西：

>>> with open('/tmp/so.csv') as f: 
... list(filter(lambda row: 'null' not in row[0:2], csv.reader(f))) 
... 
[['5631', '18650', '10', '', '', '2015-09-18 09:35:11'], 
['18650', '5631', '10', '', '', '2015-09-18 09:37:19'], 
['45379', '31541', '10', 'its friday', 'nothing yet keep it up', '2015-09-18 09:39:46']]

來源

2016-11-05 16:35:51 dawg

我覺得這個方法剛擺脫空值，如果條目說 ''123，'，'234'，x，x ，x'沒有相應的 ''234'，'123'，y，y，y'也被認爲是無效的 – Bobby

然後將'lambda'重寫爲一個測試函數，測試每行的條件...... – dawg

嘗試做類似如下：

def filter(file1, file2): 
    with open(file1, 'r') as f1: 
     peerinter = csv.reader(file1,delimiter=',') 
     with open(file2, 'a') as f2: 
     for row in peerinter: 
     ...

使用with open()語法將其封裝在上下文管理器中，這將確保該文件在最後被正確關閉。我猜你的問題源於你打開一個文件作爲類變量，另一個文件是在方法內部。

來源

2016-11-05 16:36:04 flybonzai

的Python for循環無理停止了一半，同時通過CSV行迭代

回答

相關問題