在多個csv文件中計算行數，跳空行

-1

我需要在（'/ dir'/）空格之外獲得csv文件的長度。我嘗試這樣做：在多個csv文件中計算行數，跳空行

import os, csv, itertools, glob 

#To filer the empty lines 
def filterfalse(predicate, iterable): 
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8 
    if predicate is None: 
     predicate = bool 
    for x in iterable: 
     if not predicate(x): 
      yield x 

#To read each file in '/dir/', compute the length and write the output 'count.csv' 
with open('count.csv', 'w') as out: 
    file_list = glob.glob('/dir/*') 
    for file_name in file_list: 
     with open(file_name, 'r') as f: 
       filt_f1 = filterfalse(lambda line: line.startswith('\n'), f) 
       count = sum(1 for line in f if (filt_f1)) 
       out.write('{c} {f}\n'.format(c = count, f = file_name))

我得到我想要的輸出，可惜每個文件的長度（「/ DIR /」），包括空行。

要看到空行來從我讀file.csv爲file.txt和它看起來像這樣：

*text,favorited,favoriteCount,... 
"Retweeted user (@user):... 
'empty row' 
Do Operators...*

來源

2016-04-25 user2278505

我會建議使用大熊貓。

import pandas 

# Reads csv file and converts it to pandas dataframe. 
df = pandas.read_csv('myfile.csv') 

# Removes rows where data is missing. 
df.dropna(inplace=True) 

# Gets length of dataframe and displays it. 
df_length = df.count + 1 
print('The length of the CSV file is', df_length)

文檔：http://pandas.pydata.org/pandas-docs/version/0.18.0/

來源

2016-04-25 18:29:29

你filterfalse()功能正確執行。它的正好是與標準庫itertools模塊中名爲ifilterfalse的模塊相同，所以目前還不清楚爲什麼你不只是使用它而不是自己寫 - 它的一個主要優點是它已經被測試和調試。（內置插件通常也更快，因爲很多都是用C編寫的。）

問題是您沒有正確使用generator function。

由於它返回一個generator object，需要遍歷它會潛在地yield使用類似for line in filt_f1代碼的多個值。
您給出的謂詞函數參數不能處理在其中具有其他前導空白字符（如空格和製表符）的行，並且不能正確處理。 - 所以你通過它的lambda需要修改以處理這些情況。

下面的代碼有這兩個變化。

import os, csv, itertools, glob 

#To filter the empty lines 
def filterfalse(predicate, iterable): 
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8 
    if predicate is None: 
     predicate = bool 
    for x in iterable: 
     if not predicate(x): 
      yield x 

#To read each file in '/dir/', compute the length and write the output 'count.csv' 
with open('count.csv', 'w') as out: 
    file_list = glob.glob('/dir/*') 
    for file_name in file_list: 
     with open(file_name, 'r') as f: 
      filt_f1 = filterfalse(lambda line: not line.strip(), f) # CHANGED 
      count = sum(1 for line in filt_f1) # CHANGED 
      out.write('{c} {f}\n'.format(c=count, f=file_name))

來源

2016-04-25 19:11:05 martineau

謝謝，它部分工作（即我仍然可以找到一些空行） – user2278505

在多個csv文件中計算行數，跳空行

回答

相關問題