CSV讀取列的值

我需要解析csv文件。CSV讀取列的值

輸入：文件名+

Index | writer | year | words 
    0  | Philip | 1994 | this is first row 
    1  | Heinz | 2000 | python is wonderful (new line) second line 
    2  | Thomas | 1993 | i don't like this 
    3  | Heinz | 1898 | this is another row 
    .  |  .  | . |  . 
    .  |  .  | . |  . 
    N  | Fritz | 2014 | i hate man united

輸出：對應所有單詞列表來命名

l = ['python is wonderful second line', 'this is another row']

我有什麼企圖？

import csv 
import sys 

class artist: 
    def __init__(self, name, file): 
     self.file = file 
     self.name = name 
     self.list = [] 

    def extractText(self): 
     with open(self.file, 'rb') as f: 
      reader = csv.reader(f) 
      temp = list(reader) 
     k = len(temp) 
     for i in range(1, k): 
      s = temp[i] 
      if s[1] == self.name: 
       self.list.append(str(s[3])) 


if __name__ == '__main__': 
    # arguements 
    inputFile = str(sys.argv[1]) 
    Heinz = artist('Heinz', inputFile) 
    Heinz.extractText() 
    print(Heinz.list)

輸出是：

["python is wonderful\r\nsecond line", 'this is another row']

如何獲取包含單詞的多行細胞擺脫\r\n，並且可以循環作爲其極其緩慢得到改善呢？

來源

2017-05-07 Tony Tannous

這至少應該更快，因爲你正在分析你正在閱讀的文件，然後剝離掉不需要的回車和換行字符，如果它們的存在。

with open(self.file) as csv_fh: 
    for n in csv.reader(csv_fh): 
     if n[1] == self.name: 
      self.list.append(n[3].replace('\r\n', ' ')

來源

2017-05-07 23:37:33 salparadise

你可以簡單地使用大熊貓以獲取列表：

import pandas 
df = pandas.read_csv('test1.csv') 
index = df[df['writer'] == "Heinz"].index.tolist() # get the specific name's index 
l = list() 
for i in index: 
    l.append(df.iloc[i, 3].replace('\n','')) # get the cell and strip new line '\n', append to list. 
l

輸出：

['python is wonderful second line', 'this is another row']

來源

2017-05-07 23:27:13

這不是我想要的。我需要一個特定的作家/藝術家的話。不是所有的單詞。 –

@TonyTannous更新了特定的作家答案。 –

入門中s[3]擺脫換行：我建議' '.join(s[3].splitlines())。見單證爲"".splitlines，又見"".translate。

改善循環：

def extractText(self): 
    with open(self.file, 'rb') as f: 
     for s in csv.reader(f): 
      s = temp[i] 
      if s[1] == self.name: 
       self.list.append(str(s[3]))

這節省了一個傳過來的數據。

但請考慮@ Tiny.D的意見，並給大熊貓一個嘗試。

來源

2017-05-07 23:33:47 tiwo

但他們我有刪除一些行前舉行中的每個對象全部文本。不是嗎？我需要的不是所有的特定單詞。 –

原始代碼複製所有文件內容存儲在存儲器'臨時=列表（讀取器）';這裏每一行檢查S [1] == self.name;大多數線路被丟棄。 – tiwo

要摺疊多個白色空間，您可以使用正則表達式，並加快了一點東西，嘗試循環理解：

import re 

def extractText(self): 
    RE_WHITESPACE = re.compile(r'[ \t\r\n]+') 
    with open(self.file, 'rU') as f: 
     reader = csv.reader(f) 

     # skip the first line 
     next(reader) 

     # put all of the words into a list if the artist matches 
     self.list = [RE_WHITESPACE.sub(' ', s[3]) 
        for s in reader if s[1] == self.name]

來源

2017-05-07 23:39:28

CSV讀取列的值

回答

相關問題