將列重新格式化爲只有前5個字符

我是Python的新手，我正在爲這部分工作而苦苦掙扎。在一個文本文件中有大約25個列和50,000多行。對於其中一列，＃11（ZIP），此列包含此格式「07598-XXXX」的客戶的所有郵政編碼值，我只想獲得前5個，所以「「，我需要爲整個專欄做到這一點，但我基於我目前的邏輯如何編寫它感到困惑。到目前爲止，我的代碼能夠刪除包含某些字符串的行，並且我還使用'|'分隔符很好地將其格式化爲CSV。將列重新格式化爲只有前5個字符

州| ZIP（＃11）|第12列| ...

NY | 60169-8547 | 98

NY | 60169-8973 | 58

NY | 11219-4598 | 25

NY | 11219-8475 | 12

NY | 20036-4879 | 56

如何遍歷ZIP列並顯示前5個字符？感謝您的幫助！

import csv 

my_file_name = "NVG.txt" 
cleaned_file = "cleanNVG.csv" 
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC'] 


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    for line in csv.reader(infile, delimiter='|'): 
     if not any(remove_word in element for element in line for remove_word in remove_words): 
     writer.writerow(line)

來源

2016-10-04 Cesar

分別處理標題行，然後像你一樣逐行讀取，只需通過截短爲5個字符修改第二個line列。

import csv 

my_file_name = "NVG.txt" 
cleaned_file = "cleanNVG.csv" 
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC'] 


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    cr = csv.reader(infile, delimiter='|') 
    # iterate over title line and write it as-is 
    writer.writerow(next(cr)) 
    for line in cr: 
     if not any(remove_word in element for element in line for remove_word in remove_words): 
      line[1] = line[1][:5] # truncate 
      writer.writerow(line)

交替，你可以使用line[1] = line[1].split("-")[0]這將保留一切對連字符的左側。

請注意標題行的特殊處理：cr是一個迭代器。我只是在for循環之前手動使用它來執行傳遞處理。

來源

2016-10-04 20:19:49

這工作表現！謝謝！！問題，「writer.writerow（next（cr））」是如何工作的？我對這部分有點困惑。特別是裏面的cr部分。 – Cesar

在我的新編輯中看到我的解釋。 –

'{:.5}'.format(zip_)

其中zip_是包含郵政編碼字符串。更多關於format這裏：https://docs.python.org/2/library/string.html#format-string-syntax

來源

2016-10-04 20:16:22

請不要使用'zip'作爲字符串的名稱。這是一個內置函數。 – dawg

非常好的點 –

它也許更好（更高效和習慣）來切片得到一個子字符串：''11219-4598'[：5]' – dawg

獲得第5個字符在字符串中使用str[:6]

你的情況：

with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile: 
    writer = csv.writer(outfile) 
    for line in csv.reader(infile, delimiter='|'): 
     if not any(remove_word in element for element in line for remove_word in remove_words): 
      line[1] = line[1][:6] 
      writer.writerow(line)

line[1] = line[1][:6]會在你的文件中第2列設置爲前5個字符本身。

來源

2016-10-04 20:34:17 Dule

呃不，它有6個字符... –

@jean你是對的 – Dule

將列重新格式化爲只有前5個字符

回答

相關問題