從CSV提取行基於文件的特定關鍵字

enter image description here我創建了一個代碼，以幫助我檢索從csv文件從CSV提取行基於文件的特定關鍵字

import re 
keywords = {"metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"} # all your keywords 


keyre=re.compile("energy",re.IGNORECASE) 
with open("2006-data-8-8-2016.csv") as infile: 
    with open("new_data.csv", "w") as outfile: 
     outfile.write(infile.readline()) # Save the header 
     for line in infile: 
      if len(keyre.findall(line))>0: 
       outfile.write(line)

我需要它來查找每個關鍵字，其中有兩個主要的列中的數據「位置「和」職位描述「，然後將包含這些單詞的整行寫入新文件中。關於如何以最簡單的方式完成這些任何想法？

來源

2017-08-27 Eng.Reem

我需要它來看待所有的關鍵字，例如，它應該尋找包括「金屬」字下的行「位置」和「工作描述」，然後提取整行並將它們寫入文件中，然後查找第二個單詞並執行相同操作直到最後一個單詞 –

試試這個，在數據框中循環並將新的數據框寫回csv文件。

import pandas as pd 

keywords = {"metal", "energy", "team", "sheet", "solar", "financial", 
     "transportation", "electrical", "scientists", 
     "electronic", "workers"} # all your keywords 

df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 

listMatchPosition = [] 
listMatchDescription = [] 

for i in range(len(df.index)): 
    if any(x in df['position'][i] or x in df['Job description'][i] for x in keywords): 
     listMatchPosition.append(df['position'][i]) 
     listMatchDescription.append(df['Job description'][i]) 


output = pd.DataFrame({'position':listMatchPosition, 'Job description':listMatchDescription}) 
output.to_csv("new_data.csv", index=False)

編輯：如果你有許多列添加，修改下面的代碼將做的工作。

df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 

output = pd.DataFrame(columns=df.columns) 

for i in range(len(df.index)): 
    if any(x in df['position'][i] or x in df['Job description'][i] for x in keywords): 
    output.loc[len(output)] = [df[j][i] for j in df.columns] 

output.to_csv("new_data.csv", index=False)

來源

2017-08-27 11:47:48

請注意，如果「作業描述」不是隻有一個單詞，因爲我認爲它不是，與Dataframe.isin方法 –

相反，csv文件還包含其他列以及我需要提取並放入新文件的內容。任何想法如何？ @Vincent K –

你的意思是像「薪水」，「地點」這樣的列需要一起提取？如果是的話，如果它只是更多的幾列，只需添加更多listMatchxxx –

你可以做到這一點使用熊貓如下，如果你正在尋找含有關鍵字的列表中只有一個字行：

keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"] 

# read the csv data into a dataframe 
# change "," to the data separator in your csv file 
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 
# filter the data: keep only the rows that contain one of the keywords 
# in the position or the Job description columns 
df = df[df["position"].isin(keywords) | df["Job description"].isin(keywords)] 
# write the data back to a csv file 
df.to_csv("new_data.csv",sep=",", index=False)

如果你正在尋找的行子（例如，在尋找financial engineeringfinancial），那麼你可以做到以下幾點：

keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"] 
searched_keywords = '|'.join(keywords) 

# read the csv data into a dataframe 
# change "," to the data separator in your csv file 
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 
# filter the data: keep only the rows that contain one of the keywords 
# in the position or the Job description columns 
df = df[df["position"].str.contains(searched_keywords) | df["Job description"].str.contains(searched_keywords)] 
# write the data back to a csv file 
df.to_csv("new_data.csv",sep=",", index=False)

來源

2017-08-27 11:56:39 MedAli

這很簡單，看起來不錯，我得到了代碼。但它不會保存任何數據只有標題:(雖然我相信很多關鍵字都包含在文件中，具體位置和職位描述@MedAli –

@ Eng.Reem您可以分享您的數據樣本嗎？ – MedAli

這是行不通的，因爲「職位說明」欄不僅僅是一個單詞 –

從CSV提取行基於文件的特定關鍵字

回答

相關問題