2017-08-27 234 views
0

enter image description here我創建了一個代碼,以幫助我檢索從csv文件從CSV提取行基於文件的特定關鍵字

import re 
keywords = {"metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"} # all your keywords 


keyre=re.compile("energy",re.IGNORECASE) 
with open("2006-data-8-8-2016.csv") as infile: 
    with open("new_data.csv", "w") as outfile: 
     outfile.write(infile.readline()) # Save the header 
     for line in infile: 
      if len(keyre.findall(line))>0: 
       outfile.write(line) 

我需要它來查找每個關鍵字,其中有兩個主要的列中的數據「位置「和」職位描述「,然後將包含這些單詞的整行寫入新文件中。關於如何以最簡單的方式完成這些任何想法?

+0

我需要它來看待所有的關鍵字,例如,它應該尋找包括「金屬」字下的行「位置」和「工作描述」,然後提取整行並將它們寫入文件中,然後查找第二個單詞並執行相同操作直到最後一個單詞 –

回答

0

試試這個,在數據框中循環並將新的數據框寫回csv文件。

import pandas as pd 

keywords = {"metal", "energy", "team", "sheet", "solar", "financial", 
     "transportation", "electrical", "scientists", 
     "electronic", "workers"} # all your keywords 

df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 

listMatchPosition = [] 
listMatchDescription = [] 

for i in range(len(df.index)): 
    if any(x in df['position'][i] or x in df['Job description'][i] for x in keywords): 
     listMatchPosition.append(df['position'][i]) 
     listMatchDescription.append(df['Job description'][i]) 


output = pd.DataFrame({'position':listMatchPosition, 'Job description':listMatchDescription}) 
output.to_csv("new_data.csv", index=False) 

編輯: 如果你有許多列添加,修改下面的代碼將做的工作。

df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 

output = pd.DataFrame(columns=df.columns) 

for i in range(len(df.index)): 
    if any(x in df['position'][i] or x in df['Job description'][i] for x in keywords): 
    output.loc[len(output)] = [df[j][i] for j in df.columns] 

output.to_csv("new_data.csv", index=False) 
+0

請注意,如果「作業描述」不是隻有一個單詞,因爲我認爲它不是,與Dataframe.isin方法 –

+0

相反,csv文件還包含其他列以及我需要提取並放入新文件的內容。任何想法如何? @Vincent K –

+0

你的意思是像「薪水」,「地點」這樣的列需要一起提取?如果是的話,如果它只是更多的幾列,只需添加更多listMatchxxx –

0

你可以做到這一點使用熊貓如下,如果你正在尋找含有關鍵字的列表中只有一個字行:

keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"] 

# read the csv data into a dataframe 
# change "," to the data separator in your csv file 
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 
# filter the data: keep only the rows that contain one of the keywords 
# in the position or the Job description columns 
df = df[df["position"].isin(keywords) | df["Job description"].isin(keywords)] 
# write the data back to a csv file 
df.to_csv("new_data.csv",sep=",", index=False) 

如果你正在尋找的行子(例如,在尋找financial engineeringfinancial),那麼你可以做到以下幾點:

keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists", 
      "electronic", "workers"] 
searched_keywords = '|'.join(keywords) 

# read the csv data into a dataframe 
# change "," to the data separator in your csv file 
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",") 
# filter the data: keep only the rows that contain one of the keywords 
# in the position or the Job description columns 
df = df[df["position"].str.contains(searched_keywords) | df["Job description"].str.contains(searched_keywords)] 
# write the data back to a csv file 
df.to_csv("new_data.csv",sep=",", index=False) 
+0

這很簡單,看起來不錯,我得到了代碼。但它不會保存任何數據只有標題:(雖然我相信很多關鍵字都包含在文件中,具體位置和職位描述@MedAli –

+0

@ Eng.Reem您可以分享您的數據樣本嗎? – MedAli

+0

這是行不通的,因爲「職位說明」欄不僅僅是一個單詞 –

相關問題