1
我有一個很多或URL的CSV文件,都有不同的域名擴展名(.com
,.eu
,.org
等等)。但我只想在Python 2.7版使用if '.nl' in row:
的.nl
擴展抓取域:如何只使用python從CSV文件抓取某些URL?
from selenium import webdriver
import csv
fieldnames = ['Website', '@media', 'googleadservices.com/pagead/conversion']
def csv_writerheader(path):
with open(path, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator='\n')
writer.writeheader()
def csv_writer(dictdata, path):
with open(path, 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator='\n')
writer.writerow(dictdata)
csv_output_file = 'output!.csv'
driver = webdriver.Chrome(executable_path=r'C:\Users\Jacob\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')
keywords = ['@media', 'googleadservices.com/pagead/conversion']
csv_writerheader(csv_output_file)
with open('top1m-edited.csv') as example_file:
example_reader = csv.reader(example_file)
for row in example_reader:
# INITIALIZE DICT
data = {'Website': row}
if '.nl' in row: # MAKING THE DOMAIN DISTINCTION HERE
try:
driver.get(row[0])
html = driver.page_source
for searchstring in keywords:
if searchstring.lower() in html.lower():
print (row, searchstring, 'FOUND!')
data[searchstring] = 'FOUND!'
else:
print (row, searchstring, 'not found')
data[searchstring] = 'not found'
csv_writer(data, csv_output_file)
except:
pass
打印結果:
C:\Python27\python.exe "C:/Users/Jacob/PycharmProjects/Testing/fooling around 2.py"
Process finished with exit code 0
所以我的腳本基本上處於這種狀態不會做任何事情,除了導出CSV文件幾乎沒有結果。
但是,當我簡單地忽略了if '.nl' in row:
,腳本完美地工作。
我應該做些什麼調整才能在腳本中導入/搜索.nl
域名網址?
非常感謝你,它現在的作品! – jakeT888