2017-09-02 82 views
0

我想直接導入網絡報廢的數據到PostgreSQL中,而不是先導出到.csv。從網站直接導入抓取的數據到PostgreSQL

下面是我正在使用的代碼,將數據導出到.csv文件,然後我手動導入它。任何幫助,將不勝感激

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 
my_url = 'http://tis.nhai.gov.in/TollInformation?TollPlazaID=236' 
uClient = uReq(my_url) 
page1_html = uClient.read() 
uClient.close() 
#html parsing 
page1_soup = soup(page1_html,"html.parser") 

filename = "TollDetail12.csv" 
f = open(filename,"w") 
headers = "ID, tollname, location, highwayNumber\n" 
f.write(headers) 

#grabing data 
containers = page1_soup.findAll("div",{"class":"PA15"}) 
for container in containers: 
    toll_name = container.p.b.text 

    search1 = container.findAll('b') 
    highway_number = search1[1].text 

    location = list(container.p.descendants)[10] 
    ID = my_url[my_url.find("?"):] 
    mystr = ID.strip("?") 
    print("ID: " + mystr) 
    print("toll_name: " + toll_name) 
    print("location: " + location) 
    print("highway_number: " + highway_number) 


    f.write(mystr + "," + toll_name + "," + location + "," + highway_number.replace(",","|") + "\n") 
f.close() 
+0

[在數據進入的PostgreSQL插入(http://www.postgresqltutorial.com/postgresql-python/insert/)閱讀。它會幫助你解決你的問題。 –

回答

0

您需要安裝psycopg2 PIP包。除此之外,編輯文件與您的項目特定信息,尚未測試,但應該工作。

from urllib.request import urlopen as uReq 

from bs4 import BeautifulSoup as soup 

import psycopg2 

my_url = 'http://tis.nhai.gov.in/TollInformation?TollPlazaID=236' 
uClient = uReq(my_url) 
page1_html = uClient.read() 
uClient.close() 
# html parsing 
page1_soup = soup(page1_html, 'html.parser') 

# grabing data 
containers = page1_soup.findAll('div', {'class': 'PA15'}) 

# Make the connection to PostgreSQL 
conn = psycopg2.connect(database='database_name', 
         user='user_name', password='user_password', port=5432) 
cursor = conn.cursor() 
for container in containers: 
    toll_name = container.p.b.text 

    search1 = container.findAll('b') 
    highway_number = search1[1].text 

    location = list(container.p.descendants)[10] 
    ID = my_url[my_url.find('?'):] 
    mystr = ID.strip('?') 

    query = "INSERT INTO table_name (ID, toll_name, location, highway_number) VALUES (%s, %s, %s, %s);" 
    data = (ID, toll_name, location, highway_number) 

    cursor.execute(query, data) 

# Commit the transaction 
conn.commit() 
+0

我在運行代碼 'File「C:\ Users \ prash \ AppData \ Local \ Programs \ Python \ Python36-32 \ lib \ site-packages \ psycopg2 \ __ init__.py時出現此錯誤,第130行連接 conn = _connect(dsn,connection_factory = connection_factory,** kwasync) psycopg2.OperationalError:致命:角色「prashant」不允許登錄「 – Prashant

+0

您需要使用登錄權限更改角色。 可以通過以下命令完成:'ALTER ROLE「prashant」WITH LOGIN;' – Pythonist

+0

非常感謝... – Prashant