0
我有以下刮板草案:腳本打印ARGS執行前,等待我按在終止前[進入]
from lxml import html
import requests
import sys
requestedURL = sys.argv[1]
page = requests.get(requestedURL)
tree = html.fromstring(page.content)
passage = ''
for tr in tree.cssselect("div [class='passage-content passage-class-0']"):
for each in tr:
for e in each:
for x in e:
if x.text_content() == 'Footnotes:' or x.text_content() == 'Cross references:':
passage += '\n'
passage = passage.lstrip('\n')
sys.stdout.write(passage)
sys.exit(0)
if not x.text_content()[0].isdigit():
passage += '\n\n'+x.text_content()+'\n\n'
else:
passage += x.text_content()
passage = passage.replace('\n\n\n', '\n\n')
當我運行它,我得到我想要的輸出,但我也得到兩個干擾事件:
- 的參數被印刷
- 腳本實際上並沒有結束,直到我按
Enter
例子:
python bg_scrape.py https://www.biblegateway.com/passage/?search=John+3%3A1&version=ESV
[1] 48648
John 3:1
New International Version (NIV)
Jesus Teaches Nicodemus
3 Now there was a Pharisee, a man named Nicodemus who was a member of the Jewish ruling council.
// this line doesn't show up until I hit enter
[1]+ Done python bg_scrape.py https://www.biblegateway.com/passage/?search=John+3%3A1
值得注意的是,只有開始發生一次,我把requestedURL
作爲sys.arg
而不是在代碼中的靜態字符串。
在你的命令行上運行'which python'的輸出是什麼? –
'/ usr/bin/python' – MrDuk
哦,它可能是cmd行參數中的「&」。嘗試將參數放在雙引號'python bg_scrape.py「https://www.biblegateway.com/passage/?search=John+3%3A1&version=ESV」' –