2016-01-21 33 views
0

我有以下刮板草案:腳本打印ARGS執行前,等待我按在終止前[進入]

from lxml import html 
import requests 
import sys 

requestedURL = sys.argv[1] 
page = requests.get(requestedURL) 
tree = html.fromstring(page.content) 

passage = '' 
for tr in tree.cssselect("div [class='passage-content passage-class-0']"): 
    for each in tr: 
     for e in each: 
      for x in e: 
       if x.text_content() == 'Footnotes:' or x.text_content() == 'Cross references:': 
        passage += '\n' 
        passage = passage.lstrip('\n') 
        sys.stdout.write(passage) 
        sys.exit(0) 
       if not x.text_content()[0].isdigit(): 
        passage += '\n\n'+x.text_content()+'\n\n' 
       else: 
        passage += x.text_content() 
      passage = passage.replace('\n\n\n', '\n\n') 

當我運行它,我得到我想要的輸出,但我也得到兩個干擾事件:

  • 的參數被印刷
  • 腳本實際上並沒有結束,直到我按Enter

例子:

python bg_scrape.py https://www.biblegateway.com/passage/?search=John+3%3A1&version=ESV 
[1] 48648 

John 3:1 

New International Version (NIV) 

Jesus Teaches Nicodemus 

3 Now there was a Pharisee, a man named Nicodemus who was a member of the Jewish ruling council. 

// this line doesn't show up until I hit enter 
[1]+ Done python bg_scrape.py https://www.biblegateway.com/passage/?search=John+3%3A1 

值得注意的是,只有開始發生一次,我把requestedURL作爲sys.arg而不是在代碼中的靜態字符串。

+0

在你的命令行上運行'which python'的輸出是什麼? –

+0

'/ usr/bin/python' – MrDuk

+1

哦,它可能是cmd行參數中的「&」。嘗試將參數放在雙引號'python bg_scrape.py「https://www.biblegateway.com/passage/?search=John+3%3A1&version=ESV」' –

回答

1

它可能在cmd行參數是「&」。嘗試把帕拉姆在雙引號python bg_scrape.py "https://www.biblegateway.com/passage/?search=John+3%3A1&version=ESV"

本質上發生了什麼事是你的shell實際上是運行了兩兩件事:

  • python bg_scrape.py https://www.biblegateway.com/passage/?search=John+3%3A1作爲背景程序
  • 然後運行version=ESV它分配一個shell變量

當你按下回車鍵後,shell只是給你一個已經完成的後臺進程的更新(在這種情況下,你剛剛開始的)。

相關問題