我試圖從鏈接中刪除所有文件路徑:https://github.com/themichaelusa/Trinitum/find/master,根本不使用GitHub API。從GitHub Repo刮取文件路徑產生400響應,但在瀏覽器中查看正常工作
上面的鏈接在HTML中包含一個data-url屬性(table,id ='tree-finder-results',class ='tree-browser css-truncate'),用於製作這樣的URL :https://github.com/themichaelusa/Trinitum/tree-list/45a2ca7145369bee6c31a54c30fca8d3f0aae6cd
,顯示這本字典:
{"paths":["Examples/advanced_example.py","Examples/basic_example.py","LICENSE","README.md","Trinitum/AsyncManager.py","Trinitum/Constants.py","Trinitum/DatabaseManager.py","Trinitum/Diagnostics.py","Trinitum/Order.py","Trinitum/Pipeline.py","Trinitum/Position.py","Trinitum/RSU.py","Trinitum/Strategy.py","Trinitum/TradingInstance.py","Trinitum/Trinitum.py","Trinitum/Utilities.py","Trinitum/__init__.py","setup.cfg","setup.py"]}
,當你在Chrome等瀏覽器中查看它。但是,GET請求產生<[400] Response>
。
這裏是我使用的代碼:
username, repo = ‘themichaelusa’, ‘Trinitum’
ghURL = 'https://github.com'
url = ghURL + ('/{}/{}/find/master'.format(self.username, repo))
html = requests.get(url)
soup = BeautifulSoup(html.text, "lxml")
repoContent = soup.find('div', class_='tree-finder clearfix')
fileLinksURL = ghURL + str(repoContent.find('table').attrs['data-url'])
filePaths = requests.get(fileLinksURL)
print(filePaths)
不知道什麼是錯的。我的理論是,第一個鏈接創建一個cookie,允許第二個鏈接顯示我們定位的回購的文件路徑。我只是不確定如何通過代碼實現此目的。真的會感激一些指針!
你注意'例子/ advanced_example.py'是不是相對於'的https:// github.com/themichaelusa/Trinitum /發現/ master'的,但是'的https :// github.com/themichaelusa/Trinitum/blob/master'? –
我的建議是使用瀏覽器的開發工具仔細控制實際發送的請求,打印'url'和'fileLinksURL'並進行比較。 –