在網站中提取兩個字符串的子串之間

我有this site。我想提取出現在文章標題下方的符號（EXAS，ESNT，ENZ，CENT，AEE）。我是一名初學者，所以我嘗試了一種相當反pythonic的方法：在網站中提取兩個字符串的子串之間

import requests 
link="https://www.zacks.com/commentary/99386/new-strong-buy-stocks-for-december-29th" 
fetch_data = requests.get(link) 
content = str((fetch_data.content)) 
# I know that in the source code the symbols appear between "tickers" and "publish_date" therefore: 
tickers= "tickers :" 
pd = "publish_date :" 
Z= ("%s(.*)%s" % (tickers,pd)) 
result = re.search(Z, content) 
print (result) 
# Just printing out the substring between tickers and pd 
Output: <_sre.SRE_Match object; span=(95142, 95213), match="tickers : [\\'EXAS\\',\\'ESNT\\',\\'ENZ\\',\\'CEN>

如何才能打印出符號？另外，最後一個符號'CEN'應該打印爲'CENT'並且'AEE'符號也不存在。這將是理想的

Symbols: EXAS, ESNT, ENZ, CENT, AEE

或者至少是：

"tickers : [\\'EXAS\\',\\'ESNT\\',\\'ENZ\\',\\'CENT\\',\\'AEE\\]

來源

2016-12-29 Rafael Martínez

可以訪問第一組和清理：

>>> tickers = result.groups()[0] 
>>> re.findall(r'\[.*?\]', tickers)[0].split("\\'")[1::2] 
['EXAS', 'ESNT', 'ENZ', 'CENT', 'AEE']

來源

2016-12-29 21:09:20

在網站中提取兩個字符串的子串之間

回答

相關問題