我正在編寫Python script來篩選一些log文件,我想用正則表達式/某些庫(preferred regex來過濾文本,因爲我想避免依賴於虛擬環境)。以下是文字/句話,我想尋找: Failed to find the annotation and the status of the test public void com.somename.qa.mobile.tests.somename.
我試過gsub(「[\ r \ n] +」,「\ r \ n」,textDoc)分別處理\ r和\ n,而不是單個字符串? 編輯 - "This is a line! It ends with a CRLF!\r\n
\r\n
\r\n
There is more stuff down here! I want it directly below the other stuff! Ge
10/03/2014 16:55 Local Title: TRANSFER OUT NOTE
Standard Title: TRANSFER SUMMARIZATION NOTE
AUTHOR: D,WARD
XYZ MEDICAL INSTITUTE
ABC NAGAR, PQW CITY-101011
*********
我目前有兩個函數可以從Python中提取HTML <body>文本並將其作爲一包單詞返回。他們給出相等的產出。我也清理各種標籤,否則會給我垃圾文本(例如<script>代碼)。 def html_to_bow_bs(text):
if text is None or len(text)==0:
return []
soup = BeautifulSoup(text