清潔CSS樣式塊從熊貓數據框中

我有一些記錄看起來像這樣的一個DF：清潔CSS樣式塊從熊貓數據框中

Untitledp { margin-top: 0px;margin-bottom: 0px;line-height: 1.15; } body { font-family: 'Times New Roman';font-style: Normal;font-weight: normal;font-size: 13.3333333333333px; } .Normal { telerik-style-type: paragraph;telerik-style-name: Normal;border-collapse: collapse; } .TableNormal { telerik-style-type: table;telerik-style-name: TableNormal;border-collapse: collapse; } .s_F0039783 { telerik-style-type: local;font-size: 13.34px; } .s_45EBF2E0 { telerik-style-type: local;font-family: 'Times New Roman';font-size: 13.3333333333333px;color: #000000; } A sentence that I actually want.

我想刪除的CSS樣式塊，只在最後返回的句子。每個記錄的CSS塊數量可能不同。所有記錄都以「Untitledp」開頭，並以我想要的文本結尾（文本後沒有樣式塊）。

我應該如何清潔這些塊？我使用BeautifulSoup來清理html標籤，但它不適用於這些塊。

來源

2017-08-04 Cameron Taylor

甲正則表達式可用於這一點，與sub()：

regex = re.compile('.+\s*{.*}') 
regex.sub('', s) # s is copy paste of your sample 
' A sentence that I actually want.'

至少它的工作原理在本實施例中。但要小心，如果您想要獲得的句子中有{}，這將會失敗。但是，由於句子通常不包含這些字符...

來源

2017-08-04 20:43:04 Unatiel

清潔CSS樣式塊從熊貓數據框中

回答

相關問題