0
我的文本文件看起來像這樣提取多個圖案,並將其保存到熊貓數據幀[巨蟒]
Description: Text 1 follows <br/> blah blah blah Cause: Cause Text 1
follows here <br/>Description: Text 2 follows <br/> blah blah
blah Cause: Cause Text 2 follows here<br/>Description: Text 3 follows <br/>
blah blah blah Description: Text 4 follows <br/> blah blah
blah Cause: Cause Text 4 follows<br/>
我想擁有的所有說明,並導致了NLP結構化格式的熊貓數據幀
Description Cause
Text 1 follows Cause Text 1 follows here
Text 2 follows Cause Text 2 follows here
Text 3 follows
Text 4 follows Cause Text 4 follows here
我迄今所做的:
re.findall(r'Description:(.*?)<br/>',textfile)
re.findall(r'Cause:(.*?)<br/>',textfile)
但是,這並不讓我墊當我嘗試創建更大的數據框時,說明和原因!
感謝您的任何輸入或指導做同樣的事情。對python很新穎!
嘗試['R'說明(S):(?:P(:(?(?:(?!
))\ S *。*)
(:(:(?!說明:)?。 )*?原因:\ s *(?P
)。)*))?''](https://regex101.com/r/bRIOev/1) –