2016-11-24 78 views
1

拆分單一的txt文件分成多個TXT文件,我有一個單一的txt文件,我想根據* TEXT ID如何通過Python的

例如將其分割成許多文件:一個txt文件看起來像這

*TEXT 017 01/04/63 PAGE 020 
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST 
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE..... 
*TEXT 018 01/04/63 PAGE 021 
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA 
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE..... 
*TEXT 019 01/04/63 PAGE 021 
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO 
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE 
AGAINST HIM, FOR WEIDNER, 40, WAS A.... 

如何拆分成多個txt文件?

filename: 
TEXT017.txt 

filename: 
TEXT018.txt 

filename: 
TEXT019.txt 
+0

看看're.split()'方法 – n1c9

+0

你試過了什麼?你在哪一點遇到麻煩?分割文本?寫文件?讀文件? –

+0

@SonofaBeach我不知道如何將txt保存到多個txt文件相應地 – dd90p

回答

2

通過@ n1c9的啓發,我修改和添加的東西,使之完成。

import re 

raw_string = """*TEXT 017 01/04/63 PAGE 020 
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST 
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE..... 
*TEXT 018 01/04/63 PAGE 021 
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA 
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE..... 
*TEXT 019 01/04/63 PAGE 021 
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO 
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE 
AGAINST HIM, FOR WEIDNER, 40, WAS A....""" 

split_strings = re.split('\n?(\*TEXT .*)\n', raw_string) 
blocks = [s for s in split_strings if s] # filter some blank strings 

for i in range(0, len(blocks), 2): 
    # extract `019` from `*TEXT 019 01/04/63 PAGE 021` 
    num = re.search('TEXT (\d+)', blocks[i]).group(1) 

    # save content to `TEXT019.txt` 
    filename = 'TEXT%s.txt' % num 
    content = blocks[i+1] 
    with open(filename, 'w+') as fp: 
     fp.write(content) 
+0

非常感謝..我接受你的 – dd90p

2

斯普利特文本文件導入線由什麼劃定一個新的文本ID的開頭:

import re 

raw_string = """*TEXT 017 01/04/63 PAGE 020 
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST 
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE..... 
*TEXT 018 01/04/63 PAGE 021 
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA 
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE..... 
*TEXT 019 01/04/63 PAGE 021 
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO 
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE 
AGAINST HIM, FOR WEIDNER, 40, WAS A....""" 

split_string = re.split('(.*TEXT .*PAGE \d+)', raw_string) 
for item in split_stuff: 
    print('------') 
    print(item) 

------ 
*TEXT 017 01/04/63 PAGE 020 
------ 

THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST 
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE..... 

------ 
*TEXT 018 01/04/63 PAGE 021 
------ 

RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA 
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE..... 

------ 
*TEXT 019 01/04/63 PAGE 021 
------ 

BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO 
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE 
AGAINST HIM, FOR WEIDNER, 40, WAS A.... 
+0

我的意思是保存「1960年12月美國之後的所有盟友,美國首先提議幫助北約發展自己的核打擊力量,但歐洲.....」成文件名稱爲「TEXT017.txt」。 – dd90p