2016-08-12 63 views
-2

我有一個文本文件,其中包含500多個HTML頁面,我可以將這些文件快速分離到HTML文件嗎?將文本文件拆分爲HTML文檔

我想到了並確定每個文檔的起點&會起作用,但我不確定如何爲此編寫腳本?

+0

什麼編程語言?每頁開頭的代碼是什麼。它是不是'<!DOCTYPE html>'? –

回答

0

如果您的HTML代碼由<!DOCTYPE html>標籤分隔,你可以使用這個腳本用Python編寫的:

# text to html 
# Parses through a text file and seperates HTML code into 
# files like html1.html, html2.html, etc. 
# The HTML files need to include <!DOCTYPE html> at the start! 

# Usage: $ python text-to-html.py filename 
# Example: $ python text-to-html.py testfile.txt 

from sys import argv 

filename = argv[1] 

open_file = open(filename) 
counter = 0 

for line in open_file: 
    if "<!DOCTYPE html>" in line: 
     counter += 1 
     new_filename = "html%d.html" % (counter) 
     new_file = open(new_filename, "w") 
    new_file.write(line) 

希望它能幫助!