2011-04-24 61 views
0

我有代碼散落等ÃÂ&xA7;用於我已經寫下面的代碼段,以除去或與可接受的值 1.替換變音符號等 一個UTF-8 XML文件有更好的方法來做到這個? 2.當我在一些大的XML文件(> 50MB)上運行時,可能會發生內存不足錯誤。如果沒有更好的方法,我該如何優化它,避免OOM錯誤?ColdFusion的XML字符編碼部2

<cffile 
    action="read" 
    file="#ExpandPath('./xs.xml')#" 
    variable="myfile"/> 

<cfset myfile =ReReplace(myfile,'&##xC2;&##x2013;','.','all')/> 
<cfset myfile =ReReplace(myfile,'&##xC2;&##x2019;','''','all')/> 
<cfset myfile =ReReplace(myfile,'&##xC2;&##x201D;','"','all')/> 

<cfset myfile =ReReplace(myfile,'&##xC3;&##x192;&##xC2;&##xA7;','c','all')/> 
<cfset myfile =ReReplace(myfile,'&##xC3;&##xA7;','c','all')/> 
<cfset myfile =ReReplace(myfile,'&##xC3;&##xA9;','e','all')/> 
<cfset myfile =ReReplace(myfile,'&##xC3;&##x201A;&##xC2;&##x2022;','(*)','all')/> 
<cfset myfile =ReReplace(myfile,'&##xC3;&##x192;&##xC2;&##x201A;\?','(*)','all')/> 
<cfset myfile =ReReplace(myfile,'&##xC3;&##x201A;&##xC2;&##xB7;','-','all')/> 
<cfset myfile =ReReplace(myfile,'&##xC3;&##x201A;&##xC2;&##x2018;','''','all')/> 
<cfset myfile =ReReplace(myfile,' &##xC3;&##x201A;&##xC2;&##x201C;',' "','all')/> 

<cfset myfile =ReReplace(myfile,'&##xE2;&##x20AC;&##x201C;','-','all')/> 
<cfset myfile =ReReplace(myfile,'&##xE2;&##x20AC;&##x2122;','''','all')/> 
<cfset myfile =ReReplace(myfile,' &##xE2;&##x20AC;&##x153;',' "','all')/> 
<cfset myfile =ReReplace(myfile,'&##xE2;&##x20AC;&##x153;','-','all')/> 
<cfset myfile =ReReplace(myfile,'&##xE2;&##x20AC;&##xFFFD; ','" ','all')/> 
<cfset myfile =ReReplace(myfile,'&##xE2;&##x20AC;&##xFFFD;','-','all')/> 
<cfset myfile =ReReplace(myfile,'&##xE2;&##x201E;&##xA2;','(TM)','all')/> 
<cfset myfile =ReReplace(myfile,'&##xE2;&##x20AC;&##xA2;','(*)','all')/> 

<cfset myfile =ReReplace(myfile,'&##xEF;&##x201A;&##xA7;','(*)','all')/> 

<cfset myfile =ReReplace(myfile,'(&##[^;]*;)','','all')/> 

<cffile action="write" 
    file="#ExpandPath('./xs_new.xml')#" 
    output="#myfile#"/> 

感謝

回答

1

使用ColdFusion的文件功能,在同一時間在同一行的工作,而不是讀取整個事情到內存:

<cfscript> 
myfile = FileOpen(ExpandPath('./xs.xml'), "read"); 
myNewFile = FileOpen(ExpandPath('./xs_new.xml'), "write"); 

while(NOT FileisEOF(myfile)) { 
    line = FileReadLine(myfile); // read line 

    line = ReReplace(line,'&##xC2;&##x2013;','.','all'); 
    line = ReReplace(line,'&##xC2;&##x2019;','''','all'); 
    line = ReReplace(line,'&##xC2;&##x201D;','"','all'); 

    line = ReReplace(line,'&##xC3;&##x192;&##xC2;&##xA7;','c','all'); 
    line = ReReplace(line,'&##xC3;&##xA7;','c','all'); 
    line = ReReplace(line,'&##xC3;&##xA9;','e','all'); 
    line = ReReplace(line,'&##xC3;&##x201A;&##xC2;&##x2022;','(*)','all'); 
    line = ReReplace(line,'&##xC3;&##x192;&##xC2;&##x201A;\?','(*)','all'); 
    line = ReReplace(line,'&##xC3;&##x201A;&##xC2;&##xB7;','-','all'); 
    line = ReReplace(line,'&##xC3;&##x201A;&##xC2;&##x2018;','''','all'); 
    line = ReReplace(line,' &##xC3;&##x201A;&##xC2;&##x201C;',' "','all'); 

    line = ReReplace(line,'&##xE2;&##x20AC;&##x201C;','-','all'); 
    line = ReReplace(line,'&##xE2;&##x20AC;&##x2122;','''','all'); 
    line = ReReplace(line,' &##xE2;&##x20AC;&##x153;',' "','all'); 
    line = ReReplace(line,'&##xE2;&##x20AC;&##x153;','-','all'); 
    line = ReReplace(line,'&##xE2;&##x20AC;&##xFFFD; ','" ','all'); 
    line = ReReplace(line,'&##xE2;&##x20AC;&##xFFFD;','-','all'); 
    line = ReReplace(line,'&##xE2;&##x201E;&##xA2;','(TM)','all'); 
    line = ReReplace(line,'&##xE2;&##x20AC;&##xA2;','(*)','all'); 

    line = ReReplace(line,'&##xEF;&##x201A;&##xA7;','(*)','all'); 

    line = ReReplace(line,'(&##[^;]*;)','','all'); 

    fileWrite(line); 
} 

FileClose(myfile); 
FileClose(myNewFile); 
</cfscript> 
+1

或使用如果您更喜歡CFML而不是CFScript - http://www.bennadel.com/blog/2011-Reading-In-File-Data-One-Line-At-A-Time-Using-ColdFusion -s-CFLoop-Tag-or-Java-s-LineNumberReader.htm – Henry 2011-04-25 19:52:23

+0

謝謝orangepips和Henry – KobbyPemson 2011-04-26 10:56:42