2016-08-24 285 views
0

是否有可能使用批處理文件或PowerShell來從CSV中刪除回車符/換行符而不刪除每條記錄的自然末尾。如何從CSV中刪除回車/換行 - 除了每行的結尾?

基本上我有一個這樣的文件:

a1, a2, a3, a4,aaa 
aaa a5, a6, a7,aaa aa 
a8 
b1,b2,b3,b4,b5,b6,b7,b8 
c1,c2,c3,c4,c5,c6,c7,c8 
d1,d2,d3,d4,d5,d6,d7,d8 
e1,e2,e3,e4,eee 
e5,e6,e7,e8 

作爲一個例子,第5列和8「可以」包含回車/換行符。我想刪除這些文件是1行= 1條記錄。

這可能嗎?我已經使用批處理文件格式化文件,因此如果可能的話,我希望將其用於所有格式。我正在考慮轉移到PowerShell,如果它更容易,請讓我知道(絕對PowerShell的noob)。

NP 編輯 - 每行具有相同數量的列。在這個例子中,8.

+1

由於每條(所得到的)行似乎具有不同的標記長度/數量,因此如何知道什麼是「內聯CRLF」以及什麼是「行尾CRLF」? (我認爲,我們不能依靠「a」,「b」,...?) – Stephan

+0

啊道歉,我可以將文件更改爲固定數量的令牌,使事情變得更容易 - 我會反映這在題。我的主文件是50列長 –

+0

嗯......在純批處理腳本中不是很平凡......無論如何,你可以檢查那些處理奇怪輸出並在那裏改變它的軟件(如果你有可能的話);據我所知,格式正確的CSV文件的字段中包含用引號括起來的換行符,或者更普遍地說,包含文本的字段(我認爲它是唯一可能包含換行符的數據類型)... – aschipfl

回答

0

我添加了另一列(現在是9),因爲它不起作用,在最後一個標記中有一個「in-line-CRLF」(你聲稱,標記8可能有一個) 。 (我明白,你有影響創建csv文件)。在代碼中描述爲REM方舟。

@echo off 
setlocal enabledelayedexpansion 
REM emty variable: 
set "line=" 
for /f "delims=" %%a in (t.csv) do (
    REM append line from file to variable 
    set "line=!line! %%a" 
    REM rescue spaces (by replacing with another character) 
    REM for proper token counting 
    set "line=!line: =²!" 
    set n=0 
    REM count tokens: 
    for %%b in (!line!) do set /a n+=1 
    if !n! geq 9 (
    REM if 9 (or more) tokens, the assembly is finished. 
    REM re-replace the spaces 
    set "line=!line:²= !" 
    REM cut the first char (a space): 
    set "line=!line:~1!" 
    REM output the line: 
    echo !line! 
    REM and clear the variable for the next logical line: 
    set "line=" 
) 
) 

有一些寬容,如果某行有超過<n>元素,但它會失敗,如果有少。

2

棘手,但一個很好的挑戰,我不得不忍受......雖然你沒有表現出任何自己的努力去解決它...

這裏是結合CSV數據線的情況下數量的腳本元素不符合預定義的元素。它不單獨處理元素,它只是附加行來達到建議的數量。數據不得包含任何全局通配符字符,如*?。也不應該出現任何引號,除非它們加倍像""。那就是:

@echo off 
setlocal EnableExtensions DisableDelayedExpansion 

rem // Define constants here: 
set "FILE_I=%~1" & rem // (specifies the input CSV file) 
set "FILE_O=%~2" & rem // (specifies the output CSV file) 
set "SEPARATOR=," & rem // (is the separator used in the CSV data) 
set "REPLACE=" & rem // (is the relacement string for each line-break) 
set "NUMITEMS=8" & rem // (is the proposed number of elements per line) 

rem // Validate given input and output CSV files: 
if not exist "%FILE_I%" (< "%FILE_I%" set /P ="" & exit /B 1) 
if not defined FILE_O set "FILE_O=con" 

rem // Initialise data collector and counter for elements: 
set "PREV=" & set /A "COUNT=0" 
rem // Iterate through lines of input file: 
for /F delims^=^ eol^= %%L in (' 
    rem/ /* Read input file, output dummy line and deplete output file: */ ^&^
     type "%FILE_I%" ^& ^> "%FILE_O%" break ^& echo/^&^
     for /L %%J in ^(2^,1^,%NUMITEMS%^) do @^< nul set /P ^="," 
') do (
    rem // Store currently read line: 
    set "LINE=%%L" 
    rem // Toggle delayed expansion in order not to lose `!`: 
    setlocal EnableDelayedExpansion 
    rem // Add number of elements of current line to the counter: 
    for %%I in ("!LINE:%SEPARATOR%=","!") do (
     endlocal 
     set /A "COUNT+=1" 
     setlocal EnableDelayedExpansion 
    ) 
    rem // Check whether counter reached given number of elements per line: 
    if !COUNT! LEQ %NUMITEMS% (
     rem /* Either proposed number of elements not reached, hence store data 
     rem and wait for next line to have enough elements; 
     rem or number is reached but still wait for the next line, because it 
     rem could be a single element to be appended to the previous line; 
     rem hence the data output is actually delayed by one loop iteration; 
     rem so to not lose the last line, the said dummy line is needed: */ 
     set "PREV=!PREV!%REPLACE%!LINE!" 
     rem // Transport data collector over `endlocal` barrier: 
     for /F delims^=^ eol^= %%K in ("!PREV!") do (
      endlocal 
      set "PREV=%%K" 
      setlocal EnableDelayedExpansion 
     ) 
     rem /* Decrement counter because a single element is considered 
     rem to be part of the last element of the previous line: */ 
     endlocal 
     set /A "COUNT-=1" 
     setlocal EnableDelayedExpansion 
    ) else (
     rem /* Proposed number of elements exceeded, hence output currently 
     rem collected data, reset collector and counter for elements: */ 
     if defined REPLACE set "PREV=!PREV:*%REPLACE%=!" 
     >> "%FILE_O%" echo !PREV! 
     endlocal 
     rem // Store current line in data collector and subtract 
     rem the number of output elements from counter: */ 
     set "PREV=%REPLACE%%%L" 
     set /A "COUNT-=%NUMITEMS%" 
     setlocal EnableDelayedExpansion 
    ) 
    endlocal 
) 

endlocal 
exit /B 

假設腳本保存爲concat-csv-lines.bat,輸入CSV文件名爲broken-lines.csv和輸出文件爲concatenated.csv,通過下面的命令行運行它:

concat-csv-lines.bat broken-lines.csv concatenated.csv 

隨着broken-lines.csv包含來自問題的樣本數據,concatenated.csv將保留:

a1, a2, a3, a4,aaaaaa a5, a6, a7,aaa aaa8 
b1,b2,b3,b4,b5,b6,b7,b8 
c1,c2,c3,c4,c5,c6,c7,c8 
d1,d2,d3,d4,d5,d6,d7,d8 
e1,e2,e3,e4,eeee5,e6,e7,e8