2013-04-26 77 views
1

我有一個要求,我的批處理腳本應該找到批處理的輸入文件的編碼類型是否是UTF-8?任何人都可以告訴我是否可以找到編碼類型或不在窗口??使用批處理腳本找到編碼類型

+0

究竟什麼是你想去做?我問這是因爲在處理文件之前你可以很容易地轉換文件,但實際上確定它們是什麼編碼會有點困難。 – 2013-04-26 13:17:04

+0

感謝您的答覆。如果文件採用UTF-8格式,我可以按原樣使用該文件。但是當文件格式不同時,我需要將文件轉換爲UTF-8並需要進行處理。爲此我需要知道文件格式。 – satish 2013-04-26 13:28:08

回答

4

certutil您可以轉儲十六進制格式的文件。 UTF-8文件以0xEF,0xBB,0xBF開頭。所以:

certutil -dump my.file.txt | find "ef bb bf" && echo this is utf-8

你可以把這個在FOR /F循環,以確保只有第一線進行處理。

更新:

原來的certutil-dump選項buggy。所以我需要使用它需要一個臨時文件-encodehex

@echo off 
:detect_encoding 
setLocal 
if "%1" EQU "-?" (
    endlocal 
    call :help 
    exit /b 0 
) 
if "%1" EQU "-h" (
    endlocal 
    call :help 
    exit /b 0 
) 
if "%1" EQU "" (
    endlocal 
    call :help 
    exit /b 0 
) 


if not exist "%1" (
    echo file does not exists 
    endlocal 
    exit /b 54 
) 

if exist "%1\" (
    echo this cannot be used against directories 
    endlocal 
    exit /b 53 
) 

if "%~z1" EQU "0" (
    echo empty files are not accepted 
    endlocal 
    exit /b 52 
) 



set "file=%~snx1" 
del /Q /F "%file%.hex" >nul 2>&1 

certutil -f -encodehex %file% %file%.hex>nul 

rem -- find the first line of hex file -- 

for /f "usebackq delims=" %%E in ("%file%.hex") do (
    set "f_line=%%E" > nul 
    goto :enfdor 
) 
:enfdor 
del /Q /F "%file%.hex" >nul 2>&1 

rem -- check the BOMs -- 
echo %f_line% | find "ef bb bf"  >nul && echo utf-8  &&endlocal && exit /b 1 
echo %f_line% | find "ff fe 00 00" >nul && echo utf-32 LE &&endlocal && exit /b 5 
echo %f_line% | find "ff fe"  >nul && echo utf-16 &&endlocal && exit /b 2 
echo %f_line% | find "fe ff 00"  >nul && echo utf-16 BE &&endlocal && exit /b 3 
echo %f_line% | find "00 00 fe ff" >nul && echo utf-32 BE &&endlocal && exit /b 4 

echo ASCII & endlocal & exit /b 6 



endLocal 
goto :eof 

:help 
echo. 
echo %~n0 file - Detects encoding of a text file 
echo. 
echo for each encoding you will recive a text responce with a name and a errorlevel codes as follows: 

echo  1 - UTF-8 
echo  2 - UTF-16 BE 
echo  3 - UTF-16 LE 
echo  4 - UTF-32 BE 
echo  5 - UTF-32 LE 
echo  6 - ASCII 

echo for empty files you will receive error code 52 
echo for directories you will receive error code 53 
echo for not existing file you will receive error code 54 
goto :eof 
+1

感謝您的答覆。您的回答對我來說真的很有幫助。我還有一個疑問,對於其他編碼文件類型(如unicode等),您是否可以指定要查找文件類型的更改? – satish 2013-04-26 14:00:38

+0

在這裏你可以找到其他utf編碼的信息:http://en.wikipedia.org/wiki/Byte_order_mark。我不確定它會在所有情況下工作。 – npocmaka 2013-04-26 14:09:50

+0

請檢查我的更新。 – npocmaka 2013-04-28 08:46:42