2012-01-17 61 views
1

我需要能夠從現有文本文件中提取數據。文本文件的結構看起來像這樣...在現有文本文件中查找並提取文本

this line contains a type of header and always starts at column 1 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 

this line contains a type of header and always starts at column 1 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 

this line contains a type of header and always starts at column 1 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 

this line contains a type of header and always starts at column 1 
    this line contains other data and is always tabbed in 
    this line contains other data and is always tabbed in 

正如您所看到的,文本文件按部分排列。總是有一個標題行,隨後是隨機數的其他數據行,並且段之間總是有空行。不幸的是,標題部分的命名方案或其他數據行中包含的數據沒有韻或理由......只有上述結構有點一致。我需要搜索的數據位於其他數據行之一中,僅位於其中一個部分,可以位於文本文件的任何位置。我可以使用FIND命令來查找我需要查找的文本,但是一旦我這樣做了,我需要能夠將整個部分提取到新的文本文件中。我無法弄清楚如何在第一個前面的空行上多行,然後下行到下一行空行,並提取它們之間的所有內容。那有意義嗎?不幸的是,VBScript根本就不是這個應用程序的一個選項,或者它早就結束了。有任何想法嗎?感謝名單。

回答

1
@echo off 
setlocal enableDelayedExpansion 
set input="test.txt" 
set output="extract.txt" 
set search="MY TEXT" 

::find the line with the text 
for /f "delims=:" %%N in ('findstr /n /c:!search! %input%') do set lineNum=%%N 
set "begin=0" 

::find blank lines and set begin to the last blank before text and end to the first blank after text 
for /f "delims=:" %%N in ('findstr /n "^$" %input%') do (
    if %%N lss !lineNum! (set "begin=%%N") else set "end=%%N" & goto :break 
) 
::end of section not found so we must count the number of lines in the file 
for /f %%N in ('find /c /v "" ^<%input%') do set /a end=%%N+1 
:break 

::extract the section bracketed by begin and end 
set /a count=end-begin-1 
<%input% (
    rem ::throw away the beginning lines until we reach the desired section 
    for /l %%N in (1 1 %begin%) do set /p "ln=" 
    rem ::read and write the section 
    for /l %%N in (1 1 %count%) do (
     set "ln=" 
     set /p "ln=" 
     echo(!ln! 
    ) 
)>%output% 

限制用於此解決方案:

  • 線必須通過<CR><LF>(視窗樣式)來終止
  • 線必須< = 1021個字節長(不包括<CR><LF>
  • 尾隨控制字符將從每一行剝離

如果限制是一個問題,那麼可以寫一個效率較低的變體,使用FOR/F而不是SET/P讀取該部分。

+0

這工作非常完美!謝謝!!! – 2012-01-17 21:38:44

+0

@StevenSinclair - 如果這樣做,那麼不要忘記點擊複選標記將其選爲解決方案。如果你認爲這個答案是有用的,那麼你也可以投票答案(如果你這麼傾向)。 – dbenham 2012-01-17 21:49:53

1

下面的程序讀取文件行並將一個節的行存儲在向量中同時它檢查搜索文本是否在當前部分內。當該部分結束時,如果找到搜索到的文本,則輸出當前部分作爲結果;否則,該過程轉到下一部分。

@echo off 
setlocal EnableDelayedExpansion 
set infile=input.txt 
set outfile=output.txt 
set "search=Any text" 
set textFound= 
call :SearchSection < %infile% > %outfile% 
goto :EOF 

:SearchSection 
    set i=0 
    :readNextLine 
     set line= 
     set /P line= 
     if not defined line goto endSection 
     set /A i+=1 
     set "ln%i%=!line!" 
     if not "!ln%i%!" == "!line:%search%=!" set textFound=True 
    goto readNextLine 
    :endSection 
    if %i% == 0 echo Error: Search text not found & exit /B 
if not defined textFound goto SearchSection 
for /L %%i in (1,1,%i%) do echo !ln%%i! 
exit /B 

該程序的侷限性與dbenham對他的程序所說的相同。

+0

優雅的解決方案,儘管我懷疑GOTO循環會使它慢於我的解決方案,特別是如果在大文件的末尾找到文本。我沒有測試過。 – dbenham 2012-01-18 13:13:26

+0

@dbenham:我的方法運行速度快一些[WHILE宏](http://www.dostips.com/forum/viewtopic.php?f=3&t=2707&p=12415#p12415)而不是GOTO的,有些東西例如:':SearchSection''%WHILE%not defined textfound DO(''set i = 0'' set line =''set/P line =''%WHILE%defined line DO('etc ... – Aacini 2012-01-21 04:02:40