2011-05-24 167 views
0

我有一個程序(遺憾地改變這個不是一個選項),它輸出的日誌文件大於500k行。Shell:通過子串對字符串進行分組的腳本

我想組中的日誌文件一起行(然後排序這些羣體)的基礎上的子帶中的臺詞

比如我有類似下面幾行:

SELECT something WHERE TIM BETWEEN '*' AND '*' AND something; 

什麼即時尋找到組上是TIM BETWEEN '*' AND '*'其中*線之間相匹配,例如:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 

將在輸出被分組爲例如:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 

每個組也都已經根據整個字符串進行了排序,所以在「多少」類似的情況下,它們是否相鄰?

我一直在試圖把一個shell腳本放在一起輸出我想從日誌文件中讀取的內容,但沒有取得任何成功!

編輯:我還需要提及的是 '東西' 可以是多個字,例如:

SELECT blah1, blah2 or SELECT blah1, blah2, blah3 

回答

1

你或許應該能夠使用排序

sort -o outputfile +1 -2 +4 -5 +6 -7 inputfile 

凡+1 - 2顯示「something」列,+4 -5顯示第一個日期列,+6 -7顯示最後一個日期列。

(PS!未測試)

+0

感謝Kristofer的答案,但我不能依靠列的數量和TIM BETWEEN'*'和'*'塊的位置在行之間的相同位置,我編輯了原始問題以反映此 – Tristan 2011-05-24 09:26:44

+0

您可以將「分隔符」設置爲除空格以外的其他值,以定義列結束的內容。通過這樣做,您可能可以執行多步排序,在其中更改每種排序之間的分隔符(如果可以使用單詞作爲分隔符)。 -t 更改分隔符。 – Kristofer 2011-05-24 10:39:34

0

你必須預先篩選數據,並把它變成東西,你可以使用sort用。

awk '{sub(/BETWEEN/, "|",$0) ;sub(/AND/,"|",$0)}' logFile \ 
| sort -t"|" +1 -2 +2 -3 \ 
| sed 's/|/BETWEEN/;s/|/AND/' 

輸出

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something; 

我希望這有助於。