計算記錄的數字替換重複值

一個作業的服務器上運行後，它會創建一個文件象下面這樣：計算記錄的數字替換重複值

1000727888004 
522101 John Smith 
522101 John Smith 
522188 Shelly King 
522188 Shelly King 
1000727888002 
522990 John Doe 
522990 John Doe 
9000006000000

目前，我們在這個過程中，以修復代碼，但將需要一個月。同時，我正在使用一條命令刪除下面的重複記錄。

perl -ne 'print unless $dup{$_}++;' old_file.txt > new_file.txt

我運行上面的命令後，它消除了重複的條目，但數仍爲下同：

1000727888004 
522101 John Smith 
522188 Shelly King 
1000727888002 
522990 John Doe 
9000006000000

開始與1排最後一個數字的總數（SO 4應該是2在第一行中，2應該在第四行中爲1，而6應該在以9開頭的最後一行中爲3）。它應該看起來像這樣：

1000727888002 
522101 John Smith 
522188 Shelly King 
1000727888001 
522990 John Doe 
9000003000000

我不能想出任何可以修復它的邏輯。我需要幫助。我可以運行另一個命令或在我的perl命令中添加一些內容以更正計數。是的，我可以在Notepad ++中打開文件並手動修復數字，但我試圖使其自動化。

謝謝！

來源

2017-04-22 Amir

那是什麼最後的記錄，從9？ –

這是總計數的文件的預告片。第一個9總是存在，然後接下來的6個數字是計數..如果它在一個數字中，則5個零填充在左邊。最後6個數字總是0 – Amir

在awk中。它處理計數記錄之間的「塊」內的模糊，即。它不考慮整個文件中的重複內容。如果這是不正確的假設，讓我知道。

$ awk ' 
NF==1 {   # for the cout record 
    if(c!="") # this fixes leading empty row 
     print c # print count 
    for(i in a) # all deduped data records 
     print i # print them 
    delete a  # empty hash 
    c=$0   # store count (well, you could use just the first count record) 
    next   # for this record don't process further 
} 
{ 
    if($0 in a) # if current record is already in a 
     c--  # decrease count 
    else a[$0] # else hash it 
} 
END {   # last record handling 
    print c  # print the last record 
    for(i in a) # just in case last record would be missing 
     print i # this and above could be removes 
}' file

輸出：

1000727888002 
522101 John Smith 
522188 Shelly King 
1000727888001 
522990 John Doe 
9000006000000

如果受騙者在整個文件中刪除，並最後一個記錄是數也：

awk ' 
NF==1 { 
    if(NR==1) 
     c=$0 
    print c 
} 
NF>1 { 
    if($0 in a) 
     c-- 
    else { 
     a[$0] 
     print 
    } 
}' file 
1000727888004 
522101 John Smith 
522188 Shelly King 
1000727888002 
522990 John Doe 
1000727888001

來源

2017-04-23 06:26:22

計算記錄的數字替換重複值

回答

相關問題