我的awk命令排序，卻意外地省略重複

我試着去排序的特定字段這個文件，我想要做的這一切在awk：我的awk命令排序，卻意外地省略重複

"firstName": "gdrgo", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "lastName": "222",dfg 
"xxxxx": "John", "firstName": "beto", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "lastName": "111","xxxxx": "John", 
"xxxxx": "John", "firstName": "beto", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "lastName": "111","xxxxx": "John", 
"xxxxx": "John", "xxxxx": "John", "firstName": "beto2", "xxxxx": "John","lastName": "555", "xxxxx": "John","xxxxx": "John", 
"xxxxx": "John", "xxxxx": "John", "firstName": "beto2", "xxxxx": "John","lastName": "444", "xxxxx": "John","xxxxx": "John", 
"firstName": "gdrgo", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "xxxxx": "John", "lastName": "222",dfg 
"xxxxx": "John", "xxxxx": "John", "firstName": "beto2", "xxxxx": "John","lastName": "444", "xxxxx": "John","xxxxx": "John",

我使用這個命令：

awk -F'.*"firstName": "|",.*"lastName": "|",' '{b[$3]=$0} END{for(i in b){print i}}' sumacomando

，其輸出：

但我預期：

也就是說，儘管實際輸出按照需要進行了表面排序，但卻意外丟失了重複值。

來源

2017-02-28 victorhernandezzero

排序/指數在awk的陣列中，總是聯想數組（d字典），是一個實現細節 - 不保證特定的順序;在你的情況下，輸出只是恰巧是排序。
鍵獨特，所以如果$3超過1個輸入行具有相同的值，則b[$3]=...分配相互覆蓋 - 最後一勝。

您因此：

必須使用順序索引陣列來存儲你的第三個字段值（$3）
已經通過自己的價值觀所產生的陣列後排序。

％的POSIX awk中規範，awk有沒有內置排序功能，但GNUawk呢，讓下面的解決方案，其asort()功能：

awk -F'.*"firstName": "|",.*"lastName": "|",' ' 
    { b[++n]=$3 } END{ asort(b); for(i=1;i<=n;++i) print b[i] } 
' sumacomando

注意，這並未」包括存儲相關的完整行（$0）。

如果你也想存儲相關的實線，而在（GNU）awk中依然表現排序，它變得更加複雜：

awk -F'.*"firstName": "|",.*"lastName": "|",' ' 
    # Use a compound key to store the value of $3 plus a sequential index 
    # to disambiguate, and store the input row ($0) as the value. 
    { vals[$3,++n]=$0 } 
    END{  
    # Sort by compound key using the helper function defined below. 
    asorti(vals, names, "cmp_func"); 
    # Output the first half of the compound key, i.e., the value of $3, 
    # followed by the associated input row. 
    for(i=1;i<=n;++i) print gensub(SUBSEP ".*$", "", 1, names[i]), vals[names[i]] 
    } 
    # Helper sort function that splits the compound key into its components 
    # - $3 value and sequential index - and compares the $3 values alphabetically 
    # and the indices numerically. 
    function cmp_func(i1, v1, i2, v2) { 
    split(i1, tokens1, SUBSEP) 
    split(i2, tokens2, SUBSEP) 
    if (tokens1[1] < tokens2[1]) return -1 
    if (tokens1[1] > tokens2[1]) return 1 
    i1 = int(tokens1[2]) 
    i2 = int(tokens2[2]) 
    if (i1 < i2) return -1 
    if (i1 > i2) return 1 
    return 0 
    } 
' sumacomando

管道到sort作爲一種替代解決方案大大簡化了事務：

awk -F'.*"firstName": "|",.*"lastName": "|",' '{ print $3, $0 }' sumacomando | sort -k1,1

但是，請注意，純Awk解決方案以上保留了重複的$3值之間的輸入順序，其中sort-assisted解決方案沒有。

相反，純粹的Awk解決方案需要將所有輸入一次存儲在內存中，而sort實用程序經過優化，可以處理大型輸入集並按需使用臨時文件。

來源

2017-02-28 03:01:07 mklement0

嗨對不起 – victorhernandezzero

你的領域分離的選擇是非常規的，也許更好的使用這個代替

awk -F'[:,]' '{for(i=1;i<=NF;i++) 
        if($i~"\"lastName\"") 
         {gsub(/"/,"",$(i+1)); 
         print $(i+1)}}' file | sort

如果您awk有asort功能，你可以做鑰匙的這個代替

awk -F'[:,]' '{for(i=1;i<=NF;i++) 
       if($i~"\"lastName\"") 
        {gsub(/"/,"",$(i+1)); 
        a[++c]=$(i+1)}} 
      END {asort(a); 
       for(k=1;k in a;k++) print a[k]}' file

來源

2017-02-28 02:49:30 karakfa

很好的答案非常感謝你，但我看起來像完全用awk完成不同的原因;但如果我沒有找到其他答案，我會選擇你作爲最佳答案 – victorhernandezzero

awk has asort？ – victorhernandezzero

'gawk'有，它不是「古典」'awk'的一部分，因爲對原始的unix工具哲學 – karakfa

@victorhernandezzero：@try：我嘗試了一種不同的方法，我希望它可以幫助你/全部。只有一個awk（沒有其他命令）。

awk '/lastName/{getline;while(!$0){getline};A[$0]} END{num=asorti(A, B);for(i=1;i<=num;i++){print B[i]}}' RS='[: ",]' Input_file

EDIT1：上述方案不會給你所需要的重複，特別感謝mklement0讓我知道，下面可以幫助你在相同的了。

awk '/lastName/{getline;while(!$0){getline};A[++j]=$0} END{num=asort(A, B);for(i=1;i<=num;i++){print B[i]}}' RS='[: ",\n]' Input_file

來源

2017-02-28 04:39:12 RavinderSingh13

我感謝您的意願，以改善您的問題。您的答案的排序部分現在與我的答案中的第一個解決方案相同（除非您複製數組，這不是必需的）。值提取部分不需要重寫 - OP的命令在這方面工作良好 - 並且您的重寫是複雜的，涉及到getline，這很少是正確的工具。這也使得解決方案難以推廣。 P.S .:請不要在你的回答中提到OP。無論如何，OP都會收到答案通知，這對未來的讀者來說是一種分心。 – mklement0

嗨，對不起，這是一個混淆，這個命令的工作原理是因爲安裝gawk，但不適用本地與nawk或awk對所有人抱歉 – victorhernandezzero

我的awk命令排序，卻意外地省略重複

回答

相關問題