慶典加入空更換（-e選項）

我有下面的代碼加入多個文件一起多個文件。它工作正常，但我想將空值替換爲0，所以我使用-e「0」。但它不起作用。任何想法？慶典加入空更換（-e選項）

for k in `ls file?` 
do 
    if [ -a final.results ] 
    then 
      join -a1 -a2 -e "0" final.results $k > tmp.res 
      mv tmp.res final.results 
    else 
      cp $k final.results 
    fi 

done

例如：

file1: 
a 1 
b 2 
file2: 
a 1 
c 2 
file3: 
b 1 
d 2 

Results: 
a 1 0 1 0 
b 2 1 0 
c 2 
d 2 

expected: 
a 1 1 0 
b 2 0 1 
c 0 2 0 
d 0 0 2

來源

2012-12-19 Amir

不解析ls'的'輸出;在文件中使用'k'？ do'。另外，引用'$ k'的擴展來防止文件名中的特殊字符。 – chepner

它記錄不完整，但使用join的-e選項只能與-o選項一起使用時。訂單字符串需要在每次循環時修改。以下代碼應該會生成您所需的輸出。

i=3 
orderl='0,1.2' 
orderr=',2.2' 
for k in $(ls file?) 
do 
    if [ -a final.results ] 
    then 
      join -a1 -a2 -e "0" -o "$orderl$orderr" final.results $k > tmp.res 
      orderl="$orderl,1.$i" 
      i=$((i+1)) 
      mv tmp.res final.results 
    else 
      cp $k final.results 
    fi 
done

正如你所看到的，它開始變得混亂。如果您需要擴展這個更遠它可能是值得要推遲一個比較強大的工具，如AWK或Python。

來源

2012-12-20 00:46:04 cmh

它仍然是不完全正確...... 這是你的腳本輸出： B 2 2 0 1 的C 2 0 2 0 d 2 0 0 2 – Amir

這可能是因爲你有一個現有的final.results文件。嘗試先刪除。我的輸出與你要求的一致。 – cmh

是的。以你的例子final.results爲例，運行這個腳本給出如上所述的'1 1 1 0 b 2 2 0 1 c 2 0 2 0 d 2 0 0 2'。顯然，您需要在重新運行之前刪除該文件。 – cmh

我放棄了使用連接和寫我的劇本在其他方式

keywords=`cat file? | awk '{print $1}' | sort | uniq | xargs` 
files=`ls file? | xargs` 
for p in $keywords 
do 
    x=`echo $p` 
    for k in $files 
    do 
    if grep -q ^$p $k 
    then 
     y=`cat $k | grep ^$p | awk '{print $2}'` 
     x=`echo $x $y` 
    else 
     echo $p $k 
     x=`echo $x 0`  
    fi 
    done 
    echo $x >> final.results 
done

來源

2012-12-20 19:26:26 Amir

假設有在一個單一的文件中沒有重複的鍵和按鍵不包含空格，你可以使用gawk和的排序水珠文件。這種做法將是大文件相當快，並會使用比所有數據的水珠的內存只有相對少量的。運行，如：的script.awk

gawk -f script.awk $(ls -v file*)

內容：的grep . file*

BEGINFILE { 
    c++ 
} 

z[$1] 

$1 in a { 

    a[$1]=a[$1] FS ($2 ? $2 : "0") 
    next 
} 

{ 
    for(i=1;i<=c;i++) { 
     r = (r ? r FS : "") \ 
     (i == c ? ($2 ? $2 : "0") : "0") 
    } 

    a[$1]=r; r="" 
    b[++n]=$1 
} 

ENDFILE { 

    for (j in a) { 
     if (!(j in z)) { 
      a[j]=a[j] FS "0" 
     } 
    } 

    delete z 
} 

END { 

    for (k=1;k<=n;k++) { 
     print b[k], a[b[k]] 
    } 
}

測試輸入/結果：

file1:a 1 
file1:x 
file1:b 2 
file2:a 1 
file2:c 2 
file2:g 
file3:b 1 
file3:d 2 
file5:m 6 
file5:a 4 
file6:x 
file6:m 7 
file7:x 9 
file7:c 8

結果：

a 1 1 0 4 0 0 
x 0 0 0 0 0 9 
b 2 0 1 0 0 0 
c 0 2 0 0 0 8 
g 0 0 0 0 0 0 
d 0 0 2 0 0 0 
m 0 0 0 6 7 0

來源

2013-01-02 04:36:15 Steve

順便說一句，在GNU versio n加入支持-o auto。該-e和-o引起足夠的挫折把人們學習AWK。（另見How to get all fields in outer join with Unix join?）。由於CMH說：這是[不]記載，但使用加入-e選項時，只能與-o選項一起使用。

通用的解決方案：

cut -d ' ' -f1 file? | sort -u > tmp.index 
for k in file?; do join -a1 -e '0' -o '2.2' tmp.index $k > tmp.file.$k; done 
paste -d " " tmp.index tmp.file.* > final.results 
rm tmp*

獎勵：我怎麼在混帳比較多個分支機構？

for k in pmt atc rush; do git ls-tree -r $k | cut -c13- > ~/tmp-branch-$k; done 
cut -f2 ~/tmp-branch-* | sort -u > ~/tmp-allfiles 
for k in pmt atc rush; do join -a1 -e '0' -t$'\t' -11 -22 -o '2.2' ~/tmp-allfiles ~/tmp-branch-$k > ~/tmp-sha-$k; done 
paste -d " " ~/tmp-allfiles ~/tmp-sha-* > final.results 
egrep -v '(.{40}).\1.\1' final.results # these files are not the same everywhere

來源

2013-03-15 18:42:36

我認爲你的第一個觀點不是一個問題，而是更多的正確答案。它確實給出了具有預期效果的「加入」選項。 – WAF

回想起來，這是在我的第一個Git章魚合併之前。我們比較了三個分支，直到所有差異都爲零:-) –

慶典加入空更換（-e選項）

回答

相關問題