百分比值與GNU差異

使用diff顯示兩個文件之間的百分比差異的好方法是什麼？百分比值與GNU差異

例如，如果一個文件有100行，並且一個副本有15行已被更改，diff-percent將爲15％。

來源

2010-04-27 cdated

您可以使用sdiff並計算分隔符，然後除以行數。 – MJB 2010-04-27 16:20:26

diff fileA fileB | wc -l 除以wc -l fileA //似乎是一個有趣的手動方式來完成它。 – cdated 2010-04-27 16:28:36

但問題是，如果有差異，你會得到3行 - orig，new和description。所以你可能會高估。來自wc的man文件的 – MJB 2010-04-27 17:27:01

也許這樣的事情？

兩個文件，A1和A2。

$ sdiff -B -b -s A1 A2 | wc會給你多少行不同。 wc給出總數，只是劃分。

-b和-B將忽略空白和空行，並且-s表示禁止公用行。

來源

2010-04-27 16:30:35 MJB

：換行符，單詞和字節。您將輸出中的第一個數字除以文件中的行數進行比較。 wc -l，只給你一些行數，可以添加到上面的命令中。 - 回覆AlligatorJack – cdated 2010-06-14 14:42:07

@cdated：謝謝澄清。直到您評論爲止，我纔看到問題/迴應。 – MJB 2010-06-14 15:01:05

這裏有一個腳本，將比較所有.txt文件，並顯示有超過15％的重複的那些：

#!/bin/bash 

# walk through all files in the current dir (and subdirs) 
# and compare them with other files, showing percentage 
# of duplication. 

# which type files to compare? 
# (wouldn't make sense to compare binary formats) 
ext="txt" 

# support filenames with spaces: 
IFS=$(echo -en "\n\b") 

working_dir="$PWD" 
working_dir_name=$(echo $working_dir | sed 's|.*/||') 
all_files="$working_dir/../$working_dir_name-filelist.txt" 
remaining_files="$working_dir/../$working_dir_name-remaining.txt" 

# get information about files: 
find -type f -print0 | xargs -0 stat -c "%s %n" | grep -v "/\." | \ 
    grep "\.$ext" | sort -nr > $all_files 

cp $all_files $remaining_files 

while read string; do 
    fileA=$(echo $string | sed 's/.[^.]*\./\./') 
    tail -n +2 "$remaining_files" > $remaining_files.temp 
    mv $remaining_files.temp $remaining_files 
    # remove empty lines since they produce false positives 
    sed '/^$/d' $fileA > tempA 

    echo Comparing $fileA with other files... 

    while read string; do 
     fileB=$(echo $string | sed 's/.[^.]*\./\./') 
     sed '/^$/d' $fileB > tempB 
     A_len=$(cat tempA | wc -l) 
     B_len=$(cat tempB | wc -l) 

     differences=$(sdiff -B -s tempA tempB | wc -l) 
     common=$(expr $A_len - $differences) 

     percentage=$(echo "100 * $common/$B_len" | bc) 
     if [[ $percentage -gt 15 ]]; then 
      echo " $percentage% duplication in" \ 
       "$(echo $fileB | sed 's|\./||')" 
     fi 
    done < "$remaining_files" 
    echo " " 
done < "$all_files" 

rm tempA 
rm tempB 
rm $all_files 
rm $remaining_files

來源

2013-09-09 11:12:57

https://superuser.com/questions/347560/is-there-a-tool-to-measure-file-difference-percentage有這很好地解決，

wdiff -s文件1。 txt file2.txt

更多選項請參閱man wdiff。

來源

2014-09-28 14:30:06 user159452

百分比值與GNU差異

回答

相關問題