如何按照其名稱對Linux服務器中的文件進行分類？

如何使用ls命令和選項列出重複的文件名，並在不同的目錄？如何按照其名稱對Linux服務器中的文件進行分類？

2017-03-04 MasterGL

您的問題更適合[超級用戶]（http://superuser.com/tour）。 [Stack Overflow是專業和愛好者程序員的問答網站]（http://stackoverflow.com/tour）。 – Cyrus

@Cyrus，這是一個公平的評論。事實證明，這比簡單的bash globbing問題更復雜，並且需要一些有爭議的程序設計（儘管很輕微），儘管提問者在他提問時沒有意識到這一點。所以我在這裏回答了。但是，你說的是正確的，正如問更多超級用戶導向。 – eewanco

不要忘了標記答案爲「接受」，如果它適合你！ – eewanco

不能使用一個單一的，基本ls命令來做到這一點。您必須使用其他POSIX/Unix/GNU實用程序的組合。例如，先找到重複的文件名：

find . -type f -exec basename "\{}" \; | sort | uniq -d > dupes

這通過在當前目錄（.）整個目錄層次結構意味着find的所有文件（-type f）和執行（-exec）命令basename（其條（\{}），命令結束（\;）。這些文件然後排序並打印出重複的行（uniq -d）。結果存入文件dupes。現在你的文件名被複制，但是你不知道它們在哪個目錄。再次使用find來找到它們。使用bash作爲你的shell：

while read filename; do find . -name "$filename" -print; done < dupes

這意味着通過（while）文件dupes和read的所有內容到變量filename每一行循環。對於每一行，再次執行find並搜索$filename的具體-name，並打印出來（-print，但它是隱含的，所以這是多餘的）。

說實話，你可以結合這些不使用中間文件：

find . -type f -exec basename "\{}" \; | sort | uniq -d | while read filename; do find . -name "$filename" -print; done

如果你不熟悉它，在|操作裝置，使用前一個命令的輸出作爲執行以下命令輸入以下命令。示例：

[email protected]:~$ mkdir test 
[email protected]:~$ cd test 
[email protected]:~/test$ mkdir 1 2 3 4 5 
[email protected]:~/test$ mkdir 1/2 2/3 
[email protected]:~/test$ touch 1/0000 2/1111 3/2222 4/2222 5/0000 1/2/1111 2/3/4444 
[email protected]:~/test$ find . -type f -exec basename "\{}" \; | sort | uniq -d | while read filename; do find . -name "$filename" -print; done 
./1/0000 
./5/0000 
./1/2/1111 
./2/1111 
./3/2222 
./4/2222

聲明：要求聲明文件名都是數字。雖然我試圖設計代碼來處理帶有空格的文件名（並且在我的系統上進行測試時，它可以工作），但是在遇到特殊字符，換行符，nuls或其他異常情況時，代碼可能會中斷。請注意，-exec參數具有特殊的安全考慮因素，不應該被超過任意用戶文件的root用戶使用。提供的簡化示例僅用於說明和教學目的。請查閱您的man頁面和相關的CERT建議，以獲得完整的安全隱患。

來源

2017-03-04 16:41:54 eewanco

在幾個位置觸摸名稱爲「file01 01 17」的文件並嘗試您的代碼。 –

@GeorgeVasiliou，爲我工作，我試了一下。此外，海報表明他的文件名都是數字。沒有要求處理空格的文件名，所以我沒有專門測試該場景的代碼。不過，我會添加一個免責聲明。 – eewanco

我有一個功能上的重複文件的bash我的個人資料（bash的4.4）。確實，找到是正確的工具。

我用分隔與空字符，而不是新的行（缺省查找操作）的查找結果-print0選項查找相結合。現在我可以捕獲當前目錄和子目錄下的所有文件。

這將確保結果將是正確的不管文件名包含特殊字符，如空格或新行（在某些極少數情況下）。您可以構建一個數組，然後在該數組中找到重複的文件，而不是使用雙重查找。然後你使用「duplicates」作爲模式來grep整個數組。

因此，像這樣的作品確定爲我的功能：

$ IFS= readarray -t -d '' fn< <(find . -name 'file*' -print0) 
$ dupes=$(LC_ALL=C sort <(printf '\<%s\>$\n' "${fn[@]##*/}") |uniq -d) 
$ grep -e "$dupes" <(printf '%s\n' "${fn[@]}") |awk -F/ '{print $NF,"==>",$0}' |LC_ALL=C sort

這是一個測試：

$ IFS= readarray -t -d '' fn< <(find . -name 'file*' -print0) 
# find all files and load them in an array using null delimiter 
$ printf '%s\n' "${fn[@]}" #print the array 
./tmp/file7 
./tmp/file14 
./tmp/file11 
./tmp/file8 
./tmp/file9 
./tmp/tmp2/file09 99 
./tmp/tmp2/file14.txt 
./tmp/tmp2/file15.txt 
./tmp/tmp2/file$100 
./tmp/tmp2/file14.txt.bak 
./tmp/tmp2/file15.txt.bak 
./tmp/file1 
./tmp/file4 
./file09 99 
./file14 
./file$100 
./file1 

$ dupes=$(LC_ALL=C sort <(printf '\<%s\>$\n' "${fn[@]##*/}") |uniq -d) 
#Locate duplicate files 
$ echo "$dupes" 
\<file$100\>$ #Mind this one with special char $ in filename 
\<file09 99\>$ #Mind also this one with spaces 
\<file14\>$ 
\<file1\>$ 
#I have on purpose enclose the results between \<...\> to force grep later to capture full words and avoid file1 to match file1.txt or file11 

$ grep -e "$dupes" <(printf '%s\n' "${fn[@]}") |awk -F/ '{print $NF,"==>",$0}' |LC_ALL=C sort 
file$100 ==> ./file$100   #File with special char correctly captured 
file$100 ==> ./tmp/tmp2/file$100 
file09 99 ==> ./file09 99  #File with spaces in name also correctly captured 
file09 99 ==> ./tmp/tmp2/file09 99 
file1 ==> ./file1 
file1 ==> ./tmp/file1 
file14 ==> ./file14    #other files named file14 like file14.txt and file14.txt.bak not captured since they are not duplicates. 
file14 ==> ./tmp/file14

提示：

這一個<(printf '\<%s\>$\n' "${fn[@]##*/}")在使用過程中替換使用bash內置的參數擴展技術來查找結果的基本名稱。
LC_ALL = C需要排序才能正確排序文件名。
在4.4之前的bash版本中，readarray不接受-d選項（分隔符）。在這種情況下，你可以將發現結果的陣列

而IFS =讀-r -d'資源;做好FN + =（「$ RES」）;完成< <（找到.... -print0）

來源

2017-03-05 01:15:17

如何按照其名稱對Linux服務器中的文件進行分類？

回答

相關問題