2017-04-22 117 views
2

嘗試計算深度目錄樹中所有文件的增量md5摘要,但我無法「重用」已計算的摘要。重複使用計算的md5(或任何其他校驗和)

這裏是我的測試代碼:

#!/usr/bin/env perl 
use 5.014; 
use warnings; 
use Digest::MD5; 
use Path::Tiny; 

# create some test-files in the tempdir 
my @filenames = qw(a b); 
my $testdir = Path::Tiny->tempdir; 
$testdir->child($_)->spew($_) for @filenames; #create 2 files 

dirmd5($testdir, @filenames); 
exit; 

sub dirmd5 { 
    my($dir, @files) = @_; 

    my $dirctx = Digest::MD5->new; #the md5 for the whole directory 

    for my $fname (@files) { 

     # calculate the md5 for one file 
     my $filectx = Digest::MD5->new; 
     my $fd = $dir->child($fname)->openr_raw; 
     $filectx->addfile($fd); 
     close $fd; 
     say "md5 for $fname : ", $filectx->clone->hexdigest; 

     # want somewhat "add" the above file-md5 to the directory md5  
     # this not work - even if the $filectx isn't reseted (note the "clone" above) 
     #$dirctx->add($filectx); 

     # works adding the file as bellow, 
     # but this calculating the md5 again 
     # e.g. for each file the calculation is done two times... 
     # once for the file-alone (above) 
     # and second time for the directory 
     # too bad if case of many and large files. ;(
     # especially, if i want calculate the md5sum for the whole directory trees 
     $fd = $dir->child($fname)->openr_raw; 
     $dirctx->addfile($fd); 
     close $fd; 
    } 
    say "md5 for dir: ", $dirctx->hexdigest; 
} 

上面打印:

md5 for a : 0cc175b9c0f1b6a831c399e269772661 
md5 for b : 92eb5ffee6ae2fec3ad71c777531578f 
md5 for dir: 187ef4436122d1cc2f40dc2b92f0eba0 

這是正確的,但不幸的是低效率的方式。 (見評論)。

閱讀the docs,我沒有找到任何方式重用已經計算出來的md5。例如如上面的$dirctx->add($filectx);。可能這是不可能的。

存在任何檢查和總結方法,它允許在某種程度上重新使用已計算的校驗和,因此,我可以計算整個目錄樹的校驗和/摘要,而無需爲每個文件計算多次摘要?

編號:試圖有所解決this question

回答

2

號沒有什麼,涉及MD5(initial data)MD5(new data)MD5(initial data + new data)因爲位置在流事務中的數據,以及其價值。否則它不會是一個非常有用的錯誤檢查爲abaaabbaa可能都具有相同的校驗

如果文件足夠小,你可以做閱讀每一個入內存,並使用該副本,將數據添加到這兩個摘要。這將避免大容量存儲讀取兩次

#!/usr/bin/env perl 

use 5.014; 
use warnings 'all'; 

use Digest::MD5; 
use Path::Tiny; 

# create some test-files in the tempdir 
my @filenames = qw(a b); 
my $testdir = Path::Tiny->tempdir; 
$testdir->child($_)->spew($_) for @filenames; # create 2 files 

dirmd5($testdir, @filenames); 

sub dirmd5 { 
    my ($dir, @files) = @_; 

    my $dir_ctx = Digest::MD5->new; #the md5 for the whole directory 

    for my $fname (@files) { 

     my $data = $dir->child($fname)->slurp_raw; 

     # calculate the md5 for one file 
     my $file_md5 = Digest::MD5->new->add($data)->hexdigest; 
     say "md5 for $fname : $file_md5"; 

     $dir_ctx->add($data); 
    } 

    my $dir_md5 = $dir_ctx->hexdigest; 
    say "md5 for dir: $dir_md5"; 
} 

如果文件是巨大的,那麼唯一的優化左邊是爲了避免再度同一個文件,而是讀了第二遍之前倒帶回到起點

#!/usr/bin/env perl 

use 5.014; 
use warnings 'all'; 

use Digest::MD5; 
use Path::Tiny; 
use Fcntl ':seek'; 

# create some test-files in the tempdir 
my @filenames = qw(a b); 
my $testdir = Path::Tiny->tempdir; 
$testdir->child($_)->spew($_) for @filenames; # create 2 files 

dirmd5($testdir, @filenames); 

sub dirmd5 { 
    my ($dir, @files) = @_; 

    my $dir_ctx = Digest::MD5->new; # The digest for the whole directory 

    for my $fname (@files) { 

     my $fh = $dir->child($fname)->openr_raw; 

     # The digest for just the current file 
     my $file_md5 = Digest::MD5->new->addfile($fh)->hexdigest; 
     say "md5 for $fname : $file_md5"; 

     seek $fh, 0, SEEK_SET; 
     $dir_ctx->addfile($fh); 
    } 

    my $dir_md5 = $dir_ctx->hexdigest; 
    say "md5 for dir: $dir_md5"; 
} 
+0

啊所以。然後,想要爲具有多個嵌套目錄的整個目錄樹計算摘要是毫無意義的,因爲我需要爲每個文件重複計算每個文件的摘要並重覆上面的每個目錄......呃...... :(需要弄清楚一些其他的「邏輯」爲[重複目錄樹](http://stackoverflow.com/q/43560796/869025)問題。「謝謝。 – cajwine

+0

@cajwine:沒有必要,只要保留一個文摘文件和樹中的每一個祖先目錄,任何文件中的數據都必須添加到每個祖先目錄的摘要中,這與處理樹中每個目錄的大小几乎相同,只是你可以'最後只需爲孩子們添加值。 – Borodin