在ruby-vips
,你可以做這樣的:
require 'vips'
# find normalised histogram of reference image
ref = VIPS::Image.new ARGV[0], :sequential => true
ref_hist = ref.hist.histnorm
# trigger a GC every few loops to keep memuse down
loop = 0
ARGV[1..-1].each do |filename|
# find sample hist
sample = VIPS::Image.new filename, :sequential => true
sample_hist = sample.hist.histnorm
# calculate sum of squares of differences, if it's over a threshold, print
# the filename
diff_hist = ref_hist.subtract(sample_hist).pow(2)
diff = diff_hist.avg * diff_hist.x_size * diff_hist.y_size
if diff > 100
puts "#{filename}, #{diff}"
end
loop += 1
if loop % 100 == 0
GC.start
end
end
偶爾GC.start
是必要的,使紅寶石免費的東西,防止內存填充。儘管每100張圖片只有一次,但遺憾的是,它仍然花費大量的時間進行垃圾收集。
$ vips crop ~/pics/k2.jpg ref.png 0 0 100 50
$ for i in {1..10000}; do cp ref.png $i.png; done
$ time ../similarity.rb ref.png *.png
real 2m44.294s
user 7m30.696s
sys 0m20.780s
peak mem 270mb
如果你願意考慮Python,它會更快,因爲它引用了計數,並且不需要一直掃描。
import sys
from gi.repository import Vips
# find normalised histogram of reference image
ref = Vips.Image.new_from_file(sys.argv[1], access = Vips.Access.SEQUENTIAL)
ref_hist = ref.hist_find().hist_norm()
for filename in sys.argv[2:]:
# find sample hist
sample = Vips.Image.new_from_file(filename, access = Vips.Access.SEQUENTIAL)
sample_hist = sample.hist_find().hist_norm()
# calculate sum of squares of difference, if it's over a threshold, print
# the filename
diff_hist = (ref_hist - sample_hist) ** 2
diff = diff_hist.avg() * diff_hist.width * diff_hist.height
if diff > 100:
print filename, ", ", diff
我看到:
$ time ../similarity.py ref.png *.png
real 1m4.001s
user 1m3.508s
sys 0m10.060s
peak mem 58mb
查找到[這個問題](http://stackoverflow.com/questions/4196453/simple-and-fast-method-to-compare-images-for -相似)。那裏已經討論了很多選項。 – Uzbekjon
關於與'File.binread'比較的另一個注意事項。既然你只是比較文件內容和資源以及重要性的表現,那麼最好簡單地使用bash來做到這一點。看看:'diff','cmp'或'md5'。 – Uzbekjon
如果您需要分類器,可能是[張量流](https://www.tensorflow.org)的工作。 – tadman