2014-10-20 82 views

回答

2

如果你想永遠爬,下面是腳本,您需要:

#!/bin/bash 

./bin/nutch inject urls #urls is the seed data 
while [ 1 == 1 ] 
do 
    ./bin/nutch generate -topN 10000 # 10000 is the number of URLs will be fetch in each crawling round, you can modify it 
    ./bin/nutch fetch -all 
    ./bin/nutch parse -all 
    ./bin/nutch updatedb 

done 

希望這有助於

李全安待辦事項