2014-04-02 60 views
0

我有問題,而對於注入 以下運行Nutch的是命令我運行Nutch的:作業失敗

斌/ Nutch的注射斌/爬行/ crawldb斌/網址

上面的命令運行後,得到以下錯誤

Injector: starting at 2014-04-02 13:02:29 
Injector: crawlDb: bin/crawl/crawldb 
Injector: urlDir: bin/urls/seed.txt 
Injector: Converting injected urls to crawl db entries. 
Injector: total number of urls rejected by filters: 2 
Injector: total number of urls injected after normalization and filtering: 0 
Injector: Merging injected urls into crawl db. 
Injector: overwrite: false 
Injector: update: false 
Injector: java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) 
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294) 
    at org.apache.nutch.crawl.Injector.run(Injector.java:316) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
    at org.apache.nutch.crawl.Injector.main(Injector.java:306) 

我第一次運行nutch。 我已經檢查過solr,nutch安裝正確。

下面詳細說明是從日誌文件

java.io.IOException: The temporary job-output directory file:/usr/share/apache-nutch-1.8/bin/crawl/crawldb/1639805438/_temporary doesn't exist! 
    at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) 
    at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244) 
    at org.apache.hadoop.mapred.MapFileOutputFormat.getRecordWriter(MapFileOutputFormat.java:46) 
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:449) 
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:491) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) 
2014-04-02 12:54:46,251 ERROR crawl.Injector - Injector: java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) 
    at org.apache.nutch.crawl.Injector.inject(Injector.java:294) 
    at org.apache.nutch.crawl.Injector.run(Injector.java:316) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) 
    at org.apache.nutch.crawl.Injector.main(Injector.java:306) 
+0

請幫我.. – Lussi

+0

根據你的日誌你有權限的問題。大概這個工作沒有權限在/ usr/...中創建文件夾... – Mysterion

+0

@Mysterion謝謝你的回覆..因爲你建議我改變了權限..但仍然得到相同的錯誤。 – Lussi

回答

0

使用倉/ Nutch的注入倉/爬行/ crawldb倉/網址命令注入

代替倉/ Nutch的注射爬行/ crawldb斌/網址是

解決了這個錯誤。

爲了獲取網址,我對regex-urlfilter.txt文件做了更改,現在可以獲取網址。