2013-02-12 60 views
5

由於Spring-Data-Hadoop尚未發佈,因此很難找到正在運行的示例配置以將其與c​​loudera配合使用。如何使用Cloudera CDH4和Maven獲得正在運行的Spring-Data-Hadoop項目

我需要選擇哪種依賴關係來獲得與CDH4(Hadoop 2.0.0-cdh4.1.3)一起運行的Spring-Data-Hadoop?

通過選擇不同的apporches我得到這個異常:

  1. 空指針

    Exception in thread "SimpleAsyncTaskExecutor-1" java.lang.ExceptionInInitializerError 
        at org.springframework.data.hadoop.mapreduce.JobExecutor$2.run(JobExecutor.java:183) 
        at java.lang.Thread.run(Thread.java:722) 
        Caused by: java.lang.NullPointerException 
        at org.springframework.util.ReflectionUtils.makeAccessible(ReflectionUtils.java:405) 
        at org.springframework.data.hadoop.mapreduce.JobUtils.<clinit>(JobUtils.java:123) 
        ... 2 more 
    
  2. 版本missmatch 7至4

    Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4 
        at org.apache.hadoop.ipc.Client.call(Client.java:1070) 
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) 
        at $Proxy1.getProtocolVersion(Unknown Source) 
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) 
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) 
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119) 
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238) 
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203) 
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) 
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386) 
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) 
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238) 
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) 
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372) 
        at org.springframework.data.hadoop.mapreduce.JobFactoryBean.afterPropertiesSet(JobFactoryBean.java:208) 
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1545) 
        at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1483) 
    ... 12 more 
    

回答

6

這是一個示範如何配置它。

Maven的設置:

注:

  • (Optinal)排除在彈簧數據的Hadoop的Hadoop的流媒體和Hadoop工具
  • 加入Hadoop的共同和Hadoop的HDFS與通用版本:2.0.0-cdhX.XX
  • 添加hadoop工具和hadoop流與mr1版本:2.0.0-mr1-cdhX.XX
  • Spr數據Hadoop目前僅支持MR1。所以請確保你沒有將MR2包含在其他依賴項中。用mvn dependency:tree檢查!

的pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 

    <groupId>com.example</groupId> 
    <artifactId>com.example.main</artifactId> 
    <version>0.0.1-SNAPSHOT</version> 
    <packaging>jar</packaging> 

    <properties> 
     <java-version>1.7</java-version> 
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 
     <spring.version>3.2.0.RELEASE</spring.version> 
     <spring.hadoop.version>1.0.0.BUILD-SNAPSHOT</spring.hadoop.version> 
     <hadoop.version.generic>2.0.0-cdh4.1.3</hadoop.version.generic> 
     <hadoop.version.mr1>2.0.0-mr1-cdh4.1.3</hadoop.version.mr1> 
    </properties> 

    <dependencies> 

     <dependency> 
      <groupId>org.springframework</groupId> 
      <artifactId>spring-core</artifactId> 
      <version>${spring.version}</version> 
      <exclusions> 
       <exclusion> 
        <groupId>commons-logging</groupId> 
        <artifactId>commons-logging</artifactId> 
       </exclusion> 
      </exclusions> 
     </dependency> 

     <dependency> 
      <groupId>org.springframework</groupId> 
      <artifactId>spring-context</artifactId> 
      <version>${spring.version}</version> 
     </dependency> 


     <dependency> 
      <groupId>org.springframework.data</groupId> 
      <artifactId>spring-data-hadoop</artifactId> 
      <version>${spring.hadoop.version}</version> 

      <exclusions> 
       <!-- Excluded the Hadoop dependencies to be sure that they are not mixed 
        with them provided by cloudera. --> 
       <exclusion> 
        <artifactId>hadoop-streaming</artifactId> 
        <groupId>org.apache.hadoop</groupId> 
       </exclusion> 
       <exclusion> 
        <artifactId>hadoop-tools</artifactId> 
        <groupId>org.apache.hadoop</groupId> 
       </exclusion> 
      </exclusions> 

     </dependency> 

     <!-- Hadoop Cloudera Dependencies --> 
     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-common</artifactId> 
      <version>${hadoop.version.generic}</version> 
     </dependency> 

     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-hdfs</artifactId> 
      <version>${hadoop.version.generic}</version> 
     </dependency> 

     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-tools</artifactId> 
      <version>2.0.0-mr1-cdh4.1.3</version> 
     </dependency> 

     <dependency> 
      <groupId>org.apache.hadoop</groupId> 
      <artifactId>hadoop-streaming</artifactId> 
      <version>2.0.0-mr1-cdh4.1.3</version> 
     </dependency> 

    </dependencies> 

    <build> 
     <plugins> 

      <plugin> 
       <groupId>org.apache.maven.plugins</groupId> 
       <artifactId>maven-compiler-plugin</artifactId> 
       <configuration> 
        <source>${java-version}</source> 
        <target>${java-version}</target> 
       </configuration> 
      </plugin> 

     </plugins> 
    </build> 

    <repositories> 
     <repository> 
      <id>spring-milestones</id> 
      <url>http://repo.springsource.org/libs-milestone</url> 
      <snapshots> 
       <enabled>false</enabled> 
      </snapshots> 
     </repository> 

     <repository> 
      <id>cloudera</id> 
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> 
      <snapshots> 
       <enabled>false</enabled> 
      </snapshots> 
     </repository> 

     <repository> 
      <id>spring-snapshot</id> 
      <name>Spring Maven SNAPSHOT Repository</name> 
      <url>http://repo.springframework.org/snapshot</url> 
     </repository> 
    </repositories> 
</project> 

彈簧設置(applicationContext.xml中):

與你的NameNode域

<?xml version="1.0" encoding="UTF-8"?> 
<beans xmlns="http://www.springframework.org/schema/beans" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xmlns:hdp="http://www.springframework.org/schema/hadoop" 
    xsi:schemaLocation=" 
        http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd 
        http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd 
        http://www.springframework.org/schema/context/spring-context.xsd http://www.springframework.org/schema/integration 
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.1.xsd"> 

    <hdp:configuration id="hadoopConfiguration"> 
     fs.default.name=hdfs://example.com:8020 
    </hdp:configuration> 

    <hdp:job id="wordCountJob" 
     mapper="com.example.WordMapper" 
     reducer="com.example.WordReducer" 
     input-path="/user/christian/input/test" 
     output-path="/user/christian/output2" /> 

    <hdp:job-runner job-ref="wordCountJob" run-at-startup="true" 
     wait-for-completion="true" /> 

更換fs.default.name有了這個,你應該是能夠訪問您的羣集。

一些參考:

0

嘿,你可以從https://github.com/spring-projects/spring-data-book下載。

構建並運行它在Read me文檔中給出。

+0

雖然鏈接可能會回答問題,但請考慮在答案中添加重要的問題/摘要。這樣做會確保即使提供的鏈接變爲不活動狀態,您的答案仍然有用。在SO中只有鏈接的答案是不鼓勵的。 – Harry 2013-11-26 04:56:15

+0

這個答案很神祕,至少。它不回答這個問題。 – waste 2016-01-20 06:30:41