0

我使用星火殼牌(斯卡拉2.10和Spark流org.apache.spark:spark-streaming-kafka-0-10_2.10:2.0.1)來測試一個Spark /卡夫卡消費者:星火流卡夫卡消費者不喜歡DSTREAM

import org.apache.kafka.clients.consumer.ConsumerRecord 
import org.apache.kafka.common.serialization.StringDeserializer 
import org.apache.spark.streaming.kafka010._ 
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent 
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe 
import org.apache.spark._ 
import org.apache.spark.streaming._ 
import org.apache.spark.streaming.dstream.DStream 

val kafkaParams = Map[String, Object](
    "bootstrap.servers" -> "mykafka01.example.com:9092", 
    "key.deserializer" -> classOf[StringDeserializer], 
    "value.deserializer" -> classOf[StringDeserializer], 
    "group.id" -> "mykafka", 
    "auto.offset.reset" -> "latest", 
    "enable.auto.commit" -> (false: java.lang.Boolean) 
) 

val topics = Array("mytopic") 

def createKafkaStream(ssc: StreamingContext, topics: Array[String], kafkaParams: Map[String,Object]) : DStream[(String, String)] = { 
    KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams)) 
} 

def messageConsumer(): StreamingContext = { 
    val ssc = new StreamingContext(SparkContext.getOrCreate(), Seconds(10)) 

    createKafkaStream(ssc, topics, kafkaParams).foreachRDD(rdd => { 
     rdd.collect().foreach { msg => 
      try { 
       println("Received message: " + msg._2) 
      } catch { 
       case e @ (_: Exception | _: Error | _: Throwable) => { 
       println("Exception: " + e.getMessage) 
       e.printStackTrace() 
      } 
      } 
     } 
    }) 

    ssc 
} 

val ssc = StreamingContext.getActiveOrCreate(messageConsumer) 
ssc.start() 
ssc.awaitTermination() 

當我運行此我得到以下異常:

<console>:60: error: type mismatch; 
found : org.apache.spark.streaming.dstream.InputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]] 
required: org.apache.spark.streaming.dstream.DStream[(String, String)] 
        KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams)) 
                  ^

我已經過檢查斯卡拉/ API文檔一遍又一遍,而這種代碼看起來像它應該正確執行。任何想法,我要去哪裏?

回答

3

Subscribe需要topics參數爲Array[String],您傳遞的單個字符串按照def createKafkaStream(ssc: StreamingContext, topics: String,。將參數類型更改爲Array[String](並適當調用它)將解決該問題。

+1

@smeeb我沒有該版本的kafka lib來嘗試,但是您可以查看重載的'createDirectStream'方法來查看它們的返回類型以及它們所採用的參數。看起來你正在調用的方法返回'DStream [ConsumerRecord [K,V]]'而不是你期望的'DStream [K,V]'。或者,如果它是唯一的選擇,改變你的代碼接受'DStream [ConsumerRecord [K,V]]',然後映射到'(K,V'。 – khachik

+0

再次感謝@khachik(+1) - 當你說「 *看起來,你是callig的方法返回'DStream [ConsumerRecord [K,V]]'*「...你在哪裏看到這個證據,我看着我認爲是[正確的javadocs]( http://spark.apache.org/docs/2.0.1/api/java/org/apache/spark/streaming/kafka/KafkaUtils.html),我不會看到任何重載的'createDirectStream'方法返回'DStream [ConsumerRecord [K,V]]的想法?再次感謝!!! – smeeb

+0

@smeeb看[綜合指南](https://spark.apache.org/docs/latest/streaming-kafka-0-10 -integration.html),你調用的方法返回一個'ConsumerRecords'流,你應該將它映射爲獲取鍵/值對:'stream.map(record =>(record.key,record.value))'。你發佈的javadocs似乎是用於不同的版本。 – khachik