2016-09-30 63 views
1

HBase的表我有讀HBase的表中的代碼,使其很好地格式化,然後將其轉換成數據幀:閱讀使用星火上齊柏林

import org.apache.spark._ 
import org.apache.spark.rdd.NewHadoopRDD 
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor} 
import org.apache.hadoop.hbase.client.HBaseAdmin 
import org.apache.hadoop.hbase.mapreduce.TableInputFormat 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.hbase.HColumnDescriptor 
import org.apache.hadoop.hbase.util.Bytes 
import org.apache.hadoop.hbase.client.Put; 
import org.apache.hadoop.hbase.client.HTable; 

val tableName = "my_table" 

val conf = HBaseConfiguration.create() 
// Add local HBase conf 
conf.addResource(new Path("file:///opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/etc/hbase/conf.dist/hbase-site.xml")) 
conf.set(TableInputFormat.INPUT_TABLE, tableName) 

val admin = new HBaseAdmin(conf) 

admin.isTableAvailable(tableName) 

val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], 
     classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], 
     classOf[org.apache.hadoop.hbase.client.Result]) 


case class MyClass(srcid: Long, srcLat: Double, srcLong: Double, dstid: Long, dstLat: Double, dstLong: Double, time: Int, duration: Integer) 

val parsed = hBaseRDD.map{ case(b, a) => val iter = a.list().iterator(); 
      (Bytes.toString(a.getRow()).toLong, 
      Bytes.toString(iter.next().getValue()).toDouble, 
      Bytes.toString(iter.next().getValue()).toDouble, 
      Bytes.toString(iter.next().getValue()).toLong, 
      Bytes.toString(iter.next().getValue()).toDouble, 
      Bytes.toString(iter.next().getValue()).toDouble, 
      Bytes.toString(iter.next().getValue()).toInt, 
      Bytes.toString(iter.next().getValue()) 
)}.map{ s => 
         val time = s._8.replaceAll("T", "") 
         val time2 = time.replaceAll("\\+03:00", "") 
         val format = new java.text.SimpleDateFormat("yyyy-MM-ddHH:mm:ss.SSS") 
         val date = format.parse(time2) 
         MyClass(s._1, 
         s._5, 
         s._6, 
         s._4, 
         s._2, 
         s._3, 
         date.getHours(), 
         //s(6), 
         s._7) }.toDF() 
parsed.registerTempTable("my_table") 

此代碼工作得很好火花殼。但是我想在齊柏林筆記本中使用它。我期待這樣的工作在paragrah上。然而,當我運行的代碼,它輸出的import語句以下錯誤:

<console>:28: error: object hbase is not a member of package org.apache.hadoop 
     import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor} 

我是否需要添加依賴於齊柏林使用HBase的星火。如果是這樣,我該怎麼做?在文檔描述

回答